Skip to content

imclab/california-education-data

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Public CDE Data

CDE is California Department of Education

Usage

You must first install Node.js

git clone https://github.com/oztu/california-education-data.git
cd california-education-data
node ./scripts/setup/unzipCsvs.js

The data will be available in data/csv.

Updating the data sets

Here is how the data was generated (and how to update it):

Academic Performance Index (API)

The Academic Performance Index data can be updated by running npm install; node ./scripts/crawling/getAPIFiles.js;. The data will be automatically crawled and put inside the tmp folder.

The files are retrieved from http://www.cde.ca.gov/ta/ac/ap/apidatafiles.asp. The data files are converted to CSV and the layout html pages are converted to JSON files. For convenience, a header line is added to the CSV which is the names of the fields from the layout file.

There are scripts in scripts/mongo which can be used to load this data to a mongodb instance and clean it up a bit for running map/reduce on.

Annual Financial Data (SACS and ALT)

This data is a bit trickier to update since they're distributed as self-extracting .EXEs here: http://www.cde.ca.gov/ds/fd/fd/. I've downloaded the exes, extracted them, then ran the resulting .mdb files against mdbToCsv to generate to csv files. Finally, I ran node ./scripts/setup/zipCsvs.js to gzip them in order to appease GitHub's limitation on large file sizes.

To create a sqlite3 database using the CSVs, first unzip them using node ./scripts/setup/unzipCsvs.js then, execute the nodejs scripts starting with "load" in ./scripts/sqlite.

About

Data crawled from California Department of Education (http://www.cde.ca.gov)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published