# Texas school accountability data
This notebook has the scripts needed to cut, filter and analyze school accountability data from the Texas Education Association.

### Download the data
Accountability data for 2013-2016 are in the `data` folder inside this repo, but here's how you'd get campus-level summary data files if they weren't there already. I'll use the 2015 data file as an example.

First, go to the [accountability data portal](https://rptsvr1.tea.texas.gov/perfreport/account/2015/) and click the "Download data" link on the left rail.

<img src="img/1-portal-page.gif" style="border: 1px solid #ccc; margin: 20px auto 40px auto;" />

On the resulting page, click the "Campus-level Data" radio button, then scroll down and click "Continue."

<img src="img/2-data-page.gif" style="border: 1px solid #ccc; margin: 20px auto 40px auto;" />

Finally, on the data download page, select "Tab delimited" from the select menu. Click the "Select all" button. Then click the "Download" button.

<img src="img/3-download-page.gif" style="border: 1px solid #ccc; margin: 20px auto 40px auto;" />

I renamed this file `2015-tx-school-acc-data.dat` and dropped it in the `data` folder, then repeated the process for the other years.

Also, I snagged the file layouts ([e.g.](https://rptsvr1.tea.texas.gov/perfreport/account/2015/download/camprate.html)) and saved them as .tsv files in the `/data` directory, though in practice they didn't always match the actual data. I used them as a rough guide and consulted a sample of published summary reports [like this one](https://rptsvr1.tea.texas.gov/perfreport/account/2013/static/summary/campus/c227901170.pdf) to check expected values against the actual data.

Also also, I grabbed a .csv file with [spatial data of every school in Texas](http://schoolsdata.tea-texas.opendata.arcgis.com/datasets/059432fd0dcb4a208974c235e837c94f_0) and saved it as `school_locations.csv`. (TODO: grab the [districts shapefile](http://schoolsdata.tea-texas.opendata.arcgis.com/datasets/e115fed14c0f4ca5b942dc3323626b1c_0), too.)

### Cut and stack
Using `awk`, I cut out the columns I needed from each file and appended them to `data/stacked-file.csv`. (The file layouts are slightly different each year, and also remember how the column layouts provided by the state aren't always correct.) Then I joined a few columns of location data.

In [1]:
%%bash
# truncate (or create) file
:> data/stacked_data.csv

# write headers
echo "campus_id,campus_name,campus_population,campus_pct_disadvantaged,campus_pct_english_language_learners,district_name,index1_target_score,index1_score,index2_target_score,index2_score,index3_target_score,index3_score,index4_target_score,index4_score,distinction_reading,distinction_math,distinction_student_progress,distinction_science,distinction_social_studies,distinction_close_performance_gap,distinction_postsecondary_readiness,jjaep,daep,year,overall_rating,updated_rating" >> data/stacked_data.csv

# 2013 data
awk -F '\t' '{OFS=","; if (NR!=1) {print $1,$6,$44,$46,$48,$51,$20,$19,$25,$24,$30,$29,$35,$34,$5,$3,$4,".",".",".",".",$12,$11,"2013",$49,$50;}}' data/2013-tx-school-acc-data.dat >> data/stacked_data.csv

# 2014 data
awk -F '\t' '{OFS=","; if (NR!=1) {print $1,$9,$49,$51,$53,$56,$23,$22,$28,$27,$33,$32,$39,$37,$6,$3,$5,$7,$8,$2,$4,$15,$13,"2014",$54,$55;}}' data/2014-tx-school-acc-data.dat >> data/stacked_data.csv

# 2015 data
awk -F '\t' '{OFS=","; if (NR!=1) {print $1,$9,$49,$51,$53,$56,$23,$22,$28,$27,$33,$32,$38,$37,$6,$3,$5,$7,$8,$2,$4,$15,$13,"2015",$54,$55;}}' data/2015-tx-school-acc-data.dat >> data/stacked_data.csv

# join to location data and sort by campus ID
csvcut -c 9,7,2,1,15 data/school_locations.csv | csvjoin -c "campus_id,CAMPUS" data/stacked_data.csv - > data/stacked_data_with_coordinates.csv

# check for ish
csvclean -n data/stacked_data_with_coordinates.csv

No errors.


In [None]:
"""
Here is a Python dict with a 1-indexed column layout for each year of data, for whoever deals
with this next year.
"""




### Group data by campus
Hey, I've got some Python analysis to do. Should I use `numpy`? `pandas`?

<img src="img/achewood.png" />

Ha ha OK guys, settle down, I'll use `Agate`. First up, I need to create a table and group by the campus ID.

In [None]:
import agate

school_ratings = agate.Table.from_csv('data/stacked_data_with_coordinates.csv')

### Process the data
* Run the campus names through a couple of text transforms to standardize names.
* Convert 0/1, Y/N variables into booleans.
* Convert "." and " " values (the state's flags for n/a) into None

In [None]:
import re

TEXT_TRANSFORMS = (
    (r"H S$", "High School"),
    (r"MIDDLE$", "Middle School"),
    (r"JR H S$", "Junior High School"),
    (r"INT$", "Intermediate"),
    (r"EL$", "Elementary"),
)

# [re.sub(*item, el, flags=re.IGNORECASE) for item in TEXT_TRANSFORMS]


### Chop out local school data

In [None]:
local_districts = ["Austin", "Round Rock", "Leander", "Pflugerville", "Hays", "Del Valle", "Georgetown", "Bastrop", "Manor", "Lake Travis", "Eanes", "San Marcos", "Hutto", "Dripping Springs"]