# NERDS Data Analysis with Python and Pandas

In [42]:
import pandas as pd
import numpy as np

This project is intended to gauge whether there are funding discrepancies within Bexar County Independent School Districts when it comes to per-pupil spending and economic disadvantage rates. If a discrepancy is detected (low correlation coefficient…threshold TBD…or there are outlier schools on the resulting scatterplot, we can work with an education reporter to investigate further. <br>
The data come from two sources: 
- The NERD$ data come from Georgetown University's [National Education Resource Database on School Spending Organization](https://georgetown.app.box.com/s/1dknmu4bjltrehzdygh63xnzebcki4ni/file/1060770984817) and represent the 2018-2019 academic year. <br>
- The Economic Disadvantage Data comes from [Square Meals.org](https://data.texas.gov/dataset/School-Nutrition-Programs-Contact-Information-and-/jezb-2499) and also represent the 2018-2019 academic year.<br>

## Data Import

In [43]:
nerds_full = pd.read_csv("/Users/wratko/Documents/projects/tx-school_funding/jupyter-notebooks/essential-data/tx-nerds-2018_19.csv")
ed_full = pd.read_csv("/Users/wratko/Documents/projects/tx-school_funding/jupyter-notebooks/essential-data/tx-sch_lunch-2018-19.csv")

Merge the two dataframes into one on the district and school names:

In [46]:
complete_data = pd.merge(nerds_full, ed_full, on=['distid', 'schoolname'])

In [53]:
dist_only = pd.merge(nerds_full, ed_full, on='distid')

In [55]:
dist_only.tail()

Unnamed: 0,distid,schoolid_stateassigned,distname_x,schoolname_x,enroll_raw_TX,pp_stloc_raw_TX,CEID,distname_y,SiteID,schoolname_y,CECounty,SiteCounty,SiteISP
33856,15912,15912102,SOUTHWEST ISD,SUN VALLEY EL,612,8254.138497,87,SOUTHWEST ISD,110,SPICEWOOD PARK ELEMENTARHY,BEXAR,BEXAR,75.52
33857,15912,15912102,SOUTHWEST ISD,SUN VALLEY EL,612,8254.138497,87,SOUTHWEST ISD,6,CAST STEM HIGH SCHOOL,BEXAR,BEXAR,
33858,15912,15912102,SOUTHWEST ISD,SUN VALLEY EL,612,8254.138497,87,SOUTHWEST ISD,103,INDIAN CREEK EL,BEXAR,BEXAR,78.05
33859,15912,15912102,SOUTHWEST ISD,SUN VALLEY EL,612,8254.138497,87,SOUTHWEST ISD,111,MEDIO CREEK ELEMENTARY,BEXAR,BEXAR,69.09
33860,15912,15912102,SOUTHWEST ISD,SUN VALLEY EL,612,8254.138497,87,SOUTHWEST ISD,107,BIG COUNTRY EL,BEXAR,BEXAR,47.81


In [56]:
dist_only.to_csv("/Users/wratko/Documents/projects/tx-school_funding/data_output/test-distonly.csv")

Save that file to a csv:

In [47]:
complete_data.to_csv("/Users/wratko/Documents/projects/tx-school_funding/data_output/complete_data.csv")

Now let's see our Districts just to make sure we got them all:

In [None]:
complete_data.head()

In [59]:
unique_distid = complete_data["distid"].unique()
print(unique_distid)

[ 15901 130901  46902  15911  15914  15904  15916  15913 163908  15910
  15915  15907  94902  15908  15917  15912]


In [49]:
unique_distname_y = complete_data["distname_y"].unique()
print(unique_distname_y)

['ALAMO HEIGHTS ISD' 'BOERNE ISD' 'COMAL ISD' 'EAST CENTRAL ISD'
 'FT SAM HOUSTON ISD' 'HARLANDALE ISD' 'JUDSON ISD' 'LACKLAND ISD'
 'MEDINA VALLEY ISD' 'NORTH EAST ISD' 'NORTHSIDE ISD-SAN ANTONIO'
 'SAN ANTONIO ISD' 'SCHERTZ-CIBOLO-U CITY ISD' 'SOUTH SAN ANTONIO ISD'
 'SOUTHSIDE ISD' 'SOUTHWEST ISD']


Here we're going to break out the 19 districts out into their own data frames for analysis.

In [58]:
alamo_heights = complete_data[complete_data.distid==15901]
# boerne
# comal
# east_central
# ft_sam_houston
# harlandale
# judson
# lackland
# medina_valley
# north_east
# northside
# san_antonio
# scherz_cibolo
# south_san_antonio
# southside
# southwest
