# Urban Institute - Data Exploration

This notebook is devoted to exploring data from the Urban Institute. 
Some data is called directly from the API: 
https://educationdata.urban.org/documentation/#how_to_use

Some data is downloaded directly from their "Data Explorer": 
https://educationdata.urban.org/data-explorer/school-districts/ 

Citation: 

Education Data Portal (Version 0.5.0 - Beta), Urban Institute, Center on Education Data and Policy, accessed August, 23, 2019, https://educationdata.urban.org/documentation/, [datasets].

_The Education Data Explorer harmonizes data from all major federal datasets, including the US Department of Education Common Core of Data, the US Department of Education Civil Rights Data Collection, the US Department of Education EDFacts, the US Census Bureau Small Area Income and Poverty Estimates, the US Department of Education Integrated Postsecondary Education Data System, and the US Department of Education College Scorecard._

In [22]:
# Data manipulation
import pandas as pd
import numpy as np

# Options for pandas
pd.options.display.max_columns = None
pd.options.display.max_rows = None

In [1]:
# Display all cell outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

from IPython import get_ipython
ipython = get_ipython()

# autoreload extension
if 'autoreload' not in ipython.extension_manager.loaded:
    %load_ext autoreload

%autoreload 2

# Visualizations
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)

import cufflinks as cf
cf.go_offline(connected=True)
cf.set_config_file(theme='white')

In [45]:
from json import loads
import urllib

In [46]:
url = "https://educationdata.urban.org/api/v1/school-districts/saipe/2008/"
response = urllib.request.urlopen(url)
data = loads(response.read())

-Above, we learned that the "urlopen" module lives inside of requests. Did I need urllib since I imported requests?-

        Not true, didn't need "requests" library, just had to specify "request" module inside of "urllib"

In [79]:
url3 = "https://educationdata.urban.org/api/v1/school-districts/ccd/finance/2015/"
response3 = urllib.request.urlopen(url3)
finance_2014 = loads(response3.read()) 

In [81]:
#finance_2014['results'][0].keys()

In [82]:
df3 = pd.DataFrame(finance_2014)
finance_results2 = pd.DataFrame([df3.loc[x, 'results'] for x in range(len(df3))])


In [83]:
finance_results2.fips.value_counts()

4    710
1    180
5     56
2     54
Name: fips, dtype: int64

In [85]:
len(finance_2014)

4

In [66]:
url2 = "https://educationdata.urban.org/api/v1/school-districts/ccd/finance/2015/"
response2 = urllib.request.urlopen(url2)
finance_2016 = loads(response2.read()) 

In [76]:
finance_2016['results'][0].keys()

dict_keys(['year', 'fips', 'leaid', 'censusid', 'rev_total', 'rev_fed_total', 'rev_fed_child_nutrition_act', 'rev_fed_state_title_i', 'rev_fed_state_idea', 'rev_fed_state_math_sci_teach', 'rev_fed_state_drug_free', 'rev_fed_state_vocational', 'rev_fed_state_bilingual_ed', 'rev_fed_state_other', 'rev_fed_direct_impact_aid', 'rev_fed_direct_indian_ed', 'rev_fed_direct_other', 'rev_fed_arra', 'rev_fed_nonspec', 'rev_state_total', 'rev_state_gen_formula_assist', 'rev_state_special_ed', 'rev_state_transportation', 'rev_state_staff_improve', 'rev_state_compens_basic_ed', 'rev_state_vocational_ed', 'rev_state_outlay_capital_debt', 'rev_state_bilingual_ed', 'rev_state_gifted_talented', 'rev_state_sch_lunch', 'rev_state_oth_prog', 'rev_state_employee_benefits', 'rev_state_not_employee_benefits', 'rev_state_nonspec', 'rev_local_total', 'rev_local_parent_govt', 'rev_local_prop_tax', 'rev_local_sales_tax', 'rev_local_utility_tax', 'rev_local_income_tax', 'rev_local_other_tax', 'rev_local_other_sch

In [70]:
df2 = pd.DataFrame(finance_2016)
finance_results = pd.DataFrame([df2.loc[x, 'results'] for x in range(len(df2))])



In [78]:
finance_results.fips.value_counts()

4    710
1    180
5     56
2     54
Name: fips, dtype: int64

## Analysis/Modeling
Do work here

In [48]:
df = pd.DataFrame(data)

In [54]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 4 columns):
count       1000 non-null int64
next        1000 non-null object
previous    0 non-null object
results     1000 non-null object
dtypes: int64(1), object(3)
memory usage: 31.3+ KB


In [49]:
df.head()
df.columns

Unnamed: 0,count,next,previous,results
0,13754,https://educationdata.urban.org/api/v1/school-...,,"{'district_id': '00001', 'district_name': 'DEW..."
1,13754,https://educationdata.urban.org/api/v1/school-...,,"{'district_id': '00001', 'district_name': 'MAR..."
2,13754,https://educationdata.urban.org/api/v1/school-...,,"{'district_id': '00001', 'district_name': 'VAU..."
3,13754,https://educationdata.urban.org/api/v1/school-...,,"{'district_id': '00001', 'district_name': 'CAS..."
4,13754,https://educationdata.urban.org/api/v1/school-...,,"{'district_id': '00001', 'district_name': 'FOR..."


Index(['count', 'next', 'previous', 'results'], dtype='object')

In [50]:
results = pd.DataFrame([df.loc[x, 'results'] for x in range(len(df))])



In [53]:
results.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
district_id                        1000 non-null object
district_name                      1000 non-null object
est_population_5_17                1000 non-null float64
est_population_5_17_pct            996 non-null float64
est_population_5_17_poverty        1000 non-null float64
est_population_5_17_poverty_pct    996 non-null float64
est_population_total               1000 non-null float64
fips                               1000 non-null object
leaid                              1000 non-null object
year                               1000 non-null int64
dtypes: float64(5), int64(1), object(4)
memory usage: 78.2+ KB


In [62]:
results.leaid.value_counts()

4900360    1
1300440    1
2800570    1
0200120    1
5000014    1
1800210    1
3800060    1
2700108    1
1700176    1
3000098    1
3100008    1
2200600    1
3100021    1
0400005    1
4400360    1
3100067    1
3700480    1
4600046    1
2400450    1
2700001    1
2400480    1
3000097    1
0600029    1
2700006    1
4900390    1
3500480    1
3100172    1
5000013    1
1600150    1
2600019    1
5300330    1
5500240    1
3800033    1
3800024    1
0100003    1
4900240    1
2200540    1
0200180    1
3200360    1
3800021    1
2500013    1
5500300    1
0200510    1
4100015    1
4600042    1
1600570    1
3300049    1
0400026    1
5500056    1
1300120    1
1700045    1
0600035    1
2700127    1
3800044    1
1200210    1
4600028    1
4100048    1
0200070    1
5000011    1
4700360    1
5100420    1
4400690    1
0100210    1
0200270    1
3800032    1
4600053    1
1700223    1
3100029    1
0200210    1
3800016    1
4900120    1
1300390    1
5400060    1
0600052    1
1600630    1
1200660    1
5400390    1

## Results
Show graphs and stats here

## Conclusions and Next Steps
Summarize findings here