# Completeness Check

*Justin R. Garrard*

### *Executive Summary*

This section represents the **Data Understanding** section of the CRISP-DM process.


* **[Data Completeness Limitations]** The visualization will only be able to use data from 2008-2016, owing to the limited availablity of certain data (latitude/longitude, assessments).


* **[Preliminary Modeling Indicator Selection]** We were able to select a number of promising indicators for use in modeling. A seperate notebook will be compiled to evaluate their utility.


Descriptions of any pre-existing column can be found under Appendix B. Simply use CTRL+f to find the column you are interested in.

### *Objectives*

1. **[Understanding]** To develop a working knowledge of the dataset's indicators and completeness.


2. **[Feature Selection]** To select likely indicators for use in modeling.

### Setup

In [1]:
# Import libraries
import os 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 
from ipywidgets import * 

In [2]:
# Declare global variables
DATA_DIR = os.path.join('../data/processed')
DATA_FILE = os.path.join(DATA_DIR, 'processed.csv')
plt.style.use('ggplot')

In [3]:
# Useful functions
def null_counter(df):
    record_nulls = []
    for col in df.columns:
        nulls = df[col].isnull().sum()
        percent_null = round((nulls / df.shape[0]) * 100, 2)
        record_nulls.append([col, nulls, percent_null])
    output = pd.DataFrame(record_nulls, columns=['Attribute', 'Null Count', '% Null'])
    return output

def get_year_range(df):
    year_range = list(df['year'].unique())
    year_range.sort()
    return year_range

### Preliminaries

In this section we preview the data, taking note of its scope and completeness.

***High-Level Overview***

* We have ~585,000 records with 51 indicators.


* The data ranges from 1986 to 2018.


* Completeness varies significantly between attributes.

In [4]:
# Load and preview data
edu_df = pd.read_csv(DATA_FILE)

nRow, nCol = edu_df.shape
print(f'There are {nRow} rows and {nCol} columns.')
print('')

YEAR_RANGE = get_year_range(edu_df)
print(f'Data spans the years {YEAR_RANGE[0]} to {YEAR_RANGE[-1]}.')
print('')

print('Available columns include:')
display(null_counter(edu_df))

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


There are 585650 rows and 51 columns.

Data spans the years 1986 to 2018.

Available columns include:


Unnamed: 0,Attribute,Null Count,% Null
0,leaid,0,0.0
1,year,0,0.0
2,read_test_num_valid,441477,75.38
3,read_test_pct_prof_midpt,441477,75.38
4,math_test_num_valid,441567,75.4
5,math_test_pct_prof_midpt,441567,75.4
6,lea_name,2521,0.43
7,state_leaid,2565,0.44
8,street_location,203079,34.68
9,city_location,202967,34.66


In [5]:
# Partition columns for easier analysis
location_cols = ["leaid", "year", "lea_name", "state_leaid", 
                 "street_location", "city_location", "state_location",
                 "zip_location", "zip4_location", "fips", "agency_type",
                 "number_of_schools", "county_code", "county_name", 
                 "latitude", "longitude", "cbsa", "cbsa_type", "csa"]

demographic_cols = ["leaid", "year", "teachers_total_fte", "staff_total_fte",
                    "spec_ed_students", "english_language_learners", 
                    "enrollment_x", "enrollment_y", "cmsa", "district_id",
                    "est_population_total", "est_population_5_17", 
                    "est_population_5_17_poverty", "est_population_5_17_poverty_pct",
                    "est_population_5_17_pct", "enrollment_fall_responsible", 
                    "enrollment_fall_school"]

assessment_cols = ["leaid","year","read_test_num_valid", 
                    "read_test_pct_prof_midpt", "math_test_num_valid",
                    "math_test_pct_prof_midpt", "grad_rate_midpt"]

financial_cols = ["leaid", "year", "rev_total", "rev_fed_total", "rev_state_total",
                "rev_local_total", "exp_total", "exp_current_instruction_total",
                "exp_current_supp_serve_total", "exp_current_other", "exp_nonelsec",
                "salaries_total", "benefits_employee_total", "debt_longterm_outstand_beg_FY"]

## Sanity check, should be empty if all cols are accounted for
subsets = set(location_cols + demographic_cols + assessment_cols + financial_cols)
print(set(edu_df.columns) - subsets)

set()


### Location Data

***High-Level Overview***


* Significant portions of the dataset are only available for certain year ranges (i.e. lat/long records begin in 2006)


* The vast majority (75%) of school districts have between 1-5 schools. However, at least two school districts have upwards of 1000 schools.


* Some entries need to be properly converted to NaN (such as those whose value is -2 or -1).


* The definition of [CBSA](https://www.census.gov/topics/housing/housing-patterns/about/core-based-statistical-areas.html) took some digging. It appears to be a Census indicator describing specific urban zones. 


* Likely candidates for clustering indicators are **fips, number_of_schools, and cbsa_type**.


In [6]:
# High-Level Overview
location_df = edu_df[location_cols]
display(null_counter(location_df))
print('')
display(location_df.head())

Unnamed: 0,Attribute,Null Count,% Null
0,leaid,0,0.0
1,year,0,0.0
2,lea_name,2521,0.43
3,state_leaid,2565,0.44
4,street_location,203079,34.68
5,city_location,202967,34.66
6,state_location,202963,34.66
7,zip_location,202967,34.66
8,zip4_location,284276,48.54
9,fips,2504,0.43





Unnamed: 0,leaid,year,lea_name,state_leaid,street_location,city_location,state_location,zip_location,zip4_location,fips,agency_type,number_of_schools,county_code,county_name,latitude,longitude,cbsa,cbsa_type,csa
0,100005,2013,ALBERTVILLE CITY,101,107 WEST MAIN ST,ALBERTVILLE,AL,35950.0,25.0,1.0,1.0,6.0,1095.0,MARSHALL COUNTY,34.267502,-86.208603,10700.0,2.0,290.0
1,100006,2013,MARSHALL COUNTY,48,12380 US HWY 431 SOUTH,GUNTERSVILLE,AL,35976.0,9351.0,1.0,1.0,16.0,1095.0,MARSHALL COUNTY,34.305,-86.286697,10700.0,2.0,290.0
2,100007,2013,HOOVER CITY,158,2810 METROPOLITAN WAY,HOOVER,AL,35243.0,5500.0,1.0,1.0,17.0,1073.0,JEFFERSON COUNTY,33.4062,-86.766899,13820.0,1.0,142.0
3,100008,2013,MADISON CITY,169,211 CELTIC DR,MADISON,AL,35758.0,1615.0,1.0,1.0,11.0,1089.0,MADISON COUNTY,34.687302,-86.744904,26620.0,1.0,290.0
4,100011,2013,LEEDS CITY,167,8121 PARKWAY DR,LEEDS,AL,35094.0,,1.0,1.0,3.0,1073.0,JEFFERSON COUNTY,33.543301,-86.541298,13820.0,1.0,142.0


In [7]:
# General statistics
display(location_df.describe())

Unnamed: 0,leaid,year,zip_location,fips,agency_type,number_of_schools,county_code,latitude,longitude,cbsa,cbsa_type,csa
count,585650.0,585650.0,382683.0,583146.0,583129.0,579245.0,583029.0,242140.0,242140.0,314631.0,314631.0,314629.0
mean,2984780.0,2002.479623,51242.348071,29.711683,1.8708,5.415857,29610.561507,39.642938,-91.786224,24041.530854,0.435396,138.533174
std,1464476.0,9.560158,28943.066672,14.638911,1.823968,19.461834,14586.647008,4.665949,15.199142,17436.385912,1.454066,183.776201
min,7230.0,1986.0,-2.0,1.0,1.0,-2.0,-2.0,-14.278038,-170.695602,-2.0,-2.0,-2.0
25%,1810080.0,1994.0,27105.0,18.0,1.0,1.0,18109.0,36.314642,-98.994393,11020.0,1.0,-2.0
50%,3025020.0,2003.0,55388.0,30.0,1.0,3.0,30089.0,40.44849,-89.554304,24940.0,1.0,34.0
75%,4023190.0,2011.0,74601.0,40.0,2.0,5.0,40095.0,42.650081,-80.398127,38060.0,1.0,260.0
max,7800030.0,2018.0,99929.0,78.0,9.0,1756.0,78030.0,71.299927,145.755997,79600.0,2.0,950.0


In [8]:
# Interactive Scatterplot for Location Metrics by Year
%matplotlib notebook

year_range = get_year_range(location_df)
metrics = list(location_df.columns)[2:]

@interact(year=(year_range[0],year_range[-1],1), metric=metrics)
def loc_metric_explorer(year, metric):
    # Clear any old figures
    plt.close()
    
    # Take a snapshot of the data for the given year
    snapshot = location_df[location_df['year'] == year].copy()
    snapshot.sort_values(metric, ascending=True, inplace=True)
    y_pos = np.arange(len(snapshot[metric]))
    
    # Make a plot to match states to the chosen metric
    plt.figure(figsize=(8, 8), num='Location Metric Explorer Tool')
    plt.scatter(snapshot['leaid'], snapshot[metric], alpha=0.5)
    plt.xlabel('leaid')
    plt.ylabel(metric)
    plt.title(f'{metric}: {year}')

    
interactive_plot = interactive(loc_metric_explorer,
                               year=2005,
                               metric=metrics[0])

interactive(children=(IntSlider(value=2002, description='year', max=2018, min=1986), Dropdown(description='met…

In [9]:
# Interactive Scatterplot for Location Map by Year
%matplotlib notebook

year_range = get_year_range(location_df)

@interact(year=(year_range[0],year_range[-1],1))
def loc_map_explorer(year):
    # Clear any old figures
    plt.close()
    
    # Take a snapshot of the data for the given year
    snapshot = location_df[location_df['year'] == year].copy()
#     snapshot.sort_values(metric, ascending=True, inplace=True)
    y_pos = np.arange(len(snapshot['latitude']))
    
    # Make a plot to match states to the chosen metric
    plt.figure(figsize=(8, 8), num='Location Explorer Tool')
    plt.scatter(snapshot['longitude'], snapshot['latitude'], alpha=0.5)
    plt.xlabel('longitude')
    plt.ylabel('latitude')
    plt.title('Lat/Long of Districts')

    
interactive_plot = interactive(loc_map_explorer,
                               year=2005)

interactive(children=(IntSlider(value=2002, description='year', max=2018, min=1986), Output()), _dom_classes=(…

### Demographic Data

***High-Level Overview***

* There are two enrollment columns resulting from the join process; one from the enrollment CSV and one from the finance CSV. Enrollment_x appears to be more complete.


* "staff_total_fte" and "enrollment_fall_school" are mostly empty.


* "cmsa" is a redunant attribute overlapping with cbsa from the Directory data.


* Likely candidates for clustering indicators include **teachers_total_fte, spec_ed_studetns, english_language_learners, est_population_5_17, and est_population_5_17_poverty**.

In [10]:
# High-Level Overview
demographic_df = edu_df[demographic_cols]
display(null_counter(demographic_df))
print('')
display(demographic_df.head())

Unnamed: 0,Attribute,Null Count,% Null
0,leaid,0,0.0
1,year,0,0.0
2,teachers_total_fte,30564,5.22
3,staff_total_fte,495827,84.66
4,spec_ed_students,95644,16.33
5,english_language_learners,223167,38.11
6,enrollment_x,26608,4.54
7,enrollment_y,57290,9.78
8,cmsa,317281,54.18
9,district_id,282501,48.24





Unnamed: 0,leaid,year,teachers_total_fte,staff_total_fte,spec_ed_students,english_language_learners,enrollment_x,enrollment_y,cmsa,district_id,est_population_total,est_population_5_17,est_population_5_17_poverty,est_population_5_17_poverty_pct,est_population_5_17_pct,enrollment_fall_responsible,enrollment_fall_school
0,100005,2013,259.5,,316.0,549.0,4713.0,4713.0,,5.0,21522.0,4010.0,1563.0,0.389776,0.186321,4713.0,4713.0
1,100006,2013,359.2,,679.0,292.0,5604.0,5604.0,,6.0,48162.0,8607.0,2136.0,0.24817,0.178709,5604.0,5604.0
2,100007,2013,938.13,,1079.0,508.0,13943.0,13943.0,,7.0,82139.0,14792.0,1409.0,0.095254,0.180085,13943.0,13943.0
3,100008,2013,529.0,,963.0,243.0,9554.0,9554.0,,8.0,44617.0,9559.0,785.0,0.082122,0.214246,9554.0,9554.0
4,100011,2013,94.0,,213.0,81.0,1913.0,1913.0,,11.0,11854.0,1745.0,442.0,0.253295,0.147208,1913.0,1913.0


In [11]:
# General statistics
display(demographic_df.describe())

Unnamed: 0,leaid,year,teachers_total_fte,staff_total_fte,spec_ed_students,english_language_learners,enrollment_x,enrollment_y,cmsa,district_id,est_population_total,est_population_5_17,est_population_5_17_poverty,est_population_5_17_poverty_pct,est_population_5_17_pct,enrollment_fall_responsible,enrollment_fall_school
count,585650.0,585650.0,555086.0,89823.0,490006.0,362483.0,559042.0,528360.0,268369.0,303149.0,303149.0,303149.0,303149.0,302910.0,303039.0,404716.0,147960.0
mean,2984780.0,2002.479623,163.770496,353.866662,342.516383,231.971742,2720.118,2955.169,86699.768133,14003.989012,23656.81,3892.255,688.375007,0.163252,0.167499,2831.828,2674.765
std,1464476.0,9.560158,728.226451,1241.007235,1603.064613,2315.370671,12512.65,13035.75,220303.948502,14108.143683,110545.3,18322.41,5282.505121,0.098864,0.042398,13263.63,12688.2
min,7230.0,1986.0,-3.0,-3.0,-9.0,-9.0,-9.0,-3.0,-2.0,1.0,0.0,0.0,0.0,0.0,0.0,-3.0,-3.0
25%,1810080.0,1994.0,15.0,34.65,16.0,0.0,196.0,288.0,0.0,4050.0,2438.0,395.0,50.0,0.088759,0.148026,215.0,175.0
50%,3025020.0,2003.0,48.07,100.04,82.0,2.0,713.0,859.0,0.0,9510.0,7097.0,1163.0,147.0,0.147155,0.172299,762.0,630.0
75%,4023190.0,2011.0,135.0,297.77,275.0,38.0,2190.0,2410.0,6680.0,20840.0,18869.0,3120.0,425.0,0.219653,0.193048,2292.0,2022.0
max,7800030.0,2018.0,70888.6,69989.27,169308.0,326893.0,1077381.0,1077381.0,978840.0,99965.0,8622698.0,1399391.0,487440.0,1.0,0.776471,1077381.0,1014020.0


In [12]:
# Interactive Scatterplot for Location Metrics by Year
%matplotlib notebook

year_range = get_year_range(location_df)
metrics = list(demographic_df.columns)[2:]

@interact(year=(year_range[0],year_range[-1],1), metric=metrics)
def dem_metric_explorer(year, metric):
    # Clear any old figures
    plt.close()
    
    # Take a snapshot of the data for the given year
    snapshot = demographic_df[demographic_df['year'] == year].copy()
    snapshot.sort_values(metric, ascending=True, inplace=True)
    y_pos = np.arange(len(snapshot[metric]))
    
    # Make a plot to match states to the chosen metric
    plt.figure(figsize=(8, 8), num='Demographic Metric Explorer Tool')
    plt.scatter(snapshot['leaid'], snapshot[metric], alpha=0.5, color='blue')
    plt.xlabel('leaid')
    plt.ylabel(metric)
    plt.title(f'{metric}: {year}')

    
interactive_plot = interactive(dem_metric_explorer,
                               year=2005,
                               metric=metrics[0])

interactive(children=(IntSlider(value=2002, description='year', max=2018, min=1986), Dropdown(description='met…

### Assessment Data

***High-Level Overview***

* Assessment data is only available from 2009-2017


* Graduation rate data is only available from 2010-2017


* All indicators in this section are likely clustering candidates.

In [13]:
# High-level overview
assessment_df = edu_df[assessment_cols]
display(null_counter(assessment_df))
print('')
display(assessment_df.head())

Unnamed: 0,Attribute,Null Count,% Null
0,leaid,0,0.0
1,year,0,0.0
2,read_test_num_valid,441477,75.38
3,read_test_pct_prof_midpt,441477,75.38
4,math_test_num_valid,441567,75.4
5,math_test_pct_prof_midpt,441567,75.4
6,grad_rate_midpt,489833,83.64





Unnamed: 0,leaid,year,read_test_num_valid,read_test_pct_prof_midpt,math_test_num_valid,math_test_pct_prof_midpt,grad_rate_midpt
0,100005,2013,2380.0,34.0,2402.0,33.0,94.0
1,100006,2013,2992.0,37.0,2994.0,38.0,89.0
2,100007,2013,7328.0,59.0,7343.0,57.0,95.0
3,100008,2013,5087.0,73.0,5102.0,70.0,97.0
4,100011,2013,962.0,40.0,963.0,30.0,82.0


In [14]:
# General statistics
display(assessment_df.describe())

Unnamed: 0,leaid,year,read_test_num_valid,read_test_pct_prof_midpt,math_test_num_valid,math_test_pct_prof_midpt,grad_rate_midpt
count,585650.0,585650.0,144173.0,144173.0,144083.0,144083.0,95817.0
mean,2984780.0,2002.479623,1602.30186,60.127631,1586.654067,55.194666,82.15229
std,1464476.0,9.560158,6057.769528,22.101465,5984.727835,23.881388,19.841209
min,7230.0,1986.0,0.0,-3.0,0.0,-3.0,-3.0
25%,1810080.0,1994.0,151.0,45.0,150.0,37.0,77.0
50%,3025020.0,2003.0,439.0,63.0,435.0,57.0,89.0
75%,4023190.0,2011.0,1245.0,77.0,1237.0,75.0,93.0
max,7800030.0,2018.0,354300.0,99.5,355121.0,99.5,99.0


In [15]:
# Interactive Scatterplot for Location Metrics by Year
%matplotlib notebook

year_range = get_year_range(location_df)
metrics = list(assessment_df.columns)[2:]

@interact(year=(year_range[0],year_range[-1],1), metric=metrics)
def assess_metric_explorer(year, metric):
    # Clear any old figures
    plt.close()
    
    # Take a snapshot of the data for the given year
    snapshot = assessment_df[assessment_df['year'] == year].copy()
    snapshot.sort_values(metric, ascending=True, inplace=True)
    y_pos = np.arange(len(snapshot[metric]))
    
    # Make a plot to match states to the chosen metric
    plt.figure(figsize=(8, 8), num='Assessment Metric Explorer Tool')
    plt.scatter(snapshot['leaid'], snapshot[metric], alpha=0.5, color='green')
    plt.xlabel('leaid')
    plt.ylabel(metric)
    plt.title(f'{metric}: {year}')

    
interactive_plot = interactive(assess_metric_explorer,
                               year=2005,
                               metric=metrics[0])

interactive(children=(IntSlider(value=2002, description='year', max=2018, min=1986), Dropdown(description='met…

### Financial Data

***High-Level Overview*** 

* Financial data covers 1994-2016.


* The amount of data available warrants its own EDA.


* Likely clustering indicators include **rev_total** and **exp_total**.


In [16]:
# High-level overview
financial_df = edu_df[financial_cols]
display(null_counter(financial_df))
print('')
display(financial_df.head())

Unnamed: 0,Attribute,Null Count,% Null
0,leaid,0,0.0
1,year,0,0.0
2,rev_total,180871,30.88
3,rev_fed_total,180871,30.88
4,rev_state_total,180871,30.88
5,rev_local_total,180871,30.88
6,exp_total,180871,30.88
7,exp_current_instruction_total,180871,30.88
8,exp_current_supp_serve_total,180871,30.88
9,exp_current_other,180871,30.88





Unnamed: 0,leaid,year,rev_total,rev_fed_total,rev_state_total,rev_local_total,exp_total,exp_current_instruction_total,exp_current_supp_serve_total,exp_current_other,exp_nonelsec,salaries_total,benefits_employee_total,debt_longterm_outstand_beg_FY
0,100005,2013,43875000.0,5380000.0,25102000.0,13393000.0,43121000.0,21958000.0,13381000.0,2843000.0,847000.0,21802000.0,8443000.0,32614000.0
1,100006,2013,55299000.0,7152000.0,34055000.0,14092000.0,54005000.0,27312000.0,19664000.0,4429000.0,1283000.0,29544000.0,11545000.0,18807000.0
2,100007,2013,162705000.0,5936000.0,68667000.0,88102000.0,168763000.0,92745000.0,51488000.0,8138000.0,4325000.0,89891000.0,33218000.0,297560000.0
3,100008,2013,100874000.0,4509000.0,53164000.0,43201000.0,104829000.0,50177000.0,31748000.0,4343000.0,1069000.0,47922000.0,17886000.0,128448000.0
4,100011,2013,18899000.0,1525000.0,9914000.0,7460000.0,17957000.0,9153000.0,5741000.0,1054000.0,226000.0,9057000.0,3400000.0,40521000.0


In [17]:
# General statistics
display(financial_df.describe())

Unnamed: 0,leaid,year,rev_total,rev_fed_total,rev_state_total,rev_local_total,exp_total,exp_current_instruction_total,exp_current_supp_serve_total,exp_current_other,exp_nonelsec,salaries_total,benefits_employee_total,debt_longterm_outstand_beg_FY
count,585650.0,585650.0,404779.0,404779.0,404779.0,404779.0,404779.0,404779.0,404779.0,404779.0,404779.0,404779.0,404779.0,404779.0
mean,2984780.0,2002.479623,29501070.0,2505222.0,13540740.0,13455100.0,29828470.0,15269240.0,8717093.0,1041356.0,313723.8,15377810.0,5085043.0,15944120.0
std,1464476.0,9.560158,179241600.0,18609900.0,81577270.0,87122010.0,190253500.0,110480000.0,42867530.0,5306289.0,2634643.0,88275910.0,39051230.0,121499300.0
min,7230.0,1986.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-3.0,-3.0,-34000.0
25%,1810080.0,1994.0,2583000.0,144000.0,1050000.0,767000.0,2528000.0,1244000.0,783000.0,73000.0,0.0,1212000.0,300000.0,0.0
50%,3025020.0,2003.0,8014000.0,476000.0,3612000.0,2988000.0,7933000.0,3996000.0,2389000.0,274000.0,3000.0,4015000.0,1159000.0,1055000.0
75%,4023190.0,2011.0,23242000.0,1507000.0,10413000.0,9900000.0,23300000.0,11815000.0,6949000.0,792000.0,99000.0,11991000.0,3864000.0,8940000.0
max,7800030.0,2018.0,29855530000.0,3120314000.0,11072730000.0,16721500000.0,30727360000.0,18814050000.0,5408653000.0,584491000.0,280400000.0,12189520000.0,7450059000.0,13800260000.0


In [18]:
# Interactive Scatterplot for Financial Metrics by Year
%matplotlib notebook

year_range = get_year_range(location_df)
metrics = list(financial_df.columns)[2:]

@interact(year=(year_range[0],year_range[-1],1), metric=metrics)
def fin_metric_explorer(year, metric):
    # Clear any old figures
    plt.close()
    
    # Take a snapshot of the data for the given year
    snapshot = financial_df[financial_df['year'] == year].copy()
    snapshot.sort_values(metric, ascending=True, inplace=True)
    y_pos = np.arange(len(snapshot[metric]))
    
    # Make a plot to match states to the chosen metric
    plt.figure(figsize=(8, 8), num='Financial Metric Explorer Tool')
    plt.scatter(snapshot['leaid'], snapshot[metric], alpha=0.5, color='purple')
    plt.xlabel('leaid')
    plt.ylabel(metric)
    plt.title(f'{metric}: {year}')

    
interactive_plot = interactive(fin_metric_explorer,
                               year=2005,
                               metric=metrics[0])

interactive(children=(IntSlider(value=2002, description='year', max=2018, min=1986), Dropdown(description='met…

### Appendix A: Null Visualizer

This section has a tool for visualizing where null values are throughout time.

In [19]:
# Interactive Scatterplot for Location Metrics by Year
%matplotlib notebook

year_range = get_year_range(edu_df)

@interact(year=(year_range[0],year_range[-1],1))
def null_explorer(year):
    # Clear any old figures
    plt.close()
    
    # Take a snapshot of the data for the given year
    snapshot = edu_df[edu_df['year'] == year].copy()
    y_pos = np.arange(len(edu_df.columns))
    
    # Make a plot to match states to the chosen metric
    plt.figure(figsize=(8, 8), num='Null Value Explorer Tool')
    plt.barh(list(edu_df.columns), snapshot.isnull().sum())
    plt.xscale("log")

    
interactive_plot = interactive(null_explorer,
                               year=2005)

interactive(children=(IntSlider(value=2002, description='year', max=2018, min=1986), Output()), _dom_classes=(…

In [20]:
# # Export to file as necessary
# tgt_year = 2016

# null_df = edu_df[edu_df['year'] == tgt_year].copy()
# null_df = null_df[null_df.isnull().any(axis=1)]
# display(null_df)
# null_df.to_csv('nulls.csv')

### Appendix B: Data Definitions

This section contains descriptions of each column as taken from their corresponding reference file in `references/`.

#### Directory Data

| variable                       | format                    | label                                                                              |
| ------------------------------ | ------------------------- | ---------------------------------------------------------------------------------- |
| leaid                          | string                    | Local education agency identification number (NCES)                                |
| year                           | numeric                   | Academic year (fall semester)                                                      |
| lea_name                       | string                    | Local education agency name                                                        |
| fips                           | fips                      | Federal Information Processing Standards state code                                |
| state_leaid                    | string                    | Local education agency identification number (state)                               |
| street_mailing                 | string                    | Street of mailing address                                                          |
| city_mailing                   | string                    | City of mailing address                                                            |
| state_mailing                  | string                    | State of mailing address                                                           |
| zip_mailing                    | string                    | Zip code of mailing address                                                        |
| zip4_mailing                   | string                    | 4-digit zip code of mailing address                                                |
| street_location                | string                    | Street of location                                                                 |
| city_location                  | string                    | City of location                                                                   |
| state_location                 | string                    | State of location                                                                  |
| zip_location                   | string                    | Zip code of location                                                               |
| zip4_location                  | string                    | 4-digit zip code of location                                                       |
| phone                          | string                    | Telephone number                                                                   |
| latitude                       | numeric                   | Latitude of institution                                                            |
| longitude                      | numeric                   | Longitude of institution                                                           |
| urban_centric_locale           | urban_centric_locale      | Degree of urbanization (urban-centric locale)                                      |
| cbsa                           | numeric                   | Core-based statistical area                                                        |
| cbsa_type                      | cbsa_type                 | Core-based statistical area type: Metropolitan or micropolitan                     |
| csa                            | numeric                   | Combined statistical area                                                          |
| cmsa                           | numeric                   | Consolidated metropolitan statistical area                                         |
| necta                          | numeric                   | New England city and town area                                                     |
| county_code                    | numeric                   | County code                                                                        |
| county_name                    | string                    | County name                                                                        |
| congress_district_id           | numeric                   | State and 114th congressional district identification number                       |
| state_leg_district_lower       | string                    | State legislative district—lower                                                   |
| state_leg_district_upper       | string                    | State legislative district—upper                                                   |
| bureau_indian_education        | yes_no                    | Bureau of Indian Education school                                                  |
| supervisory_union_number       | string                    | Supervisory union number                                                           |
| agency_type                    | agency_type               | Agency type                                                                        |
| boundary_change_indicator      | boundary_change_indicator | Boundary change                                                                    |
| agency_charter_indicator       | agency_charter_indicator  | Agency charter                                                                     |
| lowest_grade_offered           | grade_offered_ccd         | Lowest grade offered                                                               |
| highest_grade_offered          | grade_offered_ccd         | Highest grade offered                                                              |
| number_of_schools              | numeric                   | Number of schools associated with this agency                                      |
| enrollment                     | numeric                   | Student enrollment                                                                 |
| spec_ed_students               | numeric                   | Number of special education students                                               |
| english_language_learners      | numeric                   | Number of English language learners                                                |
| migrant_students               | numeric                   | Number of migrant students                                                         |
| teachers_prek_fte              | numeric                   | Number of full-time equivalent prekindergarten teachers                            |
| teachers_kindergarten_fte      | numeric                   | Number of full-time equivalent kindergarten teachers                               |
| teachers_elementary_fte        | numeric                   | Number of full-time equivalent elementary school teachers                          |
| teachers_secondary_fte         | numeric                   | Number of full-time equivalent secondary school teachers                           |
| teachers_ungraded_fte          | numeric                   | Number of full-time equivalent ungraded teachers                                   |
| teachers_total_fte             | numeric                   | Total full-time equivalent teachers                                                |
| instructional_aides_fte        | numeric                   | Number of full-time equivalent instructional aides or paraprofessionals            |
| coordinators_fte               | numeric                   | Number of full-time equivalent instructional coordinators and supervisors          |
| guidance_counselors_elem_fte   | numeric                   | Number of full-time equivalent elementary school guidance counselors               |
| guidance_counselors_sec_fte    | numeric                   | Number of full-time equivalent secondary school guidance counselors                |
| guidance_counselors_other_fte  | numeric                   | Number of full-time equivalent other guidance counselors                           |
| guidance_counselors_total_fte  | numeric                   | Total full-time equivalent guidance counselors                                     |
| school_counselors_fte          | numeric                   | Number of full-time school counselors                                              |
| librarian_specialists_fte      | numeric                   | Number of full-time equivalent librarians or media specialists                     |
| librarian_support_staff_fte    | numeric                   | Number of full-time equivalent library or media support staff                      |
| lea_administrators_fte         | numeric                   | Number of full-time equivalent local education agency administrators               |
| lea_admin_support_staff_fte    | numeric                   | Number of full-time equivalent local education agency administrative support staff |
| lea_staff_total_fte            | numeric                   | Total full-time equivalent LEA staff                                               |
| school_administrators_fte      | numeric                   | Number of full-time equivalent school administrators                               |
| school_admin_support_staff_fte | numeric                   | Number of full-time equivalent school administrative support staff                 |
| school_staff_total_fte         | numeric                   | Total full-time equivalent school staff                                            |
| support_staff_students_fte     | numeric                   | Number of full-time equivalent student support services staff                      |
| support_staff_other_fte        | numeric                   | Number of full-time equivalent other support services staff                        |
| staff_total_fte                | numeric                   | Total full-time equivalent staff                                                   |
| other_staff_fte                | numeric                   | Number of other full-time equivalent staff                                         |

#### Enrollment Data

| variable   | format  | label                                               |
| ---------- | ------- | --------------------------------------------------- |
| leaid      | string  | Local education agency identification number (NCES) |
| year       | numeric | Academic year (fall semester)                       |
| fips       | fips    | Federal Information Processing Standards state code |
| grade      | grade   | Grade                                               |
| race       | race    | Race and ethnicity                                  |
| sex        | sex     | Sex                                                 |
| enrollment | numeric | Student enrollment                                  |

#### Financial Data

| variable                        | format  | label                                                                                             |
| ------------------------------- | ------- | ------------------------------------------------------------------------------------------------- |
| year                            | numeric | Academic year (fall semester)                                                                     |
| leaid                           | string  | Local education agency identification number (NCES)                                               |
| fips                            | fips    | Federal Information Processing Standards state code                                               |
| censusid                        | string  | US Census Bureau 14-digit government identification number                                        |
| rev_total                       | numeric | Total revenue                                                                                     |
| rev_fed_total                   | numeric | Total federal revenue                                                                             |
| rev_fed_child_nutrition_act     | numeric | Federal revenue through the state for the Child Nutrition Act                                     |
| rev_fed_state_title_i           | numeric | Federal revenue through the state for Title I                                                     |
| rev_fed_state_idea              | numeric | Federal revenue through the state for the Individuals with Disabilities Education Act             |
| rev_fed_state_math_sci_teach    | numeric | Federal revenue through the state for math, science, and teacher quality                          |
| rev_fed_state_drug_free         | numeric | Federal revenue through the state for safe and drug-free schools                                  |
| rev_fed_state_vocational        | numeric | Federal revenue through the state for vocational and tech education                               |
| rev_fed_state_bilingual_ed      | numeric | Federal revenue through the state for bilingual education                                         |
| rev_fed_state_other             | numeric | Federal revenue through the state for other purposes                                              |
| rev_fed_direct_impact_aid       | numeric | Direct federal revenue for impact aid                                                             |
| rev_fed_direct_indian_ed        | numeric | Direct federal revenue for Indian Education                                                       |
| rev_fed_direct_other            | numeric | Direct federal revenue for other purposes                                                         |
| rev_fed_arra                    | numeric | Federal revenue from the American Recovery and Reinvestment Act                                   |
| rev_fed_nonspec                 | numeric | Federal revenue for nonspecified purposes                                                         |
| rev_state_total                 | numeric | Total state revenue                                                                               |
| rev_state_gen_formula_assist    | numeric | State revenue from general formula assistance                                                     |
| rev_state_special_ed            | numeric | State revenue for special education programs                                                      |
| rev_state_transportation        | numeric | State revenue for transportation programs                                                         |
| rev_state_staff_improve         | numeric | State revenue for staff improvement programs                                                      |
| rev_state_compens_basic_ed      | numeric | State revenue for compensatory and basic skills programs                                          |
| rev_state_vocational_ed         | numeric | State revenue for vocational education programs                                                   |
| rev_state_outlay_capital_debt   | numeric | State revenue for capital outlay and debt services programs                                       |
| rev_state_bilingual_ed          | numeric | State revenue for bilingual education                                                             |
| rev_state_gifted_talented       | numeric | State revenue for gifted and talented programs                                                    |
| rev_state_sch_lunch             | numeric | State revenue for school lunch programs                                                           |
| rev_state_oth_prog              | numeric | State revenue for other programs                                                                  |
| rev_state_employee_benefits     | numeric | State revenue, on behalf of the local education agency, for employee benefits                     |
| rev_state_not_employee_benefits | numeric | State revenue, on behalf of the local education agency, for other benefits than employee benefits |
| rev_state_nonspec               | numeric | State revenue for nonspecified purposes                                                           |
| rev_local_total                 | numeric | Total local revenue                                                                               |
| rev_local_parent_govt           | numeric | Local revenue from parent government contributions                                                |
| rev_local_prop_tax              | numeric | Local revenue from property taxes                                                                 |
| rev_local_sales_tax             | numeric | Local revenue from general sales taxes                                                            |
| rev_local_utility_tax           | numeric | Local revenue from public utility taxes                                                           |
| rev_local_income_tax            | numeric | Local revenue from individual and corporate income taxes                                          |
| rev_local_other_tax             | numeric | Local revenue from all other taxes                                                                |
| rev_local_other_sch_systems     | numeric | Local revenue from other school systems                                                           |
| rev_local_cities_counties       | numeric | Local revenue from cities and counties                                                            |
| rev_local_tuition_fees          | numeric | Local revenue from tuition fees from pupils and parents                                           |
| rev_local_transportation_fees   | numeric | Local revenue from transportation fees from pupils and parents                                    |
| rev_local_sch_lunch             | numeric | Local revenue from school lunches                                                                 |
| rev_local_textbook_sales_rents  | numeric | Local revenue from textbook sales and rentals                                                     |
| rev_local_dist_activ_receipts   | numeric | Local revenue from district activity receipts                                                     |
| rev_local_student_fees_nonspec  | numeric | Local revenue from nonspecified student fees                                                      |
| rev_local_oth_sales_serv        | numeric | Local revenue from other sales and services                                                       |
| rev_local_interest_earnings     | numeric | Local revenue from interest earnings                                                              |
| rev_local_rents_royalties       | numeric | Local revenue from rents and royalties                                                            |
| rev_local_property_sale         | numeric | Local revenue from property sales                                                                 |
| rev_local_fines_forfeits        | numeric | Local revenue from fines and forfeits                                                             |
| rev_local_private_contrib       | numeric | Local revenue from private contributions                                                          |
| rev_local_misc                  | numeric | Local revenue from miscellaneous sources                                                          |
| rev_nces                        | numeric | NCES local revenue, Census Bureau state revenue                                                   |
| exp_total                       | numeric | Total expenditures                                                                                |
| exp_current_elsec_total         | numeric | Total current expenditures for elementary and secondary education                                 |
| exp_current_state_local_funds   | numeric | Current expenditures from state and local funds                                                   |
| exp_current_federal_funds       | numeric | Current expenditures from federal funds                                                           |
| exp_current_instruction_total   | numeric | Total current expenditures for instruction                                                        |
| exp_current_supp_serve_total    | numeric | Total current expenditures for support services                                                   |
| exp_current_pupils              | numeric | Current expenditures for pupil support services                                                   |
| exp_current_instruc_staff       | numeric | Current expenditures for instructional staff support services                                     |
| exp_current_general_admin       | numeric | Current expenditures for general administration support services                                  |
| exp_current_sch_admin           | numeric | Current expenditures for school administration support services                                   |
| exp_current_operation_plant     | numeric | Current expenditures for support services for operation and maintenance of plant                  |
| exp_current_student_transport   | numeric | Current expenditures for student transportation support services                                  |
| exp_current_bco                 | numeric | Current expenditures for business, central, and other support services                            |
| exp_current_supp_serv_nonspec   | numeric | Current expenditures for support services for nonspecified purposes                               |
| exp_current_other               | numeric | Total current expenditures for other elementary or secondary                                      |
| exp_current_food_serv           | numeric | Current expenditures for food services                                                            |
| exp_current_enterprise          | numeric | Current expenditures for enterprise operations                                                    |
| exp_current_other_elsec         | numeric | Current expenditures for other elementary or secondary school purposes                            |
| exp_nonelsec                    | numeric | Total non-elementary or secondary school expenditures                                             |
| exp_nonelsec_community_serv     | numeric | Non-elementary or secondary school expenditures for community services                            |
| exp_nonelsec_adult_education    | numeric | Non-elementary or secondary school expenditures for adult education                               |
| exp_nonelsec_other              | numeric | Non-elementary or secondary school expenditures for other purposes                                |
| exp_current_arra                | numeric | Current expenditures for the American Recovery and Reinvestment Act                               |
| exp_textbooks                   | numeric | Expenditures on textbooks                                                                         |
| exp_utilities_energy            | numeric | Expenditures for utilities and energy services                                                    |
| exp_tech_supplies_services      | numeric | Expenditures for technology-related supplies and purchased services                               |
| exp_tech_equipment              | numeric | Expenditures for technology-related equipment                                                     |
| outlay_capital_total            | numeric | Total capital outlay expenditures                                                                 |
| outlay_capital_construction     | numeric | Capital outlay for construction                                                                   |
| outlay_capital_land_structures  | numeric | Capital outlay for land and existing structures                                                   |
| outlay_capital_instruc_equip    | numeric | Capital outlay for instructional equipment                                                        |
| outlay_capital_other_equip      | numeric | Capital outlay for other equipment                                                                |
| outlay_capital_nonspec_equip    | numeric | Capital outlay for nonspecified equipment                                                         |
| outlay_capital_arra             | numeric | Capital outlay for the American Recovery and Reinvestment Act                                     |
| payments_private_schools        | numeric | Payments to private schools                                                                       |
| payments_charter_schools        | numeric | Payments to charter schools                                                                       |
| payments_state_govt             | numeric | Payments to state governments                                                                     |
| payments_local_govt             | numeric | Payments to local governments                                                                     |
| payments_other_sch_system       | numeric | Payments to other school systems                                                                  |
| salaries_total                  | numeric | Total salary amount                                                                               |
| salaries_instruction            | numeric | Salaries for instruction                                                                          |
| salaries_teachers_regular_prog  | numeric | Teacher salaries for regular education programs                                                   |
| salaries_teachers_sped          | numeric | Teacher salaries for special education programs                                                   |
| salaries_teachers_vocational    | numeric | Teacher salaries for vocational education programs                                                |
| salaries_teachers_other_ed      | numeric | Teacher salaries for other education programs                                                     |
| salaries_supp_pupils            | numeric | Salaries for pupil support services                                                               |
| salaries_supp_instruc_staff     | numeric | Salaries for instructional staff support services                                                 |
| salaries_supp_general_admin     | numeric | Salaries for general administration support services                                              |
| salaries_supp_sch_admin         | numeric | Salaries for school administration support services                                               |
| salaries_supp_operation_plant   | numeric | Salaries for support services for operation and maintenance of plant                              |
| salaries_supp_stud_transport    | numeric | Salaries for student transportation support services                                              |
| salaries_supp_bco               | numeric | Salaries for business, central, and other support services                                        |
| salaries_food_service           | numeric | Salaries for food services                                                                        |
| benefits_employee_total         | numeric | Total employee benefits (dollars)                                                                 |
| benefits_employee_instruction   | numeric | Employee benefits for instruction (dollars)                                                       |
| benefits_supp_pupils            | numeric | Employee benefits for pupil support services (dollars)                                            |
| benefits_supp_instruc_staff     | numeric | Employee benefits for instructional staff support services (dollars)                              |
| benefits_supp_general_admin     | numeric | Employee benefits for general administration support services (dollars)                           |
| benefits_supp_sch_admin         | numeric | Employee benefits for school administration support services (dollars)                            |
| benefits_supp_operation_plant   | numeric | Employee benefits for support services for operation and maintenance of plant (dollars)           |
| benefits_supp_stud_transport    | numeric | Employee benefits for student transportation support services (dollars)                           |
| benefits_supp_bco               | numeric | Employee benefits for business, central, and other support services (dollars)                     |
| benefits_food_service           | numeric | Employee benefits for food services (dollars)                                                     |
| benefits_enterprise_operations  | numeric | Employee benefits for enterprise operations (dollars)                                             |
| debt_interest                   | numeric | Interest on debt                                                                                  |
| debt_longterm_outstand_beg_FY   | numeric | Long-term debt outstanding at beginning of fiscal year                                            |
| debt_longterm_issued_FY         | numeric | Long-term debt issued during fiscal year                                                          |
| debt_longterm_retired_FY        | numeric | Long-term debt retired during fiscal year                                                         |
| debt_longterm_outstand_end_FY   | numeric | Long-term debt outstanding at end of fiscal year                                                  |
| debt_shortterm_outstand_beg_FY  | numeric | Short-term debt outstanding at end of fiscal year                                                 |
| debt_shortterm_outstand_end_FY  | numeric | Short-term debt outstanding at beginning of fiscal year                                           |
| assets_sinking_fund             | numeric | Assets in a sinking fund (dollars)                                                                |
| assets_bond_fund                | numeric | Assets in a bond fund (dollars)                                                                   |
| assets_other                    | numeric | Assets in other funds (dollars)                                                                   |
| enrollment_fall_responsible     | numeric | Number of students for which the reporting local education agency is financially responsible      |
| enrollment_fall_school          | numeric | Number of students attending school within the reporting local education agency                   |

#### Assessment Data

|variable|format|label|
|---|---|---|
|leaid|string|Local education agency identification number (NCES)|
|leaid_num|numeric|Local education agency identification number (NCES) (numeric)|
|year|numeric|Academic year (fall semester)|
|lea_name|string|Local education agency name|
|fips|fips|Federal Information Processing Standards state code|
|grade_edfacts|grade_edfacts|Grade category (as reported in EDFacts)|
|race|race|Race and ethnicity|
|sex|sex|Sex|
|lep|lep|Students with limited English proficiency|
|homeless|special_pop|Students who are homeless|
|migrant|special_pop|Students who are migrants|
|disability|disability|Students with disabilities|
|econ_disadvantaged|special_pop|Students who are economically disadvantaged|
|foster_care|special_pop|Students who are in foster care|
|military_connected|special_pop|Students who are connected to the military|
|read_test_num_valid|numeric|Number of students who completed a reading or language arts assessment and for whom a proficiency level was assigned|
|read_test_pct_prof_midpt|numeric|Midpoint of the range used to report the share of students scoring proficient on a reading or language arts assessment (0–100 scale)|
|read_test_pct_prof_high|numeric|High end of the range used to report the share of students scoring proficient on a reading or language arts assessment (0–100 scale)|
|read_test_pct_prof_low|numeric|Low end of the range used to report the share of students scoring proficient on a reading or language arts assessment (0–100 scale)|
|math_test_num_valid|numeric|Number of students who completed a mathematics assessment and for whom a proficiency level was assigned|
|math_test_pct_prof_midpt|numeric|Midpoint of the range used to report the share of students scoring proficient on a mathematics assessment (0–100 scale)|
|math_test_pct_prof_high|numeric|High end of the range used to report the share of students scoring proficient on a mathematics assessment (0–100 scale)|
|math_test_pct_prof_low|numeric|Low end of the range used to report the share of students scoring proficient on a mathematics assessment (0–100 scale)|

#### Graduation Data

| variable           | format      | label                                                           |
| ------------------ | ----------- | --------------------------------------------------------------- |
| leaid              | string      | Local education agency identification number (NCES)             |
| year               | numeric     | Academic year (fall semester)                                   |
| fips               | fips        | Federal Information Processing Standards state code             |
| leaid_num          | numeric     | Local education agency identification number (NCES) (numeric)   |
| lea_name           | string      | Local education agency name                                     |
| race               | race        | Race and ethnicity                                              |
| lep                | lep         | Students with limited English proficiency                       |
| homeless           | special_pop | Students who are homeless                                       |
| disability         | disability  | Students with disabilities                                      |
| econ_disadvantaged | special_pop | Students who are economically disadvantaged                     |
| foster_care        | special_pop | Students who are in foster care                                 |
| cohort_num         | numeric     | Students in the adjusted cohort graduation rate cohort          |
| grad_rate_low      | numeric     | Low end of the high school graduation rate range (0–100 scale)  |
| grad_rate_high     | numeric     | High end of the high school graduation rate range (0–100 scale) |
| grad_rate_midpt    | numeric     | Midpoint of the high school graduation rate range (0–100 scale) |

#### Saipe Data

| variable                        | format  | label                                                                                             |
| ------------------------------- | ------- | ------------------------------------------------------------------------------------------------- |
| leaid                           | string  | Local education agency identification number (NCES)                                               |
| year                            | numeric | Academic year (fall semester)                                                                     |
| fips                            | fips    | Federal Information Processing Standards state code                                               |
| district_id                     | string  | District identification numbers reported in the US Census Small Area Income and Poverty Estimates |
| district_name                   | string  | District names reported in the US Census Small Area Income and Poverty Estimates                  |
| est_population_total            | numeric | Estimated total population                                                                        |
| est_population_5_17             | numeric | Estimated total population ages 5–17                                                              |
| est_population_5_17_pct         | numeric | Share of population that is school age (ages 5–17)                                                |
| est_population_5_17_poverty     | numeric | Estimated population ages 5–17 in poverty                                                         |
| est_population_5_17_poverty_pct | numeric | Share of school-age population (ages 5–17) in poverty                                             |