# Select School Location and Attendance Boundaries
The school location and attendance boundary files were obtained in the program `NCES_1av1_ObtainSchoolData`.

The program reads in the unzipped National Center for Education Statistics (NCES) Education Demographic and Geographic Estimates (EDGE) Data files and selects location information for a single county.

## Description of Program
- program:    NCES_2av1_SelectCountySchools
- task:       Select School Location and Attendance Boundaries
- Version:    2021-06-06 - Robeson County, NC (37155)
-             2021-06-22 - Joplin MO [Jasper and Newton County] (29145, 29097)
- project:    Interdependent Networked Community Resilience Modeling Environment (IN-CORE) Subtask 5.2 - Social Institutions
- funding:	  NIST Financial Assistance Award Numbers: 70NANB15H044 and 70NANB20H008 
- author:     Nathanael Rosenheim

- Suggested Citation:
Rosenheim, N. (2021) “Obtain, Clean, and Explore School Location and Attendance Boundary Data". 
Archived on Github and ICPSR.

In [1]:
# Import Python Packages Required for program
import pandas as pd       # Pandas for reading in data 
import geopandas as gpd   # Geopandas for reading Shapefiles
import numpy as np        # Numpy helps with selected data
import os                 # Operating System (os) For folders and finding working directory
import folium as fm       # folium has more dynamic maps - but requires internet connection

In [2]:
# Display versions being used - important information for replication
import sys
print("Python Version     ", sys.version)
print("pandas version:    ", pd.__version__)
print("geopandas version: ", gpd.__version__)
print("numpy version:     ", np.__version__)
print("folium version:    ", fm.__version__)

Python Version      3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 15:37:01) [MSC v.1916 64 bit (AMD64)]
pandas version:     1.2.4
geopandas version:  0.9.0
numpy version:      1.20.2
folium version:     0.12.1


In [3]:
# Store Program Name for output files to have the same name
programname = "NCES_2av1_SelectCountySchools_2021-06-22"
# Make directory to save output
if not os.path.exists(programname):
    os.mkdir(programname)

## Obtain NCES Files
The previous program `NCES_1av1_ObtainSchoolData` downloaded and unzippped the NCES data files. This program will read these files in as Geopandas Dataframes. There are four types of school location data:
1. Public Schools
2. Private Schools
3. Postsecondary Schools
4. School Attendance Boundaries

The file `NCES_1av1_ObtainSchoolData_2021-06-04.csv` has a list of all of the files obtained and unzipped. The location of the unzipped shapefile will be used in a loop that will read in each file.

In [4]:
filelist_df = pd.read_csv('NCES_1av1_ObtainSchoolData_2021-06-04.csv')
filelist_df

Unnamed: 0,File Description,School Year,Documentation File Name,Data File Name,Unzipped Shapefile File Location,Documentation File URL,Data File URL
0,Postsecondary School File,2015-2016,EDGE_GEOCODE_POSTSEC_FILEDOC.pdf,EDGE_GEOCODE_POSTSECONDARYSCH_1516.zip,EDGE_GEOCODE_POSTSECONDARYSCH_1516/EDGE_GEOCOD...,https://nces.ed.gov/programs/edge/docs/,https://nces.ed.gov/programs/edge/data/
1,Public District File,2015-2016,EDGE_GEOCODE_PUBLIC_FILEDOC.pdf,EDGE_GEOCODE_PUBLICLEA_1516.zip,EDGE_GEOCODE_PUBLICLEA_1516/EDGE_GEOCODE_PUBLI...,https://nces.ed.gov/programs/edge/docs/,https://nces.ed.gov/programs/edge/data/
2,Public School File,2015-2016,EDGE_GEOCODE_PUBLIC_FILEDOC.pdf,EDGE_GEOCODE_PUBLICSCH_1516.zip,EDGE_GEOCODE_PUBLICSCH_1516/EDGE_GEOCODE_PUBLI...,https://nces.ed.gov/programs/edge/docs/,https://nces.ed.gov/programs/edge/data/
3,Private School File,2015-2016,EDGE_GEOCODE_PSS1718_FILEDOC.pdf,EDGE_GEOCODE_PRIVATESCH_15_16.zip,EDGE_GEOCODE_PRIVATESCH_15_16.shp,https://nces.ed.gov/programs/edge/docs/,https://nces.ed.gov/programs/edge/data/
4,School Attendance Boundaries Single Shapefile,2015-2016,EDGE_SABS_2015_2016_TECHDOC.pdf,SABS_1516.zip,SABS_1516/SABS_1516.shp,https://nces.ed.gov/programs/edge/docs/,https://nces.ed.gov/programs/edge/data/


## Create output folder to save files

In [5]:
output_directory = 'programs_edge_data_county'
# Make directory to save output
if not os.path.exists(output_directory):
    print("Making new directory to save output: ",output_directory)
    os.mkdir(output_directory)
else:
    print("Directory",output_directory,"Already exists.")

Making new directory to save output:  programs_edge_data_county


### Loop through file list  and create geodataframe

In [8]:
# Where are unzipped files saved
sourcedatafolder = '../programs_edge_data_unzipped'

schooldata = {} # create empty dictionary to store geodataframes for each file

for index, files in filelist_df.iterrows():
    print("\nRead in shapefile for ",files['File Description'],"Files for School Year",files['School Year'])
    
    # break loop for debugging
    # if index == 1:
    #    break
    
    #where is unzipped shapefile
    file = files['Unzipped Shapefile File Location']
    filepath = sourcedatafolder+"/"+file
    print("   Checking to see if file",file,"has been unzipped...")
       
    # Check if file has been unzipped
    if not os.path.exists(filepath):
        print("   Warning file: ",file, "has not been unzipped - run the obtain data program first")
    else:
        print("   file saved as geopandas dataframe in a dictionary with 2 keys.")
        schooldata[(files['File Description'],files['School Year'])] = gpd.read_file(filepath)
        print("   To see top rows of geodataframe use the command:")
        print("   schooldata[('",files['File Description'],"','",files['School Year'],"')].head()",sep="")


Read in shapefile for  Postsecondary School File Files for School Year 2015-2016
   Checking to see if file EDGE_GEOCODE_POSTSECONDARYSCH_1516/EDGE_GEOCODE_POSTSECONDARYSCH_1516.shp has been unzipped...
   file saved as geopandas dataframe in a dictionary with 2 keys.
   To see top rows of geodataframe use the command:
   schooldata[('Postsecondary School File','2015-2016')].head()

Read in shapefile for  Public District File Files for School Year 2015-2016
   Checking to see if file EDGE_GEOCODE_PUBLICLEA_1516/EDGE_GEOCODE_PUBLICLEA_1516.shp has been unzipped...
   file saved as geopandas dataframe in a dictionary with 2 keys.
   To see top rows of geodataframe use the command:
   schooldata[('Public District File','2015-2016')].head()

Read in shapefile for  Public School File Files for School Year 2015-2016
   Checking to see if file EDGE_GEOCODE_PUBLICSCH_1516/EDGE_GEOCODE_PUBLICSCH_1516.shp has been unzipped...
   file saved as geopandas dataframe in a dictionary with 2 keys.
   

### Check Geopandas Dataframes top rows and column names
Initial data exploration to look at variables and columns.

In [10]:
schooldata[('Postsecondary School File','2015-2016')].head()

Unnamed: 0,UNITID,INSTNM,ADDR,CITY,STABBR,ZIP,STFIP15,CNTY15,NMCNTY15,LOCALE15,...,NMCBSA15,CBSATYPE15,CSA15,NMCSA15,NECTA15,NMNECTA15,CD15,SLDL15,SLDU15,geometry
0,100654,Alabama A & M University,4900 Meridian Street,Normal,AL,35762,1,1089,Madison County,12,...,"Huntsville, AL",1,290,"Huntsville-Decatur-Albertville, AL",N,N,105,19,7,POINT (-86.56850 34.78337)
1,100663,University of Alabama at Birmingham,Administration Bldg Suite 1070,Birmingham,AL,35294-0110,1,1073,Jefferson County,12,...,"Birmingham-Hoover, AL",1,142,"Birmingham-Hoover-Talladega, AL",N,N,107,55,18,POINT (-86.80917 33.50223)
2,100690,Amridge University,1200 Taylor Rd,Montgomery,AL,36117-3553,1,1101,Montgomery County,12,...,"Montgomery, AL",1,N,N,N,N,102,74,25,POINT (-86.17401 32.36261)
3,100706,University of Alabama in Huntsville,301 Sparkman Dr,Huntsville,AL,35899,1,1089,Madison County,12,...,"Huntsville, AL",1,290,"Huntsville-Decatur-Albertville, AL",N,N,105,6,2,POINT (-86.63842 34.72282)
4,100724,Alabama State University,915 S Jackson Street,Montgomery,AL,36104-0271,1,1101,Montgomery County,12,...,"Montgomery, AL",1,N,N,N,N,107,77,26,POINT (-86.29568 32.36432)


In [11]:
schooldata[('Postsecondary School File','2015-2016')].columns

Index(['UNITID', 'INSTNM', 'ADDR', 'CITY', 'STABBR', 'ZIP', 'STFIP15',
       'CNTY15', 'NMCNTY15', 'LOCALE15', 'LAT1516', 'LON1516', 'CBSA15',
       'NMCBSA15', 'CBSATYPE15', 'CSA15', 'NMCSA15', 'NECTA15', 'NMNECTA15',
       'CD15', 'SLDL15', 'SLDU15', 'geometry'],
      dtype='object')

In [12]:
schooldata[('Public District File','2015-2016')].head()

Unnamed: 0,OBJECTID,LEAID,NAME,OPSTFIPS,LSTREE,LCITY,LSTATE,LZIP,LZIP4,STFIP15,...,NMCBSA15,CBSATYPE15,CSA15,NMCSA15,NECTA15,NMNECTA15,CD15,SLDL15,SLDU15,geometry
0,1,100240,Autauga County,1,153 W 4th St,Prattville,AL,36067,M,1,...,"Montgomery, AL",1,N,N,N,N,102,88,30,POINT (-86.47421 32.46275)
1,2,100270,Baldwin County,1,2600-A N Hand Ave,Bay Minette,AL,36507,M,1,...,"Daphne-Fairhope-Foley, AL",1,380,"Mobile-Daphne-Fairhope, AL",N,N,101,64,22,POINT (-87.78749 30.91143)
2,3,100300,Barbour County,1,100 Court Square Courthouse,Clayton,AL,36016,M,1,...,N,N,N,N,N,N,102,84,28,POINT (-85.45379 31.87828)
3,4,101410,Eufaula City,1,333 State Docks Road,Eufaula,AL,36027,M,1,...,N,N,N,N,N,N,102,84,28,POINT (-85.15129 31.86830)
4,5,100028,Three Springs New Tuskegee,1,65 Enterprise Loop,Green Pond,AL,35074,M,1,...,"Birmingham-Hoover, AL",1,142,"Birmingham-Hoover-Talladega, AL",N,N,106,49,14,POINT (-87.19120 33.16752)


In [13]:
schooldata[('Public District File','2015-2016')].columns

Index(['OBJECTID', 'LEAID', 'NAME', 'OPSTFIPS', 'LSTREE', 'LCITY', 'LSTATE',
       'LZIP', 'LZIP4', 'STFIP15', 'CNTY15', 'NMCNTY15', 'LAT1516', 'LON1516',
       'CBSA15', 'NMCBSA15', 'CBSATYPE15', 'CSA15', 'NMCSA15', 'NECTA15',
       'NMNECTA15', 'CD15', 'SLDL15', 'SLDU15', 'geometry'],
      dtype='object')

In [14]:
schooldata[('Public School File','2015-2016')].head()

Unnamed: 0,NCESSCH,NAME,OPSTFIPS,LSTREE,LCITY,LSTATE,LZIP,LZIP4,STFIP15,CNTY15,...,NMCBSA15,CBSATYPE15,CSA15,NMCSA15,NECTA15,NMNECTA15,CD15,SLDL15,SLDU15,geometry
0,10000200277,Sequoyah Sch - Chalkville Campus,1,1000 Industrial School Road,Birmingham,AL,35220,M,1,1073,...,"Birmingham-Hoover, AL",1,142,"Birmingham-Hoover-Talladega, AL",N,N,106,44,20,POINT (-86.62875 33.67366)
1,10000201667,Camps,1,1601 County Rd. 57,Prattville,AL,36067,M,1,1001,...,"Montgomery, AL",1,N,N,N,N,102,42,30,POINT (-86.53013 32.52168)
2,10000201670,Det Ctr,1,2109 Bashi Rd Bldg 509,Thomasville,AL,36784,M,1,1025,...,N,N,N,N,N,N,107,68,24,POINT (-87.75053 31.93844)
3,10000201705,Wallace Sch - Mt Meigs Campus,1,1000 Industrial School Road,Mount Meigs,AL,36057,M,1,1101,...,"Montgomery, AL",1,N,N,N,N,103,75,25,POINT (-86.08236 32.37481)
4,10000201706,McNeel Sch - Vacca Campus,1,8950 Roebuck Blvd,Birmingham,AL,35206,M,1,1073,...,"Birmingham-Hoover, AL",1,142,"Birmingham-Hoover-Talladega, AL",N,N,107,58,20,POINT (-86.71006 33.58338)


In [15]:
schooldata[('Public School File','2015-2016')].columns

Index(['NCESSCH', 'NAME', 'OPSTFIPS', 'LSTREE', 'LCITY', 'LSTATE', 'LZIP',
       'LZIP4', 'STFIP15', 'CNTY15', 'NMCNTY15', 'LOCALE15', 'LAT1516',
       'LON1516', 'CBSA15', 'NMCBSA15', 'CBSATYPE15', 'CSA15', 'NMCSA15',
       'NECTA15', 'NMNECTA15', 'CD15', 'SLDL15', 'SLDU15', 'geometry'],
      dtype='object')

In [16]:
schooldata[('Private School File','2015-2016')].head()

Unnamed: 0,PPIN,LAT1516,LON1516,PINST,PL_ADD,PL_CIT,PL_STABB,PL_ZIP,PL_ZIP4,STFIP15,...,NMCBSA15,CBSATYPE15,CSA15,NMCSA15,NECTA15,NMNECTA15,CD15,SLDL15,SLDU15,geometry
0,33,34.023572,-85.989119,ST JAMES CATHOLIC SCHOOL,511 EWING AVE.,GADSDEN,AL,35901,M,1,...,"Gadsden, AL",1.0,N,N,N,N,104.0,28.0,10.0,POINT (-85.98912 34.02357)
1,44,33.173709,-87.529118,HOLY SPIRIT CATHOLIC SCHOOL,601 JAMES I HARRISON JR PKWY E,TUSCALOOSA,AL,35405,3208,1,...,"Tuscaloosa, AL",1.0,N,N,N,N,107.0,70.0,24.0,POINT (-87.52912 33.17371)
2,77,34.69004,-86.573801,HOLY SPIRIT SCHOOL,619 AIRPORT RD SW,HUNTSVILLE,AL,35802,4358,1,...,"Huntsville, AL",1.0,290,"Huntsville-Decatur-Albertville, AL",N,N,105.0,20.0,7.0,POINT (-86.57380 34.69004)
3,88,33.47581,-86.796311,OUR LADY OF SORROWS,1720 OXMOOR RD,BIRMINGHAM,AL,35209,4097,1,...,"Birmingham-Hoover, AL",1.0,142,"Birmingham-Hoover-Talladega, AL",N,N,107.0,52.0,18.0,POINT (-86.79631 33.47581)
4,124,34.532244,-86.99859,ST ANN SCHOOL,3910A SPRING AVE SW,DECATUR,AL,35603,1294,1,...,"Decatur, AL",1.0,290,"Huntsville-Decatur-Albertville, AL",N,N,105.0,8.0,3.0,POINT (-86.99859 34.53224)


In [17]:
schooldata[('Private School File','2015-2016')].columns

Index(['PPIN', 'LAT1516', 'LON1516', 'PINST', 'PL_ADD', 'PL_CIT', 'PL_STABB',
       'PL_ZIP', 'PL_ZIP4', 'STFIP15', 'CNTY15', 'NMCNTY15', 'LOCALE15',
       'CBSA15', 'NMCBSA15', 'CBSATYPE15', 'CSA15', 'NMCSA15', 'NECTA15',
       'NMNECTA15', 'CD15', 'SLDL15', 'SLDU15', 'geometry'],
      dtype='object')

In [18]:
schooldata[('School Attendance Boundaries Single Shapefile','2015-2016')].head()

Unnamed: 0,SrcName,ncessch,schnam,leaid,gslo,gshi,defacto,stAbbrev,openEnroll,Shape_Leng,Shape_Area,level,MultiBdy,geometry
0,,10000500870,Ala Avenue Middle Sch,100005,07,08,1,AL,0,146828.793028,101255200.0,2,0,"MULTIPOLYGON (((-9601421.827 4062780.032, -960..."
1,,10000500871,Albertville High Sch,100005,09,12,1,AL,0,146828.793028,101255200.0,3,0,"MULTIPOLYGON (((-9601421.827 4062780.032, -960..."
2,,10000500879,Evans Elem Sch,100005,05,06,1,AL,0,146828.793028,101255200.0,2,0,"MULTIPOLYGON (((-9601421.827 4062780.032, -960..."
3,,10000500889,Albertville Elem Sch,100005,03,04,1,AL,0,146828.793028,101255200.0,1,0,"MULTIPOLYGON (((-9601421.827 4062780.032, -960..."
4,,10000501616,Big Spring Lake Kinderg Sch,100005,KG,KG,1,AL,0,146828.793028,101255200.0,1,0,"MULTIPOLYGON (((-9601421.827 4062780.032, -960..."


In [19]:
schooldata[('School Attendance Boundaries Single Shapefile','2015-2016')].columns

Index(['SrcName', 'ncessch', 'schnam', 'leaid', 'gslo', 'gshi', 'defacto',
       'stAbbrev', 'openEnroll', 'Shape_Leng', 'Shape_Area', 'level',
       'MultiBdy', 'geometry'],
      dtype='object')

## Select NCES data for a single county
The point data files all have a County FIPS code variable (`CNTY15` for the 2015-2016 school year). This variable can be used to select data for a single county. The School Attendance Boundary file does not have a county fips code variable. But it does have a unique id for each school `ncessch`. The variable `ncessch` is also in the public school list.

>"SABS relies on standard CCD IDs to uniquely identify schools (NCESSCH) and school districts (LEAID). This
allows the SABS data to be linked across a broad range of institutional data that include the CCD ID. In a
few rare cases, districts provided boundaries for schools that did not contain a corresponding CCD
school ID. These schools were assigned with a temporary ID by concatenating the LEAID with a fixed
string of ‘9999’ and a final single digit that was automatically incremented if more than one instance
occurred." (Geverdt, 2018d p. 7)

In [20]:
def select_var(data, selectvar: str, selectlist):
    """
    
    Args:
        :param data: data to select from
        :type data: pandas dataframe or geopandas dataframe
        :param selectvar: Variable to select from
        :param selectlist: List of values to select       
    
    Returns:
        dataframe: selected values from data
    """
    
    # Make a copy of object - deep = True - creates a new object
    data_selected = data[data[selectvar].isin(selectlist)].copy(deep=True)
    
    # How many observations selected 
    obs = len(data_selected.index)
    print(obs,"observations selected using ",selectvar," in list ",selectlist)
    
    # Return data with job count
    return data_selected

### Loop through school data to select data by county

In [22]:
select_schooldata = {} # start empty dictionary for selected school data
county_list = [']
for key in schooldata:
    print(key)
    
    # The SAB file does not hae a county variable
    if "School Attendance Boundaries" in str(key):
        print("School Attendance Boundaries can not be selected by geography.")
        break
        
    select_schooldata[key] = select_var(schooldata[key],'CNTY15',county_list)

('Postsecondary School File', '2015-2016')
9 observations selected using  CNTY15  in list  ['29145', '29097']
('Public District File', '2015-2016')
13 observations selected using  CNTY15  in list  ['29145', '29097']
('Public School File', '2015-2016')
72 observations selected using  CNTY15  in list  ['29145', '29097']
('Private School File', '2015-2016')
10 observations selected using  CNTY15  in list  ['29145', '29097']
('School Attendance Boundaries Single Shapefile', '2015-2016')
School Attendance Boundaries can not be selected by geography.


In [23]:
select_schooldata[('Public School File', '2015-2016')].head()

Unnamed: 0,NCESSCH,NAME,OPSTFIPS,LSTREE,LCITY,LSTATE,LZIP,LZIP4,STFIP15,CNTY15,...,NMCBSA15,CBSATYPE15,CSA15,NMCSA15,NECTA15,NMNECTA15,CD15,SLDL15,SLDU15,geometry
50478,290000700817,CROWDER AVTS,29,601 LACLEDE AVENUE,NEOSHO,MO,64850,M,29,29145,...,"Joplin, MO",1,309,"Joplin-Miami, MO-OK",N,N,2907,160,32,POINT (-94.36335 36.81182)
50511,290000902211,GATEWAY SCHOOL,29,1823 W 20TH ST,JOPLIN,MO,64804,0202,29,29097,...,"Joplin, MO",1,309,"Joplin-Miami, MO-OK",N,N,2907,161,32,POINT (-94.54270 37.06558)
50563,290002203067,COLLEGE VIEW SCHOOL,29,1101 N GOETZ BLVD,JOPLIN,MO,64801,1431,29,29097,...,"Joplin, MO",1,309,"Joplin-Miami, MO-OK",N,N,2907,162,32,POINT (-94.46598 37.10067)
50729,290411000038,AVILLA ELEM.,29,400 SARCOXIE ST,AVILLA,MO,64833,0007,29,29097,...,"Joplin, MO",1,309,"Joplin-Miami, MO-OK",N,N,2907,127,32,POINT (-94.12960 37.19188)
50901,290735000196,Carl Junction Intermediate,29,206 S Roney,Carl Junction,MO,64834,9402,29,29097,...,"Joplin, MO",1,309,"Joplin-Miami, MO-OK",N,N,2907,163,32,POINT (-94.56828 37.17630)


In [24]:
select_schooldata[('Public District File', '2015-2016')].head()

Unnamed: 0,OBJECTID,LEAID,NAME,OPSTFIPS,LSTREE,LCITY,LSTATE,LZIP,LZIP4,STFIP15,...,NMCBSA15,CBSATYPE15,CSA15,NMCSA15,NECTA15,NMNECTA15,CD15,SLDL15,SLDU15,geometry
8740,8741,2904110,AVILLA R-XIII,29,400 SARCOXIE ST,AVILLA,MO,64833,7,29,...,"Joplin, MO",1,309,"Joplin-Miami, MO-OK",N,N,2907,127,32,POINT (-94.12960 37.19188)
8741,8742,2907350,CARL JUNCTION R-I,29,206 S RONEY,CARL JUNCTION,MO,64834,9402,29,...,"Joplin, MO",1,309,"Joplin-Miami, MO-OK",N,N,2907,163,32,POINT (-94.56830 37.17630)
8742,8743,2907460,CARTHAGE R-IX,29,710 LYON ST,CARTHAGE,MO,64836,1700,29,...,"Joplin, MO",1,309,"Joplin-Miami, MO-OK",N,N,2907,163,32,POINT (-94.31187 37.17388)
8743,8744,2916140,JASPER CO. R-V,29,201 W MERCER ST,JASPER,MO,64755,9345,29,...,"Joplin, MO",1,309,"Joplin-Miami, MO-OK",N,N,2907,127,32,POINT (-94.30389 37.34045)
8744,8745,2916350,JOPLIN SCHOOLS,29,310 W 8TH STREET,JOPLIN,MO,64804,128,29,...,"Joplin, MO",1,309,"Joplin-Miami, MO-OK",N,N,2907,161,32,POINT (-94.51594 37.08282)


## Select School Attendance Boundaries
The SABs can be selected using the `NCESSCH` variable in the Public School File, and the `LEAID` variable in the Public District File.

In [25]:
schooldata[('School Attendance Boundaries Single Shapefile', '2015-2016')].head()

Unnamed: 0,SrcName,ncessch,schnam,leaid,gslo,gshi,defacto,stAbbrev,openEnroll,Shape_Leng,Shape_Area,level,MultiBdy,geometry
0,,10000500870,Ala Avenue Middle Sch,100005,07,08,1,AL,0,146828.793028,101255200.0,2,0,"MULTIPOLYGON (((-9601421.827 4062780.032, -960..."
1,,10000500871,Albertville High Sch,100005,09,12,1,AL,0,146828.793028,101255200.0,3,0,"MULTIPOLYGON (((-9601421.827 4062780.032, -960..."
2,,10000500879,Evans Elem Sch,100005,05,06,1,AL,0,146828.793028,101255200.0,2,0,"MULTIPOLYGON (((-9601421.827 4062780.032, -960..."
3,,10000500889,Albertville Elem Sch,100005,03,04,1,AL,0,146828.793028,101255200.0,1,0,"MULTIPOLYGON (((-9601421.827 4062780.032, -960..."
4,,10000501616,Big Spring Lake Kinderg Sch,100005,KG,KG,1,AL,0,146828.793028,101255200.0,1,0,"MULTIPOLYGON (((-9601421.827 4062780.032, -960..."


In [26]:
# Creat list of `NCESSCH` values
NCESSCH_list = select_schooldata[('Public School File', '2015-2016')].NCESSCH.tolist()

In [27]:
# Creat list of `LEAID` values
LEAID_list = select_schooldata[('Public District File', '2015-2016')].LEAID.tolist()

In [28]:
def select_sabs(data,NCESSCH_list,LEAID_list):
    
    data['slcncessch'] = np.where(data['ncessch'].isin(NCESSCH_list),1,0)
    data['slcleaid']   = np.where(data['leaid'].isin(LEAID_list),1,0)
    
    data_selected = data[(data['slcncessch'] == 1) |
                         (data['slcleaid'] == 1)].copy(deep=True)
    
    # How many observations selected 
    obs = len(data_selected.index)
    print(obs,"observations selected")
    
    return data_selected

In [29]:
data = schooldata[('School Attendance Boundaries Single Shapefile', '2015-2016')]
select_schooldata[('School Attendance Boundaries Single Shapefile', '2015-2016')] = select_sabs(data,NCESSCH_list,LEAID_list)

66 observations selected


In [30]:
select_schooldata[('School Attendance Boundaries Single Shapefile', '2015-2016')].head()

Unnamed: 0,SrcName,ncessch,schnam,leaid,gslo,gshi,defacto,stAbbrev,openEnroll,Shape_Leng,Shape_Area,level,MultiBdy,geometry,slcncessch,slcleaid
36517,,290411000038,AVILLA ELEM.,2904110,KG,8,1,MO,0,110681.690329,270608300.0,1,0,"POLYGON ((-10492617.442 4469940.846, -10492607...",1,1
36668,,290735000196,Carl Junction Intermediate,2907350,04,6,1,MO,0,135293.856846,531247500.0,2,0,"POLYGON ((-10532864.012 4465184.437, -10532864...",1,1
36669,,290735000197,Carl Junction Primary 2-3,2907350,02,3,1,MO,0,135293.856846,531247500.0,1,0,"POLYGON ((-10532864.012 4465184.437, -10532864...",1,1
36670,,290735000198,Carl Junction High School,2907350,09,12,1,MO,0,135293.856846,531247500.0,3,0,"POLYGON ((-10532864.012 4465184.437, -10532864...",1,1
36671,,290735000199,Carl Junction Jr. High,2907350,07,8,1,MO,0,135293.856846,531247500.0,2,0,"POLYGON ((-10532864.012 4465184.437, -10532864...",1,1


### Split SABs by level and open enrollment

The SAB files has 5 different levels (`level`)

- 1 = Primary
- 2 = Middle
- 3 = High
- 4 = Other
- N = Not Applicable

The SAB files have a flag for schools that allow open enrollment `openEnroll`.

The SAB file can be split into non-overlapping files that represent the different levels and if the school allows open enrollement.

In [31]:
df = select_schooldata[('School Attendance Boundaries Single Shapefile', '2015-2016')]
df[['ncessch','level','openEnroll']].groupby(['level','openEnroll']).aggregate(['count'])

Unnamed: 0_level_0,Unnamed: 1_level_0,ncessch
Unnamed: 0_level_1,Unnamed: 1_level_1,count
level,openEnroll,Unnamed: 2_level_2
1,0,39
2,0,15
3,0,10
4,0,2


In [32]:
df['level'].describe()

count     66
unique     4
top        1
freq      39
Name: level, dtype: object

In [33]:
sab_boundaries = {}
sab_boundaries[('Primary School Attendance Boundaries', '2015-2016')] = \
    df[(df['level']=='1') & (df['openEnroll']=='0')].copy(deep=True)
sab_boundaries[('Middle School Attendance Boundaries', '2015-2016')] = \
    df[(df['level']=='2') & (df['openEnroll']=='0')].copy(deep=True)
sab_boundaries[('High School Attendance Boundaries', '2015-2016')] = \
    df[(df['level']=='3') & (df['openEnroll']=='0')].copy(deep=True)
sab_boundaries[('Other School Attendance Boundaries', '2015-2016')] = \
    df[(df['level']=='4') & (df['openEnroll']=='0')].copy(deep=True)
sab_boundaries[('Open Enroll School Attendance Boundaries', '2015-2016')] = \
    df[(df['openEnroll']=='1')].copy(deep=True)

In [34]:
sab_boundaries[('Primary School Attendance Boundaries', '2015-2016')].head()

Unnamed: 0,SrcName,ncessch,schnam,leaid,gslo,gshi,defacto,stAbbrev,openEnroll,Shape_Leng,Shape_Area,level,MultiBdy,geometry,slcncessch,slcleaid
36517,,290411000038,AVILLA ELEM.,2904110,KG,8,1,MO,0,110681.690329,270608300.0,1,0,"POLYGON ((-10492617.442 4469940.846, -10492607...",1,1
36669,,290735000197,Carl Junction Primary 2-3,2907350,02,3,1,MO,0,135293.856846,531247500.0,1,0,"POLYGON ((-10532864.012 4465184.437, -10532864...",1,1
36673,,290735002811,Carl Junction Primary K-1,2907350,PK,1,1,MO,0,135293.856846,531247500.0,1,0,"POLYGON ((-10532864.012 4465184.437, -10532864...",1,1
36678,Columbian Elementary,290746000204,COLUMBIAN ELEM.,2907460,KG,4,0,MO,0,39572.97436,78969890.0,1,0,"POLYGON ((-10500961.953 4474940.516, -10500937...",1,1
36679,Fairview Elementary,290746000206,FAIRVIEW ELEM.,2907460,KG,4,0,MO,0,63137.827301,108390900.0,1,0,"POLYGON ((-10493288.699 4474940.936, -10493291...",1,1


## Explore Selected Data
Create a single map that shows all selected data.

In [35]:
for key in select_schooldata:
    print(str(key[0]))
    
    # Set Coordinate Reference System to to WGS84    
    select_schooldata[key]['geometry'] = select_schooldata[key]['geometry'].to_crs(epsg=4326) 

Postsecondary School File
Public District File
Public School File
Private School File
School Attendance Boundaries Single Shapefile


In [36]:
for key in sab_boundaries:
    # Set Coordinate Reference System to to WGS84    
    sab_boundaries[key]['geometry'] = sab_boundaries[key]['geometry'].to_crs(epsg=4326) 

In [37]:
gdf = select_schooldata[('School Attendance Boundaries Single Shapefile', '2015-2016')]

In [38]:
from folium import plugins # Add minimap and search plugin functions to maps
from folium.map import *

style_function1 = lambda x: {
            'fillColor': 'green',
            'color': 'black',
            'weight': 1,
            'fillOpacity': 0.05
        }

# plot school attendence zones
# What location should the map be centered on?
center_x = (gdf.bounds.minx.mean() + gdf.bounds.maxx.mean())/2
center_y = (gdf.bounds.miny.mean() + gdf.bounds.maxy.mean())/2
print(f'The center of the map is located at {center_x} {center_y}')

gdf_map = fm.Map(location=[center_y, center_x])

# Add School attendance boundaries to map
for key in sab_boundaries:
    print(key)
    check_obs = len(sab_boundaries[key].index)
    if check_obs != 0:
        layer_gdf = sab_boundaries[key]
        layer_name = str(key[0])
        fm.GeoJson(
                layer_gdf.to_json(),
                name= layer_name,
                style_function= style_function1,
                tooltip=fm.features.GeoJsonTooltip(fields=['schnam'],sticky=True)).add_to(gdf_map)
    else: 
        print(key,"layer has no observations.")

# Add NCES Markers to map
def add_nces_markers(locations, layername, labelvar, iconcolor, iconname):
    feature_group = FeatureGroup(name=layername)
    for idx, row in locations.iterrows():
        # Get lat and lon of points
        lon = row['geometry'].x
        lat = row['geometry'].y

        # Get popup layer information
        popuplayer = row[labelvar]
        # Add marker to the map
        feature_group.add_child(Marker([lat, lon], 
                                    popup=popuplayer,
                                    icon=fm.Icon(color=iconcolor, icon=iconname)))
    return feature_group

publicschoolmarkers = add_nces_markers(locations = select_schooldata[('Public School File', '2015-2016')],
                                       layername = "Public Schools",
                                       labelvar = 'NAME',
                                       iconcolor = 'green',
                                       iconname = 'school')
gdf_map.add_child(publicschoolmarkers)

leamarkers = add_nces_markers(locations = select_schooldata[('Public District File', '2015-2016')],
                                       layername = "Public District",
                                       labelvar = 'NAME',
                                       iconcolor = 'beige',
                                       iconname = 'school')
gdf_map.add_child(leamarkers)

## Pick up here  ####


# Create Private School Locations on top of the map
locations = select_schooldata[('Private School File', '2015-2016')]
feature_group = FeatureGroup(name='Private Schools')
for idx, row in locations.iterrows():
    # Get lat and lon of points
    lon = row['geometry'].x
    lat = row['geometry'].y

    # Get NAME information
    schoolname = row['PINST']
    # Add marker to the map
    feature_group.add_child(Marker([lat, lon], 
                                popup=schoolname,
                                icon=fm.Icon(color="red", icon="school")))
gdf_map.add_child(feature_group)

# Create Post Secondary School Locations on top of the map
locations = select_schooldata[('Postsecondary School File', '2015-2016')]
feature_group = FeatureGroup(name='Postsecondary Schools')
for idx, row in locations.iterrows():
    # Get lat and lon of points
    lon = row['geometry'].x
    lat = row['geometry'].y

    # Get NAME information
    schoolname = row['INSTNM']
    # Add marker to the map
    feature_group.add_child(Marker([lat, lon], 
                                popup=schoolname,
                                icon=fm.Icon(color="blue", icon="school")))
gdf_map.add_child(feature_group)


fm.LayerControl(collapsed=False, autoZIndex=False).add_to(gdf_map)

# Add minimap
plugins.MiniMap().add_to(gdf_map)

# How should the map be bound - look for the southwest and northeast corners of the data
sw_corner = [gdf.bounds.miny.min(),gdf.bounds.minx.min()]
ne_corner = [gdf.bounds.maxy.max(),gdf.bounds.maxx.max()]
print(f'The map data file is bounded by at {sw_corner} {ne_corner}')
gdf_map.fit_bounds([sw_corner, ne_corner])

gdf_map.save(f'{programname}/{programname}.html')
display(gdf_map)

The center of the map is located at -94.42470912452822 37.067897371759834
('Primary School Attendance Boundaries', '2015-2016')
('Middle School Attendance Boundaries', '2015-2016')
('High School Attendance Boundaries', '2015-2016')
('Other School Attendance Boundaries', '2015-2016')
('Open Enroll School Attendance Boundaries', '2015-2016')
('Open Enroll School Attendance Boundaries', '2015-2016') layer has no observations.
The map data file is bounded by at [36.68990556697614, -94.61867524942402] [37.4361027118508, -93.98570909949031]


## Save files as Shapefiles

In [39]:
for index, files in filelist_df.iterrows():
    print("\nSave shapefile for ",files['File Description'],"Files for School Year",files['School Year'])
    
    newfilename = files['Data File Name'][:-4]+'_37155.shp'
    print("\nNew Shapefile name ",newfilename)
    select_schooldata[(files['File Description'],files['School Year'])].to_file(programname+"/"+newfilename)


Save shapefile for  Postsecondary School File Files for School Year 2015-2016

New Shapefile name  EDGE_GEOCODE_POSTSECONDARYSCH_1516_37155.shp

Save shapefile for  Public District File Files for School Year 2015-2016

New Shapefile name  EDGE_GEOCODE_PUBLICLEA_1516_37155.shp

Save shapefile for  Public School File Files for School Year 2015-2016

New Shapefile name  EDGE_GEOCODE_PUBLICSCH_1516_37155.shp

Save shapefile for  Private School File Files for School Year 2015-2016

New Shapefile name  EDGE_GEOCODE_PRIVATESCH_15_16_37155.shp

Save shapefile for  School Attendance Boundaries Single Shapefile Files for School Year 2015-2016

New Shapefile name  SABS_1516_37155.shp


In [41]:
newfilename = 'SABS_1516_37155_Primary.shp'
sab_boundaries[('Primary School Attendance Boundaries', '2015-2016')].to_file(programname+"/"+newfilename)

newfilename = 'SABS_1516_37155_Middle.shp'
sab_boundaries[('Middle School Attendance Boundaries', '2015-2016')].to_file(programname+"/"+newfilename)

newfilename = 'SABS_1516_37155_High.shp'
sab_boundaries[('High School Attendance Boundaries', '2015-2016')].to_file(programname+"/"+newfilename)

newfilename = 'SABS_1516_37155_Open.shp'
sab_boundaries[('Open Enroll School Attendance Boundaries', '2015-2016')].to_file(programname+"/"+newfilename)

ValueError: Cannot write empty DataFrame to file.

## Combine Files and Save as CSV File

In [42]:
select_schooldata[('Public District File','2015-2016')].head()

Unnamed: 0,OBJECTID,LEAID,NAME,OPSTFIPS,LSTREE,LCITY,LSTATE,LZIP,LZIP4,STFIP15,...,NMCBSA15,CBSATYPE15,CSA15,NMCSA15,NECTA15,NMNECTA15,CD15,SLDL15,SLDU15,geometry
8740,8741,2904110,AVILLA R-XIII,29,400 SARCOXIE ST,AVILLA,MO,64833,7,29,...,"Joplin, MO",1,309,"Joplin-Miami, MO-OK",N,N,2907,127,32,POINT (-94.12960 37.19187)
8741,8742,2907350,CARL JUNCTION R-I,29,206 S RONEY,CARL JUNCTION,MO,64834,9402,29,...,"Joplin, MO",1,309,"Joplin-Miami, MO-OK",N,N,2907,163,32,POINT (-94.56830 37.17630)
8742,8743,2907460,CARTHAGE R-IX,29,710 LYON ST,CARTHAGE,MO,64836,1700,29,...,"Joplin, MO",1,309,"Joplin-Miami, MO-OK",N,N,2907,163,32,POINT (-94.31187 37.17387)
8743,8744,2916140,JASPER CO. R-V,29,201 W MERCER ST,JASPER,MO,64755,9345,29,...,"Joplin, MO",1,309,"Joplin-Miami, MO-OK",N,N,2907,127,32,POINT (-94.30389 37.34045)
8744,8745,2916350,JOPLIN SCHOOLS,29,310 W 8TH STREET,JOPLIN,MO,64804,128,29,...,"Joplin, MO",1,309,"Joplin-Miami, MO-OK",N,N,2907,161,32,POINT (-94.51595 37.08282)


In [43]:
def prepare_data_for_append(gdf,copyvars,level,schtype,years):
    append_gdf = gdf[copyvars].copy()

    # All data frames need to have the same column names
    colnames = ['ncesid','name','addr','city','stabbr','zip','cnty15','geometry']
    append_gdf.columns = colnames
    
    append_gdf['level'] = level
    append_gdf['schtype'] = schtype
    append_gdf['lat'] = append_gdf['geometry'].centroid.y
    append_gdf['lon'] = append_gdf['geometry'].centroid.x
    append_gdf['schyr'] = years
    
    return append_gdf

# Post Secondary Schools
copyvars = ['UNITID','INSTNM','ADDR','CITY','STABBR','ZIP','CNTY15','geometry']
pss_schooldata = prepare_data_for_append(select_schooldata[('Postsecondary School File','2015-2016')],
                                            copyvars, 5,5,'2015-2016')

# Public Schools
copyvars = ['NCESSCH','NAME','LSTREE','LCITY','LSTATE','LZIP','CNTY15','geometry']
public_schooldata = prepare_data_for_append(select_schooldata[('Public School File','2015-2016')],
                                            copyvars, 99,1,'2015-2016')

# private Schools
copyvars = ['PPIN','PINST','PL_ADD','PL_CIT','PL_STABB','PL_ZIP','CNTY15','geometry']
private_schooldata = prepare_data_for_append(select_schooldata[('Private School File','2015-2016')],
                                            copyvars, 99,2,'2015-2016')

# public districts
copyvars = ['LEAID','NAME','LSTREE','LCITY','LSTATE','LZIP','CNTY15','geometry']
district_schooldata = prepare_data_for_append(select_schooldata[('Public District File','2015-2016')],
                                            copyvars, 99,4,'2015-2016')

append_schooldata = pd.concat([public_schooldata,
                              private_schooldata,
                              district_schooldata,
                              pss_schooldata], 
                              ignore_index=True, sort=False)


append_schooldata.head()


  # Remove the CWD from sys.path while we load stuff.

  # This is added back by InteractiveShellApp.init_path()


Unnamed: 0,ncesid,name,addr,city,stabbr,zip,cnty15,geometry,level,schtype,lat,lon,schyr
0,290000700817,CROWDER AVTS,601 LACLEDE AVENUE,NEOSHO,MO,64850,29145,POINT (-94.36335 36.81182),99,1,36.811816,-94.36335,2015-2016
1,290000902211,GATEWAY SCHOOL,1823 W 20TH ST,JOPLIN,MO,64804,29097,POINT (-94.54270 37.06558),99,1,37.065577,-94.542705,2015-2016
2,290002203067,COLLEGE VIEW SCHOOL,1101 N GOETZ BLVD,JOPLIN,MO,64801,29097,POINT (-94.46598 37.10067),99,1,37.10067,-94.465982,2015-2016
3,290411000038,AVILLA ELEM.,400 SARCOXIE ST,AVILLA,MO,64833,29097,POINT (-94.12960 37.19187),99,1,37.191873,-94.129601,2015-2016
4,290735000196,Carl Junction Intermediate,206 S Roney,Carl Junction,MO,64834,29097,POINT (-94.56828 37.17629),99,1,37.176294,-94.568283,2015-2016


In [44]:
append_schooldata.ncesid.describe()

count              104
unique             104
top       291635002431
freq                 1
Name: ncesid, dtype: object

In [45]:
append_schooldata.to_csv(programname+"/"+programname+".csv")