# National Center for Education Statistics School Data

The National Center for Education Statistics, U.S. Depratment of Education, has school location, attendance boundaries, and characteristics data. These data are sourced from two programs Education Demographic and Geographic Estimates (EDGE) and Common Core of Data (CCD).

The EDGES program provides GIS shapefiles for school locations and School Attendance Boundaries for public, private, and post secondary schools. The CCD program provides basic information such as staff, membership (students), and lunch programs on public schools. 

References for Data Documentation:
> Geverdt, D., (2018a). Education Demographic and Geographic Estimates (EDGE) Geocodes: Public Schools and Local Education
Agencies, (NCES 2018-080). U.S. Department of Education. Washington, DC: National Center for Education Statistics. Retrieved
[2021-06-04] from http://nces.ed.gov/pubsearch/.


> Geverdt, D., (2018b). Education Demographic and Geographic Estimates (EDGE) Program, Geocodes: Private Schools (NCES 2018-
084). U.S. Department of Education. Washington, DC: National Center for Education Statistics. Retrieved [2021-06-04] from
http://nces.ed.gov/pubsearch/.

> Geverdt, D., (2018c). Education Demographic and Geographic Estimates (EDGE) Program, Geocodes: Postsecondary Schools (NCES
2018-084). U.S. Department of Education. Washington, DC: National Center for Education Statistics. Retrieved [2021-06-04] from
http://nces.ed.gov/pubsearch/.

> Geverdt, D., (2018d). School Attendance Boundary Survey (SABS) File Documentation: 2015-16 (NCES 2018-099). U.S.
Department of Education. Washington, DC: National Center for Education Statistics. Retrieved [2021-06-02] from
http://nces.ed.gov/pubsearch.


The shapefiles are very large (556 MB for the SABS single shapefile) and cover the entire United States. Therefore the files should only be downloaded one time to reduce time for working with the files.

## Description of Program
- program:    NCES_1av1_ObtainSchoolData
- task:       Obtain School Location and Attendance Boundaries
- Version:    2021-06-04
- project:    Interdependent Networked Community Resilience Modeling Environment (IN-CORE) Subtask 5.2 - Social Institutions
- funding:	  NIST Financial Assistance Award Numbers: 70NANB15H044 and 70NANB20H008 
- author:     Nathanael Rosenheim

- Suggested Citation:
Rosenheim, N. (2021) “Obtain, Clean, and Explore School Location and Attendance Boundary Data". 
Archived on Github and ICPSR.

## Background 

### Estimated School Building and Administrative Office Locations
> Purpose of School Location Data: "The National Center for Education Statistics (NCES) Education Demographic and Geographic Estimates (EDGE) program develops data resources and information to help data users investigate the social and
spatial context of education. School point locations (latitude/longitude values) are a key component of the NCES data collection." (Geverdt, 2018a p. 1) 

School location data is collected for:
* Public schools
    - elementary
    - public secondary
    - location education agencies
* Private schools
    - elementary
    - secondary
* Postsecondary Schools that participates in the federal student financial aid programs
    - college
    - university
    - technical and vocational institution 
    
Note a Location Education Agency is "A public board of education or other public authority within a state that maintains administrative control of public elementary or secondary schools in a city, county, township, school district, or other political subdivision of a state. School districts and county offices of education are both LEAs. Under the Local Control Funding Formula, charter schools are increasingly treated as LEAs. (Ed Source https://edsource.org/glossary/local-education-agency-lea)"

The school location data can be linked to:
* Public schools (Geverdt, 2018a p. 1) 
    - Common Core of Data (CCD) school and agency universe
    - enrollment, staffing, and program participation 
* Private schools (Geverdt, 2018b p. 1) 
    - Private School Survey (PSS) collection
    - Biennial provides data about enrollment, staffing, type of program, and other basic administrative features
* Postsecondary Schools (Geverdt, 2018c p. 1) 
    - Integrated Postsecondary Education Data System (IPEDS) collection
    - enrollments, program completions, graduation rates, faculty and staff, finances, institutional prices, and student financial aid

### School Attendance Boundaries
> "The School Attendance Boundaries Survey (SABS) was an experimental survey conducted by the U.S.
Department of Education’s (ED) National Center for Education Statistics (NCES) with assistance from the
U.S. Census Bureau to collect school attendance boundaries for regular schools in the 50 states and the
District of Columbia. Attendance boundaries, sometimes known as school catchment areas, define the
geographic extent served by a local school for the purpose of student assignments. School district
administrators create attendance areas to help organize and plan district-wide services, and districts
may adjust individual school boundaries to help balance the physical capacity of local schools with
changes in the local school-age population." (Geverdt, 2018d p. 1)

## Setup Python Environment

In [2]:
# Import Python Packages Required for program
import pandas as pd       # Pandas for reading in data 
import geopandas as gpd   # Geopandas for reading Shapefiles
import os                 # Operating System (os) For folders and finding working directory
import zipfile            # Zipfile for working with compressed Zipped files

In [3]:
# Display versions being used - important information for replication
import sys # System (sys) for finding current python version
print("Python Version     ", sys.version)
print("pandas version:    ", pd.__version__)
print("geopandas version: ", gpd.__version__)
#print("zipfile version:   ", zipfile.__version__)

Python Version      3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 15:37:01) [MSC v.1916 64 bit (AMD64)]
pandas version:     1.2.4
geopandas version:  0.9.0


### Install packages to local session
It is possible to add a package to your local session - the pacakge must be installed with each run.

In [4]:
# wget is a package that helps obtain data from websites and to download the files to local machine
!pip install wget



In [5]:
import wget

In [6]:
print("wget Version     ", wget.__version__)

wget Version      3.2


In [7]:
# To learn how to use the wget command the help(wget) command has more details
# help(wget) # uncomment this line to see help information for wget package

## Obtain NCES Files
This section of code provides details on the web addresses for obtaining the NCES data. These datafiles are quiet large. It is recommended that the files are downloaded once. To facilitate the downloading of the files a Comma Seperated Values (CSV) file was create using Microsoft Excel (note CSV files are easier to read into the notebook). The CSV file includes the descriptions and important file names to be obtained. This input file can be modified for different school years.

In [20]:
filelist_df = pd.read_csv('NCES_1av1_ObtainSchoolData_2021-06-04.csv')
filelist_df

Unnamed: 0,File Description,School Year,Documentation File Name,Data File Name,Unzipped Shapefile File Location,Documentation File URL,Data File URL
0,Postsecondary School File,2015-2016,EDGE_GEOCODE_POSTSEC_FILEDOC.pdf,EDGE_GEOCODE_POSTSECONDARYSCH_1516.zip,EDGE_GEOCODE_POSTSECONDARYSCH_1516/EDGE_GEOCOD...,https://nces.ed.gov/programs/edge/docs/,https://nces.ed.gov/programs/edge/data/
1,Public District File,2015-2016,EDGE_GEOCODE_PUBLIC_FILEDOC.pdf,EDGE_GEOCODE_PUBLICLEA_1516.zip,EDGE_GEOCODE_PUBLICLEA_1516/EDGE_GEOCODE_PUBLI...,https://nces.ed.gov/programs/edge/docs/,https://nces.ed.gov/programs/edge/data/
2,Public School File,2015-2016,EDGE_GEOCODE_PUBLIC_FILEDOC.pdf,EDGE_GEOCODE_PUBLICSCH_1516.zip,EDGE_GEOCODE_PUBLICSCH_1516/EDGE_GEOCODE_PUBLI...,https://nces.ed.gov/programs/edge/docs/,https://nces.ed.gov/programs/edge/data/
3,Private School File,2015-2016,EDGE_GEOCODE_PSS1718_FILEDOC.pdf,EDGE_GEOCODE_PRIVATESCH_15_16.zip,EDGE_GEOCODE_PRIVATESCH_15_16.shp,https://nces.ed.gov/programs/edge/docs/,https://nces.ed.gov/programs/edge/data/
4,School Attendance Boundaries Single Shapefile,2015-2016,EDGE_SABS_2015_2016_TECHDOC.pdf,SABS_1516.zip,SABS_1516/SABS_1516.shp,https://nces.ed.gov/programs/edge/docs/,https://nces.ed.gov/programs/edge/data/


### Notice - Data files have Documentation Files
It is important to download the data files and the documentation files.

To match the School Attendance Zones the 2015-2016 data for school locations will be used.
Data for other years also exists - the file names for different years could be updated for different school years.

## Create output folder to save files
This notebook will make a new directory where the NCES data will be saved. It is recommened that the files be saved inside a project directory called `SourceData/nces.ed.gov/programs_edge_data/`. The `SourceData` folder is a common folder for all project members to see what data is included in the project developement. Within the `SourceData` folder subfolders that are named after the source website such as `nces.ed.gov` will help provide a sense of the provenance of the data. This notebook should be saved inside the directory `SourceData/nces.ed.gov/`. 

Consistent folder names help reinforce the provenance of data.

In [18]:
output_directory = 'programs_edge_data'
# Make directory to save output
if not os.path.exists(output_directory):
    print("Making new directory to save output: ",output_directory)
    os.mkdir(output_directory)
else:
    print("Directory",output_directory,"Already exists.")

Directory programs_edge_data Already exists.


### Loop through file list  and download the data file and documentation for each file.
This look steps through each row (`iterrows`) in the dataframe. The loop creates a dictionary with the name and location of the data documentation file and the datafile. The second internal loop steps through the two files to download. The internal loop first checks to see if the file has already been downloaded. If the file has `not` been downloaded the program uses the `wget` function to download the data from the `url`. If the has been downloaded (`else`) the program outputs a comment that the file has already been downloaded. This loop helps manage the downloading of many complext files and the associated documentation. The structure of the loop reinforces the provenance of the data - which will help future project members understand the source of the school location and attendance data.

In [10]:
for index, files in filelist_df.iterrows():
    print("\nDownloading",files['File Description'],"Files for School Year",files['School Year'])
    
    # Create dictionary with documentation and data file names and ascociated URL
    downloadfiles = {files['Documentation File Name']:files['Documentation File URL'],
                     files['Data File Name']:files['Data File URL']}
    for file in downloadfiles:
        # Set file path where file will be downloaded
        filepath = output_directory+"/"+file
        print("   Checking to see if file",file,"has been downloaded...")
        
        # set URL for where the file is located
        url = downloadfiles[file]+file
        
        # Check if file exists - if not then download
        if not os.path.exists(filepath):
            print("   Downloading: ",file, "from \n",url)
            wget.download(url, out=output_directory)
        else:
            print("   file",file,"already exists in folder ",output_directory)
            print("   original file was downloaded from", url)


Downloading Postsecondary School File Files for School Year 2015-2016
   Checking to see if file EDGE_GEOCODE_POSTSEC_FILEDOC.pdf has been downloaded...
   file EDGE_GEOCODE_POSTSEC_FILEDOC.pdf already exists in folder  programs_edge_data
   original file was downloaded from https://nces.ed.gov/programs/edge/docs/EDGE_GEOCODE_POSTSEC_FILEDOC.pdf
   Checking to see if file EDGE_GEOCODE_POSTSECONDARYSCH_1516.zip has been downloaded...
   file EDGE_GEOCODE_POSTSECONDARYSCH_1516.zip already exists in folder  programs_edge_data
   original file was downloaded from https://nces.ed.gov/programs/edge/data/EDGE_GEOCODE_POSTSECONDARYSCH_1516.zip

Downloading Public District File Files for School Year 2015-2016
   Checking to see if file EDGE_GEOCODE_PUBLIC_FILEDOC.pdf has been downloaded...
   file EDGE_GEOCODE_PUBLIC_FILEDOC.pdf already exists in folder  programs_edge_data
   original file was downloaded from https://nces.ed.gov/programs/edge/docs/EDGE_GEOCODE_PUBLIC_FILEDOC.pdf
   Checking to

## Unzip Folders
Each of the zip folder with data files a different structure for saving the spatial data.

In [16]:
unzipped_output_directory = 'programs_edge_data_unzipped'
# Make directory to save output
if not os.path.exists(unzipped_output_directory):
    print("Making unzipped_output_directory directory to save output: ",unzipped_output_directory)
    os.mkdir(output_directory)
else:
    print("Directory",unzipped_output_directory,"Already exists.")

Directory programs_edge_data_unzipped Already exists.


In [11]:
filelist_df

Unnamed: 0,File Description,School Year,Documentation File Name,Data File Name,Documentation File URL,Data File URL
0,Postsecondary School File,2015-2016,EDGE_GEOCODE_POSTSEC_FILEDOC.pdf,EDGE_GEOCODE_POSTSECONDARYSCH_1516.zip,https://nces.ed.gov/programs/edge/docs/,https://nces.ed.gov/programs/edge/data/
1,Public District File,2015-2016,EDGE_GEOCODE_PUBLIC_FILEDOC.pdf,EDGE_GEOCODE_PUBLICLEA_1516.zip,https://nces.ed.gov/programs/edge/docs/,https://nces.ed.gov/programs/edge/data/
2,Public School File,2015-2016,EDGE_GEOCODE_PUBLIC_FILEDOC.pdf,EDGE_GEOCODE_PUBLICSCH_1516.zip,https://nces.ed.gov/programs/edge/docs/,https://nces.ed.gov/programs/edge/data/
3,Private School File,2015-2016,EDGE_GEOCODE_PSS1718_FILEDOC.pdf,EDGE_GEOCODE_PRIVATESCH_15_16.zip,https://nces.ed.gov/programs/edge/docs/,https://nces.ed.gov/programs/edge/data/
4,School Attendance Boundaries Single Shapefile,2015-2016,EDGE_SABS_2015_2016_TECHDOC.pdf,SABS_1516.zip,https://nces.ed.gov/programs/edge/docs/,https://nces.ed.gov/programs/edge/data/


In [19]:
for index, files in filelist_df.iterrows():
    print("\n Unzipping",files['File Description'],"Files for School Year",files['School Year'])
    
    file = files['Data File Name']
    # Set file path where file will be downloaded
    filepath = output_directory+"/"+file
    print("   Checking to see if zip file exists",file,"has been downloaded...")

    # Check if file exists - if not then download
    if not os.path.exists(filepath):
        print("   Warning file: ",file, "has not been downloaded - run first part of program first")
    else:
        print("   file",file,"already exists in folder ",output_directory)
        print("   files will be unzipped. to", unzipped_output_directory)
        with zipfile.ZipFile(filepath, 'r') as zip_ref:
            zip_ref.extractall(unzipped_output_directory)


 Unzipping Postsecondary School File Files for School Year 2015-2016
   Checking to see if zip file exists EDGE_GEOCODE_POSTSECONDARYSCH_1516.zip has been downloaded...
   file EDGE_GEOCODE_POSTSECONDARYSCH_1516.zip already exists in folder  programs_edge_data
   files will be unzipped. to programs_edge_data_unzipped

 Unzipping Public District File Files for School Year 2015-2016
   Checking to see if zip file exists EDGE_GEOCODE_PUBLICLEA_1516.zip has been downloaded...
   file EDGE_GEOCODE_PUBLICLEA_1516.zip already exists in folder  programs_edge_data
   files will be unzipped. to programs_edge_data_unzipped

 Unzipping Public School File Files for School Year 2015-2016
   Checking to see if zip file exists EDGE_GEOCODE_PUBLICSCH_1516.zip has been downloaded...
   file EDGE_GEOCODE_PUBLICSCH_1516.zip already exists in folder  programs_edge_data
   files will be unzipped. to programs_edge_data_unzipped

 Unzipping Private School File Files for School Year 2015-2016
   Checking to s