# Add Teacher Counts
The program reads in the unzipped National Center for Education Statistics (NCES) Common Core Data and adds school characteristics to school location data.

## Description of Program
- program:    NCES_2bv1_AddTeacherCount
- task:       Add Count of Teachers/Staff to School Data
- Version:    2021-06-15
- project:    Interdependent Networked Community Resilience Modeling Environment (IN-CORE) Subtask 5.2 - Social Institutions
- funding:    NIST Financial Assistance Award Numbers: 70NANB15H044 and 70NANB20H008 
- author:     Nathanael Rosenheim

- Suggested Citation:
Rosenheim, N. (2021) “Obtain, Clean, and Explore School Location and Attendance Boundary Data". 
Archived on Github and ICPSR.

In [None]:
# Import Python Packages Required for program
import pandas as pd       # Pandas for reading in data 
import geopandas as gpd   # Geopandas for reading Shapefiles
import numpy as np        # Numpy helps with selected data
import os                 # Operating System (os) For folders and finding working directory
import folium as fm       # folium has more dynamic maps - but requires internet connection

In [None]:
# Display versions being used - important information for replication
import sys
print("Python Version     ", sys.version)
print("pandas version:    ", pd.__version__)
print("geopandas version: ", gpd.__version__)
print("numpy version:     ", np.__version__)
print("folium version:    ", fm.__version__)

Python Version      3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 15:37:01) [MSC v.1916 64 bit (AMD64)]
pandas version:     1.2.4
geopandas version:  0.9.0
numpy version:      1.20.2
folium version:     0.12.1


In [None]:
# Store Program Name for output files to have the same name
programname = "NCES_2bv1_AddTeacherCount_2021-06-15"
# Make directory to save output
if not os.path.exists(programname):
    os.mkdir(programname)

## Read in NCES Teacher Count Files
Files for the CCD were downloaded manually and saved to the Source Data Folder.

In [None]:
sourcefolder = 'ccd_data/ccd_sch_059_1516_w_2a_011717_csv/'
sourcefile = 'ccd_sch_059_1516_w_2a_011717.csv'
ccd_sch = pd.read_csv(sourcefolder+sourcefile)
ccd_sch.head()

Unnamed: 0,SURVYEAR,FIPST,STABR,STATENAME,SEANAME,LEAID,ST_LEAID,LEA_NAME,SCHID,ST_SCHID,NCESSCH,SCH_NAME,FTE
0,2015-2016,1,AL,ALABAMA,Alabama Department Of Education,100002,210,Alabama Youth Services,277,210-0020,10000200277,Sequoyah Sch - Chalkville Campus,-1.0
1,2015-2016,1,AL,ALABAMA,Alabama Department Of Education,100002,210,Alabama Youth Services,1667,210-0050,10000201667,Camps,-1.0
2,2015-2016,1,AL,ALABAMA,Alabama Department Of Education,100002,210,Alabama Youth Services,1670,210-0060,10000201670,Det Ctr,-1.0
3,2015-2016,1,AL,ALABAMA,Alabama Department Of Education,100002,210,Alabama Youth Services,1705,210-0030,10000201705,Wallace Sch - Mt Meigs Campus,-1.0
4,2015-2016,1,AL,ALABAMA,Alabama Department Of Education,100002,210,Alabama Youth Services,1706,210-0040,10000201706,McNeel Sch - Vacca Campus,-1.0


Files for the Private School Survey (PSS) were downloaded manually.

In [None]:
sourcefolder = 'pss_data/pss1516_pu_csv/'
sourcefile = 'pss1516_pu.csv'
pss_sch = pd.read_csv(sourcefolder+sourcefile)
pss_sch.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,pfnlwt,repw1,repw2,repw3,repw4,repw5,repw6,repw7,repw8,repw9,...,f_p665,s_kg,p_indian,p_asian,p_pacific,p_hisp,p_white,p_black,p_tr,sttch_rt
0,1.326531,1.326531,1.326531,1.326531,1.326531,1.326531,1.326531,1.326531,1.326531,1.326531,...,0,2,0.0,0.0,0.0,5.882353,94.117647,0.0,0.0,3.469388
1,1.482143,1.482143,1.482143,1.482143,1.482143,1.482143,1.482143,1.482143,1.482143,1.482143,...,0,11,0.0,0.0,0.0,0.0,0.0,99.21875,0.78125,13.763441
2,1.355932,1.355932,1.355932,1.355932,1.355932,1.355932,1.355932,1.355932,1.355932,1.355932,...,0,0,0.0,0.0,0.0,0.0,0.0,100.0,0.0,1.136364
3,1.441065,1.441065,1.441065,1.441065,1.441065,1.441065,1.441065,1.441065,1.441065,1.441065,...,0,15,0.0,1.376147,0.0,2.752294,93.119266,1.376147,1.376147,12.178771
4,1.318182,1.318182,1.318182,1.318182,1.318182,1.318182,1.318182,1.318182,1.318182,1.318182,...,0,0,0.0,0.0,0.0,0.0,93.548387,6.451613,0.0,7.75


### Post Secondary Data
Post secondary data was downloaded and unzipped. The files are in an Access database which can not be read directly into Pandas. The table for staff counts was exported as a DBF file.

S2015_OC - Full- and part-time staff by occupational category, race/ethnicity, and gender:  Fall 2015 

In [None]:
sourcefolder = 'IPEDS_data/IPEDS_2015-16_Final/'
sourcefile = 'S2015_OC.DBF'
ipeds_sch = gpd.read_file(sourcefolder+sourcefile)
ipeds_sch.head()

Unnamed: 0,UNITID,STAFFCAT,FTPT,OCCUPCAT,SABDTYPE,HRTOTLT,HRTOTLM,HRTOTLW,HRAIANT,HRAIANM,...,HR2MORT,HR2MORM,HR2MORW,HRUNKNT,HRUNKNM,HRUNKNW,HRNRALT,HRNRALM,HRNRALW,geometry
0,100654.0,1100.0,1.0,100.0,-2.0,963.0,422.0,541.0,2.0,0.0,...,0.0,0.0,0.0,8.0,2.0,6.0,41.0,25.0,16.0,
1,100663.0,1100.0,1.0,100.0,-2.0,9385.0,3929.0,5456.0,19.0,9.0,...,63.0,22.0,41.0,6.0,2.0,4.0,361.0,203.0,158.0,
2,100690.0,1100.0,1.0,100.0,-2.0,70.0,44.0,26.0,0.0,0.0,...,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,
3,100706.0,1100.0,1.0,100.0,-2.0,1521.0,787.0,734.0,9.0,5.0,...,1.0,0.0,1.0,16.0,7.0,9.0,34.0,22.0,12.0,
4,100724.0,1100.0,1.0,100.0,-2.0,1042.0,441.0,601.0,4.0,3.0,...,0.0,0.0,0.0,65.0,31.0,34.0,0.0,0.0,0.0,


In [None]:
ipeds_sch.UNITID.describe()

count    211925.000000
mean     252423.907826
std      123606.860865
min      100654.000000
25%      161457.000000
50%      206482.000000
75%      381945.000000
max      487676.000000
Name: UNITID, dtype: float64

In [None]:
# Keep one observation for each UNITID with total staff count
ipeds_sch_hrtotal = ipeds_sch.loc[(ipeds_sch['STAFFCAT'] == 1100) & (ipeds_sch['OCCUPCAT'] == 100)].copy()
#ipeds_sch_hrtotal = ipeds_sch.loc[(ipeds_sch['OCCUPCAT'] == 100)].copy()
ipeds_sch_hrtotal.head()

Unnamed: 0,UNITID,STAFFCAT,FTPT,OCCUPCAT,SABDTYPE,HRTOTLT,HRTOTLM,HRTOTLW,HRAIANT,HRAIANM,...,HR2MORT,HR2MORM,HR2MORW,HRUNKNT,HRUNKNM,HRUNKNW,HRNRALT,HRNRALM,HRNRALW,geometry
0,100654.0,1100.0,1.0,100.0,-2.0,963.0,422.0,541.0,2.0,0.0,...,0.0,0.0,0.0,8.0,2.0,6.0,41.0,25.0,16.0,
1,100663.0,1100.0,1.0,100.0,-2.0,9385.0,3929.0,5456.0,19.0,9.0,...,63.0,22.0,41.0,6.0,2.0,4.0,361.0,203.0,158.0,
2,100690.0,1100.0,1.0,100.0,-2.0,70.0,44.0,26.0,0.0,0.0,...,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,
3,100706.0,1100.0,1.0,100.0,-2.0,1521.0,787.0,734.0,9.0,5.0,...,1.0,0.0,1.0,16.0,7.0,9.0,34.0,22.0,12.0,
4,100724.0,1100.0,1.0,100.0,-2.0,1042.0,441.0,601.0,4.0,3.0,...,0.0,0.0,0.0,65.0,31.0,34.0,0.0,0.0,0.0,


In [None]:
ipeds_sch_hrtotal.UNITID.describe()

count      7282.000000
mean     285848.703378
std      135003.540788
min      100654.000000
25%      171308.500000
50%      224008.500000
75%      443691.500000
max      487676.000000
Name: UNITID, dtype: float64

In [None]:
ipeds_sch_hrtotal.loc[ipeds_sch_hrtotal['UNITID'] == 199281]

Unnamed: 0,UNITID,STAFFCAT,FTPT,OCCUPCAT,SABDTYPE,HRTOTLT,HRTOTLM,HRTOTLW,HRAIANT,HRAIANM,...,HR2MORT,HR2MORM,HR2MORW,HRUNKNT,HRUNKNM,HRUNKNW,HRNRALT,HRNRALM,HRNRALW,geometry
2729,199281.0,1100.0,1.0,100.0,-2.0,942.0,410.0,532.0,296.0,111.0,...,14.0,5.0,9.0,3.0,1.0,2.0,16.0,8.0,8.0,


In [None]:
ipeds_sch_hrtotal['ncesid'] = ipeds_sch_hrtotal['UNITID'].astype(int).apply(str)
ipeds_sch_hrtotal.loc[ipeds_sch_hrtotal['ncesid'] == '199281']

Unnamed: 0,UNITID,STAFFCAT,FTPT,OCCUPCAT,SABDTYPE,HRTOTLT,HRTOTLM,HRTOTLW,HRAIANT,HRAIANM,...,HR2MORM,HR2MORW,HRUNKNT,HRUNKNM,HRUNKNW,HRNRALT,HRNRALM,HRNRALW,geometry,ncesid
2729,199281.0,1100.0,1.0,100.0,-2.0,942.0,410.0,532.0,296.0,111.0,...,5.0,9.0,3.0,1.0,2.0,16.0,8.0,8.0,,199281


## Read in NCES School Location File
School location files were obtained and cleaned with the previous program `NCES_2av1_SelectCountySchools`

In [None]:
sourceprogram = 'NCES_2av1_SelectCountySchools_2021-06-06'
selected_schools = pd.read_csv(sourceprogram+'/'+sourceprogram+'.csv')
selected_schools.head()

Unnamed: 0.1,Unnamed: 0,ncesid,name,addr,city,stabbr,zip,cnty15,geometry,level,schtype,lat,lon,schyr
0,0,370004002349,CIS Academy,818 West 3rd Street,Pembroke,NC,28372,37155,POINT (-79.20335664043833 34.68503759480223),99,1,34.685038,-79.203357,2015-2016
1,1,370034603302,Southeastern Academy,12251 NC HWY 41 North,Lumberton,NC,28358,37155,POINT (-78.87378865362859 34.65169717880272),99,1,34.651697,-78.873789,2015-2016
2,2,370225003249,Sandy Grove Middle,300 Chason Road,Lumber Bridge,NC,28357,37155,POINT (-79.06581931618486 34.89650979378273),99,1,34.89651,-79.065819,2015-2016
3,3,370393001569,Deep Branch Elementary,4045 Deep Branch Road,Lumberton,NC,28360,37155,POINT (-79.14600999186194 34.63037683072379),99,1,34.630377,-79.14601,2015-2016
4,4,370393001570,Fairgrove Middle,1953 Fairgrove Sch Road,Fairmont,NC,28340,37155,POINT (-79.17370687961406 34.49329831006692),99,1,34.493298,-79.173707,2015-2016


## Merge Staff County Data with School Locations

In [None]:
ccd_sch['ncesid'] = ccd_sch['NCESSCH'].apply(str)
ccd_sch.head()

Unnamed: 0,SURVYEAR,FIPST,STABR,STATENAME,SEANAME,LEAID,ST_LEAID,LEA_NAME,SCHID,ST_SCHID,NCESSCH,SCH_NAME,FTE,ncesid
0,2015-2016,1,AL,ALABAMA,Alabama Department Of Education,100002,210,Alabama Youth Services,277,210-0020,10000200277,Sequoyah Sch - Chalkville Campus,-1.0,10000200277
1,2015-2016,1,AL,ALABAMA,Alabama Department Of Education,100002,210,Alabama Youth Services,1667,210-0050,10000201667,Camps,-1.0,10000201667
2,2015-2016,1,AL,ALABAMA,Alabama Department Of Education,100002,210,Alabama Youth Services,1670,210-0060,10000201670,Det Ctr,-1.0,10000201670
3,2015-2016,1,AL,ALABAMA,Alabama Department Of Education,100002,210,Alabama Youth Services,1705,210-0030,10000201705,Wallace Sch - Mt Meigs Campus,-1.0,10000201705
4,2015-2016,1,AL,ALABAMA,Alabama Department Of Education,100002,210,Alabama Youth Services,1706,210-0040,10000201706,McNeel Sch - Vacca Campus,-1.0,10000201706


In [None]:
addteacher = pd.merge(left = ccd_sch[['ncesid','FTE']],
                      right = selected_schools,
                     left_on = 'ncesid',
                     right_on = 'ncesid',
                     how = "right",
                     validate = "one_to_one")

In [None]:
addteacher.ncesid.describe()

count               55
unique              55
top       370034603302
freq                 1
Name: ncesid, dtype: object

In [None]:
addteacher['FTE'].describe()

count     45.000000
mean      33.398667
std       22.050478
min        7.990000
25%       19.180000
50%       28.880000
75%       38.400000
max      130.760000
Name: FTE, dtype: float64

In [None]:
addteacher = pd.merge(left = pss_sch[['ppin','p410']],
                      right = addteacher,
                     left_on = 'ppin',
                     right_on = 'ncesid',
                     how = "right",
                     validate = "one_to_one")

In [None]:
addteacher['p410'].describe()

count     5.000000
mean      6.000000
std       4.062019
min       3.000000
25%       4.000000
50%       4.000000
75%       6.000000
max      13.000000
Name: p410, dtype: float64

In [None]:
addteacher.ncesid.describe()

count               55
unique              55
top       370034603302
freq                 1
Name: ncesid, dtype: object

In [None]:
addteacher = pd.merge(left = ipeds_sch_hrtotal[['ncesid','HRTOTLT']],
                      right = addteacher,
                     left_on = 'ncesid',
                     right_on = 'ncesid',
                     how = "right",
                     validate = "one_to_one")

In [None]:
addteacher.ncesid.describe()

count               55
unique              55
top       370034603302
freq                 1
Name: ncesid, dtype: object

In [None]:
addteacher['HRTOTLT'].describe()

count      2.000000
mean     667.500000
std      388.201623
min      393.000000
25%      530.250000
50%      667.500000
75%      804.750000
max      942.000000
Name: HRTOTLT, dtype: float64

In [None]:
addteacher['numstaff'] = addteacher['FTE']
addteacher['numstaff'] = addteacher['numstaff'].fillna(addteacher['p410'])
addteacher['numstaff'] = addteacher['numstaff'].fillna(addteacher['HRTOTLT'])
addteacher['numstaff'].describe()

count     52.000000
mean      55.152692
std      136.875098
min        3.000000
25%       16.100000
50%       27.530000
75%       39.390000
max      942.000000
Name: numstaff, dtype: float64

In [None]:
addteacher.to_csv(programname+"/"+programname+".csv")