The preprocessing is performed via the SchoolYear class found in the source folder.

In [3]:
import os, sys
# Set absolute path to the root folder of the directory
full_path = os.getcwd()
home_folder = 'CPS_GradRate_Analysis'
root = full_path.split(home_folder)[0] + home_folder + '/'
sys.path.append(root)

In [7]:
%load_ext autoreload
%autoreload 2

from src.preprocessing_schoolid import SchoolYear


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


The SchoolYear class represents data from a CPS school gathered during an individual school year.  

The SchoolYear class takes two arguments: 
  - a path to a School Profile CSV from a given year.
  - a path to a Progress Report CSV from the same year.
  
The school year profile `csvs` have been downloaded from the [Chicago Data Portal](https://data.cityofchicago.org/). 

  - [2016-2017 Profile](https://data.cityofchicago.org/Education/Chicago-Public-Schools-School-Profile-Information-/8i6r-et8s)
  - [2017-2018 Profile](https://data.cityofchicago.org/Education/Chicago-Public-Schools-School-Profile-Information-/w4qj-h7bg)
  - [2018-2019 Profile](https://data.cityofchicago.org/Education/Chicago-Public-Schools-School-Profile-Information-/kh4r-387c)

Files should be downloaded and placed in the `data/chicago_data_portal_csv_files` folder

In [11]:
# After downloading the csv's, instantiate a SchoolYear object 
path_to_pr_1819 = '../data/chicago_data_portal_csv_files/Chicago_Public_Schools_-_School_Progress_Reports_SY1819.csv'
path_to_sp_1819 = '../data/chicago_data_portal_csv_files/Chicago_Public_Schools_-_School_Profile_Information_SY1819.csv'
sy_1819 = SchoolYear(path_to_sp_1819, path_to_pr_1819)

In [23]:
!ls ../data/chicago_data_portal_csv_files/

Chicago_Public_Schools_-_School_Profile_Information_SY1617.csv
Chicago_Public_Schools_-_School_Profile_Information_SY1718.csv
Chicago_Public_Schools_-_School_Profile_Information_SY1819.csv
Chicago_Public_Schools_-_School_Progress_Reports_SY1516.csv
Chicago_Public_Schools_-_School_Progress_Reports_SY1819.csv


In [24]:
path_to_pr_1718 = '../data/chicago_data_portal_csv_files/Chicago_Public_Schools_-_School_Progress_Reports_SY1718.csv'
path_to_sp_1718 = '../data/chicago_data_portal_csv_files/Chicago_Public_Schools_-_School_Profile_Information_SY1718.csv'
sy_1718 = SchoolYear(path_to_sp_1718, path_to_pr_1718)

The original data has been converted into dataframes, which can be accessed by the `sp_df` and `pr_df` attributes.

In [12]:
sy_1819.sp_df.sample()

Unnamed: 0,School_ID,Legacy_Unit_ID,Finance_ID,Short_Name,Long_Name,Primary_Category,Is_High_School,Is_Middle_School,Is_Elementary_School,Is_Pre_School,Summary,Administrator_Title,Administrator,Secondary_Contact_Title,Secondary_Contact,Address,City,State,Zip,Phone,Fax,CPS_School_Profile,Website,Facebook,Twitter,Youtube,Pinterest,Attendance_Boundaries,Grades_Offered_All,Grades_Offered,Student_Count_Total,Student_Count_Low_Income,Student_Count_Special_Ed,Student_Count_English_Learners,Student_Count_Black,Student_Count_Hispanic,Student_Count_White,Student_Count_Asian,Student_Count_Native_American,Student_Count_Other_Ethnicity,Student_Count_Asian_Pacific_Islander,Student_Count_Multi,Student_Count_Hawaiian_Pacific_Islander,Student_Count_Ethnicity_Not_Available,Statistics_Description,Demographic_Description,Dress_Code,PreK_School_Day,Kindergarten_School_Day,School_Hours,Freshman_Start_End_Time,After_School_Hours,Earliest_Drop_Off_Time,Classroom_Languages,Bilingual_Services,Refugee_Services,Title_1_Eligible,PreSchool_Inclusive,Preschool_Instructional,Significantly_Modified,Hard_Of_Hearing,Visual_Impairments,Transportation_Bus,Transportation_El,Transportation_Metra,School_Latitude,School_Longitude,Average_ACT_School,Mean_ACT,College_Enrollment_Rate_School,College_Enrollment_Rate_Mean,Graduation_Rate_School,Graduation_Rate_Mean,Overall_Rating,Rating_Status,Rating_Statement,Classification_Description,School_Year,Third_Contact_Title,Third_Contact_Name,Fourth_Contact_Title,Fourth_Contact_Name,Fifth_Contact_Title,Fifth_Contact_Name,Sixth_Contact_Title,Sixth_Contact_Name,Seventh_Contact_Title,Seventh_Contact_Name,Network,Is_GoCPS_Participant,Is_GoCPS_PreK,Is_GoCPS_Elementary,Is_GoCPS_High_School,Open_For_Enrollment_Date,Closed_For_Enrollment_Date
327,610103,5180,24751,OKEEFFE,Isabelle C O'Keeffe Elementary School,ES,False,True,True,True,O’Keeffe School of Excellence will prepare our...,Principal,Tabitha Natasha White,Assistant Principal,Lindsay Mortensen,6940 S MERRILL AVE,Chicago,Illinois,60649,7735351000.0,7735351000.0,http://cps.edu/Schools/Pages/school.aspx?Schoo...,http://okeeffe.soe.org,,,,,True,"PK,K,1,2,3,4,5,6,7,8","PK,K-8",623,575,83,9,606,7,2,0,0,0,0,6,1,1,There are 623 students enrolled at OKEEFFE. 9...,The largest demographic at OKEEFFE is Black. ...,True,Full Day,Full Day,08:45 AM-03:45 PM,,4:00 pm - 5:30 pm,8:30 am,,,,True,,,,,,"5, 15, 71, J14",,Metra Electric District (ME),41.768602,-87.572806,,,,68.2,,78.2,Level 2,INTENSIVE SUPPORT,"This school received a Level 2 rating, which i...",Schools that have an attendance boundary. Gene...,School Year 2018-2019,,,,,,,,,,,AUSL,True,False,True,False,09/01/2004 12:00:00 AM,


In [13]:
# For the 2018/2019 school year, there are 660 records and 95 columns in the school profile csv: 
print(sy_1819.sp_df.shape)

(660, 95)


In [14]:
# For the 2018/2019 school year, there are 654 records and 182 columns in the school profile csv: 
print(sy_1819.pr_df.shape)

(654, 182)


The merge_pr_and_sp method merges the Progress Report (pr_df) and School Profile (sp_df) dataframes on School_id.
It is called in the objects __init__ function.
   - merged_df: a dataframe that will be altered.

In [15]:
sy_1819.merged_df.sample()

Unnamed: 0,School_ID,Legacy_Unit_ID,Finance_ID,Short_Name_sp,Long_Name_sp,Primary_Category_sp,Is_High_School,Is_Middle_School,Is_Elementary_School,Is_Pre_School,Summary,Administrator_Title,Administrator,Secondary_Contact_Title,Secondary_Contact,Address_sp,City_sp,State_sp,Zip_sp,Phone_sp,Fax_sp,CPS_School_Profile_sp,Website_sp,Facebook,Twitter,Youtube,Pinterest,Attendance_Boundaries,Grades_Offered_All,Grades_Offered,Student_Count_Total,Student_Count_Low_Income,Student_Count_Special_Ed,Student_Count_English_Learners,Student_Count_Black,Student_Count_Hispanic,Student_Count_White,Student_Count_Asian,Student_Count_Native_American,Student_Count_Other_Ethnicity,Student_Count_Asian_Pacific_Islander,Student_Count_Multi,Student_Count_Hawaiian_Pacific_Islander,Student_Count_Ethnicity_Not_Available,Statistics_Description,Demographic_Description,Dress_Code,PreK_School_Day,Kindergarten_School_Day,School_Hours,Freshman_Start_End_Time,After_School_Hours,Earliest_Drop_Off_Time,Classroom_Languages,Bilingual_Services,Refugee_Services,Title_1_Eligible,PreSchool_Inclusive,Preschool_Instructional,Significantly_Modified,Hard_Of_Hearing,Visual_Impairments,Transportation_Bus,Transportation_El,Transportation_Metra,School_Latitude_sp,School_Longitude_sp,Average_ACT_School,Mean_ACT,College_Enrollment_Rate_School,College_Enrollment_Rate_Mean,Graduation_Rate_School,Graduation_Rate_Mean,Overall_Rating,Rating_Status,Rating_Statement,Classification_Description,School_Year,Third_Contact_Title,Third_Contact_Name,Fourth_Contact_Title,Fourth_Contact_Name,Fifth_Contact_Title,Fifth_Contact_Name,Sixth_Contact_Title,Sixth_Contact_Name,Seventh_Contact_Title,Seventh_Contact_Name,Network,Is_GoCPS_Participant,Is_GoCPS_PreK,Is_GoCPS_Elementary,Is_GoCPS_High_School,Open_For_Enrollment_Date,Closed_For_Enrollment_Date,Short_Name_pr,Long_Name_pr,School_Type,Primary_Category_pr,Address_pr,City_pr,State_pr,Zip_pr,Phone_pr,Fax_pr,CPS_School_Profile_pr,Website_pr,Progress_Report_Year,Blue_Ribbon_Award_Year,Excelerate_Award_Gold_Year,Spot_Light_Award_Year,Improvement_Award_Year,Excellence_Award_Year,Student_Growth_Rating,Student_Growth_Description,Growth_Reading_Grades_Tested_Pct_ES,Growth_Reading_Grades_Tested_Label_ES,Growth_Math_Grades_Tested_Pct_ES,Growth_Math_Grades_Tested_Label_ES,Student_Attainment_Rating,Student_Attainment_Description,Attainment_Reading_Pct_ES,Attainment_Reading_Lbl_ES,Attainment_Math_Pct_ES,Attainment_Math_Lbl_ES,Culture_Climate_Rating,Culture_Climate_Description,School_Survey_Student_Response_Rate_Pct,School_Survey_Student_Response_Rate_Avg_Pct,School_Survey_Teacher_Response_Rate_Pct,School_Survey_Teacher_Response_Rate_Avg_Pct,School_Survey_Parent_Response_Rate_Pct,School_Survey_Parent_Response_Rate_Avg_Pct,Healthy_School_Certification,Healthy_School_Certification_Description,Creative_School_Certification,Creative_School_Certification_Description,NWEA_Reading_Growth_Grade_3_Pct,NWEA_Reading_Growth_Grade_3_Lbl,NWEA_Reading_Growth_Grade_4_Pct,NWEA_Reading_Growth_Grade_4_Lbl,NWEA_Reading_Growth_Grade_5_Pct,NWEA_Reading_Growth_Grade_5_Lbl,NWEA_Reading_Growth_Grade_6_Pct,NWEA_Reading_Growth_Grade_6_Lbl,NWEA_Reading_Growth_Grade_7_Pct,NWEA_Reading_Growth_Grade_7_Lbl,NWEA_Reading_Growth_Grade_8_Pct,NWEA_Reading_Growth_Grade_8_Lbl,NWEA_Math_Growth_Grade_3_Pct,NWEA_Math_Growth_Grade_3_Lbl,NWEA_Math_Growth_Grade_4_Pct,NWEA_Math_Growth_Grade_4_Lbl,NWEA_Math_Growth_Grade_5_Pct,NWEA_Math_Growth_Grade_5_Lbl,NWEA_Math_Growth_Grade_6_Pct,NWEA_Math_Growth_Grade_6_Lbl,NWEA_Math_Growth_Grade_7_Pct,NWEA_Math_Growth_Grade_7_Lbl,NWEA_Math_Growth_Grade_8_Pct,NWEA_Math_Growth_Grade_8_Lbl,NWEA_Reading_Attainment_Grade_2_Pct,NWEA_Reading_Attainment_Grade_2_Lbl,NWEA_Reading_Attainment_Grade_3_Pct,NWEA_Reading_Attainment_Grade_3_Lbl,NWEA_Reading_Attainment_Grade_4_Pct,NWEA_Reading_Attainment_Grade_4_Lbl,NWEA_Reading_Attainment_Grade_5_Pct,NWEA_Reading_Attainment_Grade_5_Lbl,NWEA_Reading_Attainment_Grade_6_Pct,NWEA_Reading_Attainment_Grade_6_Lbl,NWEA_Reading_Attainment_Grade_7_Pct,NWEA_Reading_Attainment_Grade_7_Lbl,NWEA_Reading_Attainment_Grade_8_Pct,NWEA_Reading_Attainment_Grade_8_Lbl,NWEA_Math_Attainment_Grade_2_Pct,NWEA_Math_Attainment_Grade_2_Lbl,NWEA_Math_Attainment_Grade_3_Pct,NWEA_Math_Attainment_Grade_3_Lbl,NWEA_Math_Attainment_Grade_4_Pct,NWEA_Math_Attainment_Grade_4_Lbl,NWEA_Math_Attainment_Grade_5_Pct,NWEA_Math_Attainment_Grade_5_Lbl,NWEA_Math_Attainment_Grade_6_Pct,NWEA_Math_Attainment_Grade_6_Lbl,NWEA_Math_Attainment_Grade_7_Pct,NWEA_Math_Attainment_Grade_7_Lbl,NWEA_Math_Attainment_Grade_8_Pct,NWEA_Math_Attainment_Grade_8_Lbl,School_Survey_Involved_Families,School_Survey_Supportive_Environment,School_Survey_Ambitious_Instruction,School_Survey_Effective_Leaders,School_Survey_Collaborative_Teachers,School_Survey_Safety,Suspensions_Per_100_Students_Year_1_Pct,Suspensions_Per_100_Students_Year_2_Pct,Suspensions_Per_100_Students_Avg_Pct,Misconducts_To_Suspensions_Year_1_Pct,Misconducts_To_Suspensions_Year_2_Pct,Misconducts_To_Suspensions_Avg_Pct,Average_Length_Suspension_Year_1_Pct,Average_Length_Suspension_Year_2_Pct,Average_Length_Suspension_Avg_Pct,Behavior_Discipline_Year_1,Behavior_Discipline_Year_2,School_Survey_School_Community,School_Survey_Parent_Teacher_Partnership,School_Survey_Quality_Of_Facilities,Student_Attendance_Year_1_Pct,Student_Attendance_Year_2_Pct,Student_Attendance_Avg_Pct,Teacher_Attendance_Year_1_Pct,Teacher_Attendance_Year_2_Pct,Teacher_Attendance_Avg_Pct,One_Year_Dropout_Rate_Year_1_Pct,One_Year_Dropout_Rate_Year_2_Pct,One_Year_Dropout_Rate_Avg_Pct,Other_Metrics_Year_1,Other_Metrics_Year_2,Freshmen_On_Track_School_Pct_Year_2,Freshmen_On_Track_CPS_Pct_Year_2,Freshmen_On_Track_School_Pct_Year_1,Freshmen_On_Track_CPS_Pct_Year_1,Graduation_4_Year_School_Pct_Year_2,Graduation_4_Year_CPS_Pct_Year_2,Graduation_4_Year_School_Pct_Year_1,Graduation_4_Year_CPS_Pct_Year_1,Graduation_5_Year_School_Pct_Year_2,Graduation_5_Year_CPS_Pct_Year_2,Graduation_5_Year_School_Pct_Year_1,Graduation_5_Year_CPS_Pct_Year_1,College_Enrollment_School_Pct_Year_2,College_Enrollment_CPS_Pct_Year_2,College_Enrollment_School_Pct_Year_1,College_Enrollment_CPS_Pct_Year_1,College_Persistence_School_Pct_Year_2,College_Persistence_CPS_Pct_Year_2,College_Persistence_School_Pct_Year_1,College_Persistence_CPS_Pct_Year_1,Progress_Toward_Graduation_Year_1,Progress_Toward_Graduation_Year_2,State_School_Report_Card_URL,Mobility_Rate_Pct,Chronic_Truancy_Pct,Empty_Progress_Report_Message,School_Survey_Rating_Description,Supportive_School_Award,Supportive_School_Award_Desc,Parent_Survey_Results_Year,School_Latitude_pr,School_Longitude_pr,PSAT_Grade_9_Score_School_Avg,PSAT_Grade_10_Score_School_Avg,SAT_Grade_11_Score_School_Avg,SAT_Grade_11_Score_CPS_Avg,Growth_PSAT_Grade_9_School_Pct,Growth_PSAT_Grade_9_School_Lbl,Growth_PSAT_Reading_Grade_10_School_Pct,Growth_PSAT_Reading_Grade_10_School_Lbl,Growth_SAT_Grade_11_School_Pct,Growth_SAT_Grade_11_School_Lbl,Attainment_PSAT_Grade_9_School_Pct,Attainment_PSAT_Grade_9_School_Lbl,Attainment_PSAT_Grade_10_School_Pct,Attainment_PSAT_Grade_10_School_Lbl,Attainment_SAT_Grade_11_School_Pct,Attainment_SAT_Grade_11_School_Lbl,Attainment_All_Grades_School_Pct,Attainment_All_Grades_School_Lbl,Growth_PSAT_Math_Grade_10_School_Pct,Growth_PSAT_Math_Grade_10_School_Lbl,Growth_SAT_Reading_Grade_11_School_Pct,Growth_SAT_Reading_Grade_11_School_Lbl,Growth_SAT_Math_Grade_11_School_Pct,Growth_SAT_Math_Grade_11_School_Lbl
118,400053,1934,66145,NOBLE - GOLDER HS,Noble - Golder College Prep,HS,True,False,False,False,"Golder College Prep fosters curious, competent...",Director,Mr. Vincent Gay,AP of Operations,Amy Hynes,1454 W SUPERIOR ST,Chicago,Illinois,60642,3122660000.0,3122438000.0,http://cps.edu/Schools/Pages/school.aspx?Schoo...,http://www.goldercollegeprep.org,https://www.facebook.com/GolderCollegePrep,,,,False,9101112,9-12,657,592,113,70,103,523,15,2,2,0,0,4,0,8,There are 657 students enrolled at NOBLE - GOL...,The largest demographic at NOBLE - GOLDER HS i...,True,,,7:55 AM - 3:40 PM,"M - R 7:55 a.m. - 3:40 p.m., F 7:55 a.m. - 1:0...",3:50 PM - 4:50 PM,7:00 AM,"French, Spanish",True,False,True,,,,,,"9, 65, 66",Blue,,41.895282,-87.664483,,,87.1,68.2,87.0,78.2,Level 1+,NOT APPLICABLE,"This school received a Level 1+ rating, which ...","Schools that are open to all Chicago children,...",School Year 2018-2019,Deans of Instructions,Jenna Mullen and Chad French,Office Manager/Enrollment,Janet Calvillo,Dean of Culture,Jorge Pagan,Deans of Students,Megan McCabe and Carolyn Raeckers,,,Charter,True,False,False,True,09/01/2004 12:00:00 AM,,NOBLE - GOLDER HS,Noble - Golder College Prep,Charter,HS,1454 W SUPERIOR ST,Chicago,Illinois,60642,3122660000.0,3122438000.0,http://cps.edu/Schools/Pages/school.aspx?Schoo...,https://nobleschools.org/golder,2018,,,,,,ABOVE AVERAGE,Student Growth measures the change in standard...,,,,,ABOVE EXPECTATIONS,Student Attainment measures how well the schoo...,,,,,ORGANIZED,Results are based on student and teacher respo...,85.2,81.4,91.1,79.9,< 30%,35.6,Not Achieved,This school is not a participant of the Health...,EMERGING,This school is Emerging in the arts. It rarely...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NEUTRAL,STRONG,VERY STRONG,NEUTRAL,NEUTRAL,WEAK,,,5.6,,,13.5,,,2.0 days,2018.0,2017.0,NOT ENOUGH DATA,NOT ENOUGH DATA,NOT ENOUGH DATA,93.1,93.2,93.3,,,95.0,3.0,2.5,6.4,2017.0,2018.0,86.6,89.4,85.5,88.7,80.0,75.6,84.7,74.7,87.0,78.2,86.2,77.5,87.1,68.2,81.1,59.8,72.9,72.3,70.9,71.9,2017.0,2018.0,http://iirc.niu.edu/School.aspx?schoolid=15016...,8.0,37.9,,This school is “Organized for Improvement” whi...,NOT RATED,This school has not submitted an action plan t...,2018.0,41.895282,-87.664483,870.0,949.0,1035.0,969.0,98.0,98th,99.0,99th,98.0,98th,56.8,56.8,64.2,64.2,63.2,63.2,61.2,61.2,28.0,28th,50.0,50th,67.0,67th


In [16]:
# As one would expect, the merged dataframe has 651 rows and 276 columns:
sy_1819.merged_df.shape

(651, 276)

# Attributes

The SchoolYear class has attributes describing the number of schools and total number of high schools each year.

In [17]:
sy_1819.total_school_count

651

In [18]:
sy_1819.total_high_school_count

176

# Preprocessing (which will not cause leakage with train-test-split)

There are various preprocessing techniques that will make analysis, visualization, and modelling easier. The preprocessing below are meant to be performed prior to train-test-split or crossvalidation.  They will not cause data leakage.

The preprocessing methods are as follows:

  - **convert_is_high_school**: some School Year Profiles encode high school as a a boolean, others encode it as Y/N.  this function ensures all are booleans.
  - **make_percent_low_income**: create a column which divides Student_Count_Low_Income by Student_Count_Total

In [16]:
# Ensure Is_High_School is a boolean
sy_1819.convert_is_high_school_to_bool()['Is_High_School'][:5]

0     True
1    False
2     True
3    False
4    False
Name: Is_High_School, dtype: bool

In [28]:
sy_1718.convert_is_high_school_to_bool()['Is_High_School'][:5]

0    False
1     True
2     True
3     True
4    False
Name: Is_High_School, dtype: bool

In [17]:
# Create perc_low_income column

sy_1819.make_percent_low_income()['perc_low_income'][:5]

0    0.654028
1    0.082397
2    0.974026
3    0.783133
4    0.266667
Name: perc_low_income, dtype: float64

# Filtering

Various filtering decisions will be crucial to analysis and model building. For example, if one were modeling graduation rates, only high schools with graduation rates would be included in the data set.  Or, for the same graudation rate problem, one may want to filter out Options Schools. Options Schools serve special populations and may have different missions than non-Option schools.  Their graduation rates, for example, can be near zero.

In [18]:
# Make another copy of the 1819 School Year to compare changes after filtering
sy_1819_unaltered = SchoolYear(path_to_sp_1819, path_to_pr_1819)

## Isolating High Schools

For modeling graduation rates, a first step is to remove all schools other than high schools from the dataset.
The `isolate_high_schools` method does that. 

In [19]:
sy_1819.isolate_high_schools()
sy_1819.merged_df.sample(5)['Is_High_School']

586    True
118    True
627    True
417    True
316    True
Name: Is_High_School, dtype: bool

# Drop No Graduation Rate Schools

In [20]:
sy_1819.drop_no_gr_schools()
sy_1819.merged_df['Graduation_Rate_School'].isna().sum()

0

# Isolate Important Columns



The preprocessing function, isolate_important_columns, reduces the number of columns in the datasets from 92 - 20.

In [None]:
from src.preprocessing.preprocessing import isolate_important_columns

df_dict = {year: isolate_important_columns(df_dict[year]) for year in df_dict}
df_dict['2017-2018']

After this reduction, the following columns are left:

  - School_ID
  - Graduation_Rate_School
  - Student_Count_Total
  - Student_Count_Low_Income
  - Student_Count_Special_Ed
  - Student_Count_English_Learners
  - 10 Columns Counting Populations of Different Ethnicities
  - **Is_High_School**
  - Dress_Code
  - Classroom_Languages
  - Transportation_El
  
The bolded columns require preprocessing, which is shown below.

# Is_High_School

The school profiles for 2016-2017 and 2017-2018 encode `Is_High_School` as 'Y/N', whereas 2018-2019 encodes it as 'True/False'.  

The function below converts Y/N to True/False to ensure consistency.

In [None]:
from src.preprocessing.preprocessing import convert_is_high_school_to_bool

df_dict = {year: convert_is_high_school_to_bool(df_dict[year]) for year in df_dict}
df_dict['2016-2017']['Is_High_School']

# Dress_Code

The same conversions are applied to the Dress_Code column

In [None]:
from src.preprocessing.preprocessing import convert_dress_code_to_bool

df_dict = {year: convert_dress_code_to_bool(df_dict[year]) for year in df_dict}
df_dict['2016-2017']['Dress_Code']

In [None]:
# Add Year column to dataframes

In [None]:
df_dict['2018-2019']

In [None]:
# Interesting: primary category would be a good feature to change to Primary_Is_High_School.  
# This would give a signal of whether a school is a specifically a high school.
df_hs['2018-2019']['Primary_Category']