# Data Preparation

### Filtering Strategy

**Aggregate Level:** "S". Focuses analysis on individual schools.  
**CharterSchool:** "No" or "N". Excludes charter schools to focus on traditional public high schools.   
**DASS:** "No" or "N". Removes alternative/continuation programs so graduation rates reflect typical comprehensive high schools.   
**ReportingCategory:** "TA". Keeps aggregate totals for each school (not broken down by subgroup) to simplify modeling.


In [1]:
# import libraries
import importlib
import os

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import subprocess
import sys

from pathlib import Path

In [3]:
# import other libraries
from helper import (
    rpkl
)

# check if jcds library is installed
package_name = "jcds"

if importlib.util.find_spec(package_name) is None:
    print(f" '{package_name}' not found. Installing from Github... ")
    subprocess.check_call(
        [
            sys.executable,
            "-m",
            "pip",
            "install",
            "https://github.com/junclemente/jcds.git",
        ]
    )
else:
    print(f" '{package_name}' is already installed.")

from jcds import eda as jeda
from jcds import reports as jrep

 'jcds' is already installed.


In [4]:
# main data folder path
data_folder = Path("../data")
raw_pickle = Path(data_folder / "raw_pickle")

# California: Department of Education

## Adjusted Cohort Graduation Rate and Outcome Data (ACGR)

**Adjusted Cohort Graduation Rate and Outcome Data**
Four-year Adjusted Cohort Graduation Rate (ACGR) and Outcome data reported by race/ethnicity, student group, and gender.  

[Data Dictionary: ACGR](https://www.cde.ca.gov/ds/ad/fsacgr.asp)

In [None]:
df_acgr = rpkl(raw_pickle, "raw_acgr.pkl")

df_acgr.columns.to_list()

['AcademicYear',
 'AggregateLevel',
 'CountyCode',
 'DistrictCode',
 'SchoolCode',
 'CountyName',
 'DistrictName',
 'SchoolName',
 'CharterSchool',
 'DASS',
 'ReportingCategory',
 'CohortStudents',
 'Regular HS Diploma Graduates (Count)',
 'Regular HS Diploma Graduates (Rate)',
 "Met UC/CSU Grad Req's (Count)",
 "Met UC/CSU Grad Req's (Rate)",
 'Seal of Biliteracy (Count)',
 'Seal of Biliteracy (Rate)',
 'Golden State Seal Merit Diploma (Count)',
 'Golden State Seal Merit Diploma (Rate',
 'CHSPE Completer (Count)',
 'CHSPE Completer (Rate)',
 'Adult Ed. HS Diploma (Count)',
 'Adult Ed. HS Diploma (Rate)',
 'SPED Certificate (Count)',
 'SPED Certificate (Rate)',
 'GED Completer (Count)',
 'GED Completer (Rate)',
 'Other Transfer (Count)',
 'Other Transfer (Rate)',
 'Dropout (Count)',
 'Dropout (Rate)',
 'Still Enrolled (Count)',
 'Still Enrolled (Rate)']

In [6]:
# select columns
cols_acgr = [
    "AcademicYear",
    "AggregateLevel",
    "CountyCode",
    "DistrictCode",
    "SchoolCode",
    "CountyName",
    "DistrictName",
    "SchoolName",
    "CharterSchool",
    "DASS",
    "ReportingCategory",
    "CohortStudents",  # QA for weighing
    # "Regular HS Diploma Graduates (Count)",
    "Regular HS Diploma Graduates (Rate)",  # target variable
    # "Met UC/CSU Grad Req's (Count)",
    "Met UC/CSU Grad Req's (Rate)",  # academic readiness/intensity feature
    # "Seal of Biliteracy (Count)",
    "Seal of Biliteracy (Rate)",  # ??? language proficiency
    # "Golden State Seal Merit Diploma (Count)",
    # "Golden State Seal Merit Diploma (Rate",
    # "CHSPE Completer (Count)",
    # "CHSPE Completer (Rate)",
    # "Adult Ed. HS Diploma (Count)",
    # "Adult Ed. HS Diploma (Rate)",
    # "SPED Certificate (Count)",
    # "SPED Certificate (Rate)",
    # "GED Completer (Count)",
    # "GED Completer (Rate)",
    # "Other Transfer (Count)",
    # "Other Transfer (Rate)",
    # "Dropout (Count)",
    "Dropout (Rate)",  # secondary target
    # "Still Enrolled (Count)",
    "Still Enrolled (Rate)",  # 5th year senior
]

df_acgr[cols_acgr]

Unnamed: 0,AcademicYear,AggregateLevel,CountyCode,DistrictCode,SchoolCode,CountyName,DistrictName,SchoolName,CharterSchool,DASS,ReportingCategory,CohortStudents,Regular HS Diploma Graduates (Rate),Met UC/CSU Grad Req's (Rate),Seal of Biliteracy (Rate),Dropout (Rate),Still Enrolled (Rate)
66594,2020-21,S,01,31609,0131755,Alameda,California School for the Blind (State Special...,California School for the Blind,No,No,TA,11,0.0,0.0,0.0,63.6,0.0
66654,2020-21,S,01,31617,0131763,Alameda,California School for the Deaf-Fremont (State ...,California School for the Deaf-Fremont,No,No,TA,38,63.2,0.0,33.3,2.6,28.9
66718,2020-21,S,01,61119,0000001,Alameda,Alameda Unified,"Nonpublic, Nonsectarian Schools",No,No,TA,*,*,*,*,*,*
66782,2020-21,S,01,61119,0106401,Alameda,Alameda Unified,Alameda Science and Technology Institute,No,No,TA,43,100.0,95.3,2.3,0.0,0.0
66910,2020-21,S,01,61119,0130229,Alameda,Alameda Unified,Alameda High,No,No,TA,394,92.4,73.9,22.8,2.3,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
254262,2020-21,S,58,72736,0000000,Yuba,Marysville Joint Unified,District Office,No,No,TA,69,56.5,0.0,0.0,43.5,0.0
254314,2020-21,S,58,72736,0000001,Yuba,Marysville Joint Unified,"Nonpublic, Nonsectarian Schools",No,No,TA,*,*,*,*,*,*
254426,2020-21,S,58,72736,5830013,Yuba,Marysville Joint Unified,Lindhurst High,No,No,TA,226,87.2,36.0,11.7,6.2,5.8
254630,2020-21,S,58,72736,5835202,Yuba,Marysville Joint Unified,Marysville High,No,No,TA,201,90.5,37.4,1.1,7.0,2.0


## Absenteeism

### Chronic Absenteeism Data

[Data Dictionary: Chronic Absenteeism](https://www.cde.ca.gov/ds/ad/fsabd.asp)


In [None]:
# load raw dataset and filter
df_chron_abs = rpkl(raw_pickle, "raw_chronic_absent.pkl")
df_chron_abs.columns.to_list()


['Academic Year',
 'Aggregate Level',
 'County Code',
 'District Code',
 'School Code',
 'County Name',
 'District Name',
 'School Name',
 'Charter School',
 'Reporting Category',
 'ChronicAbsenteeismEligibleCumula',
 'ChronicAbsenteeismCount',
 'ChronicAbsenteeismRate']

In [9]:
# select columns
chron_abs_cols = [
    "Academic Year",
    "Aggregate Level",
    "County Code",
    "District Code",
    "School Code",
    "County Name",
    "District Name",
    "School Name",
    # "Charter School",
    # "Reporting Category",
    # "ChronicAbsenteeismEligibleCumula",
    # "ChronicAbsenteeismCount",
    "ChronicAbsenteeismRate",
]

df_chron_abs[chron_abs_cols]

Unnamed: 0,Academic Year,Aggregate Level,County Code,District Code,School Code,County Name,District Name,School Name,ChronicAbsenteeismRate
57598,2020-21,S,01,10017,0130419,Alameda,Alameda County Office of Education,Alameda County Community,84.4
57599,2020-21,S,01,10017,0130401,Alameda,Alameda County Office of Education,Alameda County Juvenile Hall/Court,61.7
57621,2020-21,S,01,31609,0131755,Alameda,California School for the Blind (State Special...,California School for the Blind,8.8
57644,2020-21,S,01,31617,0131763,Alameda,California School for the Deaf-Fremont (State ...,California School for the Deaf-Fremont,11.6
58027,2020-21,S,01,61119,6090013,Alameda,Alameda Unified,Edison Elementary,3.7
...,...,...,...,...,...,...,...,...,...
263006,2020-21,S,58,72751,6056832,Yuba,Wheatland,Lone Tree Elementary,16.7
263008,2020-21,S,58,72751,6056840,Yuba,Wheatland,Wheatland Elementary,27.1
263062,2020-21,S,58,72769,0123570,Yuba,Wheatland Union High,Wheatland Community Day High,
263063,2020-21,S,58,72769,0133751,Yuba,Wheatland Union High,Edward P. Duplex,100


### Absenteeism by Reason

[Data Dictionary: Absenteeism by Reason](https://www.cde.ca.gov/ds/ad/fsabr.asp)


In [12]:
df_abs = rpkl(raw_pickle, "raw_absent_reason.pkl")

df_abs.columns.to_list()

['Academic Year',
 'Aggregate Level',
 'County Code',
 'District Code',
 'School Code',
 'County Name',
 'District Name',
 'School Name',
 'Charter School',
 'DASS',
 'Reporting Category',
 'Eligible Cumulative Enrollment',
 'Count of Students with One or More Absences',
 'Average Days Absent',
 'Total Days Absent',
 'Excused Absences (percent)',
 'Unexcused Absences (percent)',
 'Out-of-School Suspension Absences (percent)',
 'Incomplete Independent Study Absences (percent)',
 'Excused Absences (count)',
 'Unexcused Absences (count)',
 'Out-of-School Suspension Absences (count)',
 'Incomplete Independent Study Absences (count)']

In [13]:
# select columns
abs_cols = [
    "Academic Year",
    "Aggregate Level",
    "County Code",
    "District Code",
    "School Code",
    "County Name",
    "District Name",
    "School Name",
    "Charter School",
    "DASS",
    "Reporting Category",
    "Eligible Cumulative Enrollment",
    # "Count of Students with One or More Absences",
    # "Average Days Absent",
    # "Total Days Absent",
    # "Excused Absences (percent)",
    "Unexcused Absences (percent)",
    "Out-of-School Suspension Absences (percent)",
    # "Incomplete Independent Study Absences (percent)",
    # "Excused Absences (count)",
    # "Unexcused Absences (count)",
    # "Out-of-School Suspension Absences (count)",
    # "Incomplete Independent Study Absences (count)",
]

df_abs[abs_cols]

Unnamed: 0,Academic Year,Aggregate Level,County Code,District Code,School Code,County Name,District Name,School Name,Charter School,DASS,Reporting Category,Eligible Cumulative Enrollment,Unexcused Absences (percent),Out-of-School Suspension Absences (percent)
583,2021-22,S,01,31609,0131755,Alameda,California School for the Blind (State Special...,California School for the Blind,No,No,TA,67,46.9,0
608,2021-22,S,01,31617,0131763,Alameda,California School for the Deaf-Fremont (State ...,California School for the Deaf-Fremont,No,No,TA,329,41.8,1.8
628,2021-22,S,01,61119,0000000,Alameda,Alameda Unified,District Office,No,No,TA,22,0,0
647,2021-22,S,01,61119,0106401,Alameda,Alameda Unified,Alameda Science and Technology Institute,No,No,TA,170,20.2,0
670,2021-22,S,01,61119,0111765,Alameda,Alameda Unified,Ruby Bridges Elementary,No,No,TA,473,28.4,0.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
227659,2021-22,S,58,72751,6056816,Yuba,Wheatland,Bear River,No,No,TA,586,43.9,0.2
227682,2021-22,S,58,72751,6056832,Yuba,Wheatland,Lone Tree Elementary,No,No,TA,390,29,0.1
227706,2021-22,S,58,72751,6056840,Yuba,Wheatland,Wheatland Elementary,No,No,TA,352,33.5,0
227743,2021-22,S,58,72769,0000000,Yuba,Wheatland Union High,District Office,No,No,TA,*,*,*


## Public Schools and Districts

[Data Dictionary: Public Schools and Districts](https://www.cde.ca.gov/ds/si/ds/fspubschls.asp)


In [14]:
df_schooldata = rpkl(raw_pickle, "raw_school_data.pkl")

df_schooldata.columns.to_list()

['CDSCode',
 'NCESDist',
 'NCESSchool',
 'StatusType',
 'County',
 'District',
 'School',
 'Street',
 'StreetAbr',
 'City',
 'Zip',
 'State',
 'MailStreet',
 'MailStrAbr',
 'MailCity',
 'MailZip',
 'MailState',
 'Phone',
 'Ext',
 'FaxNumber',
 'WebSite',
 'OpenDate',
 'ClosedDate',
 'Charter',
 'CharterNum',
 'FundingType',
 'DOC',
 'DOCType',
 'SOC',
 'SOCType',
 'EdOpsCode',
 'EdOpsName',
 'EILCode',
 'EILName',
 'GSoffered',
 'GSserved',
 'Virtual',
 'Magnet',
 'YearRoundYN',
 'FederalDFCDistrictID',
 'Latitude',
 'Longitude',
 'AdmFName',
 'AdmLName',
 'LastUpDate',
 'Multilingual']

In [15]:
cols_schooldata = [
    "CDSCode",
    "NCESDist",
    "NCESSchool",
    "StatusType",
    "County",
    "District",
    "School",
    # "Street",
    # "StreetAbr",
    # "City",
    "Zip",
    # "State",
    # "MailStreet",
    # "MailStrAbr",
    # "MailCity",
    # "MailZip",
    # "MailState",
    # "Phone",
    # "Ext",
    # "FaxNumber",
    # "WebSite",
    "OpenDate",
    # "ClosedDate",
    "Charter",
    # "CharterNum",
    # "FundingType",
    # "DOC",
    "DOCType",
    # "SOC",
    "SOCType",
    "EdOpsCode",
    # "EdOpsName",
    "EILCode",
    # "EILName",
    # "GSoffered",
    "GSserved",
    "Virtual",
    "Magnet",
    "YearRoundYN",
    "FederalDFCDistrictID",
    "Latitude",
    "Longitude",
    # "AdmFName",
    # "AdmLName",
    # "LastUpDate",
    "Multilingual",
]

df_schooldata[cols_schooldata]

Unnamed: 0,CDSCode,NCESDist,NCESSchool,StatusType,County,District,School,Zip,OpenDate,Charter,...,EdOpsCode,EILCode,GSserved,Virtual,Magnet,YearRoundYN,FederalDFCDistrictID,Latitude,Longitude,Multilingual
2,01100170112607,0691051,10947,Active,Alameda,Alameda County Office of Education,Envision Academy for Arts & Technology,94612-3355,2006-08-28 00:00:00,Y,...,TRAD,ELEMHIGH,6-12,N,N,N,0601614,37.804520,-122.26815,N
3,01100170114363,0691051,12013,Active,Alameda,Alameda County Office of Education,American Indian Public Charter School II,94607-4900,2007-07-01 00:00:00,Y,...,TRAD,ELEM,K-8,N,N,N,0601880,37.800368,-122.26548,N
5,01100170123968,0691051,12844,Active,Alameda,Alameda County Office of Education,Community School for Creative Education,94606-4903,2011-08-22 00:00:00,Y,...,TRAD,ELEM,K-8,N,N,N,0601691,37.784648,-122.23863,Y
6,01100170124172,0691051,12901,Active,Alameda,Alameda County Office of Education,Yu Ming Charter,94607-2477,2011-08-09 00:00:00,Y,...,TRAD,ELEM,K-8,N,N,N,0602013,37.818228,-122.28233,Y
8,01100170126748,0691051,13155,Active,Alameda,Alameda County Office of Education,LPS Oakland R & D Campus,94605-4037,2012-08-21 00:00:00,Y,...,TRAD,HS,9-12,N,N,N,0601967,37.759536,-122.16291,N
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18390,58727516056816,0642330,06925,Active,Yuba,Wheatland,Bear River,95692-9286,1980-07-01 00:00:00,N,...,TRAD,INTMIDJR,4-8,N,N,N,No Data,39.007843,-121.43617,N
18392,58727516056832,0642330,06927,Active,Yuba,Wheatland,Lone Tree Elementary,95903,1980-07-01 00:00:00,N,...,TRAD,ELEM,K-5,N,N,N,No Data,39.100027,-121.33614,N
18393,58727516056840,0642330,06928,Active,Yuba,Wheatland,Wheatland Elementary,95692-8215,1980-07-01 00:00:00,N,...,TRAD,ELEM,K-3,N,N,N,No Data,39.012487,-121.42900,N
18396,58727516118806,0642330,11548,Active,Yuba,Wheatland,Wheatland Charter Academy,95903,2001-08-22 00:00:00,Y,...,TRAD,ELEM,K-5,N,N,N,No Data,39.102475,-121.33536,N


## Free or Reduced-Price Meal (Student Poverty)

[Data Dictionary: FRPM ](https://www.cde.ca.gov/ds/ad/fsspfrpm.asp)


In [16]:
df_frpm = rpkl(raw_pickle, "raw_frpm.pkl")

df_frpm.columns.to_list()

['Academic Year',
 'County Code',
 'District Code',
 'School Code',
 'County Name',
 'District Name',
 'School Name',
 'District Type',
 'School Type',
 'Educational Option Type',
 'NSLP Provision Status',
 'Charter School (Y/N)',
 'Charter School Number',
 'Charter Funding Type',
 'IRC',
 'Low Grade',
 'High Grade',
 'Enrollment (K-12)',
 'Free Meal Count (K-12)',
 'Percent (%) Eligible Free (K-12)',
 'FRPM Count (K-12)',
 'Percent (%) Eligible FRPM (K-12)',
 'Enrollment (Ages 5-17)',
 'Free Meal Count (Ages 5-17)',
 'Percent (%) Eligible Free (Ages 5-17)',
 'FRPM Count (Ages 5-17)',
 'Percent (%) Eligible FRPM (Ages 5-17)',
 'CALPADS Fall 1 Certification Status']

In [17]:
cols_frpm = [
    "Academic Year",
    "County Code",
    "District Code",
    "School Code",
    "County Name",
    "District Name",
    "School Name",
    "District Type",
    "School Type",
    "Educational Option Type",
    # "NSLP Provision Status",
    "Charter School (Y/N)",
    # "Charter School Number",
    # "Charter Funding Type",
    "IRC",
    # "Low Grade",
    # "High Grade",
    "Enrollment (K-12)",
    # "Free Meal Count (K-12)",
    "Percent (%) Eligible Free (K-12)",
    "FRPM Count (K-12)",
    "Percent (%) Eligible FRPM (K-12)",
    # "Enrollment (Ages 5-17)",
    # "Free Meal Count (Ages 5-17)",
    # "Percent (%) Eligible Free (Ages 5-17)",
    # "FRPM Count (Ages 5-17)",
    # "Percent (%) Eligible FRPM (Ages 5-17)",
    "CALPADS Fall 1 Certification Status",
]

df_frpm[cols_frpm]

Unnamed: 0,Academic Year,County Code,District Code,School Code,County Name,District Name,School Name,District Type,School Type,Educational Option Type,Charter School (Y/N),IRC,Enrollment (K-12),Percent (%) Eligible Free (K-12),FRPM Count (K-12),Percent (%) Eligible FRPM (K-12),CALPADS Fall 1 Certification Status
0,2021-2022,1,10017,130419,Alameda,Alameda County Office of Education,Alameda County Community,County Office of Education (COE),County Community,County Community School,N,N,57,0.789474,47,0.824561,Y
1,2021-2022,1,10017,130401,Alameda,Alameda County Office of Education,Alameda County Juvenile Hall/Court,County Office of Education (COE),Juvenile Court Schools,Juvenile Court School,N,N,64,1.000000,64,1.000000,Y
14,2021-2022,1,31609,131755,Alameda,California School for the Blind (State Special...,California School for the Blind,State Special Schools,State Special Schools,State Special School,N,N,62,1.000000,62,1.000000,Y
15,2021-2022,1,31617,131763,Alameda,California School for the Deaf-Fremont (State ...,California School for the Deaf-Fremont,State Special Schools,State Special Schools,State Special School,N,N,318,1.000000,318,1.000000,Y
17,2021-2022,1,61119,130229,Alameda,Alameda Unified,Alameda High,Unified School District,High Schools (Public),Traditional,N,N,1808,0.172013,327,0.180863,Y
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10550,2021-2022,58,72751,6056832,Yuba,Wheatland,Lone Tree Elementary,Elementary School District,Elementary Schools (Public),Traditional,N,N,351,0.193732,119,0.339031,Y
10552,2021-2022,58,72751,6056840,Yuba,Wheatland,Wheatland Elementary,Elementary School District,Elementary Schools (Public),Traditional,N,N,340,0.523529,189,0.555882,Y
10554,2021-2022,58,72769,133751,Yuba,Wheatland Union High,Edward P. Duplex,High School District,Continuation High Schools,Continuation School,N,N,45,0.711111,45,1.000000,Y
10556,2021-2022,58,72769,123570,Yuba,Wheatland Union High,Wheatland Community Day High,High School District,District Community Day Schools,Community Day School,N,N,5,0.600000,4,0.800000,Y


## CBEDS Data about Schools & Districts

[Data Dictionary: CBEDS](https://www.cde.ca.gov/ds/ad/fscbedsorab19.asp)

In [19]:
df_cbeds = rpkl(raw_pickle, "raw_cbeds.pkl")

df_cbeds.columns.to_list()

['Cdscode',
 'CountyName',
 'DistrictName',
 'SchoolName',
 'Description',
 'Level',
 'Section',
 'RowNumber',
 'Value',
 'Year']

In [20]:
col_cbeds = [
    "Cdscode",
    "CountyName",
    "DistrictName",
    "SchoolName",
    #  'Description',
    "Level",
    #  'Section',
    #  'RowNumber',
    "Value",
    "Year",
]

df_cbeds[col_cbeds]

Unnamed: 0,Cdscode,CountyName,DistrictName,SchoolName,Level,Value,Year
18,01100170112607,Alameda,Alameda County Office of Education,Envision Academy for Arts & Technology,S,True,2122
19,01100170112607,Alameda,Alameda County Office of Education,Envision Academy for Arts & Technology,S,True,2122
20,01100170112607,Alameda,Alameda County Office of Education,Envision Academy for Arts & Technology,S,0,2122
21,01100170112607,Alameda,Alameda County Office of Education,Envision Academy for Arts & Technology,S,0,2122
22,01100170112607,Alameda,Alameda County Office of Education,Envision Academy for Arts & Technology,S,True,2122
...,...,...,...,...,...,...,...
58706,58727695838305,Yuba,Wheatland Union High,Wheatland Union High,S,True,2122
58707,58727695838305,Yuba,Wheatland Union High,Wheatland Union High,S,True,2122
58708,58727695838305,Yuba,Wheatland Union High,Wheatland Union High,S,True,2122
58709,58727695838305,Yuba,Wheatland Union High,Wheatland Union High,S,20210811,2122


## Staff Data Files

### Student / Staff Ratio

[Data Dictionary: Student-Staff Ratio](https://www.cde.ca.gov/ds/ad/fsstrat.asp)

In [21]:
df_ss_ratio = rpkl(raw_pickle, "raw_student_staff_ratio.pkl")

df_ss_ratio.columns.to_list()

['Academic Year',
 'Aggregate Level',
 'County Code',
 'District Code',
 'School Code',
 'County Name',
 'District Name',
 'School Name',
 'Charter School',
 'DASS',
 'School Grade Span',
 'TOTAL_ENR_N',
 'TCH_FTE_N',
 'ADM_FTE_N',
 'PSV_FTE_N',
 'OTH_FTE_N',
 'STU_TCH_RATIO',
 'STU_ADM_RATIO',
 'STU_PSV_RATIO',
 'STU_OTH_RATIO']

In [22]:
cols_ss_ratio = [
    "Academic Year",
    "Aggregate Level",
    "County Code",
    "District Code",
    "School Code",
    "County Name",
    "District Name",
    "School Name",
    "Charter School",
    "DASS",
    "School Grade Span",
    # "TOTAL_ENR_N",
    # "TCH_FTE_N",
    # "ADM_FTE_N",
    # "PSV_FTE_N",
    # "OTH_FTE_N",
    "STU_TCH_RATIO",  # student / teacher ratio
    "STU_ADM_RATIO",  # student / admin ratio
    "STU_PSV_RATIO",  # student / counselor ratio
    # "STU_OTH_RATIO",
]

df_ss_ratio[cols_ss_ratio]

Unnamed: 0,Academic Year,Aggregate Level,County Code,District Code,School Code,County Name,District Name,School Name,Charter School,DASS,School Grade Span,STU_TCH_RATIO,STU_ADM_RATIO,STU_PSV_RATIO
556,2021-22,S,01,10017,0000000,Alameda,Alameda County Office of Education,District Office,N,N,GS_K12,*,*,*
571,2021-22,S,01,31609,0131755,Alameda,California School for the Blind (State Special...,California School for the Blind,N,N,GS_K12,4.8,12.4,3.9
572,2021-22,S,01,31617,0000000,Alameda,California School for the Deaf-Fremont (State ...,District Office,N,N,GS_K12,*,*,*
573,2021-22,S,01,31617,0131763,Alameda,California School for the Deaf-Fremont (State ...,California School for the Deaf-Fremont,N,N,GS_K12,4.4,40.3,159
574,2021-22,S,01,61119,0000000,Alameda,Alameda Unified,District Office,N,N,GS_K12,*,*,*
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30235,2021-22,S,58,72751,6056816,Yuba,Wheatland,Bear River,N,N,GS_K12,18.9,284,180.3
30236,2021-22,S,58,72751,6056832,Yuba,Wheatland,Lone Tree Elementary,N,N,GS_K6,20.6,*,175.5
30237,2021-22,S,58,72751,6056840,Yuba,Wheatland,Wheatland Elementary,N,N,GS_K6,20,340,123.6
30239,2021-22,S,58,72769,0000000,Yuba,Wheatland Union High,District Office,N,N,GS_K12,*,3,*


### Staff Education

[Data Dictionary: Staff Education](https://www.cde.ca.gov/ds/ad/fssted.asp)


In [23]:
df_staff_ed = rpkl(raw_pickle, "raw_staff_edu.pkl")

df_staff_ed.columns.to_list()

['Academic Year',
 'Aggregate Level',
 'County Code',
 'District Code',
 'School Code',
 'County Name',
 'District Name',
 'School Name',
 'Charter School',
 'DASS',
 'Staff Type',
 'School Grade Span',
 'Staff Gender',
 'Total Staff Count',
 'Associate',
 'Baccalaureate',
 'Baccalaureate Plus',
 'Master',
 'Master Plus',
 'Doctorate',
 'Special (Juris Doctor)',
 'None']

In [24]:
cols_staff_ed = [
    "Academic Year",
    "Aggregate Level",
    "County Code",
    "District Code",
    "School Code",
    "County Name",
    "District Name",
    "School Name",
    "Charter School",
    "DASS",
    "Staff Type",
    "School Grade Span",
    # "Staff Gender",
    "Total Staff Count",
    "Associate",
    "Baccalaureate",
    "Baccalaureate Plus",
    "Master",
    "Master Plus",
    "Doctorate",
    "Special (Juris Doctor)",
    "None",
]

df_staff_ed[cols_staff_ed]

Unnamed: 0,Academic Year,Aggregate Level,County Code,District Code,School Code,County Name,District Name,School Name,Charter School,DASS,...,School Grade Span,Total Staff Count,Associate,Baccalaureate,Baccalaureate Plus,Master,Master Plus,Doctorate,Special (Juris Doctor),None
7395,2021-22,S,01,10017,0000000,Alameda,Alameda County Office of Education,District Office,N,N,...,GS_K12,5,0,0,0,0,4,1,0,0
7396,2021-22,S,01,10017,0000000,Alameda,Alameda County Office of Education,District Office,N,N,...,GS_K12,5,0,0,0,0,4,1,0,0
7397,2021-22,S,01,10017,0000000,Alameda,Alameda County Office of Education,District Office,N,N,...,GS_K12,6,0,0,0,0,5,1,0,0
7398,2021-22,S,01,10017,0000000,Alameda,Alameda County Office of Education,District Office,N,N,...,GS_K12,6,0,0,0,0,5,1,0,0
7399,2021-22,S,01,10017,0000000,Alameda,Alameda County Office of Education,District Office,N,N,...,GS_K12,1,0,0,0,0,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
360896,2021-22,S,58,10587,0000000,Yuba,Yuba County Office of Education,District Office,N,N,...,GS_K12,11,0,2,1,7,0,1,0,0
360897,2021-22,S,58,10587,0000000,Yuba,Yuba County Office of Education,District Office,N,N,...,GS_K12,9,0,2,1,6,0,0,0,0
360898,2021-22,S,58,10587,0000000,Yuba,Yuba County Office of Education,District Office,N,N,...,GS_K12,2,0,0,0,1,0,1,0,0
360899,2021-22,S,58,10587,0000000,Yuba,Yuba County Office of Education,District Office,N,N,...,GS_K12,1,0,0,1,0,0,0,0,0


### Staff Experience

[Data Dictionary: Staff Experience](https://www.cde.ca.gov/ds/ad/fsstex.asp)

In [25]:
df_staff_xp = rpkl(raw_pickle, "raw_staff_exp.pkl")

df_staff_xp.columns.to_list()

['Academic Year',
 'Aggregate Level',
 'County Code',
 'District Code',
 'School Code',
 'County Name',
 'District Name',
 'School Name',
 'Charter School',
 'DASS',
 'Staff Type',
 'School Grade Span',
 'Staff Gender',
 'Total Staff Count',
 'Average Total Years Experience',
 'Average District Years Experience',
 'Experienced',
 'Inexperienced',
 'First Year',
 'Second Year']

In [26]:
cols_staff_xp = [
    "Academic Year",
    "Aggregate Level",
    "County Code",
    "District Code",
    "School Code",
    "County Name",
    "District Name",
    "School Name",
    "Charter School",
    "DASS",
    "Staff Type",
    "School Grade Span",
    # "Staff Gender",
    "Total Staff Count",
    "Average Total Years Experience",
    "Average District Years Experience",
    "Experienced",  # 2+ years experience
    "Inexperienced",  # <2 years experience
    "First Year",  # No of staff in 1st year
    "Second Year",  # No of staff in 2nd year
]

df_staff_xp[cols_staff_xp]

Unnamed: 0,Academic Year,Aggregate Level,County Code,District Code,School Code,County Name,District Name,School Name,Charter School,DASS,Staff Type,School Grade Span,Total Staff Count,Average Total Years Experience,Average District Years Experience,Experienced,Inexperienced,First Year,Second Year
7395,2021-22,S,01,10017,0000000,Alameda,Alameda County Office of Education,District Office,N,N,ADM,GS_K12,5,20.6,9.2,5,0,0,0
7396,2021-22,S,01,10017,0000000,Alameda,Alameda County Office of Education,District Office,N,N,ADM,GS_K12,5,20.6,9.2,5,0,0,0
7397,2021-22,S,01,10017,0000000,Alameda,Alameda County Office of Education,District Office,N,N,ALL,GS_K12,6,20.0,10.2,6,0,0,0
7398,2021-22,S,01,10017,0000000,Alameda,Alameda County Office of Education,District Office,N,N,ALL,GS_K12,6,20.0,10.2,6,0,0,0
7399,2021-22,S,01,10017,0000000,Alameda,Alameda County Office of Education,District Office,N,N,OTH,GS_K12,1,17.0,15.0,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
360896,2021-22,S,58,10587,0000000,Yuba,Yuba County Office of Education,District Office,N,N,ALL,GS_K12,11,21.6,14.6,11,0,0,0
360897,2021-22,S,58,10587,0000000,Yuba,Yuba County Office of Education,District Office,N,N,ALL,GS_K12,9,22.8,16.0,9,0,0,0
360898,2021-22,S,58,10587,0000000,Yuba,Yuba County Office of Education,District Office,N,N,ALL,GS_K12,2,16.5,8.5,2,0,0,0
360899,2021-22,S,58,10587,0000000,Yuba,Yuba County Office of Education,District Office,N,N,OTH,GS_K12,1,16.0,15.0,1,0,0,0


## Enrollment by School

[Data Dictionary: Enrollment by School](https://www.cde.ca.gov/ds/ad/fsenrps.asp)


In [27]:
df_enroll = rpkl(raw_pickle, "raw_school_enroll.pkl")

df_enroll.columns.to_list()

['ACADEMIC_YEAR',
 'CDS_CODE',
 'COUNTY',
 'DISTRICT',
 'SCHOOL',
 'ENR_TYPE',
 'RACE_ETHNICITY',
 'GENDER',
 'GR_KN',
 'GR_1',
 'GR_2',
 'GR_3',
 'GR_4',
 'GR_5',
 'GR_6',
 'GR_7',
 'GR_8',
 'UNGR_ELM',
 'GR_9',
 'GR_10',
 'GR_11',
 'GR_12',
 'UNGR_SEC',
 'ENR_TOTAL',
 'ADULT']

In [28]:
cols_enroll = [
    "ACADEMIC_YEAR",
    "CDS_CODE",
    "COUNTY",
    "DISTRICT",
    "SCHOOL",
    "ENR_TYPE",
    "RACE_ETHNICITY",
    "GENDER",
    # "GR_KN",
    # "GR_1",
    # "GR_2",
    # "GR_3",
    # "GR_4",
    # "GR_5",
    # "GR_6",
    # "GR_7",
    # "GR_8",
    # "UNGR_ELM",
    "GR_9",
    "GR_10",
    "GR_11",
    "GR_12",
    "UNGR_SEC",
    "ENR_TOTAL",
    # "ADULT",
]

df_enroll[cols_enroll]

Unnamed: 0,ACADEMIC_YEAR,CDS_CODE,COUNTY,DISTRICT,SCHOOL,ENR_TYPE,RACE_ETHNICITY,GENDER,GR_9,GR_10,GR_11,GR_12,UNGR_SEC,ENR_TOTAL
15,2020-21,01100170112607,ALAMEDA,Alameda County Office of Education,Envision Academy for Arts & Technology,P,0,F,1,0,1,0,0,6
16,2020-21,01100170112607,ALAMEDA,Alameda County Office of Education,Envision Academy for Arts & Technology,P,0,M,0,1,1,0,0,3
17,2020-21,01100170112607,ALAMEDA,Alameda County Office of Education,Envision Academy for Arts & Technology,P,1,F,1,1,0,1,0,3
18,2020-21,01100170112607,ALAMEDA,Alameda County Office of Education,Envision Academy for Arts & Technology,P,2,F,1,0,0,0,0,1
19,2020-21,01100170112607,ALAMEDA,Alameda County Office of Education,Envision Academy for Arts & Technology,P,2,M,0,1,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
793375,2022-23,58727695838305,YUBA,Wheatland Union High,Wheatland Union High,P,7,F,35,54,60,43,0,192
793376,2022-23,58727695838305,YUBA,Wheatland Union High,Wheatland Union High,P,7,M,61,67,61,63,0,252
793377,2022-23,58727695838305,YUBA,Wheatland Union High,Wheatland Union High,P,7,X,0,0,1,0,0,1
793378,2022-23,58727695838305,YUBA,Wheatland Union High,Wheatland Union High,P,9,F,16,10,14,12,0,52


# Ca DOE School Climate, Health, and Learning Surveys

## Perception of Safety by Grade Level

In [29]:
df_safety = rpkl(raw_pickle, "raw_safety_percept_grade.pkl")

df_safety.columns.to_list()

['geography',
 'geo_type',
 'grade',
 'very_safe_pct',
 'safe_pct',
 'neither_pct',
 'unsafe_pct',
 'very_unsafe_pct',
 'years',
 'level_of_safety_filter']

In [30]:
cols_safety = ['geography',
 'geo_type',
 'grade',
 'very_safe_pct',
 'safe_pct',
 'neither_pct',
 'unsafe_pct',
 'very_unsafe_pct',
 'years',
 'level_of_safety_filter']

df_safety[cols_safety]

Unnamed: 0,geography,geo_type,grade,very_safe_pct,safe_pct,neither_pct,unsafe_pct,very_unsafe_pct,years,level_of_safety_filter
0,California,State,9,0.128,0.420,0.364,0.053,0.035,2017-2019,All
1,California,State,11,0.134,0.403,0.373,0.055,0.036,2017-2019,All
2,Alameda County,County,9,0.132,0.459,0.341,0.044,0.023,2017-2019,All
3,Alameda County,County,11,0.145,0.423,0.351,0.051,0.029,2017-2019,All
4,Amador County,County,9,0.153,0.403,0.374,0.048,0.021,2017-2019,All
...,...,...,...,...,...,...,...,...,...,...
109,Ventura County,County,11,0.162,0.420,0.335,0.050,0.033,2017-2019,All
110,Yolo County,County,9,0.139,0.424,0.371,0.043,0.023,2017-2019,All
111,Yolo County,County,11,0.162,0.437,0.342,0.034,0.025,2017-2019,All
112,Yuba County,County,9,0.075,0.415,0.359,0.097,0.055,2017-2019,All


## Perception of Safety by School Connectedness

In [31]:
df_connected = rpkl(raw_pickle, "raw_safety_connect.pkl")

df_connected.columns.to_list()

['Geography',
 'Connectedness',
 'Very Safe',
 'Safe',
 'Neither Safe nor Unsafe',
 'Unsafe',
 'Very Unsafe',
 'Safety_Positive']

In [32]:
cols_connected = ['Geography',
 'Connectedness',
 'Very Safe',
 'Safe',
 'Neither Safe nor Unsafe',
 'Unsafe',
 'Very Unsafe',
 'Safety_Positive']

df_connected[cols_connected]

Unnamed: 0,Geography,Connectedness,Very Safe,Safe,Neither Safe nor Unsafe,Unsafe,Very Unsafe,Safety_Positive
0,California,High,0.268,0.559,0.157,0.011,0.005,0.827
1,California,Medium,0.052,0.334,0.520,0.065,0.028,0.386
2,California,Low,0.069,0.111,0.428,0.196,0.196,0.180
3,Alameda County,High,0.268,0.582,0.138,0.009,0.004,0.850
4,Alameda County,Medium,0.060,0.370,0.494,0.057,0.020,0.430
...,...,...,...,...,...,...,...,...
172,Yolo County,Medium,0.064,0.385,0.475,0.053,0.022,0.449
173,Yolo County,Low,0.083,0.136,0.456,0.163,0.162,0.219
174,Yuba County,High,0.234,0.584,0.160,0.015,0.007,0.818
175,Yuba County,Medium,0.036,0.331,0.498,0.086,0.049,0.367
