# Capstone 2 Data Wrangling File

The purpose is to download and review the following files from the GOSA website (for SY19-20):

**Student Operational Data**
- Attendance
- Enrollment by Subgroup Programs

**Teacher Categorization Data**
- Educator Experience
- Emergency and Provisional Teacher Credentials
- Out of Field Teachers


## Dataset Issues

In reviewing the data sets, a few issues were observed:
 - Institution/school numbers are not unique for the entire state. For example, Baker County K12 School in Baker county has the number 105; Lanier College and Career Academy in Hall County also has the number 105.
 - A number of counts for the datasets shows up as "TFS" which stands for too-few-students. According to the GOSA website, this means that fewer than 10 students were included. If an average score is intended for this count, it shows NaN instead.

## Imports

In [1]:
import pandas as pd

## Load the Georgia Education Data

In [2]:
# Gather URLs
act_url = 'https://download.gosa.ga.gov/2020/ACT_RECENT_2020_JUN_21_2021.csv'
attendance_url = 'https://download.gosa.ga.gov/2020/Attendance_2020_Dec112020.csv'
experience_url = 'https://download.gosa.ga.gov/2020/Educators_Inexperienced_2020_Dec112020.csv'
credential_url = 'https://download.gosa.ga.gov/2020/Educators_EMERGENCY_WAIVERS_2020_Dec112020.csv'
enrollment_url = 'https://download.gosa.ga.gov/2020/Enrollment_by_Subgroups_Programs_2020_Dec112020.csv'
oof_url = 'https://download.gosa.ga.gov/2020/Educators_OUT_OF_FIELD_2020_Dec112020.csv'
sat_url = 'https://download.gosa.ga.gov/2020/SAT_HIGHEST_2020_JUN_21_2021.csv'

In [3]:
# Create list for urls
urls = [act_url, attendance_url, experience_url, credential_url, enrollment_url, oof_url, sat_url]

In [4]:
# Create dataframes
d = {url: pd.read_csv(url) for url in urls}

## Explore Student Operational Data

### Attendance Data

In [5]:
attendance_df = d[attendance_url]

In [6]:
attendance_df.head()

Unnamed: 0,LONG_SCHOOL_YEAR,DETAIL_LVL_DESC,SCHOOL_DSTRCT_CD,SCHOOL_DSTRCT_NM,INSTN_NUMBER,INSTN_NAME,GRADES_SERVED_DESC,STUDENT_COUNT_ALL,FIVE_OR_FEWER_PERCENT_ALL,SIX_TO_FIFTEEN_PERCENT_ALL,...,CHRONIC_ABSENT_PERC_HISPANI,CHRONIC_ABSENT_PERC_MULTI,CHRONIC_ABSENT_PERC_FEMALE,CHRONIC_ABSENT_PERC_MALE,CHRONIC_ABSENT_PERC_SWD,CHRONIC_ABSENT_PERC_NOT_SWD,CHRONIC_ABSENT_PERC_ED,CHRONIC_ABSENT_PERC_NOT_ED,CHRONIC_ABSENT_PERC_LEP,CHRONIC_ABSENT_PERC_MIGRANT
0,2019-20,School,601,Appling County,103,Appling County High School,09101112,1027,55.6,32.9,...,11.8,23.1,10.7,13.3,15.4,11.6,12.1,0.0,13.6,8.7
1,2019-20,School,601,Appling County,177,Appling County Elementary School,02030405,520,62.5,32.5,...,4.4,8.3,6.8,5.2,6.5,5.8,6.0,0.0,4.9,0.0
2,2019-20,School,601,Appling County,195,Appling County Middle School,060708,867,62.5,30.1,...,3.9,17.6,6.2,8.3,10.4,6.7,7.3,0.0,2.6,1.9
3,2019-20,School,601,Appling County,277,Appling County Primary School,"PK,KK,01,02",579,54.9,37.8,...,6.7,11.5,7.6,7.2,10.3,7.0,7.4,0.0,3.7,0.0
4,2019-20,School,601,Appling County,1050,Altamaha Elementary School,"PK,KK,01,02,03,04,05",380,51.1,42.1,...,7.4,7.7,6.4,4.1,6.8,5.0,5.3,0.0,9.1,0.0


In [7]:
#attendance_df.info()

The attendance dataset shows the total number of students at the school, the percent of students missing fewer than five days, between 6-15 days, and more than 15 days. The percentages are also available for race, gender, students with disability designations, economically disadvantaged designations, limited English proficiency, and migrant status. Chronic absenteeism rates (missing at least 10% of school days) by subgroup is also included.

In [8]:
#How many districts are included?
attendance_df['SCHOOL_DSTRCT_NM'].nunique()

216

In [34]:
#How many schools are included?
attendance_df['INSTN_NAME'].nunique()

2181

In [38]:
#How many high schools are included?
attendance_hs = attendance_df[attendance_df['GRADES_SERVED_DESC'].str.contains('09')]
attendance_hs.head()

Unnamed: 0,LONG_SCHOOL_YEAR,DETAIL_LVL_DESC,SCHOOL_DSTRCT_CD,SCHOOL_DSTRCT_NM,INSTN_NUMBER,INSTN_NAME,GRADES_SERVED_DESC,STUDENT_COUNT_ALL,FIVE_OR_FEWER_PERCENT_ALL,SIX_TO_FIFTEEN_PERCENT_ALL,...,CHRONIC_ABSENT_PERC_HISPANI,CHRONIC_ABSENT_PERC_MULTI,CHRONIC_ABSENT_PERC_FEMALE,CHRONIC_ABSENT_PERC_MALE,CHRONIC_ABSENT_PERC_SWD,CHRONIC_ABSENT_PERC_NOT_SWD,CHRONIC_ABSENT_PERC_ED,CHRONIC_ABSENT_PERC_NOT_ED,CHRONIC_ABSENT_PERC_LEP,CHRONIC_ABSENT_PERC_MIGRANT
0,2019-20,School,601,Appling County,0103,Appling County High School,09101112,1027,55.6,32.9,...,11.8,23.1,10.7,13.3,15.4,11.6,12.1,0.0,13.6,8.7
6,2019-20,District,601,Appling County,ALL,All Column Values,"PK,KK,01,02,03,04,05,06,07,08,09,10,11,12",3533,57.4,34.4,...,6.7,17.0,8.0,8.8,10.5,8.0,8.4,0.0,4.7,2.3
7,2019-20,School,602,Atkinson County,0103,Atkinson County High School,09101112,486,64.4,28.4,...,7.4,0.0,6.4,7.2,3.0,7.4,6.8,0.0,19.4,14.8
11,2019-20,District,602,Atkinson County,ALL,All Column Values,"PK,KK,01,02,03,04,05,06,07,08,09,10,11,12",1702,60.3,33.2,...,5.9,10.0,6.8,7.1,6.3,7.0,6.9,0.0,5.4,8.1
14,2019-20,School,603,Bacon County,0302,Bacon County High School,09101112,603,50.7,32.7,...,17.4,20.0,15.7,17.8,27.8,15.1,22.1,11.8,20.0,16.0


### Enrollment Data

In [9]:
enrollment_df = d[enrollment_url]

In [10]:
enrollment_df.head()

Unnamed: 0,DETAIL_LVL_DESC,INSTN_NUMBER,SCHOOL_DSTRCT_CD,LONG_SCHOOL_YEAR,INSTN_NAME,SCHOOL_DSTRCT_NM,GRADES_SERVED_DESC,ENROLL_PERCENT_ASIAN,ENROLL_PERCENT_NATIVE,ENROLL_PERCENT_BLACK,...,ENROLL_COUNT_SPECIAL_ED_PK,ENROLL_PCT_SPECIAL_ED_PK,ENROLL_COUNT_VOCATION_9_12,ENROLL_PCT_VOCATION_9_12,ENROLL_COUNT_ALT_PROGRAMS,ENROLL_PCT_ALT_PROGRAMS,ENROLL_COUNT_GIFTED,ENROLL_PCT_GIFTED,ENROLL_PERCENT_MALE,ENROLL_PERCENT_FEMALE
0,School,103,601,2019-20,Appling County High School,Appling County,09101112,1.0,0.0,21.0,...,0,0.0,579.0,59.2,53.0,7.0,59.0,6.0,51.0,49.0
1,School,177,601,2019-20,Appling County Elementary School,Appling County,02030405,1.0,0.0,28.0,...,0,0.0,,,0.0,0.0,25.0,5.1,52.0,48.0
2,School,195,601,2019-20,Appling County Middle School,Appling County,060708,1.0,0.0,23.0,...,0,0.0,,,22.0,2.6,59.0,7.1,50.0,50.0
3,School,277,601,2019-20,Appling County Primary School,Appling County,"PK,KK,01,02",0.0,0.0,27.0,...,26,17.1,,,0.0,0.0,11.0,2.0,48.0,52.0
4,School,1050,601,2019-20,Altamaha Elementary School,Appling County,"PK,KK,01,02,03,04,05",0.0,0.0,7.0,...,10,20.0,,,0.0,0.0,36.0,10.0,50.0,50.0


In [11]:
#enrollment_df.info()

The enrollment dataset shows the grade levels at the school and the percentages of students by race and sub-groups (migrant, ED, SWD, LEP, gender). Also included are the counts and percentages of remedial middle school students, early intervention elementary students, special education, alternative programs, and gifted students.

## Explore Student Academic Performance Data

### ACT Scores

In [12]:
act_df = d[act_url]

In [13]:
act_df.head()

Unnamed: 0,LONG_SCHOOL_YEAR,SCHOOL_DISTRCT_CD,SCHOOL_DSTRCT_NM,INSTN_NUMBER,INSTN_NAME,SUBGRP_DESC,TEST_CMPNT_TYP_CD,NATIONAL_NUM_TESTED_CNT,STATE_NUM_TESTED_CNT,DSTRCT_NUM_TESTED_CNT,INSTN_NUM_TESTED_CNT,NATIONAL_AVG_SCORE_VAL,STATE_AVG_SCORE_VAL,DSTRCT_AVG_SCORE_VAL,INSTN_AVG_SCORE_VAL
0,2019-20,644,DeKalb County,2055,Druid Hills High School,All Students,English,1670497,26811,1337,80,19.7,20.2,19.5,21.6
1,2019-20,619,Calhoun County,113,Calhoun County High School,All Students,English,1670497,26811,TFS,TFS,19.7,20.2,,
2,2019-20,772,Dalton Public Schools,110,Morris Innovative High School,All Students,English,1670497,26811,82,TFS,19.7,20.2,18.5,
3,2019-20,604,Baker County,105,Baker County K12 School,All Students,English,1670497,26811,TFS,TFS,19.7,20.2,,
4,2019-20,657,Floyd County,107,Pepperell High School,All Students,English,1670497,26811,201,57,19.7,20.2,19.1,17.9


In [14]:
act_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2417 entries, 0 to 2416
Data columns (total 15 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   LONG_SCHOOL_YEAR         2417 non-null   object 
 1   SCHOOL_DISTRCT_CD        2417 non-null   int64  
 2   SCHOOL_DSTRCT_NM         2417 non-null   object 
 3   INSTN_NUMBER             2417 non-null   int64  
 4   INSTN_NAME               2417 non-null   object 
 5   SUBGRP_DESC              2417 non-null   object 
 6   TEST_CMPNT_TYP_CD        2417 non-null   object 
 7   NATIONAL_NUM_TESTED_CNT  2417 non-null   int64  
 8   STATE_NUM_TESTED_CNT     2417 non-null   int64  
 9   DSTRCT_NUM_TESTED_CNT    2417 non-null   object 
 10  INSTN_NUM_TESTED_CNT     2417 non-null   object 
 11  NATIONAL_AVG_SCORE_VAL   2417 non-null   float64
 12  STATE_AVG_SCORE_VAL      2417 non-null   float64
 13  DSTRCT_AVG_SCORE_VAL     2166 non-null   float64
 14  INSTN_AVG_SCORE_VAL     

In [15]:
# Check to see which subgroups are represented
act_df['SUBGRP_DESC'].unique()

array(['All Students'], dtype=object)

In [16]:
# Check to see which component scores are included
act_df['TEST_CMPNT_TYP_CD'].unique()

array(['English', 'Mathematics', 'Composite', 'Reading', 'Science',
       'Writing Subscore'], dtype=object)

In [17]:
# For a specific school, what data is included?
act_df[act_df['INSTN_NAME'] == 'Druid Hills High School']

Unnamed: 0,LONG_SCHOOL_YEAR,SCHOOL_DISTRCT_CD,SCHOOL_DSTRCT_NM,INSTN_NUMBER,INSTN_NAME,SUBGRP_DESC,TEST_CMPNT_TYP_CD,NATIONAL_NUM_TESTED_CNT,STATE_NUM_TESTED_CNT,DSTRCT_NUM_TESTED_CNT,INSTN_NUM_TESTED_CNT,NATIONAL_AVG_SCORE_VAL,STATE_AVG_SCORE_VAL,DSTRCT_AVG_SCORE_VAL,INSTN_AVG_SCORE_VAL
0,2019-20,644,DeKalb County,2055,Druid Hills High School,All Students,English,1670497,26811,1337,80,19.7,20.2,19.5,21.6
129,2019-20,644,DeKalb County,2055,Druid Hills High School,All Students,Reading,1670497,26810,1337,80,21.2,21.6,20.9,23.3
176,2019-20,644,DeKalb County,2055,Druid Hills High School,All Students,Science,1670497,26810,1337,80,20.6,20.8,19.9,21.9
228,2019-20,644,DeKalb County,2055,Druid Hills High School,All Students,Writing Subscore,678906,5088,332,13,6.4,6.8,6.7,6.5
1813,2019-20,644,DeKalb County,2055,Druid Hills High School,All Students,Mathematics,1670497,26810,1337,80,20.4,20.2,19.2,20.6
1870,2019-20,644,DeKalb County,2055,Druid Hills High School,All Students,Composite,1670497,26810,1337,80,20.6,20.8,20.0,22.0


In [18]:
# How many schools have TFS for the composite score?
act_df[(act_df['INSTN_NUM_TESTED_CNT'] == 'TFS') & (act_df['TEST_CMPNT_TYP_CD'] == 'Composite')].nunique()

LONG_SCHOOL_YEAR            1
SCHOOL_DISTRCT_CD          46
SCHOOL_DSTRCT_NM           46
INSTN_NUMBER               35
INSTN_NAME                 48
SUBGRP_DESC                 1
TEST_CMPNT_TYP_CD           1
NATIONAL_NUM_TESTED_CNT     1
STATE_NUM_TESTED_CNT        1
DSTRCT_NUM_TESTED_CNT      16
INSTN_NUM_TESTED_CNT        1
NATIONAL_AVG_SCORE_VAL      1
STATE_AVG_SCORE_VAL         1
DSTRCT_AVG_SCORE_VAL       14
INSTN_AVG_SCORE_VAL         0
dtype: int64

The ACT dataset shows the school information (name, number) and component (English, Reading, Science, Writing, Math) and composite counts and average scores. The counts and average schools are provided for the national, state, district, and school levels. 

For some counts, "TFS" (too few students) is listed. This results in a NaN value for the average.

### SAT Scores

In [19]:
sat_df = d[sat_url]

In [20]:
sat_df.head()

Unnamed: 0,LONG_SCHOOL_YEAR,SCHOOL_DISTRCT_CD,SCHOOL_DSTRCT_NM,INSTN_NUMBER,INSTN_NAME,SUBGRP_DESC,TEST_CMPNT_TYP_CD,NATIONAL_NUM_TESTED_CNT,STATE_NUM_TESTED_CNT,DSTRCT_NUM_TESTED_CNT,INSTN_NUM_TESTED_CNT,STATE_AVG_SCORE_VAL,DSTRCT_AVG_SCORE_VAL,INSTN_AVG_SCORE_VAL
0,2019-20,601,Appling County,103,Appling County High School,All Students,Combined Test Score,2198460,43074,62,62,1028,1005,1005
1,2019-20,601,Appling County,103,Appling County High School,All Students,Essay Analysis Score - New,1258478,15097,17,17,3,2,2
2,2019-20,601,Appling County,103,Appling County High School,All Students,Essay Reading Score - New,1258478,15097,17,17,5,4,4
3,2019-20,601,Appling County,103,Appling County High School,All Students,Essay Total,1258478,15097,17,17,13,11,11
4,2019-20,601,Appling County,103,Appling County High School,All Students,Essay Writing Score - New,1258478,15097,17,17,5,5,5


In [21]:
sat_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3276 entries, 0 to 3275
Data columns (total 14 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   LONG_SCHOOL_YEAR         3276 non-null   object
 1   SCHOOL_DISTRCT_CD        3276 non-null   int64 
 2   SCHOOL_DSTRCT_NM         3276 non-null   object
 3   INSTN_NUMBER             3276 non-null   int64 
 4   INSTN_NAME               3276 non-null   object
 5   SUBGRP_DESC              3276 non-null   object
 6   TEST_CMPNT_TYP_CD        3276 non-null   object
 7   NATIONAL_NUM_TESTED_CNT  3276 non-null   int64 
 8   STATE_NUM_TESTED_CNT     3276 non-null   int64 
 9   DSTRCT_NUM_TESTED_CNT    3276 non-null   object
 10  INSTN_NUM_TESTED_CNT     3276 non-null   object
 11  STATE_AVG_SCORE_VAL      3276 non-null   int64 
 12  DSTRCT_AVG_SCORE_VAL     3276 non-null   object
 13  INSTN_AVG_SCORE_VAL      3276 non-null   object
dtypes: int64(5), object(9)
memory usage: 358

In [22]:
# Check to see which subject scores are included
sat_df['TEST_CMPNT_TYP_CD'].unique()

array(['Combined Test Score', 'Essay Analysis Score - New',
       'Essay Reading Score - New', 'Essay Total',
       'Essay Writing Score - New', 'Math Section Score - New',
       'Reading Test  Score - New', 'WritLang Test  Score - New'],
      dtype=object)

In [23]:
# How many schools have TFS for the combined test score?
sat_df[(sat_df['INSTN_NUM_TESTED_CNT'] == 'TFS') 
       & (sat_df['TEST_CMPNT_TYP_CD'] == 'Combined Test Score')].sort_values(by = 'INSTN_NUMBER')

Unnamed: 0,LONG_SCHOOL_YEAR,SCHOOL_DISTRCT_CD,SCHOOL_DSTRCT_NM,INSTN_NUMBER,INSTN_NAME,SUBGRP_DESC,TEST_CMPNT_TYP_CD,NATIONAL_NUM_TESTED_CNT,STATE_NUM_TESTED_CNT,DSTRCT_NUM_TESTED_CNT,INSTN_NUM_TESTED_CNT,STATE_AVG_SCORE_VAL,DSTRCT_AVG_SCORE_VAL,INSTN_AVG_SCORE_VAL
2692,2019-20,736,Thomas County,100,Bishop Hall Charter School,All Students,Combined Test Score,2198460,43074,93,TFS,1028,1009,TFS
2652,2019-20,731,Taliaferro County,102,Taliaferro County School,All Students,Combined Test Score,2198460,43074,TFS,TFS,1028,TFS,TFS
3200,2019-20,7830103,Commission Charter Schools- CCAT School,103,Statesboro STEAM Academy,All Students,Combined Test Score,2198460,43074,TFS,TFS,1028,TFS,TFS
500,2019-20,629,Clarke County,104,Classic City High School,All Students,Combined Test Score,2198460,43074,318,TFS,1028,967,TFS
24,2019-20,604,Baker County,105,Baker County K12 School,All Students,Combined Test Score,2198460,43074,TFS,TFS,1028,TFS,TFS
1740,2019-20,669,Hall County,105,Lanier College and Career Academy,All Students,Combined Test Score,2198460,43074,604,TFS,1028,1039,TFS
104,2019-20,610,Berrien County,106,Berrien Academy Performance Learning Center,All Students,Combined Test Score,2198460,43074,71,TFS,1028,1012,TFS
2548,2019-20,722,Rockdale County,106,Z_Rockdale Career Academy,All Students,Combined Test Score,2198460,43074,520,TFS,1028,984,TFS
1252,2019-20,657,Floyd County,107,Pepperell High School,All Students,Combined Test Score,2198460,43074,96,TFS,1028,1064,TFS
3128,2019-20,772,Dalton Public Schools,110,Morris Innovative High School,All Students,Combined Test Score,2198460,43074,184,TFS,1028,970,TFS


The SAT dataset shows the school information (name, number) and component, and counts and average scores. The counts and average schools are provided for the national, state, district, and school levels. 

Similar to the AP and ACT datasets, for some counts, "TFS" (too few students) is listed. This results in a NaN value for the average.

## Expore Teacher Categorization Data

### Educator Experience

In [24]:
experience_df = d[experience_url]

In [25]:
experience_df.head()

Unnamed: 0,LONG_SCHOOL_YEAR,SCHOOL_DSTRCT_NM,INSTN_NAME,LABEL_LVL_3_DESC,LABEL_LVL_2_DESC,FTE,INEXPERIENCED_FTE,INEXPERIENCED_FTE_PCT
0,2019-20,Appling County,Altamaha Elementary School,Leaders,Not Applicable,1.0,0.0,0
1,2019-20,Appling County,Altamaha Elementary School,Teachers,Total,27.3,6.0,22
2,2019-20,Appling County,Appling County Elementary School,Leaders,Not Applicable,2.0,1.5,75
3,2019-20,Appling County,Appling County Elementary School,Teachers,Total,44.9,14.0,31
4,2019-20,Appling County,Appling County High School,Leaders,Not Applicable,2.5,0.5,20


In [26]:
#experience_df.info()

The experience dataset shows the total number of full time employees at the school at the leader and teacher levels. Also included are the number of inexperienced FTEs and the percentage of total FTEs that are inexperienced.

### Teacher Emergency Credentials

In [27]:
credential_df = d[credential_url]

In [28]:
credential_df.head()

Unnamed: 0,LONG_SCHOOL_YEAR,SCHOOL_DSTRCT_NM,INSTN_NAME,LABEL_LVL_3_DESC,LABEL_LVL_2_DESC,FTE,OUTOFFIELD_FTE,OUTOFFIELD_FTE_PCT
0,2019-20,Appling County,Altamaha Elementary School,Teachers,Total,27.3,1.0,4
1,2019-20,Appling County,Appling County Elementary School,Teachers,Total,44.9,2.0,4
2,2019-20,Appling County,Appling County High School,Teachers,Total,55.7,8.1,15
3,2019-20,Appling County,Appling County Middle School,Teachers,Total,53.9,13.0,24
4,2019-20,Appling County,Appling County Primary School,Teachers,Total,48.2,0.0,0


In [29]:
credential_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3779 entries, 0 to 3778
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   LONG_SCHOOL_YEAR    3779 non-null   object 
 1   SCHOOL_DSTRCT_NM    3779 non-null   object 
 2   INSTN_NAME          3779 non-null   object 
 3   LABEL_LVL_3_DESC    3779 non-null   object 
 4   LABEL_LVL_2_DESC    3779 non-null   object 
 5   FTE                 3779 non-null   float64
 6   OUTOFFIELD_FTE      3779 non-null   float64
 7   OUTOFFIELD_FTE_PCT  3779 non-null   int64  
dtypes: float64(2), int64(1), object(5)
memory usage: 236.3+ KB


The emergency credentials dataset shows the total number of full time employees at the school at the teacher level. Also included are the number of out-of-field FTEs (those with emergency or provisial credentials) and the percentage of total FTEs that are out-of-field.

Note that this description of out-of-field is different from the out-of-field dataset.

### Out-of-Field Teachers

In [30]:
oof_df = d[oof_url]

In [31]:
oof_df.head()

Unnamed: 0,LONG_SCHOOL_YEAR,SCHOOL_DSTRCT_NM,INSTN_NAME,LABEL_LVL_3_DESC,LABEL_LVL_2_DESC,FTE,OUTOFFIELD_FTE,OUTOFFIELD_FTE_PCT
0,2019-20,Appling County,Altamaha Elementary School,Teachers,Total,27.3,1.0,4
1,2019-20,Appling County,Appling County Elementary School,Teachers,Total,44.9,1.0,2
2,2019-20,Appling County,Appling County High School,Teachers,Total,55.7,2.0,4
3,2019-20,Appling County,Appling County Middle School,Teachers,Total,53.9,0.0,0
4,2019-20,Appling County,Appling County Primary School,Teachers,Total,48.2,0.0,0


In [32]:
oof_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3779 entries, 0 to 3778
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   LONG_SCHOOL_YEAR    3779 non-null   object 
 1   SCHOOL_DSTRCT_NM    3779 non-null   object 
 2   INSTN_NAME          3779 non-null   object 
 3   LABEL_LVL_3_DESC    3779 non-null   object 
 4   LABEL_LVL_2_DESC    3779 non-null   object 
 5   FTE                 3779 non-null   float64
 6   OUTOFFIELD_FTE      3779 non-null   float64
 7   OUTOFFIELD_FTE_PCT  3779 non-null   int64  
dtypes: float64(2), int64(1), object(5)
memory usage: 236.3+ KB


In [33]:
oof_df.compare(credential_df)

Unnamed: 0_level_0,OUTOFFIELD_FTE,OUTOFFIELD_FTE,OUTOFFIELD_FTE_PCT,OUTOFFIELD_FTE_PCT
Unnamed: 0_level_1,self,other,self,other
1,1.0,2.0,2.0,4.0
2,2.0,8.1,4.0,15.0
3,0.0,13.0,0.0,24.0
5,4.0,24.1,2.0,10.0
7,3.0,7.3,10.0,23.0
...,...,...,...,...
3774,0.0,3.7,0.0,9.0
3775,0.0,2.0,0.0,4.0
3776,0.0,2.0,0.0,4.0
3777,0.0,13.4,0.0,7.0


The out-of-field dataset shows the total number of full time employees at the school at the teacher level. Also included are the number of out-of-field FTEs (those teaching in a subject or field for which the teacher is not certified or licensed) and the percentage of total FTEs that are out-of-field.

Note that this definition of out-of-field is different from the emergency credentials dataset.