# Education data prep **Part 2**

### This script combines the following 3 datasets, aggregates them by county, redesigns column naming structure, and re-calculates rates:
1. District Student Mobility/Stability Statistics 2011-2012 **by Instructional Program/Service Type**
2. District Student Mobility/Stability Statistics 2011-2012 **by Gender & Race/Ethnicity**
3. District Graduation Data Statistics 2011-2012 **by Instructional Program Service Type**
## Reference: Column Naming conventions

- This dataset is designed so you should never have to look at the columns to find the name of one (since there are around 140 columns). Just look here for reference instead.
- For instance, to get the rate for any variable, just use `_rate` after a variable. So `graduated` becomes `graduated_rate`

| Type | Naming | Example |
| - | - | - |
| County Total | variable | `stable` |
| Count | group + variable | `disabled_stable` |
| Rate | group + variable + "rate" | `disabled_stable_rate` |
| Group Total | group + group total | `disabled_pupil_total` |

<br>

#### Mobility/Stability columns

| GROUPS | VARIABLES | GROUP TOTALS |
| - | - | - |
| disabled | stable | pupil_total |
| limited_eng | mobile | 
| poor | mobile_instances |
| migrant | 
| title_1 | 
| homeless |
| gifted |
| male |
| female |
| white |
| asian |
| black |
| hispanic |

<br>

#### Graduation columns

| GROUPS | VARIABLES | GROUP TOTALS |
| - | - | - |
| disabled | graduated | grad_base_total |
| limited_eng | completed |
| poor |
| migrant |
| title_1 |
| homeless |
| gifted |

<br>

**What are group totals?**
- Notice they aren't just called "total". This is because, for graduation data, we don't care about the total number of students. We care about the total number of students who are actually in the pool for graduation. So, we call it `grad_base_total` and use that when calculating rate

**Rates are calculated by dividing a variable by its group total, then multiplying by 100**

---
---
---

In [46]:
import pandas as pd
import df_util
from df_util import head
input_path = lambda name: f'../input-data/{name}.csv'
work_path = lambda name: f'../working-data/{name}.csv'

df_grad_rate_raw = pd.read_csv(input_path('dist_grad_rate__cfyh-6xxg'))
df_mobility_raw = pd.read_csv(input_path('dist_student_mobility__6wcd-ysh5'))
df_mobility_demo_raw = pd.read_csv(input_path('dist_mobility_demographics__rg84-k4d3'))

head(df_grad_rate_raw, df_mobility_raw, df_mobility_demo_raw)

38 cols x 185 rows


Unnamed: 0,organization_code,organization_name,students_with_disabilities_final_grad_base,students_with_disabilities_graduates_total,students_with_disabilities_graduation_rate,students_with_disabilities_completers_total,students_with_disabilities_completion_rate,limited_english_proficient_final_grad_base,limited_english_proficient_graduates_total,limited_english_proficient_graduation_rate,...,homeless_graduates_total,homeless_graduation_rate,homeless_completers_total,homeless_completion_rate,gifted_talented_final_grad_base,gifted_talented_graduates_total,gifted_talented_graduation_rate,gifted_talented_completers_total,gifted_talented_completion_rate,county_name
0,9999.0,STATE TOTAL,5775.0,3099.0,53.7,3222.0,55.8,6171.0,3289.0,53.3,...,1175.0,49.1,1262.0,52.7,6604.0,6048.0,91.6,6156.0,93.2,
1,10.0,MAPLETON 1,49.0,18.0,36.7,19.0,38.8,219.0,73.0,33.3,...,12.0,29.3,16.0,39.0,44.0,27.0,61.4,27.0,61.4,ADAMS
2,20.0,ADAMS 12 FIVE STAR SCHOOLS,250.0,118.0,47.2,127.0,50.8,379.0,257.0,67.8,...,62.0,58.5,65.0,61.3,227.0,201.0,88.5,208.0,91.6,ADAMS


60 cols x 184 rows


Unnamed: 0,school_year,org_code,organization_name,category,total_pupil_count_all_students,total_stable_pupil_count_all_students,total_stability_rate_all_students,total_mobile_student_count_all_students,total_student_mobility_rate_all_students,total_instances_of_mobility_all_students,...,homeless_student_mobility_rate,homeless_instances_of_mobility,homeless_mobility_incidence_rate,gifted_talented_pupil_count,gifted_talented_stable_student_count,gifted_talented_stability_rate,gifted_talented_mobile_student_count,gifted_talented_student_mobility_rate,gifted_talented_instances_of_mobility,gifted_talented_mobility_incidence_rate
0,20112012.0,9999.0,STATE TOTAL,1STATE TOTALS (INCLUDING ALTERNATIVE SCHOOLS),939283.0,705064.0,75.1,231706.0,24.7,253577.0,...,45.3,11558.0,54.2,73344.0,66620.0,90.8,6641.0,9.1,7366.0,10.0
1,20112012.0,10.0,MAPLETON 1,1DISTRICT TOTALS (INCLUDING ALTERNATIVE SCHOOLS),9037.0,5077.0,56.2,3919.0,43.4,4133.0,...,32.7,79.0,36.9,250.0,205.0,82.0,44.0,17.6,47.0,18.8
2,20112012.0,20.0,ADAMS 12 FIVE STAR SCHOOLS,1DISTRICT TOTALS (INCLUDING ALTERNATIVE SCHOOLS),49889.0,34283.0,68.7,15424.0,30.9,16854.0,...,57.2,481.0,68.2,3590.0,3225.0,89.8,361.0,10.1,404.0,11.3


74 cols x 185 rows


Unnamed: 0,school_year,org_code,organization_name,category,total_pupil_count,total_stable_student_count,total_stability_rate,total_mobile_student_count,total_student_mobility_rate,total_instances_of_mobility,...,total_native_hawaiian_or_other_pacific_islander_student_mobility_rate,total_native_hawaiian_or_other_pacific_islander_instances_of_mobility,total_native_hawaiian_or_other_pacific_islander_mobility_incidence_rate,total_two_or_more_races_pupil_count,total_two_or_more_races_stable_student_count,total_two_or_more_races_stability_rate,total_two_or_more_races_mobile_student_count,total_two_or_more_races_student_mobility_rate,total_two_or_more_races_instances_of_mobility,total_two_or_more_races_mobility_incidence_rate
0,20112012.0,9999.0,STATE TOTAL,1STATE TOTALS (INCLUDING ALTERNATIVE SCHOOLS),939283.0,705064.0,75.1,231706.0,24.7,253577.0,...,34.8,840.0,38.0,29329.0,21501.0,73.3,7718.0,26.3,8433.0,28.8
1,20112012.0,10.0,MAPLETON 1,1DISTRICT TOTALS (INCLUDING ALTERNATIVE SCHOOLS),9037.0,5077.0,56.2,3919.0,43.4,4133.0,...,70.8,17.0,70.8,219.0,129.0,58.9,90.0,41.1,91.0,41.6
2,20112012.0,20.0,ADAMS 12 FIVE STAR SCHOOLS,1DISTRICT TOTALS (INCLUDING ALTERNATIVE SCHOOLS),49889.0,34283.0,68.7,15424.0,30.9,16854.0,...,45.3,42.0,48.8,662.0,455.0,68.7,203.0,30.7,222.0,33.5


In [52]:
not_state_total = 'organization_name != "STATE TOTAL"'
df_grad_rate = (df_grad_rate_raw
    .query(not_state_total)
    .rename_col('organization_code', 'org_code')
    .move_col('county_name', 2)
    .rename_col('county_name', 'county')
)
df_mobility = (df_mobility_raw
    .query(not_state_total)
    .drop(columns=['school_year', 'category'])
)
df_mobility_demo = (df_mobility_demo_raw
    .query(not_state_total)
    .drop(columns=['school_year', 'category'])
)
head(df_grad_rate, df_mobility, df_mobility_demo)

38 cols x 183 rows


Unnamed: 0,org_code,organization_name,county,students_with_disabilities_final_grad_base,students_with_disabilities_graduates_total,students_with_disabilities_graduation_rate,students_with_disabilities_completers_total,students_with_disabilities_completion_rate,limited_english_proficient_final_grad_base,limited_english_proficient_graduates_total,...,homeless_final_grad_base,homeless_graduates_total,homeless_graduation_rate,homeless_completers_total,homeless_completion_rate,gifted_talented_final_grad_base,gifted_talented_graduates_total,gifted_talented_graduation_rate,gifted_talented_completers_total,gifted_talented_completion_rate
56,1040.0,ACADEMY 20,EL PASO,111.0,65.0,58.6,69.0,62.2,28.0,21.0,...,11.0,6.0,54.5,6.0,54.5,294.0,287.0,97.6,290.0,98.6
2,20.0,ADAMS 12 FIVE STAR SCHOOLS,ADAMS,250.0,118.0,47.2,127.0,50.8,379.0,257.0,...,106.0,62.0,58.5,65.0,61.3,227.0,201.0,88.5,208.0,91.6
3,30.0,ADAMS COUNTY 14,ADAMS,59.0,32.0,54.2,32.0,54.2,170.0,86.0,...,99.0,52.0,52.5,57.0,57.6,30.0,27.0,90.0,27.0,90.0


59 cols x 183 rows


Unnamed: 0,org_code,organization_name,county,total_pupil_count_all_students,total_stable_pupil_count_all_students,total_stability_rate_all_students,total_mobile_student_count_all_students,total_student_mobility_rate_all_students,total_instances_of_mobility_all_students,total_mobility_incidence_rate_all_students,...,homeless_student_mobility_rate,homeless_instances_of_mobility,homeless_mobility_incidence_rate,gifted_talented_pupil_count,gifted_talented_stable_student_count,gifted_talented_stability_rate,gifted_talented_mobile_student_count,gifted_talented_student_mobility_rate,gifted_talented_instances_of_mobility,gifted_talented_mobility_incidence_rate
57,1040.0,ACADEMY 20,EL PASO,25881.0,20007.0,77.3,5853.0,22.6,6077.0,23.5,...,25.0,7.0,25.0,2667.0,2405.0,90.2,262.0,9.8,264.0,9.9
2,20.0,ADAMS 12 FIVE STAR SCHOOLS,ADAMS,49889.0,34283.0,68.7,15424.0,30.9,16854.0,33.8,...,57.2,481.0,68.2,3590.0,3225.0,89.8,361.0,10.1,404.0,11.3
3,30.0,ADAMS COUNTY 14,ADAMS,8265.0,5510.0,66.7,3038.0,36.8,3397.0,41.1,...,49.7,529.0,59.7,377.0,317.0,84.1,75.0,19.9,89.0,23.6


73 cols x 183 rows


Unnamed: 0,org_code,organization_name,county,total_pupil_count,total_stable_student_count,total_stability_rate,total_mobile_student_count,total_student_mobility_rate,total_instances_of_mobility,total_mobility_incidence_rate,...,total_native_hawaiian_or_other_pacific_islander_student_mobility_rate,total_native_hawaiian_or_other_pacific_islander_instances_of_mobility,total_native_hawaiian_or_other_pacific_islander_mobility_incidence_rate,total_two_or_more_races_pupil_count,total_two_or_more_races_stable_student_count,total_two_or_more_races_stability_rate,total_two_or_more_races_mobile_student_count,total_two_or_more_races_student_mobility_rate,total_two_or_more_races_instances_of_mobility,total_two_or_more_races_mobility_incidence_rate
56,1040.0,ACADEMY 20,EL PASO,25881.0,20007.0,77.3,5853.0,22.6,6077.0,23.5,...,28.2,31.0,28.2,1340.0,960.0,71.6,380.0,28.4,396.0,29.6
2,20.0,ADAMS 12 FIVE STAR SCHOOLS,ADAMS,49889.0,34283.0,68.7,15424.0,30.9,16854.0,33.8,...,45.3,42.0,48.8,662.0,455.0,68.7,203.0,30.7,222.0,33.5
3,30.0,ADAMS COUNTY 14,ADAMS,8265.0,5510.0,66.7,3038.0,36.8,3397.0,41.1,...,0.0,0.0,0.0,55.0,28.0,50.9,26.0,47.3,28.0,50.9


### Standardize county names
> Just ONE of the imported datasets (grad rate) has a county column. Using teh district column present in all 3, let's merge the county column with the others.

In [36]:
df = df_grad_rate.copy()[['organization_name', 'county_name']]

org_name_to_county_name_map = {
    'CHARTER SCHOOL INSTITUTE': 'DENVER',
    'MOUNTAIN BOCES': 'CHAFFEE',
    'CENTENNIAL BOCES': 'WELD',
    'SAN JUAN BOCES': 'LA PLATA',
    'EXPEDITIONARY BOCES': 'DENVER',
}
for org_name, new_county_name in org_name_to_county_name_map.items():
    df.loc[df.organization_name == org_name, 'county_name'] = new_county_name

county = df
head(county)

2 cols x 183 rows


Unnamed: 0,organization_name,county_name
1,MAPLETON 1,ADAMS
2,ADAMS 12 FIVE STAR SCHOOLS,ADAMS
3,ADAMS COUNTY 14,ADAMS


### Do the stuff
1. Merge county column
2. Refactor county and bring to front.
3. Drop unneeded columns
4. Drop duplicates (for some reason the 'STATE TOTAL' row is duplicated 2-4 times on some of the datasets. Weird.)
5. Repeat for the other two datasets

In [None]:
df_mobility = (county
    .merge(df_mobility, how='left')
    .drop(columns=['category', 'school_year', 'org_code'])
    .drop_duplicates()
    .move_col('county_name', 0)
    .rename(columns={'organization_name': 'dist', 'county_name':'county'})
)
# df.to_csv(work_path('dist_mobility_rate'), index=False)
head(df_mobility)

In [None]:
df_mobility_demo = (county
    .merge(df_mobility_demo, how='left')
    .drop(columns=['category', 'school_year', 'org_code'])
    .drop_duplicates()
    .move_col('county_name', 0)
    .rename(columns={'organization_name': 'dist', 'county_name':'county'})
)
# df.to_csv(work_path('dist_mobility_rate_demographics'), index=False)
head(df_mobility_demo)

In [None]:
df_grad_rate = df_grad_rate.drop(columns='county_name')
df_grad_rate = (county
    .merge(df_grad_rate, how='left')
    .drop(columns=['organization_code'])
    .drop_duplicates()
    .move_col('county_name', 0)
    .rename(columns={'organization_name': 'dist', 'county_name':'county'})
)
# df.to_csv(work_path('dist_grad_rate'), index=False)
head(df_grad_rate)

In [1]:
import pandas as pd, numpy as np
from geo_df import GeoDF
import df_util
from df_util import head, separate_by
input_path = lambda name: f'../input-data/{name}.csv'
work_path = lambda name: f'../working-data/{name}.csv'

# These 3 datasets have each been cleaned already, and had their county names standardized so they can be joined
grad_raw = pd.read_csv(work_path('dist_grad_rate'))
mob_raw = pd.read_csv(work_path('dist_mobility_rate'))
mob_dem_raw = pd.read_csv(work_path('dist_mobility_rate_demographics'))
head(grad_raw, mob_raw, mob_dem_raw)

37 cols x 184 rows


Unnamed: 0,county,dist,students_with_disabilities_final_grad_base,students_with_disabilities_graduates_total,students_with_disabilities_graduation_rate,students_with_disabilities_completers_total,students_with_disabilities_completion_rate,limited_english_proficient_final_grad_base,limited_english_proficient_graduates_total,limited_english_proficient_graduation_rate,...,homeless_final_grad_base,homeless_graduates_total,homeless_graduation_rate,homeless_completers_total,homeless_completion_rate,gifted_talented_final_grad_base,gifted_talented_graduates_total,gifted_talented_graduation_rate,gifted_talented_completers_total,gifted_talented_completion_rate
0,STATE TOTAL,STATE TOTAL,5775.0,3099.0,53.7,3222.0,55.8,6171.0,3289.0,53.3,...,2394.0,1175.0,49.1,1262.0,52.7,6604.0,6048.0,91.6,6156.0,93.2
1,ADAMS,MAPLETON 1,49.0,18.0,36.7,19.0,38.8,219.0,73.0,33.3,...,41.0,12.0,29.3,16.0,39.0,44.0,27.0,61.4,27.0,61.4
2,ADAMS,ADAMS 12 FIVE STAR SCHOOLS,250.0,118.0,47.2,127.0,50.8,379.0,257.0,67.8,...,106.0,62.0,58.5,65.0,61.3,227.0,201.0,88.5,208.0,91.6


58 cols x 184 rows


Unnamed: 0,county,dist,total_pupil_count_all_students,total_stable_pupil_count_all_students,total_stability_rate_all_students,total_mobile_student_count_all_students,total_student_mobility_rate_all_students,total_instances_of_mobility_all_students,total_mobility_incidence_rate_all_students,students_with_disabilities_pupil_count,...,homeless_student_mobility_rate,homeless_instances_of_mobility,homeless_mobility_incidence_rate,gifted_talented_pupil_count,gifted_talented_stable_student_count,gifted_talented_stability_rate,gifted_talented_mobile_student_count,gifted_talented_student_mobility_rate,gifted_talented_instances_of_mobility,gifted_talented_mobility_incidence_rate
0,STATE TOTAL,STATE TOTAL,939283.0,705064.0,75.1,231706.0,24.7,253577.0,27.0,84121.0,...,45.3,11558.0,54.2,73344.0,66620.0,90.8,6641.0,9.1,7366.0,10.0
1,ADAMS,MAPLETON 1,9037.0,5077.0,56.2,3919.0,43.4,4133.0,45.7,735.0,...,32.7,79.0,36.9,250.0,205.0,82.0,44.0,17.6,47.0,18.8
2,ADAMS,ADAMS 12 FIVE STAR SCHOOLS,49889.0,34283.0,68.7,15424.0,30.9,16854.0,33.8,4339.0,...,57.2,481.0,68.2,3590.0,3225.0,89.8,361.0,10.1,404.0,11.3


72 cols x 184 rows


Unnamed: 0,county,dist,total_pupil_count,total_stable_student_count,total_stability_rate,total_mobile_student_count,total_student_mobility_rate,total_instances_of_mobility,total_mobility_incidence_rate,total_female_pupil_count,...,total_native_hawaiian_or_other_pacific_islander_student_mobility_rate,total_native_hawaiian_or_other_pacific_islander_instances_of_mobility,total_native_hawaiian_or_other_pacific_islander_mobility_incidence_rate,total_two_or_more_races_pupil_count,total_two_or_more_races_stable_student_count,total_two_or_more_races_stability_rate,total_two_or_more_races_mobile_student_count,total_two_or_more_races_student_mobility_rate,total_two_or_more_races_instances_of_mobility,total_two_or_more_races_mobility_incidence_rate
0,STATE TOTAL,STATE TOTAL,939283.0,705064.0,75.1,231706.0,24.7,253577.0,27.0,458512.0,...,34.8,840.0,38.0,29329.0,21501.0,73.3,7718.0,26.3,8433.0,28.8
1,ADAMS,MAPLETON 1,9037.0,5077.0,56.2,3919.0,43.4,4133.0,45.7,4450.0,...,70.8,17.0,70.8,219.0,129.0,58.9,90.0,41.1,91.0,41.6
2,ADAMS,ADAMS 12 FIVE STAR SCHOOLS,49889.0,34283.0,68.7,15424.0,30.9,16854.0,33.8,24340.0,...,45.3,42.0,48.8,662.0,455.0,68.7,203.0,30.7,222.0,33.5


### Merge and group by county

In [2]:
# Remove the columns duplicated across mobility demographics and mobility datasets
mob_dem = mob_dem_raw.drop(columns=[
    'total_pupil_count', 'total_stable_student_count', 'total_stability_rate', 'total_mobile_student_count',
    'total_student_mobility_rate', 'total_instances_of_mobility', 'total_mobility_incidence_rate'])

# Combine the two mobility datasets
mob = mob_raw.merge(mob_dem, on=['county', 'dist'])

# Combined the mobility and graduate data into one df
df_raw_dist = mob.merge(grad_raw, on=['county', 'dist'])
df_raw_dist = df_raw_dist[df_raw_dist.county != 'STATE TOTAL']

head(df_raw_dist)

156 cols x 183 rows


Unnamed: 0,county,dist,total_pupil_count_all_students,total_stable_pupil_count_all_students,total_stability_rate_all_students,total_mobile_student_count_all_students,total_student_mobility_rate_all_students,total_instances_of_mobility_all_students,total_mobility_incidence_rate_all_students,students_with_disabilities_pupil_count,...,homeless_final_grad_base,homeless_graduates_total,homeless_graduation_rate,homeless_completers_total,homeless_completion_rate,gifted_talented_final_grad_base,gifted_talented_graduates_total,gifted_talented_graduation_rate,gifted_talented_completers_total,gifted_talented_completion_rate
1,ADAMS,MAPLETON 1,9037.0,5077.0,56.2,3919.0,43.4,4133.0,45.7,735.0,...,41.0,12.0,29.3,16.0,39.0,44.0,27.0,61.4,27.0,61.4
2,ADAMS,ADAMS 12 FIVE STAR SCHOOLS,49889.0,34283.0,68.7,15424.0,30.9,16854.0,33.8,4339.0,...,106.0,62.0,58.5,65.0,61.3,227.0,201.0,88.5,208.0,91.6
3,ADAMS,ADAMS COUNTY 14,8265.0,5510.0,66.7,3038.0,36.8,3397.0,41.1,876.0,...,99.0,52.0,52.5,57.0,57.6,30.0,27.0,90.0,27.0,90.0


## Column Name Manipulation
---

In [3]:
df = df_raw_dist.copy()

### Remove all rates. They got messed up when we aggregated by county

In [4]:
df = separate_by(df, "rate", mode='exclude')

#### Remove native american and native hawaiian because the group sizes are very small and values are 0 for a lot of counties. Remove "two_or_more_races" because it's inconsistent, and difficult to compare groups

In [5]:
df = separate_by(df, "american_indian", mode='exclude')
df = separate_by(df, "native_hawaiian", mode='exclude')
df = separate_by(df, "two_or_more", mode='exclude')

### Standardize group names, then shorten group names
- Graduation data has `limited_english_proficient` and `econ_disadvant` 
- Mobility data `english_language_learners` and `economically_disadvantaged`

**Standardize these to `limited_english` and `econ_disadvant`, and shorten the others**

In [6]:
df = df.col_replace({
    # Mobility/Stability groups
    "limited_english_proficient": "limited_eng",
    "english_language_learners": "limited_eng",
    "economically_disadvantaged": "poor",
    "econ_disadvant": "poor",
    "students_with_disabilities": "disabled",
    "gifted_talented": "gifted",
    # Demographics
    "black_or_african_american": "black",
    "hispanic_or_latino": "hispanic",
    # Graduation data
    "final_grad_base": "grad_base_total",
    "graduates_total": "graduated",
    "completers_total": "completed",
    # Mobility/Stability data
    "instances_of_mobility": "mobile_instances",
    "pupil_count": "pupil_total",
    "_student_count": "",
    # Variable totals
    "_all_students": "",
    "total_": "",
}
).rename(columns={'stable_pupil_total': 'stable'})

### Rename more stuff for readability/consistency

In [7]:
df_dist_counts = df.copy()
df_county_counts = (df
    .copy()
    .groupby(['county'])
    .sum(numeric_only=True)
    .reset_index()
)
head(df_dist_counts, df_county_counts)

79 cols x 183 rows


Unnamed: 0,county,dist,pupil_total,stable,mobile,mobile_instances,disabled_pupil_total,disabled_stable,disabled_mobile,disabled_mobile_instances,...,migrant_completed,title_1_grad_base_total,title_1_graduated,title_1_completed,homeless_grad_base_total,homeless_graduated,homeless_completed,gifted_grad_base_total,gifted_graduated,gifted_completed
1,ADAMS,MAPLETON 1,9037.0,5077.0,3919.0,4133.0,735.0,469.0,261.0,279.0,...,5.0,218.0,118.0,124.0,41.0,12.0,16.0,44.0,27.0,27.0
2,ADAMS,ADAMS 12 FIVE STAR SCHOOLS,49889.0,34283.0,15424.0,16854.0,4339.0,3001.0,1325.0,1501.0,...,12.0,224.0,80.0,98.0,106.0,62.0,65.0,227.0,201.0,208.0
3,ADAMS,ADAMS COUNTY 14,8265.0,5510.0,3038.0,3397.0,876.0,636.0,266.0,311.0,...,4.0,419.0,296.0,301.0,99.0,52.0,57.0,30.0,27.0,27.0


78 cols x 63 rows


Unnamed: 0,county,pupil_total,stable,mobile,mobile_instances,disabled_pupil_total,disabled_stable,disabled_mobile,disabled_mobile_instances,limited_eng_pupil_total,...,migrant_completed,title_1_grad_base_total,title_1_graduated,title_1_completed,homeless_grad_base_total,homeless_graduated,homeless_completed,gifted_grad_base_total,gifted_graduated,gifted_completed
0,ADAMS,98546.0,67272.0,31222.0,33925.0,8848.0,6263.0,2588.0,2896.0,20773.0,...,33.0,935.0,529.0,559.0,360.0,190.0,204.0,402.0,337.0,345.0
1,ALAMOSA,2775.0,1882.0,885.0,950.0,223.0,159.0,63.0,66.0,368.0,...,4.0,28.0,22.0,23.0,6.0,6.0,6.0,0.0,0.0,0.0
2,ARAPAHOE,124639.0,94109.0,30134.0,32269.0,11842.0,9461.0,2354.0,2568.0,25370.0,...,9.0,488.0,202.0,213.0,243.0,96.0,102.0,909.0,820.0,828.0


## Calculate Rates
---

In [8]:
def get_rates(df, index):
    df = df.copy()
    df_rates = df.copy()[index]

    for c in ['stable', 'mobile', 'mobile_instances']:
        group_rate = (df[c] / df['pupil_total'] * 100).round(2).fillna(0)
        df_rates[f"{c}_rate"] = group_rate
        df[f"{c}_rate"] = group_rate

    # Calculate rates dynamically
    for group in [
            'disabled', 'limited_eng', 'poor', 'migrant', 'title_1', 'homeless', 'gifted',
            'male', 'female', 'white', 'black', 'hispanic', 'asian']:

        for c in [c for c in df.columns if group in c and "total" not in c]:
            var = c.replace(f"{group}_", '')

            if var in ['graduated', 'completed']:
                new = df[c] / df[f"{group}_grad_base_total"]
            else:
                new = df[c] / df[f"{group}_pupil_total"]
            
            new = (new * 100).round(2).fillna(0)
            df_rates[f"{c}_rate"] = new
            df[f"{c}_rate"] = new

    return df, df_rates

In [9]:
df_dist_all, df_dist_rates = get_rates(df_dist_counts, ['county', 'dist'])
df_county_all, df_county_rates = get_rates(df_county_counts, ['county'])

In [10]:
head(df_dist_all, df_dist_counts, df_dist_rates, df_county_all, df_county_counts, df_county_rates)

138 cols x 183 rows


Unnamed: 0,county,dist,pupil_total,stable,mobile,mobile_instances,disabled_pupil_total,disabled_stable,disabled_mobile,disabled_mobile_instances,...,white_mobile_instances_rate,black_stable_rate,black_mobile_rate,black_mobile_instances_rate,hispanic_stable_rate,hispanic_mobile_rate,hispanic_mobile_instances_rate,asian_stable_rate,asian_mobile_rate,asian_mobile_instances_rate
1,ADAMS,MAPLETON 1,9037.0,5077.0,3919.0,4133.0,735.0,469.0,261.0,279.0,...,52.08,48.04,51.4,53.63,60.31,39.19,42.19,47.22,52.78,53.7
2,ADAMS,ADAMS 12 FIVE STAR SCHOOLS,49889.0,34283.0,15424.0,16854.0,4339.0,3001.0,1325.0,1501.0,...,32.27,55.98,43.94,47.41,67.3,32.23,36.81,81.07,18.84,21.15
3,ADAMS,ADAMS COUNTY 14,8265.0,5510.0,3038.0,3397.0,876.0,636.0,266.0,311.0,...,47.45,48.2,51.8,53.6,68.67,35.2,39.57,80.0,20.0,20.0


79 cols x 183 rows


Unnamed: 0,county,dist,pupil_total,stable,mobile,mobile_instances,disabled_pupil_total,disabled_stable,disabled_mobile,disabled_mobile_instances,...,migrant_completed,title_1_grad_base_total,title_1_graduated,title_1_completed,homeless_grad_base_total,homeless_graduated,homeless_completed,gifted_grad_base_total,gifted_graduated,gifted_completed
1,ADAMS,MAPLETON 1,9037.0,5077.0,3919.0,4133.0,735.0,469.0,261.0,279.0,...,5.0,218.0,118.0,124.0,41.0,12.0,16.0,44.0,27.0,27.0
2,ADAMS,ADAMS 12 FIVE STAR SCHOOLS,49889.0,34283.0,15424.0,16854.0,4339.0,3001.0,1325.0,1501.0,...,12.0,224.0,80.0,98.0,106.0,62.0,65.0,227.0,201.0,208.0
3,ADAMS,ADAMS COUNTY 14,8265.0,5510.0,3038.0,3397.0,876.0,636.0,266.0,311.0,...,4.0,419.0,296.0,301.0,99.0,52.0,57.0,30.0,27.0,27.0


61 cols x 183 rows


Unnamed: 0,county,dist,stable_rate,mobile_rate,mobile_instances_rate,disabled_stable_rate,disabled_mobile_rate,disabled_mobile_instances_rate,disabled_graduated_rate,disabled_completed_rate,...,white_mobile_instances_rate,black_stable_rate,black_mobile_rate,black_mobile_instances_rate,hispanic_stable_rate,hispanic_mobile_rate,hispanic_mobile_instances_rate,asian_stable_rate,asian_mobile_rate,asian_mobile_instances_rate
1,ADAMS,MAPLETON 1,56.18,43.37,45.73,63.81,35.51,37.96,36.73,38.78,...,52.08,48.04,51.4,53.63,60.31,39.19,42.19,47.22,52.78,53.7
2,ADAMS,ADAMS 12 FIVE STAR SCHOOLS,68.72,30.92,33.78,69.16,30.54,34.59,47.2,50.8,...,32.27,55.98,43.94,47.41,67.3,32.23,36.81,81.07,18.84,21.15
3,ADAMS,ADAMS COUNTY 14,66.67,36.76,41.1,72.6,30.37,35.5,54.24,54.24,...,47.45,48.2,51.8,53.6,68.67,35.2,39.57,80.0,20.0,20.0


137 cols x 63 rows


Unnamed: 0,county,pupil_total,stable,mobile,mobile_instances,disabled_pupil_total,disabled_stable,disabled_mobile,disabled_mobile_instances,limited_eng_pupil_total,...,white_mobile_instances_rate,black_stable_rate,black_mobile_rate,black_mobile_instances_rate,hispanic_stable_rate,hispanic_mobile_rate,hispanic_mobile_instances_rate,asian_stable_rate,asian_mobile_rate,asian_mobile_instances_rate
0,ADAMS,98546.0,67272.0,31222.0,33925.0,8848.0,6263.0,2588.0,2896.0,20773.0,...,32.45,54.66,45.08,47.91,67.49,32.71,36.4,78.55,21.37,23.54
1,ALAMOSA,2775.0,1882.0,885.0,950.0,223.0,159.0,63.0,66.0,368.0,...,35.54,57.14,42.86,42.86,70.15,29.47,32.69,64.0,36.0,36.0
2,ARAPAHOE,124639.0,94109.0,30134.0,32269.0,11842.0,9461.0,2354.0,2568.0,25370.0,...,21.16,67.47,32.03,34.57,72.76,26.75,29.2,78.59,21.3,22.56


78 cols x 63 rows


Unnamed: 0,county,pupil_total,stable,mobile,mobile_instances,disabled_pupil_total,disabled_stable,disabled_mobile,disabled_mobile_instances,limited_eng_pupil_total,...,migrant_completed,title_1_grad_base_total,title_1_graduated,title_1_completed,homeless_grad_base_total,homeless_graduated,homeless_completed,gifted_grad_base_total,gifted_graduated,gifted_completed
0,ADAMS,98546.0,67272.0,31222.0,33925.0,8848.0,6263.0,2588.0,2896.0,20773.0,...,33.0,935.0,529.0,559.0,360.0,190.0,204.0,402.0,337.0,345.0
1,ALAMOSA,2775.0,1882.0,885.0,950.0,223.0,159.0,63.0,66.0,368.0,...,4.0,28.0,22.0,23.0,6.0,6.0,6.0,0.0,0.0,0.0
2,ARAPAHOE,124639.0,94109.0,30134.0,32269.0,11842.0,9461.0,2354.0,2568.0,25370.0,...,9.0,488.0,202.0,213.0,243.0,96.0,102.0,909.0,820.0,828.0


60 cols x 63 rows


Unnamed: 0,county,stable_rate,mobile_rate,mobile_instances_rate,disabled_stable_rate,disabled_mobile_rate,disabled_mobile_instances_rate,disabled_graduated_rate,disabled_completed_rate,limited_eng_stable_rate,...,white_mobile_instances_rate,black_stable_rate,black_mobile_rate,black_mobile_instances_rate,hispanic_stable_rate,hispanic_mobile_rate,hispanic_mobile_instances_rate,asian_stable_rate,asian_mobile_rate,asian_mobile_instances_rate
0,ADAMS,68.26,31.68,34.43,70.78,29.25,32.73,47.54,50.1,69.99,...,32.45,54.66,45.08,47.91,67.49,32.71,36.4,78.55,21.37,23.54
1,ALAMOSA,67.82,31.89,34.23,71.3,28.25,29.6,86.67,93.33,72.01,...,35.54,57.14,42.86,42.86,70.15,29.47,32.69,64.0,36.0,36.0
2,ARAPAHOE,75.51,24.18,25.89,79.89,19.88,21.69,51.26,52.06,74.94,...,21.16,67.47,32.03,34.57,72.76,26.75,29.2,78.59,21.3,22.56


## Save
---

In [11]:
df_dist_all.to_csv(work_path('education_dist'), index=False)
df_dist_counts.to_csv(work_path('education_dist_counts'), index=False)
df_dist_rates.to_csv(work_path('education_dist_rates'), index=False)

df_county_all.to_csv(work_path('education_county'), index=False)
df_county_counts.to_csv(work_path('education_county_counts'), index=False)
df_county_rates.to_csv(work_path('education_county_rates'), index=False)