# Team 82 Data Cleanup

In [1]:
import os
import pandas as pd
import numpy as np

## Census Data

### Current Spending of Public Elementary-Secondary School Systems by State_2012-2018
Survey Component: Annual Survey of School System Finance
<br />
Type of Government (GOVTYPE_LABEL): State and Local

#### Cleanup Methodology
* Removed the following columns:
    * The `Survey Component (SVY_COMP_LABEL)` column because it contains the same value, `Annual Survey of School System Finance` for all rows.
    * The `Aggregate Description (AGG_DESC)` column because it's values are not human readable and represent the same data as in the `Meaning of Aggregate Description (AGG_DESC_LABEL)` column
    * The `Type of Government (GOVTYPE_LABEL)` column because it contains the same value `State and Local` for all rows.
* Renamed the following columns:
    * `Year (YEAR)`: removed the parenthesis  
    * `Geographic Area Name (NAME)`: removed the parenthesis and rename to "State"
    * `Amount Formatted (AMOUNT_FORMATTED)`: removed the parenthesis and rename to "Spending"
    * `Meaning of Aggregate Description (AGG_DESC_LABEL)`: rename to "Description"

* Simplified the values of the `Description` column by renaming the value `Elementary-secondary education school system total current expenditures` to "total" and removing the text "Elementary-secondary education school system current expenditures"
* Updated `Revenue` column datatype to int64

### Per Pupil Amounts for Current Spending of Public Elementary-Secondary School Systems-US and State-2012 - 2018
Survey Component: Annual Survey of School System Finance
<br />
Type of Government (GOVTYPE_LABEL): State and Local

#### Cleanup Methodology
* Removed the following columns
    * `Survey Component (SVY_COMP_LABEL)` because it contains the same value, `Annual Survey of School System Finance` for all rows
    * The `Aggregate Description (AGG_DESC)` column because it's values are not human readable and represent the same data as in the `Meaning of Aggregate Description (AGG_DESC_LABEL)` column
    * The `Type of Government (GOVTYPE_LABEL)` column because it contains the same value `State and Local` for all rows.
* Renamed the following columns:
    * `Geographic Area Name (NAME)`: removed the parenthesis and rename to "State"
    * `Year (YEAR)`: removed the parenthesis  
    * `Meaning of Aggregate Description (AGG_DESC_LABEL)`: rename to "Description"
    * `Amount Formatted (AMOUNT_FORMATTED)`: removed the parenthesis and rename to "Spending"

* Simplified the values of the `Description` column by renaming the value `Elementary-secondary education school system total current expenditures` to "total" and removing the text "Elementary-secondary education school system current expenditures"
* Updated `Revenue` column datatype to int64

### Percentage Distribution of Public Elementary-Secondary School System Revenue by Source-US and State-2012 - 2018
Survey Component: Annual Survey of School System Finance
<br />
Type of Government (GOVTYPE_LABEL): State and Local

#### Cleanup Methodology
* Removed the following columns
    * `Survey Component (SVY_COMP_LABEL)` because it contains the same value, `Annual Survey of School System Finance` for all rows
    * The `Aggregate Description (AGG_DESC)` column because it's values are not human readable and represent the same data as in the `Meaning of Aggregate Description (AGG_DESC_LABEL)` column
    * The `Type of Government (GOVTYPE_LABEL)` column because it contains the same value `State and Local` for all rows.
* Renamed the following columns:
    * `Geographic Area Name (NAME)`: removed the parenthesis and rename to "State"
    * `Year (YEAR)`: removed the parenthesis  
    * `Meaning of Aggregate Description (AGG_DESC_LABEL)`: rename to "Description"
    * `Amount Formatted (AMOUNT_FORMATTED)`: removed the parenthesis and rename to "Percentage"
* Simplified the values of the `Description` column by renaming the value `Elementary-secondary education school system total current expenditures` to "total" and removing the text "Elementary-secondary education school system current expenditures"
* Replaced the percentage values for DC where the value was "X" with zero
* Updated `Percentage` datatype to float64

### Revenue from Federal Sources for Public Elementary-Secondary School Systems-US and States-2012 - 2018
Survey Component: Annual Survey of School System Finance
<br />
Type of Government (GOVTYPE_LABEL): State and Local

#### Cleanup Methodology
* Removed the following columns
    * `Survey Component (SVY_COMP_LABEL)` because it contains the same value, `Annual Survey of School System Finance` for all rows
    * The `Aggregate Description (AGG_DESC)` column because it's values are not human readable and represent the same data as in the `Meaning of Aggregate Description (AGG_DESC_LABEL)` column
    * The `Type of Government (GOVTYPE_LABEL)` column because it contains the same value `State and Local` for all rows.
* Renamed the following columns:
    * `Geographic Area Name (NAME)`: removed the parenthesis and rename to "State"
    * `Year (YEAR)`: removed the parenthesis  
    * `Meaning of Aggregate Description (AGG_DESC_LABEL)`: rename to "Description"
    * `Amount Formatted (AMOUNT_FORMATTED)`: removed the parenthesis and rename to "Revenue"

* Simplified the values of the `Description` column by renaming the value `Elementary-secondary education school system total current expenditures` to "total" and removing the text "Elementary-secondary education school system current expenditures"
* Replaced "N" values in the `Revenue` column with zero
* Updated `Revenue` column datatype to int64

### Revenue from State Sources for Public Elementary-Secondary School Systems-US and State-
2012 - 2018
Survey Component: Annual Survey of School System Finance
<br />
Type of Government (GOVTYPE_LABEL): State and Local

#### Cleanup Methodology
* Removed the following columns
    * `Survey Component (SVY_COMP_LABEL)` because it contains the same value, `Annual Survey of School System Finance` for all rows
    * The `Aggregate Description (AGG_DESC)` column because it's values are not human readable and represent the same data as in the `Meaning of Aggregate Description (AGG_DESC_LABEL)` column
    * The `Type of Government (GOVTYPE_LABEL)` column because it contains the same value `State and Local` for all rows.
* Removed D.C. from dataset because it doesn't receive any state funding
* Renamed the following columns:
    * `Geographic Area Name (NAME)`: removed the parenthesis and rename to "State"
    * `Year (YEAR)`: removed the parenthesis  
    * `Meaning of Aggregate Description (AGG_DESC_LABEL)`: rename to "Description"
    * `Amount Formatted (AMOUNT_FORMATTED)`: removed the parenthesis and rename to "Revenue"

* Simplified the values of the `Description` column by renaming the value `Elementary-secondary education school system total current expenditures` to "total" and removing the text "Elementary-secondary education school system current expenditures"
* Updated `Revenue` column datatype to int64

### Revenue from Local Sources for Public Elementary-Secondary School Systems-US and State-2012 - 2018
Survey Component: Annual Survey of School System Finance
<br />
Type of Government (GOVTYPE_LABEL): State and Local

#### Cleanup Methodology
* Removed the following columns
    * `Survey Component (SVY_COMP_LABEL)` because it contains the same value, `Annual Survey of School System Finance` for all rows
    * The `Aggregate Description (AGG_DESC)` column because it's values are not human readable and represent the same data as in the `Meaning of Aggregate Description (AGG_DESC_LABEL)` column
    * The `Type of Government (GOVTYPE_LABEL)` column because it contains the same value `State and Local` for all rows.

* Renamed the following columns:
    * `Geographic Area Name (NAME)`: removed the parenthesis and rename to "State"
    * `Year (YEAR)`: removed the parenthesis  
    * `Meaning of Aggregate Description (AGG_DESC_LABEL)`: rename to "Description"
    * `Amount Formatted (AMOUNT_FORMATTED)`: removed the parenthesis and rename to "Revenue"

* Simplified the values of the `Description` column by renaming the value `Elementary-secondary education school system total current expenditures` to "total" and removing the text "Elementary-secondary education school system current expenditures"
* Replaced "X" values in the `Revenue` column with zero
* Updated `Revenue` column datatype to int64

### Summary of Public Elementary-Secondary School System Finances-US and States-2012-2018
Survey Component: Annual Survey of School System Finance
<br />
Type of Government (GOVTYPE_LABEL): State and Local

#### Cleanup Methodology
* Removed the following columns
    * `Survey Component (SVY_COMP_LABEL)` because it contains the same value, `Annual Survey of School System Finance` for all rows
    * The `Aggregate Description (AGG_DESC)` column because it's values are not human readable and represent the same data as in the `Meaning of Aggregate Description (AGG_DESC_LABEL)` column
    * The `Type of Government (GOVTYPE_LABEL)` column because it contains the same value `State and Local` for all rows.
* Renamed the following columns:
    * `Geographic Area Name (NAME)`: removed the parenthesis and rename to "State"
    * `Year (YEAR)`: removed the parenthesis  
    * `Meaning of Aggregate Description (AGG_DESC_LABEL)`: rename to "Description"
    * `Amount Formatted (AMOUNT_FORMATTED)`: removed the parenthesis and rename to "Revenue"

* Simplified the values of the `Description` column by renaming the value `Elementary-secondary education school system total current expenditures` to "total" and removing the text "Elementary-secondary education school system current expenditures"
* Replaced "X" values in the `Revenue` column with zero
* Updated `Revenue` column datatype to int64

# Team 82 Data Cleanup - NCES Parent Demographic Data

## NCES Parent Demographic Data

### Education Attainment of Parent
Education Demographic and Geographic est`imates ( EDGE)  2014-2018
<br />
Geography: All Districts
Population Group:  Parents of Relevant Children
* PDP 02 Selected Socal Characteristics on Parents in the United States
    * `'PDP025 Attainment'`


#### Cleanup Methodology
* Removed the following columns:
    * Columns that end in `moe` - Margin of Error` because data will not be used for this project`
    * `'PDP02.5_29moe'`
    * `'PDP02.5_30pctmoe'`
    * `'PDP02.5_31pctmoe'`
    * `'PDP02.5_32pctmoe'`
    * `'PDP02.5_33pctmoe'`
    * `'PDP02.5_34pctmoe'`
    * `'PDP02.5_35pctmoe'`
    * `'PDP02.5_36pctmoe'`
    * `'PDP02.5_38pctmoe'`
    * `'PDP02.5_30moe'`
    * `'PDP02.5_31moe'`
    * `'PDP02.5_32moe'`
    * `'PDP02.5_33moe'`
    * `'PDP02.5_34moe'`
    * `'PDP02.5_35moe'`
    * `'PDP02.5_36moe'`
    * `'PDP02.5_37pctmoe'`
    
* Renamed the following columns to readable names and int64
    POP - population 25 years or older; pc = per cent  
    * `'PDP02.5_30est` to num_Educational_Attain_POP_LT9th
    * `'PDP02.5_31est` to num_Educational_Attain_POP_9th-12th
    * `'PDP02.5_32est` to num_Educational_Attain_POP_HS_GRAD
    * `'PDP02.5_33est` to num_Educational_Attain_POP_SomeColl
    * `'PDP02.5_34est` to num_Educational_Attain_POP_AssocDeg
    * `'PDP02.5_35est` to num_Educational_Attain_POP_BacDeg
    * `'PDP02.5_36est` to num_Educational_Attain_POP_GradProf
    * `'PDP02.5_37pct` to pct_Educational_Attain_HS_Grad_higher
    * `'PDP02.5_29est` to num_Educational_Attain_POP
    * `'PDP02.5_30pct` to pc_ Educational_Attain_POP_LT9th
    * `'PDP02.5_31pct` to pc_Educational_Attain_POP_9th-12th
    * `'PDP02.5_32pct` to pc_Educational_Attain_POP_HS_GRAD
    * `'PDP02.5_33pct` to pc_Educational_Attain_POP_SomeColl
    * `'PDP02.5_34pct` to pc_Educational_Attain_POP_AssocDeg
    * `'PDP02.5_35pct` to pc_Educational_Attain_POP_BacDeg
    * `'PDP02.5_36pct` to pc_Educational_Attain_POP_GradProf
    * `'PDP02.5_38pct` to pct_Educational_Attain_BS_Deg_higher
    



In [2]:
parent_social_by_district = pd.read_csv('./data_sets/EDGE_Export_122152216654_Social Demographics parents 2014_18/PDP02.5_202_USSchoolDistrictAll_12215227826.txt', sep='|')

In [3]:
parent_social_by_district

Unnamed: 0,GeoId,Geography,LEAID,Year,Iteration,PDP02.5_29est,PDP02.5_29moe,PDP02.5_30est,PDP02.5_30moe,PDP02.5_30pct,...,PDP02.5_36pct,PDP02.5_36pctmoe,PDP02.5_37est,PDP02.5_37moe,PDP02.5_37pct,PDP02.5_37pctmoe,PDP02.5_38est,PDP02.5_38moe,PDP02.5_38pct,PDP02.5_38pctmoe
0,97000US2700106,A.C.G.C. Public School District,2700106,2014-2018,202,905,79,15,14,1.7,...,4.4,1.3,855,78,94.5,2.6,180,33,19.9,3.2
1,97000US4500690,Abbeville County School District,4500690,2014-2018,202,2965,336,50,38,1.7,...,8.4,3.1,2740,326,92.4,3.2,715,165,24.1,4.8
2,97000US5500030,Abbotsford School District,5500030,2014-2018,202,670,113,90,39,13.4,...,0.6,0.5,545,104,81.3,6.0,80,25,11.9,3.7
3,97000US4807380,Abbott Independent School District,4807380,2014-2018,202,210,61,0,13,0.0,...,11.9,6.4,205,61,97.6,3.5,105,39,50.0,11.6
4,97000US5300030,Aberdeen School District,5300030,2014-2018,202,2895,321,330,161,11.4,...,9.2,3.2,2410,295,83.2,6.1,540,141,18.7,4.6
5,97000US2800360,Aberdeen School District,2800360,2014-2018,202,1260,218,105,74,8.3,...,3.2,3.3,1155,214,91.7,5.8,190,108,15.1,8.0
6,97000US1600030,Aberdeen School District 58,1600030,2014-2018,202,575,151,95,61,16.5,...,7.8,5.4,420,124,73.0,12.1,95,51,16.5,8.2
7,97000US4807410,Abernathy Independent School District,4807410,2014-2018,202,755,149,35,36,4.6,...,4.0,3.7,680,129,90.1,7.1,175,73,23.2,8.1
8,97000US4807440,Abilene Independent School District,4807440,2014-2018,202,13660,659,525,151,3.8,...,6.5,1.3,12175,689,89.1,1.9,2615,388,19.1,2.5
9,97000US2003180,Abilene Unified School District 435,2003180,2014-2018,202,1390,174,75,49,5.4,...,7.2,4.6,1270,168,91.4,3.9,410,94,29.5,6.7


In [4]:
#debug look at columns
#parent_social_by_district.columns

In [5]:
# dropping moe - margin of error
df =parent_social_by_district
columns_to_drop = ['PDP02.5_29moe','PDP02.5_30pctmoe','PDP02.5_31pctmoe','PDP02.5_32pctmoe','PDP02.5_33pctmoe','PDP02.5_34pctmoe',\
                   'PDP02.5_35pctmoe','PDP02.5_36pctmoe','PDP02.5_38pctmoe','PDP02.5_30moe','PDP02.5_31moe','PDP02.5_32moe',\
                   'PDP02.5_33moe','PDP02.5_34moe','PDP02.5_35moe','PDP02.5_36moe','PDP02.5_37moe','PDP02.5_37pctmoe','PDP02.5_38moe','PDP02.5_38pctmoe']
parent_social_by_district.drop(columns=columns_to_drop, inplace=True)

In [6]:
parent_social_by_district.columns

Index(['GeoId', 'Geography', 'LEAID', 'Year', 'Iteration', 'PDP02.5_29est',
       'PDP02.5_30est', 'PDP02.5_30pct', 'PDP02.5_31est', 'PDP02.5_31pct',
       'PDP02.5_32est', 'PDP02.5_32pct', 'PDP02.5_33est', 'PDP02.5_33pct',
       'PDP02.5_34est', 'PDP02.5_34pct', 'PDP02.5_35est', 'PDP02.5_35pct',
       'PDP02.5_36est', 'PDP02.5_36pct', 'PDP02.5_37est', 'PDP02.5_37pct',
       'PDP02.5_38est', 'PDP02.5_38pct'],
      dtype='object')

In [7]:
# renaming columns with names that are more useful
df2 =parent_social_by_district
columns_to_rename = {
    'PDP02.5_30est': 'num_Educational_Attain_POP_LT9th',
    'PDP02.5_31est': 'num_Educational_Attain_POP_9th-12th',
    'PDP02.5_32est': 'num_Educational_Attain_POP_HS_GRAD',
    'PDP02.5_33est': 'num_Educational_Attain_POP_SomeColl',
    'PDP02.5_34est': 'num_Educational_Attain_POP_AssocDeg',
    'PDP02.5_35est': 'num_Educational_Attain_POP_BacDeg',
    'PDP02.5_36est': 'num_Educational_Attain_POP_GradProf',
    'PDP02.5_37pct': 'pct_Educational_Attain_POP_HS_Grad_higher',
    'PDP02.5_29est': 'num_Educational_Attain_POP',
    'PDP02.5_30pct': 'pc_ Educational_Attain_POP_LT9th',
    'PDP02.5_31pct': 'pc_Educational_Attain_POP_9th-12th',
    'PDP02.5_32pct': 'pc_Educational_Attain_POP_HS_GRAD',
    'PDP02.5_33pct': 'pc_Educational_Attain_POP_SomeColl',
    'PDP02.5_34pct': 'pc_Educational_Attain_POP_AssocDeg',
    'PDP02.5_35pct': 'pc_Educational_Attain_POP_BacDeg',
    'PDP02.5_36pct': 'pc_Educational_Attain_POP_GradProf',
    'PDP02.5_38pct': 'pct_Educational_Attain_BS_Deg_higher'
}
parent_social_by_district.rename(columns=columns_to_rename, inplace=True)
parent_social_by_district

Unnamed: 0,GeoId,Geography,LEAID,Year,Iteration,num_Educational_Attain_POP,num_Educational_Attain_POP_LT9th,pc_ Educational_Attain_POP_LT9th,num_Educational_Attain_POP_9th-12th,pc_Educational_Attain_POP_9th-12th,...,num_Educational_Attain_POP_AssocDeg,pc_Educational_Attain_POP_AssocDeg,num_Educational_Attain_POP_BacDeg,pc_Educational_Attain_POP_BacDeg,num_Educational_Attain_POP_GradProf,pc_Educational_Attain_POP_GradProf,PDP02.5_37est,pct_Educational_Attain_POP_HS_Grad_higher,PDP02.5_38est,pct_Educational_Attain_BS_Deg_higher
0,97000US2700106,A.C.G.C. Public School District,2700106,2014-2018,202,905,15,1.7,35,3.9,...,175,19.3,140,15.5,40,4.4,855,94.5,180,19.9
1,97000US4500690,Abbeville County School District,4500690,2014-2018,202,2965,50,1.7,175,5.9,...,580,19.6,460,15.5,250,8.4,2740,92.4,715,24.1
2,97000US5500030,Abbotsford School District,5500030,2014-2018,202,670,90,13.4,30,4.5,...,65,9.7,75,11.2,4,0.6,545,81.3,80,11.9
3,97000US4807380,Abbott Independent School District,4807380,2014-2018,202,210,0,0.0,4,1.9,...,45,21.4,80,38.1,25,11.9,205,97.6,105,50.0
4,97000US5300030,Aberdeen School District,5300030,2014-2018,202,2895,330,11.4,155,5.4,...,255,8.8,275,9.5,265,9.2,2410,83.2,540,18.7
5,97000US2800360,Aberdeen School District,2800360,2014-2018,202,1260,105,8.3,4,0.3,...,160,12.7,150,11.9,40,3.2,1155,91.7,190,15.1
6,97000US1600030,Aberdeen School District 58,1600030,2014-2018,202,575,95,16.5,60,10.4,...,45,7.8,50,8.7,45,7.8,420,73.0,95,16.5
7,97000US4807410,Abernathy Independent School District,4807410,2014-2018,202,755,35,4.6,40,5.3,...,30,4.0,145,19.2,30,4.0,680,90.1,175,23.2
8,97000US4807440,Abilene Independent School District,4807440,2014-2018,202,13660,525,3.8,955,7.0,...,1480,10.8,1725,12.6,890,6.5,12175,89.1,2615,19.1
9,97000US2003180,Abilene Unified School District 435,2003180,2014-2018,202,1390,75,5.4,45,3.2,...,145,10.4,305,21.9,100,7.2,1270,91.4,410,29.5


### Update Population Type

In [8]:
# update Parent Population  type
df3 =parent_social_by_district
c = parent_social_by_district.columns[6:]
for j in c:
    parent_social_by_district[j] = parent_social_by_district[j].astype('int64')


## NCES Parent Economic Demographics Data

### Education Attainment of Parent
Education Demographic and Geographic est`imates ( EDGE)  2014-2018
<br />
Geography: All Districts
Population Group:  Parents of Relevant Children
* PDP 03 Selected Economic Characteristics on Parents in the United States
    * `* `'PDP3.8 Percentage of People Whose Income in past 12 Months us Below the Poverty Level'`


#### Cleanup Methodology
    
* Renamed the following columns to readable names and to data type int64.  pc data not available, set value to NaN

    * `'PDP03.8_72pct'` to `'pc_Below PovLvL_All_Ages'`
    * `'PDP03.8_73pct'` to `'pc_Below PovLvL_Age_gte_18'`	
    * `'PDP03.8_74pct'` to `'pc_Below PovLvL_Age_18_64'`
    * `'PDP03.8_75pct'` to `'pc_Below PovLvL_Age_gte_65'`
    * `'PDP03.8_72pctmoe'` to `'pcmoe_Below PovLvL_All_Ages'`
    * `'PDP03.8_73pctmoe'` to `'pcmoe_Below PovLvL_Age_gte_18'`	
    * `'PDP03.8_74pctmoe'` to `'pcmoe_Below PovLvL_Age_18_64'`
    * `'PDP03.8_75pctmoe'` to `'pcmoe_Below PovLvL_Age_gte_65'`
    

In [9]:
parent_econ_by_district = pd.read_csv("./data_sets/EDGE_Export_122153919246_Economic Demographics Parents 2014-2018/PDP03.8_202_USSchoolDistrictAll_122153917324.txt", sep='|')
parent_econ_by_district.columns
#parent_econ_by_district 

Index(['GeoId', 'Geography', 'LEAID', 'Year', 'Iteration', 'PDP03.8_72pct',
       'PDP03.8_72pctmoe', 'PDP03.8_73pct', 'PDP03.8_73pctmoe',
       'PDP03.8_74pct', 'PDP03.8_74pctmoe', 'PDP03.8_75pct',
       'PDP03.8_75pctmoe'],
      dtype='object')

In [10]:
#Rename Columns
df5 =parent_econ_by_district
columns_to_rename = {
     'PDP03.8_72pct' : 'pct_Below PovLvL_All_Ages',
     'PDP03.8_73pct' : 'pct_Below PovLvL_Age_gte_18',
     'PDP03.8_74pct' : 'pct_Below PovLvL_Age_18_64',
     'PDP03.8_75pct' : 'pct_Below PovLvL_Age_gte_65',
     'PDP03.8_72pctmoe' : 'pctmoe_Below PovLvL_All_Ages',
     'PDP03.8_73pctmoe' : 'pctmoe_Below PovLvL_Age_gte_18',
     'PDP03.8_74pctmoe' : 'pctmoe_Below PovLvL_Age_18_64',
     'PDP03.8_75pctmoe' : 'pctmoe_Below PovLvL_Age_gte_65'
}

parent_econ_by_district.rename(columns=columns_to_rename, inplace=True)
#debug 
parent_econ_by_district

Unnamed: 0,GeoId,Geography,LEAID,Year,Iteration,pct_Below PovLvL_All_Ages,pctmoe_Below PovLvL_All_Ages,pct_Below PovLvL_Age_gte_18,pctmoe_Below PovLvL_Age_gte_18,pct_Below PovLvL_Age_18_64,pctmoe_Below PovLvL_Age_18_64,pct_Below PovLvL_Age_gte_65,pctmoe_Below PovLvL_Age_gte_65
0,97000US2700106,A.C.G.C. Public School District,2700106,2014-2018,202,11.8,3.1,11.8,3.1,11.8,3.1,-,**
1,97000US4500690,Abbeville County School District,4500690,2014-2018,202,21.9,5.5,21.9,5.5,21.9,5.5,0.0,100.0
2,97000US5500030,Abbotsford School District,5500030,2014-2018,202,16.0,6.9,16.0,6.9,16.1,6.9,0.0,100.0
3,97000US4807380,Abbott Independent School District,4807380,2014-2018,202,0.0,17.3,0.0,17.3,0.0,17.3,-,**
4,97000US5300030,Aberdeen School District,5300030,2014-2018,202,18.3,5.1,18.3,5.1,18.4,5.2,0.0,98.9
5,97000US2800360,Aberdeen School District,2800360,2014-2018,202,14.9,7.5,14.9,7.5,15.1,7.6,0.0,88.4
6,97000US1600030,Aberdeen School District 58,1600030,2014-2018,202,16.0,12.5,16.0,12.5,16.0,12.6,0.0,100.0
7,97000US4807410,Abernathy Independent School District,4807410,2014-2018,202,6.9,6.9,6.9,6.9,6.9,6.9,-,**
8,97000US4807440,Abilene Independent School District,4807440,2014-2018,202,16.7,2.3,16.7,2.4,16.8,2.4,0.0,100.0
9,97000US2003180,Abilene Unified School District 435,2003180,2014-2018,202,2.3,2.2,2.3,2.2,2.3,2.2,0.0,100.0


### Update Population Type

In [11]:
# update Percent Poverty type  Force non numbers to NaN
df6 =parent_econ_by_district
c = parent_econ_by_district.columns[5:]

for j in c:
       parent_econ_by_district[j]= parent_econ_by_district[j].apply(pd.to_numeric,errors='coerce')
#debug
#parent_econ_by_district



##  Teacher wage information
Teacher wage data by district 2018-2019

In [12]:
file = "./data_sets/EDGE_ACS_CWIFT2018/EDGE_ACS_CWIFT2018_LEA1819.csv"
# received encoding error reading file.   Find encoding type
#import chardet
#with open(file, 'rb') as rawdata:
#    #result = chardet.detect(rawdata.read(100000))
#result

teacher_wage_by_district = pd.read_csv(file, encoding='ISO-8859-1')
#teacher_wage_by_district.columns

In [13]:
teacher_wage_by_district.drop(columns='Unnamed: 5', inplace=True)

In [14]:
df_c = parent_econ_by_district
df1 = df_c.merge(teacher_wage_by_district, how='left', on='LEAID')
df1




Unnamed: 0,GeoId,Geography,LEAID,Year,Iteration,pct_Below PovLvL_All_Ages,pctmoe_Below PovLvL_All_Ages,pct_Below PovLvL_Age_gte_18,pctmoe_Below PovLvL_Age_gte_18,pct_Below PovLvL_Age_18_64,pctmoe_Below PovLvL_Age_18_64,pct_Below PovLvL_Age_gte_65,pctmoe_Below PovLvL_Age_gte_65,LEA_NAME,ST_NAME,LEA_CWIFTEST,LEA_CWIFTSE
0,97000US2700106,A.C.G.C. Public School District,2700106,2014-2018,202,11.8,3.1,11.8,3.1,11.8,3.1,,,A.C.G.C. Public School District,Minnesota,0.814,0.034
1,97000US4500690,Abbeville County School District,4500690,2014-2018,202,21.9,5.5,21.9,5.5,21.9,5.5,0.0,100.0,Abbeville County School District,South Carolina,0.777,0.035
2,97000US5500030,Abbotsford School District,5500030,2014-2018,202,16.0,6.9,16.0,6.9,16.1,6.9,0.0,100.0,Abbotsford School District,Wisconsin,0.795,0.023
3,97000US4807380,Abbott Independent School District,4807380,2014-2018,202,0.0,17.3,0.0,17.3,0.0,17.3,,,Abbott Independent School District,Texas,0.823,0.039
4,97000US5300030,Aberdeen School District,5300030,2014-2018,202,18.3,5.1,18.3,5.1,18.4,5.2,0.0,98.9,Aberdeen School District,Washington,0.866,0.032
5,97000US2800360,Aberdeen School District,2800360,2014-2018,202,14.9,7.5,14.9,7.5,15.1,7.6,0.0,88.4,Aberdeen School District,Mississippi,0.767,0.038
6,97000US1600030,Aberdeen School District 58,1600030,2014-2018,202,16.0,12.5,16.0,12.5,16.0,12.6,0.0,100.0,Aberdeen School District 58,Idaho,0.764,0.039
7,97000US4807410,Abernathy Independent School District,4807410,2014-2018,202,6.9,6.9,6.9,6.9,6.9,6.9,,,Abernathy Independent School District,Texas,0.826,0.025
8,97000US4807440,Abilene Independent School District,4807440,2014-2018,202,16.7,2.3,16.7,2.4,16.8,2.4,0.0,100.0,Abilene Independent School District,Texas,0.825,0.023
9,97000US2003180,Abilene Unified School District 435,2003180,2014-2018,202,2.3,2.2,2.3,2.2,2.3,2.2,0.0,100.0,Abilene Unified School District 435,Kansas,0.712,0.043


# Assessments by School District
Math 2017-2018
ELA 2017-2018
See assessments-sy2017-18-public-file-documentation.docx, section 1.5.2 Academic Achievement Files for Table Layout

In [132]:
import glob



#Select grades  4 & 8  for following groups:
# All Students, Economically Disadvantaged, Foster Care, Homeless


selected_col_math1011 = ['STNAM', 'FIPST', 'LEAID', 'leanm10',
                        'ALL_MTH04numvalid_1011', 'ALL_MTH04pctprof_1011',
                        'ECD_MTH04numvalid_1011', 'ECD_MTH04pctprof_1011',
                        'HOM_MTH04numvalid_1011', 'HOM_MTH04pctprof_1011',
                        'ALL_MTH08numvalid_1011', 'ALL_MTH08pctprof_1011',
                        'ECD_MTH08numvalid_1011', 'ECD_MTH08pctprof_1011',
                        'HOM_MTH08numvalid_1011', 'HOM_MTH08pctprof_1011']
selected_col_math1112 = ['STNAM', 'FIPST', 'LEAID', 'leanm11',
                        'ALL_MTH04numvalid_1112', 'ALL_MTH04pctprof_1112',
                        'ECD_MTH04numvalid_1112', 'ECD_MTH04pctprof_1112',
                        'HOM_MTH04numvalid_1112', 'HOM_MTH04pctprof_1112',
                        'ALL_MTH08numvalid_1112', 'ALL_MTH08pctprof_1112',
                        'ECD_MTH08numvalid_1112', 'ECD_MTH08pctprof_1112',
                        'HOM_MTH08numvalid_1112', 'HOM_MTH08pctprof_1112']

selected_col_math1213 = ['STNAM', 'FIPST', 'LEAID', 'LEANM',
                        'ALL_MTH04NUMVALID_1213', 'ALL_MTH04PCTPROF_1213',
                        'ECD_MTH04NUMVALID_1213', 'ECD_MTH04PCTPROF_1213',
                        'HOM_MTH04NUMVALID_1213', 'HOM_MTH04PCTPROF_1213',
                        'ALL_MTH08NUMVALID_1213', 'ALL_MTH08PCTPROF_1213',
                        'ECD_MTH08NUMVALID_1213', 'ECD_MTH08PCTPROF_1213',
                        'HOM_MTH08NUMVALID_1213', 'HOM_MTH08PCTPROF_1213']

selected_col_math1314 = ['STNAM', 'FIPST', 'LEAID', 'LEANM',
                        'ALL_MTH04NUMVALID_1314', 'ALL_MTH04PCTPROF_1314',
                        'ECD_MTH04NUMVALID_1314', 'ECD_MTH04PCTPROF_1314',
                        'HOM_MTH04NUMVALID_1314', 'HOM_MTH04PCTPROF_1314',
                        'ALL_MTH08NUMVALID_1314', 'ALL_MTH08PCTPROF_1314',
                        'ECD_MTH08NUMVALID_1314', 'ECD_MTH08PCTPROF_1314',
                        'HOM_MTH08NUMVALID_1314', 'HOM_MTH08PCTPROF_1314']
selected_col_math1415 = ['STNAM', 'FIPST', 'LEAID',
                        'LEANM','DATE_CUR',
                        'ALL_MTH04NUMVALID_1415', 'ALL_MTH04PCTPROF_1415',
                        'ECD_MTH04NUMVALID_1415', 'ECD_MTH04PCTPROF_1415',
                        'HOM_MTH04NUMVALID_1415', 'HOM_MTH04PCTPROF_1415',
                        'ALL_MTH08NUMVALID_1415', 'ALL_MTH08PCTPROF_1415',
                        'ECD_MTH08NUMVALID_1415', 'ECD_MTH08PCTPROF_1415',
                        'HOM_MTH08NUMVALID_1415', 'HOM_MTH08PCTPROF_1415']
selected_col_math1516 = ['STNAM', 'FIPST', 'LEAID','LEANM','DATE_CUR',
                        'ALL_MTH04NUMVALID_1516', 'ALL_MTH04PCTPROF_1516',
                        'ECD_MTH04NUMVALID_1516', 'ECD_MTH04PCTPROF_1516',
                        'HOM_MTH04NUMVALID_1516', 'HOM_MTH04PCTPROF_1516',
                        'ALL_MTH08NUMVALID_1516', 'ALL_MTH08PCTPROF_1516',
                        'ECD_MTH08NUMVALID_1516', 'ECD_MTH08PCTPROF_1516',
                        'HOM_MTH08NUMVALID_1516', 'HOM_MTH08PCTPROF_1516']

selected_col_math1617 = ['STNAM', 'FIPST', 'LEAID',
                        'ST_LEAID', 'LEANM', 'DATE_CUR',
                        'ALL_MTH04NUMVALID_1617', 'ALL_MTH04PCTPROF_1617',
                        'ECD_MTH04NUMVALID_1617', 'ECD_MTH04PCTPROF_1617',
                        'HOM_MTH04NUMVALID_1617', 'HOM_MTH04PCTPROF_1617',
                        'ALL_MTH08NUMVALID_1617', 'ALL_MTH08PCTPROF_1617',
                        'ECD_MTH08NUMVALID_1617', 'ECD_MTH08PCTPROF_1617',
                        'HOM_MTH08NUMVALID_1617', 'HOM_MTH08PCTPROF_1617']

selected_col_math1718 = ['STNAM', 'FIPST', 'LEAID',
                        'ST_LEAID', 'LEANM', 'DATE_CUR',
                        'ALL_MTH04NUMVALID_1718', 'ALL_MTH04PCTPROF_1718',
                        'ECD_MTH04NUMVALID_1718', 'ECD_MTH04PCTPROF_1718',
                        'HOM_MTH04NUMVALID_1718', 'HOM_MTH04PCTPROF_1718',
                        'ALL_MTH08NUMVALID_1718', 'ALL_MTH08PCTPROF_1718',
                        'ECD_MTH08NUMVALID_1718', 'ECD_MTH08PCTPROF_1718',
                        'HOM_MTH08NUMVALID_1718', 'HOM_MTH08PCTPROF_1718']

math_columns = {1011:selected_col_math1011, 1112:selected_col_math1112,
                1213:selected_col_math1213, 1314:selected_col_math1314,
                1415:selected_col_math1415, 1516:selected_col_math1516,
                1617:selected_col_math1617, 1718:selected_col_math1718}

math_columns1618 = {1617:selected_col_math1617, 1718:selected_col_math1718}

dfs= ('df_1011','df_1112','df_1213','df_1314','df_1415','df_1516','df_1617','df_1718')
dfs1618= ('df_1617','df_1718')

privacy_symbols = ['GE','GT','LE','LT']

# description:  averages assesssment score ranges,
# input:  df column to be processed
# return: df column with  average of privacy scores
#         privacy symbols removed and  replaced with average of percentage.
#         minimum peecent passed  value = 1/

def process_scores(data):
    ret = 0
    arr1 =[]
    #print(type(data))

    if (type(data) != str ):
            ret =data
    elif "-" in data:
        # split into two 
        t = data.split("-")
        avg = round( (int(t[0]) + int(t[1]))/2)
        ret = avg
    elif any(substring in data for substring in privacy_symbols):
        ps= data[:2]
        num = int(data[2:])
        if (num == 1):
            avg = 1
        else:
          avg = round(num/2)
        ret=avg         
    else:
        ret = data
   
    return ret
                
def process_privacy_symbols(data):
    ret = 0
    



assessment_folder = "./data_sets/Assessments/"
files = glob.glob('./data_sets/Assessments/math*.csv')
#print(files)


# Create dictionary  for multiple math dataframes

list_of_math_dfs={}

for df, file in zip(dfs, files):
    #print("{} {}".format(file,df))
    list_of_math_dfs[df] = pd.read_csv(file)
    #print(list_of_math_dfs[df].shape)
    #print(list_of_math_dfs[df].dtypes)
    #print(list(list_of_math_dfs[df]))


# clean up Math Assessment Dataframes


for df, sel_col in zip(dfs, math_columns):
    # process PS ranges
    #print(math_columns[sel_col])
    # TODO this changes based on df
    cc = math_columns[sel_col][4:]
    if ((df ==  'df_1415' )| (df ==  'df_1516')):
        cc = math_columns[sel_col][5:]
    elif ((df ==  'df_1617') | (df ==  'df_1718')):
         cc = math_columns[sel_col][6:]
    for col in cc:
        #print(col)
        list_of_math_dfs[df][col] = list_of_math_dfs[df][col].map(lambda a: process_scores(a))
        list_of_math_dfs[df][col] = list_of_math_dfs[df][col].apply(pd.to_numeric, errors='coerce')

In [89]:
list_of_math_dfs['df_1415'][selected_col_math1415]

Unnamed: 0,STNAM,FIPST,LEAID,LEANM,DATE_CUR,ALL_MTH04NUMVALID_1415,ALL_MTH04PCTPROF_1415,ECD_MTH04NUMVALID_1415,ECD_MTH04PCTPROF_1415,HOM_MTH04NUMVALID_1415,HOM_MTH04PCTPROF_1415,ALL_MTH08NUMVALID_1415,ALL_MTH08PCTPROF_1415,ECD_MTH08NUMVALID_1415,ECD_MTH08PCTPROF_1415,HOM_MTH08NUMVALID_1415,HOM_MTH08PCTPROF_1415
0,ALABAMA,1,100005,Albertville City,13APR16,392.0,41.0,201.0,27.0,5.0,,339.0,19.0,156.0,12.0,17.0,10.0
1,ALABAMA,1,100006,Marshall County,13APR16,440.0,44.0,354.0,40.0,56.0,34.0,393.0,25.0,290.0,22.0,35.0,15.0
2,ALABAMA,1,100007,Hoover City,13APR16,1011.0,65.0,280.0,42.0,5.0,,1140.0,61.0,271.0,32.0,6.0,25.0
3,ALABAMA,1,100008,Madison City,13APR16,736.0,73.0,183.0,47.0,7.0,25.0,778.0,59.0,171.0,32.0,6.0,25.0
4,ALABAMA,1,100011,Leeds City,13APR16,146.0,37.0,75.0,22.0,3.0,,118.0,32.0,65.0,22.0,2.0,
5,ALABAMA,1,100012,Boaz City,13APR16,197.0,47.0,141.0,37.0,8.0,25.0,176.0,27.0,115.0,22.0,7.0,25.0
6,ALABAMA,1,100013,Trussville City,13APR16,333.0,74.0,34.0,64.0,,,345.0,46.0,38.0,24.0,,
7,ALABAMA,1,100030,Alexander City,13APR16,217.0,42.0,149.0,37.0,,,236.0,37.0,138.0,22.0,,
8,ALABAMA,1,100060,Andalusia City,13APR16,121.0,27.0,79.0,22.0,1.0,,124.0,27.0,67.0,12.0,1.0,
9,ALABAMA,1,100090,Anniston City,13APR16,156.0,27.0,116.0,22.0,24.0,30.0,136.0,12.0,103.0,8.0,9.0,25.0


In [146]:
selected_col_rla1011 = ['STNAM', 'FIPST', 'LEAID', 'leanm10',
                        'ALL_RLA04numvalid_1011', 'ALL_RLA04pctprof_1011',
                        'ECD_RLA04numvalid_1011', 'ECD_RLA04pctprof_1011',
                        'HOM_RLA04numvalid_1011', 'HOM_RLA04pctprof_1011',
                        'ALL_RLA08numvalid_1011', 'ALL_RLA08pctprof_1011',
                        'ECD_RLA08numvalid_1011', 'ECD_RLA08pctprof_1011',
                        'HOM_RLA08numvalid_1011', 'HOM_RLA08pctprof_1011']
selected_col_rla1112 = ['STNAM', 'FIPST', 'LEAID', 'leanm11'
                        'ALL_RLA04numvalid_1112', 'ALL_RLA04pctprof_1112',
                        'ECD_RLA04numvalid_1112', 'ECD_RLA04pctprof_1112',
                        'HOM_RLA04numvalid_1112', 'HOM_RLA04pctprof_1112',
                        'ALL_RLA08numvalid_1112', 'ALL_RLA08pctprof_1112',
                        'ECD_RLA08numvalid_1112', 'ECD_RLA08pctprof_1112',
                        'HOM_RLA08numvalid_1112', 'HOM_RLA08pctprof_1112']

selected_col_rla1213 = ['STNAM', 'FIPST', 'LEAID', 'LEANM',
                        'ALL_RLA04numvalid_1213', 'ALL_RLA04pctprof_1213',
                        'ECD_RLA04numvalid_1213', 'ECD_RLA04pctprof_1213',
                        'HOM_RLA04numvalid_1213', 'HOM_RLA04pctprof_1213',
                        'ALL_RLA08numvalid_1213', 'ALL_RLA08pctprof_1213',
                        'ECD_RLA08numvalid_1213', 'ECD_RLA08pctprof_1213',
                        'HOM_RLA08numvalid_1213', 'HOM_RLA08pctprof_1213']

selected_col_rla1314 = ['STNAM', 'FIPST', 'LEAID', 'LEANM',
                        'ALL_RLA04NUMVALID_1314', 'ALL_RLA04PCTPROF_1314',
                        'ECD_RLA04NUMVALID_1314', 'ECD_RLA04PCTPROF_1314',
                        'HOM_RLA04NUMVALID_1314', 'HOM_RLA04PCTPROF_1314',
                        'ALL_RLA08NUMVALID_1314', 'ALL_RLA08PCTPROF_1314',
                        'ECD_RLA08NUMVALID_1314', 'ECD_RLA08PCTPROF_1314',
                        'HOM_RLA08NUMVALID_1314', 'HOM_RLA08PCTPROF_1314']

selected_col_rla1415 = ['STNAM', 'FIPST', 'LEAID','LEANM','DATE_CUR',
                        'ALL_RLA04NUMVALID_1415', 'ALL_RLA04PCTPROF_1415',
                        'ECD_RLA04NUMVALID_1415', 'ECD_RLA04PCTPROF_1415',
                        'HOM_RLA04NUMVALID_1415', 'HOM_RLA04PCTPROF_1415',
                        'ALL_RLA08NUMVALID_1415', 'ALL_RLA08PCTPROF_1415',
                        'ECD_RLA08NUMVALID_1415', 'ECD_RLA08PCTPROF_1415',
                        'HOM_RLA08NUMVALID_1415', 'HOM_RLA08PCTPROF_1415']

selected_col_rla1516 = ['STNAM', 'FIPST', 'LEAID', 'LEANM', 'DATE_CUR',
                        'ALL_RLA04NUMVALID_1516', 'ALL_RLA04PCTPROF_1516',
                        'ECD_RLA04NUMVALID_1516', 'ECD_RLA04PCTPROF_1516',
                        'HOM_RLA04NUMVALID_1516', 'HOM_RLA04PCTPROF_1516',
                        'ALL_RLA08NUMVALID_1516', 'ALL_RLA08PCTPROF_1516',
                        'ECD_RLA08NUMVALID_1516', 'ECD_RLA08PCTPROF_1516',
                        'HOM_RLA08NUMVALID_1516', 'HOM_RLA08PCTPROF_1516']

selected_col_rla1617 = ['STNAM', 'FIPST', 'LEAID',
                        'ST_LEAID', 'LEANM', 'DATE_CUR',
                        'ALL_RLA04NUMVALID_1617', 'ALL_RLA04PCTPROF_1617',
                        'ECD_RLA04NUMVALID_1617', 'ECD_RLA04PCTPROF_1617',
                        'HOM_RLA04NUMVALID_1617', 'HOM_RLA04PCTPROF_1617',
                        'ALL_RLA08NUMVALID_1617', 'ALL_RLA08PCTPROF_1617',
                        'ECD_RLA08NUMVALID_1617', 'ECD_RLA08PCTPROF_1617',
                        'HOM_RLA08NUMVALID_1617', 'HOM_RLA08PCTPROF_1617']

selected_col_rla1718 = ['STNAM', 'FIPST', 'LEAID',
                        'ST_LEAID', 'LEANM', 'DATE_CUR',
                        'ALL_RLA04NUMVALID_1718', 'ALL_RLA04PCTPROF_1718',
                        'ECD_RLA04NUMVALID_1718', 'ECD_RLA04PCTPROF_1718',
                        'HOM_RLA04NUMVALID_1718', 'HOM_RLA04PCTPROF_1718',
                        'ALL_RLA08NUMVALID_1718', 'ALL_RLA08PCTPROF_1718',
                        'ECD_RLA08NUMVALID_1718', 'ECD_RLA08PCTPROF_1718',
                        'HOM_RLA08NUMVALID_1718', 'HOM_RLA08PCTPROF_1718']


rla_columns = {1011:selected_col_rla1011, 1112:selected_col_rla1112,
                1213:selected_col_rla1213, 1314:selected_col_rla1314,
                1415:selected_col_rla1415, 1516:selected_col_rla1516,
                1617:selected_col_rla1617, 1718:selected_col_rla1718}

rla_columns1618 = {1617:selected_col_rla1617, 1718:selected_col_rla1718}# clean up RLA Assessment Dataframes

#get lra score data files
assessment_folder = "./data_sets/Assessments/"
files = glob.glob('./data_sets/Assessments/rla*.csv')
#print(files)

# Create dictionary  for multiple lra dataframes
list_of_rla_dfs={}

for df, file in zip(dfs, files):
    #print("{} {}".format(file,df))
    list_of_rla_dfs[df] = pd.read_csv(file)


# clean up  Language Reading Arts Assessment Dataframes

for df, sel_col in zip(dfs, rla_columns):
    # process PS ranges
    #print(lra_columns[sel_col])
    cc = rla_columns[sel_col][4:]
    if ((df ==  'df_1415' )| (df ==  'df_1516')):
        cc = rla_columns[sel_col][5:]
    elif ((df ==  'df_1617') | (df ==  'df_1718')):
         cc = rla_columns[sel_col][6:]
    for col in cc:
        #print(col)
        list_of_rla_dfs[df][col] = list_of_rla_dfs[df][col].map(lambda a: process_scores(a))
        list_of_rla_dfs[df][col] = list_of_rla_dfs[df][col].apply(pd.to_numeric, errors='coerce')


In [147]:
list_of_rla_dfs['df_1213'].columns

Index(['STNAM', 'FIPST', 'LEAID', 'LEANM', 'ALL_RLA00numvalid_1213',
       'ALL_RLA00pctprof_1213', 'MAM_RLA00numvalid_1213',
       'MAM_RLA00pctprof_1213', 'MAS_RLA00numvalid_1213',
       'MAS_RLA00pctprof_1213',
       ...
       'MIG_RLA05numvalid_1213', 'MIG_RLA05pctprof_1213',
       'MIG_RLA06numvalid_1213', 'MIG_RLA06pctprof_1213',
       'MIG_RLA07numvalid_1213', 'MIG_RLA07pctprof_1213',
       'MIG_RLA08numvalid_1213', 'MIG_RLA08pctprof_1213',
       'MIG_RLAHSnumvalid_1213', 'MIG_RLAHSpctprof_1213'],
      dtype='object', length=228)

In [135]:
#Separate ou data into grades 4 & 8

a = dict(zip(dfs, math_columns))
for df in dfs:
    list_of_math_dfs[df] = list_of_math_dfs[df][math_columns[a[df]]]
    


list_of_math_dfs['df_1011'].to_csv (r'./data_sets/clean/math_scores_sy1011.csv',index=False, header=True)
list_of_math_dfs['df_1112'].to_csv (r'./data_sets/clean/math_scores_sy1112.csv',index=False, header=True)
list_of_math_dfs['df_1213'].to_csv (r'./data_sets/clean/math_scores_sy1213.csv',index=False, header=True)
list_of_math_dfs['df_1314'].to_csv (r'./data_sets/clean/math_scores_sy1314.csv',index=False, header=True)
list_of_math_dfs['df_1415'].to_csv (r'./data_sets/clean/math_scores_sy1415.csv',index=False, header=True)
list_of_math_dfs['df_1516'].to_csv (r'./data_sets/clean/math_scores_sy1516.csv',index=False, header=True)
list_of_math_dfs['df_1617'].to_csv (r'./data_sets/clean/math_scores_sy1617.csv',index=False, header=True)
list_of_math_dfs['df_1718'].to_csv (r'./data_sets/clean/math_scores_sy1618.csv',index=False, header=True)

a = dict(zip(dfs, rla_columns))
for df in dfs:
    list_of_rla_dfs[df] = list_of_rla_dfs[df][rla_columns[a[df]]]
    


list_of_rla_dfs['df_1011'].to_csv (r'./data_sets/clean/rla_scores_sy1011.csv',index=False, header=True)
list_of_rla_dfs['df_1112'].to_csv (r'./data_sets/clean/rla_scores_sy1112.csv',index=False, header=True)
list_of_rla_dfs['df_1213'].to_csv (r'./data_sets/clean/rla_scores_sy1213.csv',index=False, header=True)
list_of_rla_dfs['df_1314'].to_csv (r'./data_sets/clean/math_scores_sy1314.csv',index=False, header=True)
list_of_rla_dfs['df_1415'].to_csv (r'./data_sets/clean/math_scores_sy1415.csv',index=False, header=True)
list_of_rla_dfs['df_1516'].to_csv (r'./data_sets/clean/math_scores_sy1516.csv',index=False, header=True)
list_of_rla_dfs['df_1617'].to_csv (r'./data_sets/clean/math_scores_sy1617.csv',index=False, header=True)
list_of_rla_dfs['df_1718'].to_csv (r'./data_sets/clean/math_scores_sy1618.csv',index=False, header=True)

#    print (df)


#list_of_math_dfs['df_1516'].columns
#a = dict(zip(dfs, math_columns))
#math_columns[a['df_1415']]
#list_of_math_dfs['df_1415'][math_columns[a['df_1415']]]
#list_of_math_dfs['df_1415'] = list_of_math_dfs['df_1415'][math_columns[a['df_1415']]]
#list_of_math_dfs['df_1415']

#print (math_columns[1011])


#df_math_merge =  list_of_math_dfs[dfs[3]]
#print (selected_col_lra1011a )
#df_math_merge.columns
#df_math_merge= df_math_merge[selected_col_math1011a]
#df_math_merge
    

