# Academy of Py Report


## Analysis

* Overall the district reading passing scores is at 100%.
* Students from Charter schools demonstrate higher overall scores rates than students from District type schools.
* School size shows an impact on the overall scores and passing percentage. Where smaller schools reported 100% math passing rate. The larger the school, the lower the overall passing rate.
* Funding per student is not directly proportional to the average test scores. On avererage passing rate did not increase with high budget per students. 

#### Retrieve data from raw CSV files for schools and students. Create data frames for each

In [705]:
# Dependendencies
import pandas as pd
import os

In [706]:
# Raw Schools data
raw_data_file_sch = "schools_complete.csv"
schools = os.path.join("raw_data",raw_data_file_sch)

In [707]:
# get school data into a dataframe
sch_df = pd.read_csv(schools)

In [708]:
# get number of schools
number_of_schools = sch_df['School ID'].count()
# get total budget
budget = sch_df['budget'].sum()

In [709]:
# Raw Students data
raw_data_file_stu = "students_complete.csv"
students = os.path.join("raw_data",raw_data_file_stu)

In [710]:
# get student data into a dataframe
stu_df = pd.read_csv(students)

In [711]:
student_population = stu_df['Student ID'].count() 
# alternative: student_population = len(stu_df)

In [712]:
# averages scores
r_avg = stu_df['reading_score'].mean()
m_avg = stu_df['math_score'].mean()

# passing grade must be >60
r_pass = stu_df.loc[stu_df['reading_score'] > 60,"reading_score"]
read_pass = r_pass.count()
m_pass = stu_df.loc[stu_df["math_score"] > 60,"math_score"]
math_pass = m_pass.count()
r_per = (read_pass/student_population)*100 
m_per = (math_pass/student_population)*100 
pass_rate = (r_per+m_per)/2

# put it all in a dictionay
district_df = pd.DataFrame({'Total Schools':[number_of_schools],
                            'Total Students':[student_population],
                            'Total Budget': ['${:,.2f}'.format(budget)],
                            'Average Math Score':[m_avg], 
                            'Average Reading Score':[r_avg],
                            '% Passing Math':['{:,.2f}%'.format(m_per)],
                            '% Passing Reading':['{:,.2f}%'.format(r_per)],
                            'Overall Passing Rate':['{:,.2f}%'.format(pass_rate)]                             
                            }
                          )

## District Summary

In [713]:
# reorder columns
order_district = ['Total Schools','Total Students','Total Budget','Average Reading Score','% Passing Reading','Average Math Score','% Passing Math','Overall Passing Rate']

In [714]:
district_df = district_df[order_district]
district_df

Unnamed: 0,Total Schools,Total Students,Total Budget,Average Reading Score,% Passing Reading,Average Math Score,% Passing Math,Overall Passing Rate
0,15,39170,"$24,649,428.00",81.87784,100.00%,78.985371,90.91%,95.45%


In [715]:
# aggregate Schools and Students data into one dataframe
sch_df = sch_df.rename(columns={'name':'school'})
sch_stu_df = pd.merge(sch_df,stu_df,on='school',how='outer')

In [716]:
#Students by Schools
bySchool = sch_stu_df.groupby('school')
students_sch = bySchool['Student ID'].count()

#Budget by school
school_budget = bySchool['budget'].unique().astype(int)
school_budget_ft = school_budget.map('${:,.2f}'.format)
budget_per_stu = school_budget/students_sch
budget_per_stu_ft = budget_per_stu.map('${:,.2f}'.format)

#Score averages by School
math_avg = bySchool['math_score'].mean()
read_avg = bySchool['reading_score'].mean()

#School type
school_type = bySchool['type'].unique()
school_type = school_type.map("%s".join)

In [717]:
#Math pass percentage
mathpass = sch_stu_df.loc[sch_stu_df['math_score']>60]
mathpass_sch = mathpass['school'].value_counts()
mathpercentage = (mathpass_sch/students_sch)*100

#Reading pass percentage
readpass = sch_stu_df.loc[sch_stu_df['reading_score']>60]
readpass_sch = readpass['school'].value_counts()
readpercentage = (readpass_sch/students_sch)*100

#Overall passing rate
pass_rate = (readpercentage+mathpercentage)/2

#Collect all together
school_dict = {'type':school_type,
               'students':students_sch,
               'budget':school_budget_ft,
               'per student budget':budget_per_stu_ft,
               'Average Math Score':math_avg,
               'Average Reading Score': read_avg,
               '% passing Math':mathpercentage,
               '% passing Reading':readpercentage,
               'Overall passing':pass_rate
              }
schools_df = pd.DataFrame(school_dict)

## School Summary

In [718]:
order_schools = ['type','students','budget','per student budget','Average Reading Score', '% passing Reading','Average Math Score','% passing Math','Overall passing']
schools_df = schools_df[order_schools]
schools_df

Unnamed: 0_level_0,type,students,budget,per student budget,Average Reading Score,% passing Reading,Average Math Score,% passing Math,Overall passing
school,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Bailey High School,District,4976,"$3,124,928.00",$628.00,81.033963,100.0,77.048432,87.439711,93.719855
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.97578,100.0,83.061895,100.0,100.0
Figueroa High School,District,2949,"$1,884,411.00",$639.00,81.15802,100.0,76.711767,86.43608,93.21804
Ford High School,District,2739,"$1,763,916.00",$644.00,80.746258,100.0,77.102592,87.221614,93.610807
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.816757,100.0,83.351499,100.0,100.0
Hernandez High School,District,4635,"$3,022,020.00",$652.00,80.934412,100.0,77.289752,86.450917,93.225458
Holden High School,Charter,427,"$248,087.00",$581.00,83.814988,100.0,83.803279,100.0,100.0
Huang High School,District,2917,"$1,910,635.00",$655.00,81.182722,100.0,76.629414,86.83579,93.417895
Johnson High School,District,4761,"$3,094,650.00",$650.00,80.966394,100.0,77.072464,86.704474,93.352237
Pena High School,Charter,962,"$585,858.00",$609.00,84.044699,100.0,83.839917,100.0,100.0


### Top 5 Performing Schools (By Passing Rate)

In [719]:
top_5 = schools_df.sort_values(by='Overall passing', ascending=False)[:5]
top_5

Unnamed: 0_level_0,type,students,budget,per student budget,Average Reading Score,% passing Reading,Average Math Score,% passing Math,Overall passing
school,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.97578,100.0,83.061895,100.0,100.0
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.816757,100.0,83.351499,100.0,100.0
Holden High School,Charter,427,"$248,087.00",$581.00,83.814988,100.0,83.803279,100.0,100.0
Pena High School,Charter,962,"$585,858.00",$609.00,84.044699,100.0,83.839917,100.0,100.0
Shelton High School,Charter,1761,"$1,056,600.00",$600.00,83.725724,100.0,83.359455,100.0,100.0


### Bottom 5 Performing Schools (By Passing Rate)

In [720]:
bottom_5 = schools_df.sort_values(by='Overall passing', ascending=True)[:5]
bottom_5

Unnamed: 0_level_0,type,students,budget,per student budget,Average Reading Score,% passing Reading,Average Math Score,% passing Math,Overall passing
school,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Figueroa High School,District,2949,"$1,884,411.00",$639.00,81.15802,100.0,76.711767,86.43608,93.21804
Rodriguez High School,District,3999,"$2,547,363.00",$637.00,80.744686,100.0,76.842711,86.446612,93.223306
Hernandez High School,District,4635,"$3,022,020.00",$652.00,80.934412,100.0,77.289752,86.450917,93.225458
Johnson High School,District,4761,"$3,094,650.00",$650.00,80.966394,100.0,77.072464,86.704474,93.352237
Huang High School,District,2917,"$1,910,635.00",$655.00,81.182722,100.0,76.629414,86.83579,93.417895


### Math Scores by Grade

In [721]:
# group by school and grade to obtain the math average
byGrade = sch_stu_df.groupby(['school','grade'])
bySchool_math_df = pd.DataFrame(byGrade['math_score'].mean().unstack())
reorder_grades = ['9th','10th','11th','12th']
bySchool_math_df = bySchool_math_df[reorder_grades]
bySchool_math_df

grade,9th,10th,11th,12th
school,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,77.083676,76.996772,77.515588,76.492218
Cabrera High School,83.094697,83.154506,82.76556,83.277487
Figueroa High School,76.403037,76.539974,76.884344,77.151369
Ford High School,77.361345,77.672316,76.918058,76.179963
Griffin High School,82.04401,84.229064,83.842105,83.356164
Hernandez High School,77.438495,77.337408,77.136029,77.186567
Holden High School,83.787402,83.429825,85.0,82.855422
Huang High School,77.027251,75.908735,76.446602,77.225641
Johnson High School,77.187857,76.691117,77.491653,76.863248
Pena High School,83.625455,83.372,84.328125,84.121547


### Reading Scores by Grade

In [722]:
# groups by school and grade to obtain the reading average
bySchool_reading_df = pd.DataFrame(byGrade['reading_score'].mean().unstack())
bySchool_reading_df = bySchool_reading_df[reorder_grades]
bySchool_reading_df

grade,9th,10th,11th,12th
school,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,81.303155,80.907183,80.945643,80.912451
Cabrera High School,83.676136,84.253219,83.788382,84.287958
Figueroa High School,81.198598,81.408912,80.640339,81.384863
Ford High School,80.632653,81.262712,80.403642,80.662338
Griffin High School,83.369193,83.706897,84.288089,84.013699
Hernandez High School,80.86686,80.660147,81.39614,80.857143
Holden High School,83.677165,83.324561,83.815534,84.698795
Huang High School,81.290284,81.512386,81.417476,80.305983
Johnson High School,81.260714,80.773431,80.616027,81.227564
Pena High School,83.807273,83.612,84.335938,84.59116


### Scores by School Spending

In [723]:
schools_df['per student budget'] = budget_per_stu
topspend = schools_df['per student budget'].max()
bottomspend = schools_df['per student budget'].min()
bottomspend = budget_per_stu.min()

t2 = (topspend - bottomspend)/4
spending = [0, bottomspend + t2, bottomspend+(2*t2), bottomspend+(3*t2), topspend]
spending_labels = ['<$598','$598-616','$616-635','>$635']

school_spending = pd.cut(schools_df['per student budget'], spending, labels=spending_labels)
scores_dict = {"Spending":school_spending,
                 'Overall passing rate':pass_rate,
                "Average Math Score":math_avg,
                "Average Reading Score":read_avg,
                "% passing math":mathpercentage,
                "% passing read":readpercentage}
spending_df = pd.DataFrame(scores_dict)
reorder_scores = ['Spending','Overall passing rate','Average Reading Score','% passing read','Average Math Score','% passing math']
spending_df = spending_df[reorder_columns]
spending_df.groupby('Spending').mean()

Unnamed: 0_level_0,Overall passing rate,Average Reading Score,% passing read,Average Math Score,% passing math
Spending,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
<$598,100.0,83.933814,100.0,83.455399,100.0
$598-616,100.0,83.885211,100.0,83.599686,100.0
$616-635,96.859928,82.42536,100.0,80.199966,93.719855
>$635,94.292535,81.368774,100.0,77.866721,88.585069


## Scores by School Size

In [724]:
sch_size = [0,1500,3000,5000]
name_size = ['Small(<1500)', 'Medium(1500-3000)','Large(3000-5000)']
school_size = pd.cut(schools_df['students'],sch_size, labels=name_size)
scores_dict['Size'] = school_size
reorder_scores[0]='Size'
size_df = pd.DataFrame(scores_dict)
size_df = size_df[reorder_scores]
size_df.groupby("Size").mean()

Unnamed: 0_level_0,Overall passing rate,Average Reading Score,% passing read,Average Math Score,% passing math
Size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Small(<1500),100.0,83.892148,100.0,83.664898,100.0
Medium(1500-3000),97.530843,82.82274,100.0,80.904987,95.061685
Large(3000-5000),93.380214,80.919864,100.0,77.06334,86.760428


## Scores by School Type

In [725]:
scores_dict['type']=school_type
reorder_scores[0]='type'
type_df = pd.DataFrame(scores_dict)
type_df = type_df[reorder_scores]
type_df.groupby('type').mean()

Unnamed: 0_level_0,Overall passing rate,Average Reading Score,% passing read,Average Math Score,% passing math
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Charter,100.0,83.896421,100.0,83.473852,100.0
District,93.395371,80.966636,100.0,76.956733,86.790742
