# PyAcademy Analysis

* Charter schools easily outperformed district schools.  The top five performing schools by overall pass rate are charter schools. They also had higher average scores and higher pass rates in math and reading.  Charter schools seem particularly superior to district schools in teaching math.


* Schools that spend less per student surprisingly outperform schools that spend more on both average scores in reading and math and the percentage of students who pass those subjects.  Higher per student spending does not correlate with improved learning outcomes.


* Schools with smaller enrollments tend to outperform larger schools, as the average scores and the pass rates decrease with larger enrollments.  Large district schools really seem to struggle with math and reading, but particularly with math.

In [28]:
#import dependencies
import pandas as pd

In [29]:
#create file paths
school_file = "raw_data/schools_complete.csv"
students_file = "raw_data/students_complete.csv"

In [30]:
#read data into a pandas dataset
schools_df = pd.read_csv(school_file)
students_df = pd.read_csv(students_file)

In [31]:
#start district summary by finding the total number of schools
total_schools = schools_df["name"].count()

In [32]:
#district summary--find the total number of students in the district
total_students = students_df["name"].count()

In [33]:
#district analysis -- find total budget
total_budget = schools_df["budget"].sum()

In [34]:
#district analysis -- calculate the average scores for reading and math 
district_average_reading = students_df["reading_score"].mean()
district_average_math = students_df["math_score"].mean()

In [35]:
#calculate the total passing in math
pass_count_math = students_df[students_df["math_score"] > 69].count()
pass_count_math = pass_count_math["math_score"]

In [36]:
#calculate the total passing in reading
pass_count_reading = students_df[students_df["reading_score"] > 69].count()
pass_count_reading = pass_count_reading["reading_score"]

In [37]:
#districct analysis -- calculate the percentage passing in reading and math
percent_passing_reading = pass_count_reading / total_students
percent_passing_math = pass_count_math / total_students

In [38]:
#calculate the overall passing rate
overall_pass = (percent_passing_math + percent_passing_reading) / 2

In [39]:
#create a summary table of the district's key metrics
district_summary_table = pd.DataFrame(
    {
        "Total Schools": '{:,.0f}'.format(total_schools),
        "Total Students": '{:,.0f}'.format(total_students),
        "Total Budget": '${:,.2f}'.format(total_budget),                              
        "Average Math Score": '{:,.2f}'.format(district_average_math),
        "Average Reading Score": '{:,.2f}'.format(district_average_reading),
        "% Passing Math":'{:.2%}'.format(percent_passing_math),
        "% Passing Reading":'{:.2%}'.format(percent_passing_reading),
        "Overall Passing Rate": '{:.2%}'.format(overall_pass)
    }, index=[0])
district_summary_table = district_summary_table[["Total Schools","Total Students","Total Budget", "Average Math Score","Average Reading Score","% Passing Math","% Passing Reading","Overall Passing Rate"]]


# District Summary

In [40]:
district_summary_table

Unnamed: 0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,Overall Passing Rate
0,15,39170,"$24,649,428.00",78.99,81.88,74.98%,85.81%,80.39%


In [41]:
#District Summary complete
#Begin analysis of the individual schools
#build upon the schools_df by adding the summary columns from the students_df

In [42]:
students_df = students_df.rename(columns={"name": "Student Name", "school": "School Name"})

In [43]:
#rename the "size" in the table to total students
schools_summary_table = schools_df.rename(columns={"name": "School Name", 'type': "School Type", 'size': 'Total Students', 'budget': "Total Budget"})

In [44]:
#calculate the budge per student and append a column to the summary table
schools_summary_table["Per Student Budget"] = schools_summary_table["Total Budget"] / schools_summary_table["Total Students"]

In [45]:
#calculate the average scores for each school
#group the student_df by school and check the .mean() method
#test--create new dataframe from groupby object
df = pd.DataFrame(students_df.groupby("School Name").mean())
df = df.reset_index()

In [46]:
schools_summary_table = pd.merge(schools_summary_table, df, on="School Name")
schools_summary_table = schools_summary_table.rename(columns={"reading_score": "Average Reading Score", "math_score": "Average Math Score"})

In [47]:
#Calculate the number of students who passed reading (for calculating the percent pass later)
df2 = students_df[["School Name", "reading_score"]]
df2 = df2.loc[df2["reading_score"] > 69]
df2 = df2.groupby("School Name").count()
df2 = df2.rename(columns={"reading_score": "# of Pass Reading"})
df2 = df2.reset_index()

In [48]:
#Calculate the number of students who passed math (for calculating the percent pass later)
df3 = students_df[["School Name", "math_score"]]
df3 = df3.loc[df3["math_score"] > 69]
df3 = df3.groupby("School Name").count()
df3 = df3.rename(columns={"math_score": "# of Pass Math"})
df3 = df3.reset_index()

In [49]:
#merge the #'s passed dataframes with the summary df in order to perform calculations
schools_summary_table = pd.merge(schools_summary_table, df2, on="School Name")
schools_summary_table = pd.merge(schools_summary_table, df3, on="School Name")

In [50]:
#Calculate the passing % for math and reading using the columns in the dataframe and add them as new columns
schools_summary_table["% Passing Reading"] = (schools_summary_table["# of Pass Reading"] / schools_summary_table["Total Students"]) * 100
schools_summary_table["% Passing Math"] = (schools_summary_table["# of Pass Math"] / schools_summary_table["Total Students"]) * 100
schools_summary_table["Overall Passing Rate"] = ((schools_summary_table["% Passing Reading"] + schools_summary_table["% Passing Math"]) / 2) 

In [51]:
#drop the unnecessary columns from the summary dataframe
schools_summary_table = schools_summary_table.drop(["School ID", "# of Pass Reading", "# of Pass Math", "Student ID"], axis=1)

# Schools Summary 

In [26]:
schools_summary_table

Unnamed: 0,School Name,School Type,Total Students,Total Budget,Per Student Budget,Average Reading Score,Average Math Score,% Passing Reading,% Passing Math,Overall Passing Rate
0,Huang High School,District,2917,1910635,655.0,81.182722,76.629414,81.316421,65.683922,73.500171
1,Figueroa High School,District,2949,1884411,639.0,81.15802,76.711767,80.739234,65.988471,73.363852
2,Shelton High School,Charter,1761,1056600,600.0,83.725724,83.359455,95.854628,93.867121,94.860875
3,Hernandez High School,District,4635,3022020,652.0,80.934412,77.289752,80.862999,66.752967,73.807983
4,Griffin High School,Charter,1468,917500,625.0,83.816757,83.351499,97.138965,93.392371,95.265668
5,Wilson High School,Charter,2283,1319574,578.0,83.989488,83.274201,96.539641,93.867718,95.203679
6,Cabrera High School,Charter,1858,1081356,582.0,83.97578,83.061895,97.039828,94.133477,95.586652
7,Bailey High School,District,4976,3124928,628.0,81.033963,77.048432,81.93328,66.680064,74.306672
8,Holden High School,Charter,427,248087,581.0,83.814988,83.803279,96.252927,92.505855,94.379391
9,Pena High School,Charter,962,585858,609.0,84.044699,83.839917,95.945946,94.594595,95.27027


In [276]:
# Summary Table for Individual Schools complete
# Begin creating table for the top 5 performing schools based on overall pass rate

In [277]:
# sort the summary dataframe in order to create a df in which the top five schools are on the top
top5 = schools_summary_table.sort_values("Overall Passing Rate", ascending=False).reset_index(drop=True)
top5 = top5.iloc[0:5].set_index("School Name")

In [278]:
# sort the summary dataframe in order to create a df in which the bottom five schools are on the top
bottom5 = schools_summary_table.sort_values("Overall Passing Rate", ascending=True).reset_index(drop=True)
bottom5 = bottom5.iloc[0:5].set_index("School Name")

In [279]:
#top and bottom 5 tables complete
#create table that sorts math scores by school and grade level

In [280]:
#Create table for math scores by grade
math_scores_by_grade = students_df.groupby(["School Name", "grade"])["math_score"].mean()
math_scores_by_grade = pd.DataFrame(math_scores_by_grade).unstack()

In [281]:
#create table for reading scores by grade
reading_scores_by_grade = students_df.groupby(["School Name", "grade"])["reading_score"].mean()
reading_scores_by_grade = pd.DataFrame(reading_scores_by_grade).unstack("grade")

In [282]:
#create table for scores by school spending
scores_by_spending = schools_summary_table
#create bins and bin lables
bins = [0, 600, 620, 640, 655]
spending_ranges = ["Under $600", "$600-620", "$620-640", "Over $640"]
#create new column for the binned data
scores_by_spending["Spending Ranges (Per Student)"] = pd.cut(scores_by_spending["Per Student Budget"], bins, labels=spending_ranges)
#group the df by the binned column and create new df for it
scores_by_spending = scores_by_spending.groupby("Spending Ranges (Per Student)").mean()
#drop the irrelevant columns from the df
scores_by_spending = scores_by_spending.drop(["Total Students", "Total Budget", "Per Student Budget"], axis=1)

In [283]:
#create table for scores by school size
scores_by_size = schools_summary_table
#create bins and bin lables
bins = [0, 1500, 3000, 5000]
enrollments = ["Small(Under 1500)", "Medium(1500-3000)", "Large (Over 3000)"]
#create new column for the binned data
scores_by_size["Enrollments"] = pd.cut(scores_by_size["Total Students"], bins, labels=enrollments)
#group the df by the binned column and create new df for it
scores_by_size = scores_by_size.groupby("Enrollments").mean()
#drop the irrelevant columns from the df
scores_by_size = scores_by_size.drop(["Total Students", "Total Budget", "Per Student Budget"], axis=1)

In [284]:
#create a table for scores by school type
scores_by_type = schools_summary_table
scores_by_type = scores_by_type.groupby("School Type").mean()
scores_by_type = scores_by_type.drop(["Total Students", "Total Budget", "Per Student Budget"], axis=1)

In [181]:
#Data wrangling complete. Now display the tables.

# Top 5 Performing Schools

In [185]:
top5

Unnamed: 0_level_0,School Type,Total Students,Total Budget,Per Student Budget,Average Reading Score,Average Math Score,% Passing Reading,% Passing Math,Overall Passing Rate
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Cabrera High School,Charter,1858,1081356,582.0,83.97578,83.061895,97.039828,94.133477,95.586652
Thomas High School,Charter,1635,1043130,638.0,83.84893,83.418349,97.308869,93.272171,95.29052
Pena High School,Charter,962,585858,609.0,84.044699,83.839917,95.945946,94.594595,95.27027
Griffin High School,Charter,1468,917500,625.0,83.816757,83.351499,97.138965,93.392371,95.265668
Wilson High School,Charter,2283,1319574,578.0,83.989488,83.274201,96.539641,93.867718,95.203679


# Bottom 5 Performing Schools

In [186]:
bottom5

Unnamed: 0_level_0,School Type,Total Students,Total Budget,Per Student Budget,Average Reading Score,Average Math Score,% Passing Reading,% Passing Math,Overall Passing Rate
School Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Rodriguez High School,District,3999,2547363,637.0,80.744686,76.842711,80.220055,66.366592,73.293323
Figueroa High School,District,2949,1884411,639.0,81.15802,76.711767,80.739234,65.988471,73.363852
Huang High School,District,2917,1910635,655.0,81.182722,76.629414,81.316421,65.683922,73.500171
Johnson High School,District,4761,3094650,650.0,80.966394,77.072464,81.222432,66.057551,73.639992
Ford High School,District,2739,1763916,644.0,80.746258,77.102592,79.299014,68.309602,73.804308


# Math Scores By Grade

In [187]:
math_scores_by_grade

Unnamed: 0_level_0,math_score,math_score,math_score,math_score
grade,10th,11th,12th,9th
School Name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Bailey High School,76.996772,77.515588,76.492218,77.083676
Cabrera High School,83.154506,82.76556,83.277487,83.094697
Figueroa High School,76.539974,76.884344,77.151369,76.403037
Ford High School,77.672316,76.918058,76.179963,77.361345
Griffin High School,84.229064,83.842105,83.356164,82.04401
Hernandez High School,77.337408,77.136029,77.186567,77.438495
Holden High School,83.429825,85.0,82.855422,83.787402
Huang High School,75.908735,76.446602,77.225641,77.027251
Johnson High School,76.691117,77.491653,76.863248,77.187857
Pena High School,83.372,84.328125,84.121547,83.625455


# Reading Scores By Grade

In [188]:
reading_scores_by_grade

Unnamed: 0_level_0,reading_score,reading_score,reading_score,reading_score
grade,10th,11th,12th,9th
School Name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Bailey High School,80.907183,80.945643,80.912451,81.303155
Cabrera High School,84.253219,83.788382,84.287958,83.676136
Figueroa High School,81.408912,80.640339,81.384863,81.198598
Ford High School,81.262712,80.403642,80.662338,80.632653
Griffin High School,83.706897,84.288089,84.013699,83.369193
Hernandez High School,80.660147,81.39614,80.857143,80.86686
Holden High School,83.324561,83.815534,84.698795,83.677165
Huang High School,81.512386,81.417476,80.305983,81.290284
Johnson High School,80.773431,80.616027,81.227564,81.260714
Pena High School,83.612,84.335938,84.59116,83.807273


# Math and Reading Scores by Spending (Per Student)

In [189]:
scores_by_spending

Unnamed: 0_level_0,Average Reading Score,Average Math Score,% Passing Reading,% Passing Math,Overall Passing Rate
Spending Ranges (Per Student),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Under $600,83.892196,83.43621,96.459627,93.541501,95.000564
$600-620,84.044699,83.839917,95.945946,94.594595,95.27027
$620-640,82.120471,79.474551,87.46808,77.139934,82.304007
Over $640,80.957446,77.023555,80.675217,66.70101,73.688113


# Math and Reading Scores by School Size

In [190]:
scores_by_size

Unnamed: 0_level_0,Average Reading Score,Average Math Score,% Passing Reading,% Passing Math,Overall Passing Rate
Enrollments,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Small(Under 1500),83.892148,83.664898,96.445946,93.497607,94.971776
Medium(1500-3000),82.82274,80.904987,90.588593,83.556977,87.072785
Large (Over 3000),80.919864,77.06334,81.059691,66.464293,73.761992


# Math and Reading Scores by Type of School

In [191]:
scores_by_type

Unnamed: 0_level_0,Average Reading Score,Average Math Score,% Passing Reading,% Passing Math,Overall Passing Rate
School Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Charter,83.896421,83.473852,96.586489,93.62083,95.10366
District,80.966636,76.956733,80.799062,66.548453,73.673757
