# PyCity Schools Analysis

- The purpose of this analysis was to understand trends in school performance based on varying school budgets and school sizes. 
- Based on the analysis, schools with higher per-student budgets did not have a higher % overall passing compared to schools with lower per-student budgets. The top 5 schools with the highest % overall passing had an average per-student budget of 485, and were all charter schools. On the other hand, the bottom 5 schools with the lowest % overall passing had an average per-student budget of 647, and were all district schools. This indicates that the per-student budget is not a leading indicator for school performance; other important factors include effectiveness of budget allocation, student to teacher ratios, student and teacher educational backgrounds, curriculum differences, and more. 
- Another interesting finding was while the top-5 schools and bottom-5 schools did not differ much in the % of students who passed reading, there was a much larger gap in the % of students who passed math, which contributes to the overall difference in % of students passing overall. The top 5 schools had an average of 84% students who passed reading and an average of 94% of students who passed math, leading to an average 91% overall passing rate. The bottom 5 schools had an average of 81% of students who passed reading, but a much lower average of 66% of students who passed math, leading to an average 54% overall passing rate. In order to improve overall pass rate, the worst-performing schools should focus on improving the pass rate of math classes. 




In [54]:
### Setup
# Import pandas
import pandas as pd

# Set path to files and save in a variable
schools_file = "Resources/schools_complete.csv"
students_file = "Resources/students_complete.csv"

# Store files in dataframes
schools_df = pd.read_csv(schools_file)
students_df = pd.read_csv(students_file)

# Combine dataframes
df = pd.merge(schools_df, students_df, how = "left", on = ["school_name"])
df = df.rename(columns = {"school_name" : "School Name", "type" : "Type", "size" : "Size", "budget" : "Budget", "student_name" : "Student Name", "gender" : "Gender", "grade" : "Grade", "reading_score" : "Reading Score", "math_score" : "Math Score"})
df.head()

Unnamed: 0,School ID,School Name,Type,Size,Budget,Student ID,Student Name,Gender,Grade,Reading Score,Math Score
0,0,Huang High School,District,2917,1910635,0,Paul Bradley,M,9th,66,79
1,0,Huang High School,District,2917,1910635,1,Victor Smith,M,12th,94,61
2,0,Huang High School,District,2917,1910635,2,Kevin Rodriguez,M,12th,90,60
3,0,Huang High School,District,2917,1910635,3,Dr. Richard Scott,M,12th,67,58
4,0,Huang High School,District,2917,1910635,4,Bonnie Ray,F,9th,97,84


## District Summary

In [55]:
#Count unique Schools
count_unique_schools = len(df["School Name"].unique())
print(count_unique_schools)

15


In [56]:
#Total students
total_students = df["Student ID"].count()
print(total_students)

39170


In [57]:
#Total budget
list_unique_budgets = df["Budget"].unique()
list_unique_budgets_df = pd.DataFrame(list_unique_budgets)
list_unique_budgets_df.head()
total_budget = list_unique_budgets_df[0].sum()
print(total_budget)

24649428


In [58]:
# Average math score
average_math_score = df["Math Score"].mean()
print(average_math_score)

78.98537145774827


In [59]:
# Average reading score
average_reading_score = df["Reading Score"].mean()
print(average_reading_score)

81.87784018381414


In [60]:
# Clean up columns 
df = df.dropna(how = 'any')

In [61]:
# % of students who passed math
df = df.astype({"Math Score" : "int64"})
pass_math_list = df.loc[df["Math Score"] >= 70, :]
pass_math_df = pd.DataFrame(pass_math_list)
pass_math_df.head()
count_math_pass = pass_math_df["Student ID"].count()

percent_pass_math = count_math_pass / total_students *100
print(percent_pass_math)

74.9808526933878


In [62]:
# % of students who passed reading
df = df.astype({"Reading Score" : "int64"})
pass_reading_list = df.loc[df["Reading Score"] >= 70, :]
pass_reading_df = pd.DataFrame(pass_reading_list)
count_reading_pass = pass_reading_df["Student ID"].count()

percent_pass_reading = count_reading_pass / total_students *100
print(percent_pass_reading)

85.80546336482001


In [63]:
# % of students who passed math AND reading
pass_mathreading_list = df.loc[(df["Math Score"] >= 70) & (df["Reading Score"] >= 70), :]
pass_mathreading_df = pd.DataFrame(pass_mathreading_list)
count_mathreading_pass = pass_mathreading_df["Student ID"].count()

percent_pass_mathreading = count_mathreading_pass / total_students *100
print(percent_pass_mathreading)

65.17232575950983


In [64]:
# Create summary dataframe
district_summary_df = pd.DataFrame({"Total Unique Schools" : [count_unique_schools],
                                   "Total Students" : [total_students],
                                   "Total Budget" : [total_budget],
                                   "Average Math Score" : [average_math_score],
                                   "Average Reading Score" : [average_reading_score],
                                   "Percent of Students Passing Math" : [percent_pass_math],
                                   "Percent of Students Passing Reading" : [percent_pass_reading],
                                   "Percent of Students Passing Overall" : [percent_pass_mathreading]
                                   })
district_summary_df

Unnamed: 0,Total Unique Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,Percent of Students Passing Math,Percent of Students Passing Reading,Percent of Students Passing Overall
0,15,39170,24649428,78.985371,81.87784,74.980853,85.805463,65.172326


## School Summary

In [65]:
#Set Up
school_summary = schools_df.set_index(["school_name"])["type"]
school_summary_df = pd.DataFrame(school_summary)

In [66]:
#Total students per school
students_per_school = df.groupby(["School Name"]).count()["Student ID"]

In [67]:
#Total budget per school
budget_per_school = df.groupby(["School Name"]).mean()["Budget"]

  budget_per_school = df.groupby(["School Name"]).mean()["Budget"]


In [68]:
#Total budget per student per school
budget_per_student = budget_per_school / students_per_school

In [69]:
#Average math score per school
total_school_math_score = df.groupby(["School Name"]).sum()["Math Score"]
average_school_math_score = total_school_math_score / students_per_school

  total_school_math_score = df.groupby(["School Name"]).sum()["Math Score"]


In [70]:
#Average reading score per school
total_school_reading_score = df.groupby(["School Name"]).sum()["Reading Score"]
average_school_reading_score = total_school_reading_score / students_per_school

  total_school_reading_score = df.groupby(["School Name"]).sum()["Reading Score"]


In [71]:
# % of students who passed math per school
df["Pass Math"] = df["Math Score"] >= 70
filter_pass_math_df = df.loc[df["Pass Math"] == True, :]
count_pass_math_per_school = filter_pass_math_df.groupby(["School Name"]).count()["Student ID"]
percent_pass_math_per_school = count_pass_math_per_school / students_per_school *100

In [72]:
# % of students who passed reading per school
df["Pass Reading"] = df["Reading Score"] >= 70
filter_pass_reading_df = df.loc[df["Pass Reading"] == True, :]
count_pass_reading_per_school = filter_pass_reading_df.groupby(["School Name"]).count()["Student ID"]
percent_pass_reading_per_school = count_pass_reading_per_school / students_per_school *100

In [73]:
# % of students who passed math AND reading per school
df["Pass Math and Reading"] = ((df["Reading Score"] >= 70) & (df["Math Score"] >= 70))
filter_pass_mathreading_df = df.loc[df["Pass Math and Reading"] == True, :]
count_pass_mathreading_per_school = filter_pass_mathreading_df.groupby(["School Name"]).count()["Student ID"]
percent_pass_mathreading_per_school = count_pass_mathreading_per_school / students_per_school *100

In [74]:
# Create summary dataframe
school_summary_df = school_summary_df.rename(columns = {"type" : "Type"})
school_summary_df["Total Students"] = students_per_school
school_summary_df["Total Budget"] = budget_per_school
school_summary_df["Total Budget"] = school_summary_df["Total Budget"].map("${:,.2f}".format)
school_summary_df["Per Student Budget"] = budget_per_student
school_summary_df["Per Student Budget"] = school_summary_df["Per Student Budget"].map("{:,.2f}".format)
school_summary_df["Average Math Score"] = average_school_math_score
school_summary_df["Average Reading Score"] = average_school_reading_score
school_summary_df["% Passing Math"] = percent_pass_math_per_school
school_summary_df["% Passing Reading"] = percent_pass_reading_per_school
school_summary_df["% Passing Overall"] = percent_pass_mathreading_per_school
school_summary_df


Unnamed: 0_level_0,Type,Total Students,Total Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Passing Overall
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Huang High School,District,2917,"$1,910,635.00",655.0,76.629414,81.182722,65.683922,81.316421,53.513884
Figueroa High School,District,2949,"$1,884,411.00",639.0,76.711767,81.15802,65.988471,80.739234,53.204476
Shelton High School,Charter,1761,"$1,056,600.00",600.0,83.359455,83.725724,93.867121,95.854628,89.892107
Hernandez High School,District,4635,"$3,022,020.00",652.0,77.289752,80.934412,66.752967,80.862999,53.527508
Griffin High School,Charter,1468,"$917,500.00",625.0,83.351499,83.816757,93.392371,97.138965,90.599455
Wilson High School,Charter,2283,"$1,319,574.00",578.0,83.274201,83.989488,93.867718,96.539641,90.582567
Cabrera High School,Charter,1858,"$1,081,356.00",582.0,83.061895,83.97578,94.133477,97.039828,91.334769
Bailey High School,District,4976,"$3,124,928.00",628.0,77.048432,81.033963,66.680064,81.93328,54.642283
Holden High School,Charter,427,"$248,087.00",581.0,83.803279,83.814988,92.505855,96.252927,89.227166
Pena High School,Charter,962,"$585,858.00",609.0,83.839917,84.044699,94.594595,95.945946,90.540541


## Highest-Performing Schools (by % Overall Passing)

In [75]:
# Highest performing schools by % overall passing (top 5 rows)
top_schools_df = school_summary_df.sort_values("% Passing Overall", ascending = False)
top_schools_df.head()


Unnamed: 0_level_0,Type,Total Students,Total Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Passing Overall
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Cabrera High School,Charter,1858,"$1,081,356.00",582.0,83.061895,83.97578,94.133477,97.039828,91.334769
Thomas High School,Charter,1635,"$1,043,130.00",638.0,83.418349,83.84893,93.272171,97.308869,90.948012
Griffin High School,Charter,1468,"$917,500.00",625.0,83.351499,83.816757,93.392371,97.138965,90.599455
Wilson High School,Charter,2283,"$1,319,574.00",578.0,83.274201,83.989488,93.867718,96.539641,90.582567
Pena High School,Charter,962,"$585,858.00",609.0,83.839917,84.044699,94.594595,95.945946,90.540541


## Lowest-Performing Schools (by % Overall Passing)

In [76]:
# Lowest performing schools by % overall passing (top 5 rows)
bottom_schools_df = school_summary_df.sort_values("% Passing Overall")
bottom_schools_df.head()

Unnamed: 0_level_0,Type,Total Students,Total Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Passing Overall
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Rodriguez High School,District,3999,"$2,547,363.00",637.0,76.842711,80.744686,66.366592,80.220055,52.988247
Figueroa High School,District,2949,"$1,884,411.00",639.0,76.711767,81.15802,65.988471,80.739234,53.204476
Huang High School,District,2917,"$1,910,635.00",655.0,76.629414,81.182722,65.683922,81.316421,53.513884
Hernandez High School,District,4635,"$3,022,020.00",652.0,77.289752,80.934412,66.752967,80.862999,53.527508
Johnson High School,District,4761,"$3,094,650.00",650.0,77.072464,80.966394,66.057551,81.222432,53.539172


## Math Scores by Grade

In [77]:
# Group by "school name"
average_math_by_grade = df.groupby(["School Name", "Grade"]).mean("Math Score")
average_math_by_grade_df = pd.DataFrame(average_math_by_grade)
average_math_sorted = average_math_by_grade_df.loc[:, ["Math Score"]]
average_math_sorted = pd.DataFrame(average_math_sorted)
average_math_sorted

Unnamed: 0_level_0,Unnamed: 1_level_0,Math Score
School Name,Grade,Unnamed: 2_level_1
Bailey High School,10th,76.996772
Bailey High School,11th,77.515588
Bailey High School,12th,76.492218
Bailey High School,9th,77.083676
Cabrera High School,10th,83.154506
Cabrera High School,11th,82.76556
Cabrera High School,12th,83.277487
Cabrera High School,9th,83.094697
Figueroa High School,10th,76.539974
Figueroa High School,11th,76.884344


## Reading Scores by Grade

In [78]:
# Reading scores by grade
average_reading_by_grade = df.groupby(["School Name","Grade"]).mean()["Reading Score"]
average_reading_by_grade_df = pd.DataFrame(average_reading_by_grade)
average_reading_sorted = average_reading_by_grade_df.loc[:, ["Reading Score"]]
average_reading_sorted = pd.DataFrame(average_reading_sorted)
average_reading_sorted

  average_reading_by_grade = df.groupby(["School Name","Grade"]).mean()["Reading Score"]


Unnamed: 0_level_0,Unnamed: 1_level_0,Reading Score
School Name,Grade,Unnamed: 2_level_1
Bailey High School,10th,80.907183
Bailey High School,11th,80.945643
Bailey High School,12th,80.912451
Bailey High School,9th,81.303155
Cabrera High School,10th,84.253219
Cabrera High School,11th,83.788382
Cabrera High School,12th,84.287958
Cabrera High School,9th,83.676136
Figueroa High School,10th,81.408912
Figueroa High School,11th,80.640339


## Scores by School Spending

In [79]:
school_summary_df.head()

Unnamed: 0_level_0,Type,Total Students,Total Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Passing Overall
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Huang High School,District,2917,"$1,910,635.00",655.0,76.629414,81.182722,65.683922,81.316421,53.513884
Figueroa High School,District,2949,"$1,884,411.00",639.0,76.711767,81.15802,65.988471,80.739234,53.204476
Shelton High School,Charter,1761,"$1,056,600.00",600.0,83.359455,83.725724,93.867121,95.854628,89.892107
Hernandez High School,District,4635,"$3,022,020.00",652.0,77.289752,80.934412,66.752967,80.862999,53.527508
Griffin High School,Charter,1468,"$917,500.00",625.0,83.351499,83.816757,93.392371,97.138965,90.599455


In [80]:
school_summary_df = school_summary_df.astype({"Per Student Budget": "float64"}, errors = 'raise')

In [81]:
# Scores by school spending
spending_bins = [0, 585, 630, 645, 680]
labels = ["<$585", "$585-630", "$630-645", "$645-680"]

school_summary_df["Spend Range"] = pd.cut(school_summary_df["Per Student Budget"], spending_bins, labels = labels, include_lowest = True)

spending_math_scores = school_summary_df.groupby(["Spend Range"]).mean()["Average Math Score"]
spending_reading_scores = school_summary_df.groupby(["Spend Range"]).mean()["Average Reading Score"]
spending_passing_math = school_summary_df.groupby(["Spend Range"]).mean()["% Passing Math"]
spending_passing_reading = school_summary_df.groupby(["Spend Range"]).mean()["% Passing Reading"]
overall_passing_spending = school_summary_df.groupby(["Spend Range"]).mean()["% Passing Overall"]

spending_summary_df = pd.DataFrame({"Average Math Score" : [spending_math_scores],
                                   "Average Reading Score" : [spending_reading_scores_df],
                                   "% Passing Math (Percentage Of Students Who Passed Math)" : [spending_passing_math],
                                   "% Passing Reading (Percentage Of Students Who Passed Reading)" : [spending_passing_reading],
                                   "% Overall Passing (Percentage Of Students Who Passed Math and Reading)" : [overall_passing_spending],
                                   })

spending_summary_df


  spending_math_scores = school_summary_df.groupby(["Spend Range"]).mean()["Average Math Score"]
  spending_reading_scores = school_summary_df.groupby(["Spend Range"]).mean()["Average Reading Score"]
  spending_passing_math = school_summary_df.groupby(["Spend Range"]).mean()["% Passing Math"]
  spending_passing_reading = school_summary_df.groupby(["Spend Range"]).mean()["% Passing Reading"]
  overall_passing_spending = school_summary_df.groupby(["Spend Range"]).mean()["% Passing Overall"]


Unnamed: 0,Average Math Score,Average Reading Score,% Passing Math (Percentage Of Students Who Passed Math),% Passing Reading (Percentage Of Students Who Passed Reading),% Overall Passing (Percentage Of Students Who Passed Math and Reading)
0,Spend Range <$585 83.455399 $585-630 ...,Average Reading Score Spend Range...,Spend Range <$585 93.460096 $585-630 ...,Spend Range <$585 96.610877 $585-630 ...,Spend Range <$585 90.369459 $585-630 ...


## Scores by School Size

In [82]:
# Scores by School Size

size_bins = [0, 1000, 2000, 5000]
size_labels = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]

school_summary_df["School Size"] = pd.cut(school_summary_df["Total Students"], size_bins, labels = size_labels, include_lowest = True)

math_scores_size = school_summary_df.groupby(["School Size"]).mean()["Average Math Score"]
reading_scores_size = school_summary_df.groupby(["School Size"]).mean()["Average Reading Score"]
passing_math_size = school_summary_df.groupby(["School Size"]).mean()["% Passing Math"]
passing_reading_size = school_summary_df.groupby(["School Size"]).mean()["% Passing Reading"]
passing_spending_size = school_summary_df.groupby(["School Size"]).mean()["% Passing Overall"]

size_summary_df = pd.DataFrame({"Average Math Score" : [math_scores_size],
                                   "Average Reading Score" : [reading_scores_size],
                                   "% Passing Math (Percentage Of Students Who Passed Math)" : [passing_math_size],
                                   "% Passing Reading (Percentage Of Students Who Passed Reading)" : [passing_reading_size],
                                   "% Overall Passing (Percentage Of Students Who Passed Math and Reading)" : [passing_spending_size],
                                   })

size_summary_df


  math_scores_size = school_summary_df.groupby(["School Size"]).mean()["Average Math Score"]
  reading_scores_size = school_summary_df.groupby(["School Size"]).mean()["Average Reading Score"]
  passing_math_size = school_summary_df.groupby(["School Size"]).mean()["% Passing Math"]
  passing_reading_size = school_summary_df.groupby(["School Size"]).mean()["% Passing Reading"]
  passing_spending_size = school_summary_df.groupby(["School Size"]).mean()["% Passing Overall"]


Unnamed: 0,Average Math Score,Average Reading Score,% Passing Math (Percentage Of Students Who Passed Math),% Passing Reading (Percentage Of Students Who Passed Reading),% Overall Passing (Percentage Of Students Who Passed Math and Reading)
0,School Size Small (<1000) 83.821598 Me...,School Size Small (<1000) 83.929843 Me...,School Size Small (<1000) 93.550225 Me...,School Size Small (<1000) 96.099437 Me...,School Size Small (<1000) 89.883853 Me...


## Scores by School Type

In [41]:
# Scores by School Type
math_scores_type = school_summary_df.groupby(["Type"]).mean()["Average Math Score"]
reading_scores_type = school_summary_df.groupby(["Type"]).mean()["Average Reading Score"]
passing_math_type = school_summary_df.groupby(["Type"]).mean()["% Passing Math"]
passing_reading_type = school_summary_df.groupby(["Type"]).mean()["% Passing Reading"]
passing_spending_type = school_summary_df.groupby(["Type"]).mean()["% Passing Overall"]

type_summary_df = pd.DataFrame({"Average Math Score" : [math_scores_type],
                                   "Average Reading Score" : [reading_scores_type],
                                   "% Passing Math (Percentage Of Students Who Passed Math)" : [passing_math_type],
                                   "% Passing Reading (Percentage Of Students Who Passed Reading)" : [passing_reading_type],
                                   "% Overall Passing (Percentage Of Students Who Passed Math and Reading)" : [passing_spending_type],
                                   })

type_summary_df

  math_scores_type = school_summary_df.groupby(["Type"]).mean()["Average Math Score"]
  reading_scores_type = school_summary_df.groupby(["Type"]).mean()["Average Reading Score"]
  passing_math_type = school_summary_df.groupby(["Type"]).mean()["% Passing Math"]
  passing_reading_type = school_summary_df.groupby(["Type"]).mean()["% Passing Reading"]
  passing_spending_type = school_summary_df.groupby(["Type"]).mean()["% Passing Overall"]


Unnamed: 0,Average Math Score,Average Reading Score,% Passing Math (Percentage Of Students Who Passed Math),% Passing Reading (Percentage Of Students Who Passed Reading),% Overall Passing (Percentage Of Students Who Passed Math and Reading)
0,Type Charter 83.473852 District 76.9567...,Type Charter 83.896421 District 80.9666...,Type Charter 93.620830 District 66.5484...,Type Charter 96.586489 District 80.7990...,Type Charter 90.432244 District 53.6722...
