# PyCity Schools Analysis
---
In this project, we analyze the passing grades of students in different schools.

One conclusion we can draw from this analysis is that Charter schools have an overwhelming advantage in overall passing compared to District schools. We can see this from the "Highest Performing Schools (by % Overall Passing)" and the "Lowest Performing Schools (by % Overall Passing)". The top 5 performing schools are Charter schools and the bottom 5 performing schools are District schools. Charter school students have an overall passing rate of 90.43 % compared to District schools which have an overall passing rate of 53.67 %.

Another conclusion we can draw from this analysis is that schools spending more per student tends to yield a lower overall passing rate. In the section "Scores by School Spending", we see that schools spending <$585 per student have an overall pasing rate of 90.37 %, schools spending $585-630 per student have an overall passing rate of 81.42 %, schools spending $630-645 per student have an overall passing rate of 62.86 % and schools spending $645-680 per student have an overall passing rate of 53.53 %.

Lastly, there is a huge difference in overall pasing rate for small to medium schools compared to larger schools in terms of number of students. In the section "Scores by School Size", we can see that schools with less than 1000 students have an overall passing rate of 89.88 %, Schools with 1000 to 2000 students have an overall passing rate of 90.62 % and schools with 2000-5000 students have an overall passing rate of 58.29 %.

In conclusion, from this data, we can predict that Charter schools that have less than 1000 students that have a spending budget of less than $585 per student would have a better overall passing rate compared to other schools. 

---


In [48]:
# Import appropriate modules

import pandas as pd
from pathlib import Path

# Read the csv files and merge them in to a DataFrame

school_data_path = Path("Resources/schools_complete.csv")
student_data_path = Path("Resources/students_complete.csv")

# Setup separate DataFrames for school and student data
school_data = pd.read_csv(school_data_path)
student_data = pd.read_csv(student_data_path)

# Merge the school_data and student_data DataFrames by school name
school_student_df = pd.merge(student_data, school_data, how="left", on=["school_name", "school_name"])

# District Summary


#### Total unique school count
---

In [49]:
# Find the total number of unique schools

school_count = len(school_student_df["school_name"].unique())
school_count

15

#### Total Student Count
---

In [50]:
# Find the total number of students

student_count = len(school_student_df["student_name"])
student_count

39170

#### Total Budget
---

In [51]:
# Find the total budget using the school_data DataFrame

total_budget = school_data["budget"].sum()
total_budget

24649428

#### Total Average Math Score
----

In [52]:
# Find the average math score

average_math_score = school_student_df["math_score"].mean()
average_math_score

78.98537145774827

#### Total Average Reading Score
---

In [53]:
# Find the average reading score

average_reading_score = school_student_df["reading_score"].mean()
average_reading_score

81.87784018381414

#### Total Percentage of Students Passing Math
----

In [54]:
# Find percentage of students passing math. Passing grade is 70 or higher

math_pass = school_student_df.loc[(school_student_df["math_score"]) >= 70, :]["math_score"].count()
math_pass_percent = math_pass / student_count * 100
math_pass_percent

74.9808526933878

#### Total Percentage of Students Passing Reading
----

In [55]:
# Find percentage of students passing reading. Passing grade is 70 or higher

read_pass = school_student_df.loc[(school_student_df["reading_score"]) >= 70, :]["reading_score"].count()
read_pass_percent = read_pass / student_count * 100
read_pass_percent

85.80546336482001

#### Total Percentage of Students Passing Math and Reading
----

In [56]:
# Find the percentage of students who passed math and reading

overall_pass = school_student_df.loc[((school_student_df["reading_score"]) >= 70) & 
                                     ((school_student_df["math_score"]) >= 70), :]["reading_score"].count()

overall_pass_percent = overall_pass / student_count * 100

overall_pass_percent

65.17232575950983

### District key metrics
---

In [57]:
# Create DataFrame for district's key metrics

district_summary = pd.DataFrame([{"Total Schools": school_count, "Total Students": student_count, "Total Budget": total_budget,
                                  "Average Math Score": average_math_score, "Average Reading Score": average_reading_score,
                                  "% Passing Math": math_pass_percent, "%Passing Reading": read_pass_percent, "% Overall Passing": overall_pass_percent}])

# Formatting
district_summary["Total Students"] = district_summary["Total Students"].map("{:,}".format)
district_summary["Total Budget"] = district_summary["Total Budget"].map("${:,.2f}".format)

district_summary

Unnamed: 0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,%Passing Reading,% Overall Passing
0,15,39170,"$24,649,428.00",78.985371,81.87784,74.980853,85.805463,65.172326


# School Summary
---

In [58]:
# We gather these as they will be useful later when we do loops
# Create a list of schools

school_list = school_data["school_name"].unique()

# Create a new school_data DataFrame with index as school_name for easier access

school_data_new = school_data.set_index("school_name")

#### School type
----

In [59]:
# Create dictionaries to store data for the school type per school
# Keys are the school names, values are school type
school_type = {"school_name": [], "type": []}

# Loop through each school in school_list and add the school to "school_name"
# column and the school type in the "type" column

for school in school_list:
    school_type["school_name"].append(school)
    school_type["type"].append(school_data_new.at[school, "type"])

# Store it in a DataFrame with school names as index, sorted in alphabetical order
school_type = pd.DataFrame(school_type).sort_values("school_name")
school_type = school_type.reset_index(drop = True)

school_type.head()


Unnamed: 0,school_name,type
0,Bailey High School,District
1,Cabrera High School,Charter
2,Figueroa High School,District
3,Ford High School,District
4,Griffin High School,Charter


#### Total Students
---

In [60]:
# Create dictionaries to store data for the school type per school
# Keys are the school names, values are school type
per_school_stu_count = {"school_name": [], "total_students": []}

# Loop through each school in school_list and add the school to "school_name"
# column and the total students in the "total_students" column

for school in school_list:
    per_school_stu_count["school_name"].append(school)
    per_school_stu_count["total_students"].append(school_data_new.at[school, "size"])

# Store it in a DataFrame with school names as index, sorted in alphabetical order
per_school_stu_count = pd.DataFrame(per_school_stu_count).sort_values("school_name")
per_school_stu_count = per_school_stu_count.reset_index(drop = True)

per_school_stu_count.head()


Unnamed: 0,school_name,total_students
0,Bailey High School,4976
1,Cabrera High School,1858
2,Figueroa High School,2949
3,Ford High School,2739
4,Griffin High School,1468


#### Budget per School
---

In [82]:
# Create dictionaries to store data for the budget per school.
# and budget per capita respectively
per_school_budget = {"school_name": [], "budget": []}              # Dictionary for budget
per_school_capita = {"school_name": [], "budget_per_capita": []}   # Dictionary for budget per capita


# Loop through school_data and add to the school budget and school budget per capita 
for school in school_list:

    # Defining these variables for cleaner code

    # Current school's budget
    cur_budget = school_data_new.at[school, "budget"]    

    # Current school's budget per capita
    cur_budget_capita = int(school_data_new.at[school, "budget"]) / int(school_data_new.at[school, "size"])

    per_school_budget["school_name"].append(school)
    per_school_budget["budget"].append(cur_budget)
    per_school_capita["school_name"].append(school)
    per_school_capita["budget_per_capita"].append(cur_budget_capita)

# Create a DataFrame for each of budget per schoool and budget per capita per school
per_school_budget = pd.DataFrame(per_school_budget)

per_school_capita = pd.DataFrame(per_school_capita)

per_school_budget.head()
per_school_capita.head()

Unnamed: 0,school_name,budget_per_capita
0,Huang High School,655.0
1,Figueroa High School,639.0
2,Shelton High School,600.0
3,Hernandez High School,652.0
4,Griffin High School,625.0


#### Average math score
---

In [83]:
# Group the data by the school name. Makes average calculations easier
school_summary_df = school_student_df.groupby("school_name")

In [84]:
# Find average math score for each school

per_school_av_math = school_summary_df[["math_score"]].mean()

# Reset the index so we have "school_name" as a column for merging later
per_school_av_math = per_school_av_math.reset_index()
per_school_av_math.head()

Unnamed: 0,school_name,math_score
0,Bailey High School,77.048432
1,Cabrera High School,83.061895
2,Figueroa High School,76.711767
3,Ford High School,77.102592
4,Griffin High School,83.351499


#### Average reading score
---

In [85]:
# Find average reading score for each school
per_school_av_read = school_summary_df[["reading_score"]].mean()

# Reset the index so we have "school_name" as a column for merging later
per_school_av_read = per_school_av_read.reset_index()
per_school_av_read.head()

Unnamed: 0,school_name,reading_score
0,Bailey High School,81.033963
1,Cabrera High School,83.97578
2,Figueroa High School,81.15802
3,Ford High School,80.746258
4,Griffin High School,83.816757


#### % Passing Math
---

In [86]:
# We compute the number of students passing math per school
per_school_math_pass = {"school_name": [], "%_pass_math": []}      # Dictionary to store school and their math passing %

# Loop through the school list and input the math pass % into the dictionary from school_data_new
for school in school_list:

    # DataFrame to store all students in current school
    cur_school = school_student_df[(school_student_df["school_name"] == school)]

    # DataFrame to store students in current school with math_score over 70
    math_over_70 = cur_school[(school_student_df["math_score"] >= 70)]

    # # Add the current school and the percentage passing math into their respective columns in the dictionary for the DataFrame
    per_school_math_pass["school_name"].append(school)
    per_school_math_pass["%_pass_math"].append(len(math_over_70) / len(cur_school) * 100)


# Replace the dictionary with a dataframe of the collected values
per_school_math_pass = pd.DataFrame(per_school_math_pass)

per_school_math_pass.head()


  math_over_70 = cur_school[(school_student_df["math_score"] >= 70)]


Unnamed: 0,school_name,%_pass_math
0,Huang High School,65.683922
1,Figueroa High School,65.988471
2,Shelton High School,93.867121
3,Hernandez High School,66.752967
4,Griffin High School,93.392371


#### % Passing Reading
---

In [87]:
# We compute the number of students passing reading per school
per_school_read_pass = {"school_name": [], "%_pass_read": []}   # Dictionary to store school and their reading passing %

# Loop through the school list and input the math pass % into the dictionary from school_data_new
for school in school_list:

    # DataFrame to store all students in current school
    cur_school = school_student_df[(school_student_df["school_name"] == school)]

    # DataFrame to store students in current school with math_score over 70
    read_over_70 = cur_school[(school_student_df["reading_score"] >= 70)]

    # # Add the current school and the percentage passing math into their respective columns in the dictionary for the DataFrame
    per_school_read_pass["school_name"].append(school)
    per_school_read_pass["%_pass_read"].append(len(read_over_70) / len(cur_school) * 100)


# Replace the dictionary with a dataframe of the collected values
per_school_read_pass = pd.DataFrame(per_school_read_pass)

per_school_read_pass.head()

  read_over_70 = cur_school[(school_student_df["reading_score"] >= 70)]


Unnamed: 0,school_name,%_pass_read
0,Huang High School,81.316421
1,Figueroa High School,80.739234
2,Shelton High School,95.854628
3,Hernandez High School,80.862999
4,Griffin High School,97.138965


#### % Overall Pass
---

In [88]:
# We compute the number of students passing math and reading per school
per_school_overall_pass = {"school_name": [], "%_pass_overall": []}     # Dictionary to store school and their overall passing %

# Loop through the school list and input the math pass % into the dictionary from school_data_new
for school in school_list:

    # DataFrame to store all students in current school
    cur_school = school_student_df[(school_student_df["school_name"] == school)]

    # DataFrame to store students in current school with math_score over 70
    read_math_over_70 = cur_school[(school_student_df["reading_score"] >= 70) & (school_student_df["math_score"] >= 70)]

    # # Add the current school and the percentage passing math into their respective columns in the dictionary for the DataFrame
    per_school_overall_pass["school_name"].append(school)
    per_school_overall_pass["%_pass_overall"].append(len(read_math_over_70) / len(cur_school) * 100)


# Replace the dictionary with a dataframe of the collected values
per_school_overall_pass = pd.DataFrame(per_school_overall_pass)

per_school_overall_pass.head()

  read_math_over_70 = cur_school[(school_student_df["reading_score"] >= 70) & (school_student_df["math_score"] >= 70)]


Unnamed: 0,school_name,%_pass_overall
0,Huang High School,53.513884
1,Figueroa High School,53.204476
2,Shelton High School,89.892107
3,Hernandez High School,53.527508
4,Griffin High School,90.599455


#### Merge the per school data into a Dataframe
----

In [89]:
# Merging all the DataFrames into one that summarizes the key metrics per school

per_school_summary = pd.DataFrame.merge(school_type, per_school_stu_count, on="school_name", how="left")
per_school_summary = pd.DataFrame.merge(per_school_summary, per_school_budget, on="school_name", how="left")
per_school_summary = pd.DataFrame.merge(per_school_summary, per_school_capita, on="school_name", how="left")
per_school_summary = pd.DataFrame.merge(per_school_summary, per_school_av_math, on="school_name", how="left")
per_school_summary = pd.DataFrame.merge(per_school_summary, per_school_av_read, on="school_name", how="left")
per_school_summary = pd.DataFrame.merge(per_school_summary, per_school_math_pass, on="school_name", how="left")
per_school_summary = pd.DataFrame.merge(per_school_summary, per_school_read_pass, on="school_name", how="left")
per_school_summary = pd.DataFrame.merge(per_school_summary, per_school_overall_pass, on="school_name", how="left")

###  Formatting  ###

# Rename the column names for better readability
per_school_summary = per_school_summary.rename(columns={"school_name": "School", "type": "School Type", "total_students": "Total Students",
                                                "budget": "Total School Budget", "budget_per_capita": "Per Student Budget",
                                                "math_score": "Average Math Score", "reading_score": "Average Reading Score",
                                                "%_pass_math": "% Passing Math", "%_pass_read": "% Passing Reading", 
                                                "%_pass_overall": "% Overall Passing"})

# Make the index as the school_name and remove index name for the school column
per_school_summary = per_school_summary.set_index("School")
per_school_summary.index.name = None

# Make a copy to format as we may need to do calculations on the original DataFrame later on
per_school_summary_format = per_school_summary.copy()
per_school_summary_format["Total School Budget"] = per_school_summary_format["Total School Budget"].map("${:,.2f}".format)
per_school_summary_format["Per Student Budget"] = per_school_summary_format["Per Student Budget"].map("${:,.2f}".format)
per_school_summary_format

Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Bailey High School,District,4976,"$3,124,928.00",$628.00,77.048432,81.033963,66.680064,81.93328,54.642283
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.061895,83.97578,94.133477,97.039828,91.334769
Figueroa High School,District,2949,"$1,884,411.00",$639.00,76.711767,81.15802,65.988471,80.739234,53.204476
Ford High School,District,2739,"$1,763,916.00",$644.00,77.102592,80.746258,68.309602,79.299014,54.289887
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.351499,83.816757,93.392371,97.138965,90.599455
Hernandez High School,District,4635,"$3,022,020.00",$652.00,77.289752,80.934412,66.752967,80.862999,53.527508
Holden High School,Charter,427,"$248,087.00",$581.00,83.803279,83.814988,92.505855,96.252927,89.227166
Huang High School,District,2917,"$1,910,635.00",$655.00,76.629414,81.182722,65.683922,81.316421,53.513884
Johnson High School,District,4761,"$3,094,650.00",$650.00,77.072464,80.966394,66.057551,81.222432,53.539172
Pena High School,Charter,962,"$585,858.00",$609.00,83.839917,84.044699,94.594595,95.945946,90.540541


# Highest Performing Schools (by % Overall Passing)
---

In [90]:
# Sort the per_school_summary DataFrame by the % Overall Passing column in descending order

top_schools = per_school_summary.sort_values("% Overall Passing", ascending=False)

# Keep only the first 5 rows
top_schools = top_schools.head(5)
top_schools

Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Cabrera High School,Charter,1858,1081356,582.0,83.061895,83.97578,94.133477,97.039828,91.334769
Thomas High School,Charter,1635,1043130,638.0,83.418349,83.84893,93.272171,97.308869,90.948012
Griffin High School,Charter,1468,917500,625.0,83.351499,83.816757,93.392371,97.138965,90.599455
Wilson High School,Charter,2283,1319574,578.0,83.274201,83.989488,93.867718,96.539641,90.582567
Pena High School,Charter,962,585858,609.0,83.839917,84.044699,94.594595,95.945946,90.540541


# Lowest Performing Schools (by % Overall Passing)
---

In [91]:
# Sort the per_school_summary DataFrame by the % Overall Passing column in ascending order
bottom_schools = per_school_summary.sort_values("% Overall Passing")

# Keep only the first 5 rows
bottom_schools = bottom_schools.head(5)
bottom_schools

Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Rodriguez High School,District,3999,2547363,637.0,76.842711,80.744686,66.366592,80.220055,52.988247
Figueroa High School,District,2949,1884411,639.0,76.711767,81.15802,65.988471,80.739234,53.204476
Huang High School,District,2917,1910635,655.0,76.629414,81.182722,65.683922,81.316421,53.513884
Hernandez High School,District,4635,3022020,652.0,77.289752,80.934412,66.752967,80.862999,53.527508
Johnson High School,District,4761,3094650,650.0,77.072464,80.966394,66.057551,81.222432,53.539172


# Math Scores by Grade
---

In [92]:
# Collect each grade into a DataFrame

grade_9s = school_student_df[(school_student_df["grade"] == "9th")]
grade_10s = school_student_df[(school_student_df["grade"] == "10th")]
grade_11s = school_student_df[(school_student_df["grade"] == "11th")]
grade_12s = school_student_df[(school_student_df["grade"] == "12th")]

# Group each of the grade DataFrame by schools and calculate the average math score for each school and rename the columns appropriately
grade_9s_math_scores = grade_9s.groupby("school_name")[["math_score"]].mean().reset_index().rename(columns={"math_score": "Grade 9"})
grade_10s_math_scores = grade_10s.groupby("school_name")[["math_score"]].mean().reset_index().rename(columns={"math_score": "Grade 10"})
grade_11s_math_scores = grade_11s.groupby("school_name")[["math_score"]].mean().reset_index().rename(columns={"math_score": "Grade 11"})
grade_12s_math_scores = grade_12s.groupby("school_name")[["math_score"]].mean().reset_index().rename(columns={"math_score": "Grade 12"})

# Merge the averages for each of the grades per school into one DataFrame
math_scores_by_grade = pd.DataFrame.merge(grade_9s_math_scores, grade_10s_math_scores, how="left", on="school_name")
math_scores_by_grade = pd.DataFrame.merge(math_scores_by_grade, grade_11s_math_scores, how="left", on="school_name")
math_scores_by_grade = pd.DataFrame.merge(math_scores_by_grade, grade_12s_math_scores, how="left", on="school_name")

# Set the index and remove the index name
math_scores_by_grade = math_scores_by_grade.set_index("school_name")
math_scores_by_grade.index.name = None

math_scores_by_grade


Unnamed: 0,Grade 9,Grade 10,Grade 11,Grade 12
Bailey High School,77.083676,76.996772,77.515588,76.492218
Cabrera High School,83.094697,83.154506,82.76556,83.277487
Figueroa High School,76.403037,76.539974,76.884344,77.151369
Ford High School,77.361345,77.672316,76.918058,76.179963
Griffin High School,82.04401,84.229064,83.842105,83.356164
Hernandez High School,77.438495,77.337408,77.136029,77.186567
Holden High School,83.787402,83.429825,85.0,82.855422
Huang High School,77.027251,75.908735,76.446602,77.225641
Johnson High School,77.187857,76.691117,77.491653,76.863248
Pena High School,83.625455,83.372,84.328125,84.121547


# Reading Score by Grade
---

In [93]:
# Group each of the grade DataFrame by schools and calculate the average math score for each school and rename the columns appropriately
grade_9s_read_scores = grade_9s.groupby("school_name")[["reading_score"]].mean().reset_index().rename(columns={"reading_score": "Grade 9"})
grade_10s_read_scores = grade_10s.groupby("school_name")[["reading_score"]].mean().reset_index().rename(columns={"reading_score": "Grade 10"})
grade_11s_read_scores = grade_11s.groupby("school_name")[["reading_score"]].mean().reset_index().rename(columns={"reading_score": "Grade 11"})
grade_12s_read_scores = grade_12s.groupby("school_name")[["reading_score"]].mean().reset_index().rename(columns={"reading_score": "Grade 12"})

# Merge the averages for each of the grades per school into one DataFrame 
read_scores_by_grade = pd.DataFrame.merge(grade_9s_read_scores, grade_10s_read_scores, how="left", on="school_name")
read_scores_by_grade = pd.DataFrame.merge(read_scores_by_grade, grade_11s_read_scores, how="left", on="school_name")
read_scores_by_grade = pd.DataFrame.merge(read_scores_by_grade, grade_12s_read_scores, how="left", on="school_name")

# Set the index as school_name and remove the index name
read_scores_by_grade = read_scores_by_grade.set_index("school_name")
read_scores_by_grade.index.name = None

read_scores_by_grade

Unnamed: 0,Grade 9,Grade 10,Grade 11,Grade 12
Bailey High School,81.303155,80.907183,80.945643,80.912451
Cabrera High School,83.676136,84.253219,83.788382,84.287958
Figueroa High School,81.198598,81.408912,80.640339,81.384863
Ford High School,80.632653,81.262712,80.403642,80.662338
Griffin High School,83.369193,83.706897,84.288089,84.013699
Hernandez High School,80.86686,80.660147,81.39614,80.857143
Holden High School,83.677165,83.324561,83.815534,84.698795
Huang High School,81.290284,81.512386,81.417476,80.305983
Johnson High School,81.260714,80.773431,80.616027,81.227564
Pena High School,83.807273,83.612,84.335938,84.59116


# Scores by School Spending
---

In [94]:
# Setup the bin values
spending_bins = [0, 585, 630, 645, 680]
spending_labels = ["<$585", "$585-630", "$630-645", "$645-680"]

# Create a copy of the school summary for spending summary computations
spending_summary = per_school_summary.copy()

# Use `pd.cut` to categorize spending based on the bins.
spending_summary["Spending Ranges (Per Student)"] = pd.cut(spending_summary["Per Student Budget"], spending_bins, labels=spending_labels)

spending_summary.head()

Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing,Spending Ranges (Per Student)
Bailey High School,District,4976,3124928,628.0,77.048432,81.033963,66.680064,81.93328,54.642283,$585-630
Cabrera High School,Charter,1858,1081356,582.0,83.061895,83.97578,94.133477,97.039828,91.334769,<$585
Figueroa High School,District,2949,1884411,639.0,76.711767,81.15802,65.988471,80.739234,53.204476,$630-645
Ford High School,District,2739,1763916,644.0,77.102592,80.746258,68.309602,79.299014,54.289887,$630-645
Griffin High School,Charter,1468,917500,625.0,83.351499,83.816757,93.392371,97.138965,90.599455,$585-630


In [95]:
# Create groups by spending category and calculate the mean for each and store in a DataFrame

# We reset the index to have the "Spending Ranges (Per Student)" column for merging
spending_math_scores = spending_summary.groupby(["Spending Ranges (Per Student)"])[["Average Math Score"]].mean().reset_index()
spending_reading_scores = spending_summary.groupby(["Spending Ranges (Per Student)"])[["Average Reading Score"]].mean().reset_index()
spending_passing_math = spending_summary.groupby(["Spending Ranges (Per Student)"])[["% Passing Math"]].mean().reset_index()
spending_passing_reading = spending_summary.groupby(["Spending Ranges (Per Student)"])[["% Passing Reading"]].mean().reset_index()
overall_passing_spending = spending_summary.groupby(["Spending Ranges (Per Student)"])[["% Overall Passing"]].mean().reset_index()

#### School Spending Scores Summary
----

In [96]:
# Merge each of the groups above into one DataFrame

spending_summary = pd.merge(spending_math_scores, spending_reading_scores, on="Spending Ranges (Per Student)", how="left")
spending_summary = pd.merge(spending_summary, spending_passing_math, on="Spending Ranges (Per Student)", how="left")
spending_summary = pd.merge(spending_summary, spending_passing_reading, on="Spending Ranges (Per Student)", how="left")
spending_summary = pd.merge(spending_summary, overall_passing_spending, on="Spending Ranges (Per Student)", how="left")

# Set the index as the spending category
spending_summary = spending_summary.set_index("Spending Ranges (Per Student)")

spending_summary

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Spending Ranges (Per Student),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
<$585,83.455399,83.933814,93.460096,96.610877,90.369459
$585-630,81.899826,83.155286,87.133538,92.718205,81.418596
$630-645,78.518855,81.624473,73.484209,84.391793,62.857656
$645-680,76.99721,81.027843,66.164813,81.133951,53.526855


# Scores by School Size
---

In [97]:
# Setup the bin values
size_bins = [0, 1000, 2000, 5000]
size_labels = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]

# Make copy of per_school_summary to use for these calculations
size_summary = per_school_summary.copy()

# Use `pd.cut` to categorize school size based on the bins.

size_summary["School Size"] = pd.cut(size_summary["Total Students"], size_bins, labels=size_labels)

size_summary.head()

Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing,School Size
Bailey High School,District,4976,3124928,628.0,77.048432,81.033963,66.680064,81.93328,54.642283,Large (2000-5000)
Cabrera High School,Charter,1858,1081356,582.0,83.061895,83.97578,94.133477,97.039828,91.334769,Medium (1000-2000)
Figueroa High School,District,2949,1884411,639.0,76.711767,81.15802,65.988471,80.739234,53.204476,Large (2000-5000)
Ford High School,District,2739,1763916,644.0,77.102592,80.746258,68.309602,79.299014,54.289887,Large (2000-5000)
Griffin High School,Charter,1468,917500,625.0,83.351499,83.816757,93.392371,97.138965,90.599455,Medium (1000-2000)


In [98]:
# Calculate averages for the desired columns and store it in a Dataframe

# We reset the index to have the "School Size" column for merging
size_math_scores = size_summary.groupby(["School Size"])[["Average Math Score"]].mean().reset_index()
size_reading_scores = size_summary.groupby(["School Size"])[["Average Reading Score"]].mean().reset_index()
size_passing_math = size_summary.groupby(["School Size"])[["% Passing Math"]].mean().reset_index()
size_passing_reading = size_summary.groupby(["School Size"])[["% Passing Reading"]].mean().reset_index()
size_overall_passing = size_summary.groupby(["School Size"])[["% Overall Passing"]].mean().reset_index()


#### School Size Scores Summary
----

In [99]:
# Merge each of the groups above into one DataFrame

size_summary = pd.merge(size_math_scores, size_reading_scores, on="School Size", how="left")
size_summary = pd.merge(size_summary, size_passing_math, on="School Size", how="left")
size_summary = pd.merge(size_summary, size_passing_reading, on="School Size", how="left")
size_summary = pd.merge(size_summary, size_overall_passing, on="School Size", how="left")

# Set the index as the school size
size_summary = size_summary.set_index("School Size")

size_summary

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Small (<1000),83.821598,83.929843,93.550225,96.099437,89.883853
Medium (1000-2000),83.374684,83.864438,93.599695,96.79068,90.621535
Large (2000-5000),77.746417,81.344493,69.963361,82.766634,58.286003


# Scores by School Type
----

In [100]:
# Make a copy of per_school_summary for these calculations

type_summary = per_school_summary.copy()

In [101]:
# Group the per_school_summary DataFrame by "School Type" and average the results.

# We reset the index to have the "School Type" column for merging
average_math_score_by_type = type_summary.groupby(["School Type"])[["Average Math Score"]].mean().reset_index()
average_reading_score_by_type = type_summary.groupby(["School Type"])[["Average Reading Score"]].mean().reset_index()
average_percent_passing_math_by_type = type_summary.groupby(["School Type"])[["% Passing Math"]].mean().reset_index()
average_percent_passing_reading_by_type = type_summary.groupby(["School Type"])[["% Passing Reading"]].mean().reset_index()
average_percent_overall_passing_by_type = type_summary.groupby(["School Type"])[["% Overall Passing"]].mean().reset_index()


#### School Type Scores Summary
----

In [102]:
# Merge each of the groups above into one DataFrame

type_summary = pd.merge(average_math_score_by_type, average_reading_score_by_type, on="School Type", how="left")
type_summary = pd.merge(type_summary, average_percent_passing_math_by_type, on="School Type", how="left")
type_summary = pd.merge(type_summary, average_percent_passing_reading_by_type, on="School Type", how="left")
type_summary = pd.merge(type_summary, average_percent_overall_passing_by_type, on="School Type", how="left")

# Set the index as the school type
type_summary = type_summary.set_index("School Type")

type_summary


Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Charter,83.473852,83.896421,93.62083,96.586489,90.432244
District,76.956733,80.966636,66.548453,80.799062,53.672208
