Deliverables:
- A high-level snapshot of the district's key metrics, presented in a table format
- An overview of the key metrics for each school, presented in a table format
- Tables presenting each of the following metrics:
    - Top 5 and bottom 5 performing schools, based on the overall passing rate
    - The average math score received by students in each grade level at each school
    - The average reading score received by students in each grade level at each school
    - School performance based on the budget per student
    - School performance based on the school size 
    - School performance based on the type of school

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to change the path if needed.)
school_data_to_load = "Resources/schools_complete.csv"
student_data_to_load = "Resources/students_complete.csv"

# Read the School Data and Student Data and store into a Pandas DataFrame
schools_df = pd.read_csv(school_data_to_load)
students_df = pd.read_csv(student_data_to_load)

# Cleaning Student Names and Replacing Substrings in a Python String
# Add each prefix and suffix to remove to a list.
prefixes_suffixes = ["Dr. ", "Mr. ","Ms. ", "Mrs. ", "Miss ", " MD", " DDS", " DVM", " PhD"]

# Iterate through the words in the "prefixes_suffixes" list and replace them with an empty space, "".
for word in prefixes_suffixes:
    students_df["student_name"] = students_df["student_name"].str.replace(word,"")

# Check names.
#students_df.head(10)



## Deliverable 1: Replace the reading and math scores.

### Replace the 9th grade reading and math scores at Thomas High School with NaN.

In [2]:
# Step 1. Import numpy as np.
import numpy as np


In [3]:
# Step 2. Use the loc method on the student_data_df to select all the reading scores from the 9th grade at Thomas High School and replace them with NaN.
students_df.loc[(students_df["school_name"] == "Thomas High School") & (students_df["grade"] == "9th"), ["reading_score"]]=np.nan


In [4]:
#  Step 3. Refactor the code in Step 2 to replace the math scores with NaN.
students_df.loc[(students_df["school_name"] == "Thomas High School") & (students_df["grade"] == "9th"), ["math_score"]]=np.nan


In [5]:
#  Step 4. Check the student data for NaN's. 
#students_df.isnull().sum()

## Deliverable 2 : Repeat the school district analysis

### District-Level Stats

In [6]:
#create a merged dataframe to work from

school_data_complete_df = pd.merge(students_df, schools_df, on = ["school_name","school_name"])

In [7]:
#getting number of students

#total students in district
student_count = school_data_complete_df.student_name.count()

#students in ninth grade at thomas high
student_remove = school_data_complete_df.loc[(school_data_complete_df["school_name"] == "Thomas High School") & (school_data_complete_df["grade"] == "9th")]["student_name"].count()

#total students less ninth grade thomas high
student_count_calculations = student_count - student_remove


In [8]:
#getting number of schools
school_count = len(school_data_complete_df.school_name.unique())

In [9]:
#getting total district budget

total_budget = schools_df["budget"].sum()

In [10]:
#district-wide score averages

avg_math_score = school_data_complete_df.math_score.mean()

avg_reading_score = school_data_complete_df.reading_score.mean()


In [11]:
#district wide math passing percentage and district wide reading passing percentages

#num passing
passing_math = school_data_complete_df[school_data_complete_df["math_score"] >= 70]

passing_reading = school_data_complete_df[school_data_complete_df["reading_score"] >= 70]

pass_math_count = passing_math["student_name"].count()
pass_reading_count = passing_reading["student_name"].count()

#calc percentage
pass_math_percentage = pass_math_count/float(student_count_calculations)*100
pass_reading_percentage = pass_reading_count/float(student_count_calculations)*100


In [12]:
#district wide math AND reading passing percentages

#num passing
pass_math_reading = school_data_complete_df[(school_data_complete_df["math_score"] >= 70) &(school_data_complete_df["reading_score"] >= 70) ]

pass_math_reading_count = pass_math_reading["student_name"].count()

#percentage
pass_math_reading_percentage = pass_math_reading_count/float(student_count_calculations)*100

In [13]:
# create district summary data frame

district_summary_df = pd.DataFrame(
    [{"Total  Schools": school_count,
     "Total Students": student_count,
     "Total Budget": total_budget,
     "Average Math Score": avg_math_score,
     "Average Reading Score": avg_reading_score,
     "% Passing Math": pass_math_percentage,
     "% Passing Reading": pass_reading_percentage,
     "% Overall Passing": pass_math_reading_percentage}])

#### Note on Data Frame Delivery

For this project: whenever I make a dataframe, I'll make a second one with "\_format" in the title that will be used for display. This is because the formatting changes my numbers to strings, removing the ability to run calculations on them. With this method I still have access to my original data frame for calculations and a second, formatted one for display.

I know I can use .astype() or pd.to_numeric() for most of the numbers, but for the columns where I need to add dollar signs or commas they don't work, so I went with this method to keep it normalized for every case.

In [14]:
# build the data frame for formatting

district_summary_format_df = pd.DataFrame(
    [{"Total  Schools": school_count,
     "Total Students": student_count,
     "Total Budget": total_budget,
     "Average Math Score": avg_math_score,
     "Average Reading Score": avg_reading_score,
     "% Passing Math": pass_math_percentage,
     "% Passing Reading": pass_reading_percentage,
     "% Overall Passing": pass_math_reading_percentage}])


In [15]:
#format the table to look better

district_summary_format_df["Total Students"] = district_summary_format_df["Total Students"].map("{:,}".format)
district_summary_format_df["Total Budget"] = district_summary_format_df["Total Budget"].map("${:,.2f}".format)
district_summary_format_df["Average Math Score"] = district_summary_format_df["Average Math Score"].map("{:.1f}".format)
district_summary_format_df["Average Reading Score"] = district_summary_format_df["Average Reading Score"].map("{:.1f}".format)
district_summary_format_df["% Passing Math"] = district_summary_format_df["% Passing Math"].map("{:.1f}".format)
district_summary_format_df["% Passing Reading"] = district_summary_format_df["% Passing Reading"].map("{:.1f}".format)
district_summary_format_df["% Overall Passing"] = district_summary_format_df["% Overall Passing"].map("{:.1f}".format)

district_summary_format_df

Unnamed: 0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
0,15,39170,"$24,649,428.00",78.9,81.9,74.8,85.7,64.9


### School-Level Stats

In [16]:
#make new dF with school name as the index

per_school_types = schools_df.set_index(["school_name"])["type"]

In [17]:
#get the student count per school

#method 1
#per_school_counts = schools_df.set_index(["school_name"])["size"]

#method 2
per_school_counts = school_data_complete_df["school_name"].value_counts()

In [18]:
#get the budget-per-student for each school

#get the total budget per school
per_school_budget = schools_df.set_index(["school_name"])["budget"]

#per student spending

per_school_capita = per_school_budget / per_school_counts

In [19]:
#get per school average math and reading scores

per_school_math = school_data_complete_df.groupby(["school_name"]).mean()["math_score"]

per_school_reading = school_data_complete_df.groupby(["school_name"]).mean()["reading_score"]

In [20]:
#indiv math and reading passing percentages by school

#math
per_school_passing_math = school_data_complete_df[school_data_complete_df["math_score"] >= 70]
per_school_passing_math = per_school_passing_math.groupby(["school_name"]).count()["student_name"]
per_school_passing_math = per_school_passing_math / per_school_counts * 100

#reading
per_school_passing_reading = school_data_complete_df[school_data_complete_df["reading_score"] >= 70]
per_school_passing_reading = per_school_passing_reading.groupby(["school_name"]).count()["student_name"]
per_school_passing_reading = per_school_passing_reading / per_school_counts * 100

In [21]:
#overall passing percentages by school

per_school_pass_all = school_data_complete_df[(school_data_complete_df["math_score"] >= 70) & (school_data_complete_df["reading_score"] >= 70)]
per_school_pass_all = per_school_pass_all.groupby(["school_name"]).count()["student_name"]
per_school_pass_all = per_school_pass_all / per_school_counts * 100

In [22]:
#correcting thomas high school's stats after removing all the 9th graders

thomas_pop_for_calcs = school_data_complete_df.loc[(school_data_complete_df["school_name"] == "Thomas High School") & (school_data_complete_df["grade"] != "9th")]["student_name"].count()


#thomas math and reading passing percentages
thomas_passing_math = school_data_complete_df.loc[(school_data_complete_df["school_name"] == "Thomas High School") & (school_data_complete_df["grade"] != "9th") & (school_data_complete_df["math_score"] >= 70)]
thomas_passing_math = thomas_passing_math.count()["student_name"]
thomas_passing_math = thomas_passing_math / thomas_pop_for_calcs * 100

thomas_passing_reading = school_data_complete_df.loc[(school_data_complete_df["school_name"] == "Thomas High School") & (school_data_complete_df["grade"] != "9th") & (school_data_complete_df["reading_score"] >= 70)]
thomas_passing_reading = thomas_passing_reading.count()["student_name"]
thomas_passing_reading = thomas_passing_reading / thomas_pop_for_calcs * 100

#thomas overall passing percentages
thomas_passing_all = school_data_complete_df.loc[(school_data_complete_df["school_name"] == "Thomas High School") & (school_data_complete_df["grade"] != "9th") & (school_data_complete_df["math_score"] >= 70) & (school_data_complete_df["reading_score"] >= 70)]
thomas_passing_all = thomas_passing_all.count()["student_name"]
thomas_passing_all = thomas_passing_all / thomas_pop_for_calcs * 100


In [23]:
#making the school summary data frame

per_school_summary_df = pd.DataFrame({
    "School Type": per_school_types,
    "Total Students": per_school_counts,
    "Total School Budget": per_school_budget,
    "Per Student Budget": per_school_capita,
    "Average Math Score": per_school_math,
    "Average Reading Score": per_school_reading,
    "% Passing Math": per_school_passing_math,
    "% Passing Reading": per_school_passing_reading,
    "% Overall Passing": per_school_pass_all})


In [24]:
#correct thomas high school's row

per_school_summary_df.loc["Thomas High School", ["% Passing Math"]] = thomas_passing_math
per_school_summary_df.loc["Thomas High School", ["% Passing Reading"]] = thomas_passing_reading
per_school_summary_df.loc["Thomas High School", ["% Overall Passing"]] = thomas_passing_all

In [25]:
#create the formatted df

per_school_summary_format_df = pd.DataFrame({
    "School Type": per_school_types,
    "Total Students": per_school_counts,
    "Total School Budget": per_school_budget,
    "Per Student Budget": per_school_capita,
    "Average Math Score": per_school_math,
    "Average Reading Score": per_school_reading,
    "% Passing Math": per_school_passing_math,
    "% Passing Reading": per_school_passing_reading,
    "% Overall Passing": per_school_pass_all})

In [26]:
#correct thomas high school's row

per_school_summary_format_df.loc["Thomas High School", ["% Passing Math"]] = thomas_passing_math
per_school_summary_format_df.loc["Thomas High School", ["% Passing Reading"]] = thomas_passing_reading
per_school_summary_format_df.loc["Thomas High School", ["% Overall Passing"]] = thomas_passing_all

In [27]:
#format the summary df

per_school_summary_format_df["Total Students"] = per_school_summary_format_df["Total Students"].map("{:,}".format)
per_school_summary_format_df["Total School Budget"] = per_school_summary_format_df["Total School Budget"].map("${:,.2f}".format)
per_school_summary_format_df["Per Student Budget"] = per_school_summary_format_df["Per Student Budget"].map("${:,.2f}".format)
per_school_summary_format_df["Average Math Score"] = per_school_summary_format_df["Average Math Score"].map("{:.1f}".format)
per_school_summary_format_df["Average Reading Score"] = per_school_summary_format_df["Average Reading Score"].map("{:.1f}".format)
per_school_summary_format_df["% Passing Math"] = per_school_summary_format_df["% Passing Math"].map("{:.1f}".format)
per_school_summary_format_df["% Passing Reading"] = per_school_summary_format_df["% Passing Reading"].map("{:.1f}".format)
per_school_summary_format_df["% Overall Passing"] = per_school_summary_format_df["% Overall Passing"].map("{:.1f}".format)


In [28]:
per_school_summary_format_df

Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Bailey High School,District,4976,"$3,124,928.00",$628.00,77.0,81.0,66.7,81.9,54.6
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.1,84.0,94.1,97.0,91.3
Figueroa High School,District,2949,"$1,884,411.00",$639.00,76.7,81.2,66.0,80.7,53.2
Ford High School,District,2739,"$1,763,916.00",$644.00,77.1,80.7,68.3,79.3,54.3
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.4,83.8,93.4,97.1,90.6
Hernandez High School,District,4635,"$3,022,020.00",$652.00,77.3,80.9,66.8,80.9,53.5
Holden High School,Charter,427,"$248,087.00",$581.00,83.8,83.8,92.5,96.3,89.2
Huang High School,District,2917,"$1,910,635.00",$655.00,76.6,81.2,65.7,81.3,53.5
Johnson High School,District,4761,"$3,094,650.00",$650.00,77.1,81.0,66.1,81.2,53.5
Pena High School,Charter,962,"$585,858.00",$609.00,83.8,84.0,94.6,95.9,90.5


In [29]:
#sorting and finding top 5 schools

top_schools = per_school_summary_format_df.sort_values(["% Overall Passing"], ascending = False)

top_schools.head()

Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Cabrera High School,Charter,1858,"$1,081,356.00",$582.00,83.1,84.0,94.1,97.0,91.3
Griffin High School,Charter,1468,"$917,500.00",$625.00,83.4,83.8,93.4,97.1,90.6
Thomas High School,Charter,1635,"$1,043,130.00",$638.00,83.4,83.9,93.2,97.0,90.6
Wilson High School,Charter,2283,"$1,319,574.00",$578.00,83.3,84.0,93.9,96.5,90.6
Pena High School,Charter,962,"$585,858.00",$609.00,83.8,84.0,94.6,95.9,90.5


In [30]:
#sorting and finding bottom 5 schools

bottom_schools = per_school_summary_format_df.sort_values(["% Overall Passing"], ascending = True)

bottom_schools.head()

Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Rodriguez High School,District,3999,"$2,547,363.00",$637.00,76.8,80.7,66.4,80.2,53.0
Figueroa High School,District,2949,"$1,884,411.00",$639.00,76.7,81.2,66.0,80.7,53.2
Hernandez High School,District,4635,"$3,022,020.00",$652.00,77.3,80.9,66.8,80.9,53.5
Huang High School,District,2917,"$1,910,635.00",$655.00,76.6,81.2,65.7,81.3,53.5
Johnson High School,District,4761,"$3,094,650.00",$650.00,77.1,81.0,66.1,81.2,53.5


### Grade-Level Stats

In [31]:
# making individual grade-level dFs

ninth_graders = school_data_complete_df[school_data_complete_df["grade"] == "9th"]
tenth_graders = school_data_complete_df[school_data_complete_df["grade"] == "10th"]
eleventh_graders = school_data_complete_df[school_data_complete_df["grade"] == "11th"]
twelfth_graders = school_data_complete_df[school_data_complete_df["grade"] == "12th"]

In [32]:
# getting average math scores

ninth_grade_math_scores = ninth_graders.groupby(["school_name"]).mean()["math_score"]
tenth_grade_math_scores = tenth_graders.groupby(["school_name"]).mean()["math_score"]
eleventh_grade_math_scores = eleventh_graders.groupby(["school_name"]).mean()["math_score"]
twelfth_grade_math_scores = twelfth_graders.groupby(["school_name"]).mean()["math_score"]

In [33]:
#getting average reading scores

ninth_grade_reading_scores = ninth_graders.groupby(["school_name"]).mean()["reading_score"]
tenth_grade_reading_scores = tenth_graders.groupby(["school_name"]).mean()["reading_score"]
eleventh_grade_reading_scores = eleventh_graders.groupby(["school_name"]).mean()["reading_score"]
twelfth_grade_reading_scores = twelfth_graders.groupby(["school_name"]).mean()["reading_score"]

In [34]:
#making math grade avg summary table

math_scores_by_grade = pd.DataFrame({
    "9th":ninth_grade_math_scores,
    "10th":tenth_grade_math_scores,
    "11th":eleventh_grade_math_scores,
    "12th":twelfth_grade_math_scores
})

In [35]:
#make the formatted table

math_scores_by_grade_format = pd.DataFrame({
    "9th":ninth_grade_math_scores,
    "10th":tenth_grade_math_scores,
    "11th":eleventh_grade_math_scores,
    "12th":twelfth_grade_math_scores
})

In [36]:
#formatting number columns
math_scores_by_grade_format["9th"] = math_scores_by_grade_format["9th"].map("{:.1f}".format)
math_scores_by_grade_format["10th"] = math_scores_by_grade_format["10th"].map("{:.1f}".format)
math_scores_by_grade_format["11th"] = math_scores_by_grade_format["11th"].map("{:.1f}".format)
math_scores_by_grade_format["12th"] = math_scores_by_grade_format["12th"].map("{:.1f}".format)

#dropping header title
math_scores_by_grade_format.index.name = None

In [37]:
#view math scores
math_scores_by_grade_format

Unnamed: 0,9th,10th,11th,12th
Bailey High School,77.1,77.0,77.5,76.5
Cabrera High School,83.1,83.2,82.8,83.3
Figueroa High School,76.4,76.5,76.9,77.2
Ford High School,77.4,77.7,76.9,76.2
Griffin High School,82.0,84.2,83.8,83.4
Hernandez High School,77.4,77.3,77.1,77.2
Holden High School,83.8,83.4,85.0,82.9
Huang High School,77.0,75.9,76.4,77.2
Johnson High School,77.2,76.7,77.5,76.9
Pena High School,83.6,83.4,84.3,84.1


In [38]:
#making reading grade avg summary table

reading_scores_by_grade = pd.DataFrame({
    "9th":ninth_grade_reading_scores,
    "10th":tenth_grade_reading_scores,
    "11th":eleventh_grade_reading_scores,
    "12th":twelfth_grade_reading_scores
})

In [39]:
#making the formatted table

reading_scores_by_grade_format = pd.DataFrame({
    "9th":ninth_grade_reading_scores,
    "10th":tenth_grade_reading_scores,
    "11th":eleventh_grade_reading_scores,
    "12th":twelfth_grade_reading_scores
})

In [40]:
#formatting number columns
reading_scores_by_grade_format["9th"] = reading_scores_by_grade_format["9th"].map("{:.1f}".format)
reading_scores_by_grade_format["10th"] = reading_scores_by_grade_format["10th"].map("{:.1f}".format)
reading_scores_by_grade_format["11th"] = reading_scores_by_grade_format["11th"].map("{:.1f}".format)
reading_scores_by_grade_format["12th"] = reading_scores_by_grade_format["12th"].map("{:.1f}".format)

#dropping header title
reading_scores_by_grade_format.index.name = None

In [41]:
#view reading table

reading_scores_by_grade_format

Unnamed: 0,9th,10th,11th,12th
Bailey High School,81.3,80.9,80.9,80.9
Cabrera High School,83.7,84.3,83.8,84.3
Figueroa High School,81.2,81.4,80.6,81.4
Ford High School,80.6,81.3,80.4,80.7
Griffin High School,83.4,83.7,84.3,84.0
Hernandez High School,80.9,80.7,81.4,80.9
Holden High School,83.7,83.3,83.8,84.7
Huang High School,81.3,81.5,81.4,80.3
Johnson High School,81.3,80.8,80.6,81.2
Pena High School,83.8,83.6,84.3,84.6


### Group By Per Student Spending Stats

In [42]:
#define and cut by bins

spending_bins = [0,585,630,645,675]
group_names = ["<$586","$586-630","$631-645","$646-675"]

In [43]:
#add bins to bigger summary table

per_school_summary_df["Spending Ranges (Per Student)"] = pd.cut(per_school_capita, spending_bins, labels = group_names)

In [44]:
#calculate the stats we want for the new table

spending_math_scores = per_school_summary_df.groupby(["Spending Ranges (Per Student)"]).mean()["Average Math Score"]
spending_reading_scores = per_school_summary_df.groupby(["Spending Ranges (Per Student)"]).mean()["Average Reading Score"]
spending_passing_math = per_school_summary_df.groupby(["Spending Ranges (Per Student)"]).mean()["% Passing Math"]
spending_passing_reading = per_school_summary_df.groupby(["Spending Ranges (Per Student)"]).mean()["% Passing Reading"]
spending_passing_overall = per_school_summary_df.groupby(["Spending Ranges (Per Student)"]).mean()["% Overall Passing"]

In [45]:
#build the spending summary df

spending_summary_df = pd.DataFrame({
    "Average Math Score": spending_math_scores,
    "Average Reading Score": spending_reading_scores,
    "% Passing Math": spending_passing_math,
    "% Passing Reading": spending_passing_reading,
    "% Overall Passing": spending_passing_overall
})

In [46]:
#build the formatted df

spending_summary_df_format = pd.DataFrame({
    "Average Math Score": spending_math_scores,
    "Average Reading Score": spending_reading_scores,
    "% Passing Math": spending_passing_math,
    "% Passing Reading": spending_passing_reading,
    "% Overall Passing": spending_passing_overall
})

In [47]:
#format the spending summary df

spending_summary_df_format["Average Math Score"] = spending_summary_df_format["Average Math Score"].map("{:.1f}".format).astype("float")
spending_summary_df_format["Average Reading Score"] = spending_summary_df_format["Average Reading Score"].map("{:.1f}".format).astype("float")
spending_summary_df_format["% Passing Math"] = spending_summary_df_format["% Passing Math"].map("{:.0f}".format)
spending_summary_df_format["% Passing Reading"] = spending_summary_df_format["% Passing Reading"].map("{:.0f}".format)
spending_summary_df_format["% Overall Passing"] = spending_summary_df_format["% Overall Passing"].map("{:.0f}".format)

spending_summary_df_format

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Spending Ranges (Per Student),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
<$586,83.5,83.9,93,97,90
$586-630,81.9,83.2,87,93,81
$631-645,78.5,81.6,73,84,63
$646-675,77.0,81.0,66,81,54


### Group by School Size Stats

In [48]:
size_bins = [0,999,1999,5000]

size_group_names = ["Small (<1000)", "Medium (1000-1999)", "Large (2000 - 5000)"]

In [49]:
#add to the school summary df

per_school_summary_df["School Size"] = pd.cut(per_school_summary_df["Total Students"],size_bins,labels = size_group_names)

In [50]:
#calculate summary stats by school size

size_math_scores = per_school_summary_df.groupby(["School Size"]).mean()["Average Math Score"]
size_reading_scores = per_school_summary_df.groupby(["School Size"]).mean()["Average Reading Score"]
size_passing_math = per_school_summary_df.groupby(["School Size"]).mean()["% Passing Math"]
size_passing_reading = per_school_summary_df.groupby(["School Size"]).mean()["% Passing Reading"]
size_passing_overall = per_school_summary_df.groupby(["School Size"]).mean()["% Overall Passing"]

In [51]:
#build the size summary df

size_summary_df = pd.DataFrame({
    "Average Math Score": size_math_scores,
    "Average Reading Score": size_reading_scores,
    "% Passing Math": size_passing_math,
    "% Passing Reading": size_passing_reading,
    "% Overall Passing": size_passing_overall
})

In [52]:
#build the formatted size summary df

size_summary_df_format = pd.DataFrame({
    "Average Math Score": size_math_scores,
    "Average Reading Score": size_reading_scores,
    "% Passing Math": size_passing_math,
    "% Passing Reading": size_passing_reading,
    "% Overall Passing": size_passing_overall
})

In [53]:
#format the size summary df

size_summary_df_format["Average Math Score"] = size_summary_df_format["Average Math Score"].map("{:.1f}".format).astype("float")
size_summary_df_format["Average Reading Score"] = size_summary_df_format["Average Reading Score"].map("{:.1f}".format).astype("float")
size_summary_df_format["% Passing Math"] = size_summary_df_format["% Passing Math"].map("{:.0f}".format)
size_summary_df_format["% Passing Reading"] = size_summary_df_format["% Passing Reading"].map("{:.0f}".format)
size_summary_df_format["% Overall Passing"] = size_summary_df_format["% Overall Passing"].map("{:.0f}".format)

size_summary_df_format

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Small (<1000),83.8,83.9,94,96,90
Medium (1000-1999),83.4,83.9,94,97,91
Large (2000 - 5000),77.7,81.3,70,83,58


### Group by School Type Stats

In [54]:
# calculate summary stats by school type

type_math_scores = per_school_summary_df.groupby(["School Type"]).mean()["Average Math Score"]
type_reading_scores = per_school_summary_df.groupby(["School Type"]).mean()["Average Reading Score"]
type_passing_math = per_school_summary_df.groupby(["School Type"]).mean()["% Passing Math"]
type_passing_reading = per_school_summary_df.groupby(["School Type"]).mean()["% Passing Reading"]
type_passing_overall = per_school_summary_df.groupby(["School Type"]).mean()["% Overall Passing"]


In [55]:
# build the df

type_summary_df = pd.DataFrame({
    "Average Math Score": type_math_scores,
    "Average Reading Score": type_reading_scores,
    "% Passing Math": type_passing_math,
    "% Passing Reading": type_passing_reading,
    "% Overall Passing": type_passing_overall
})

In [56]:
# build the formatted df

type_summary_df_format = pd.DataFrame({
    "Average Math Score": type_math_scores,
    "Average Reading Score": type_reading_scores,
    "% Passing Math": type_passing_math,
    "% Passing Reading": type_passing_reading,
    "% Overall Passing": type_passing_overall
})

In [57]:
#format the df

type_summary_df_format["Average Math Score"] = type_summary_df_format["Average Math Score"].map("{:.1f}".format).astype("float")
type_summary_df_format["Average Reading Score"] = type_summary_df_format["Average Reading Score"].map("{:.1f}".format).astype("float")
type_summary_df_format["% Passing Math"] = type_summary_df_format["% Passing Math"].map("{:.0f}".format)
type_summary_df_format["% Passing Reading"] = type_summary_df_format["% Passing Reading"].map("{:.0f}".format)
type_summary_df_format["% Overall Passing"] = type_summary_df_format["% Overall Passing"].map("{:.0f}".format)

type_summary_df_format

Unnamed: 0_level_0,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
School Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Charter,83.5,83.9,94,97,90
District,77.0,81.0,67,81,54
