### District Summary

* Create a high level snapshot (in table form) of the district's key metrics, including:
  * Total Schools
  * Total Students
  * Total Budget
  * Average Math Score
  * Average Reading Score
  * % Passing Math
  * % Passing Reading
  * Overall Passing Rate (Average of the above two)

In [132]:
# import dependencies 
import pandas as pd

In [133]:
# create path for file to load
schools_file = "./Resources/schools_complete.csv"
students_file = "./Resources/students_complete.csv"

# read csv and store in Pandas DataFrame 
schools_df = pd.read_csv(schools_file)
students_df = pd.read_csv(students_file)

In [134]:
schools_df.head()

Unnamed: 0,School ID,school_name,type,size,budget
0,0,Huang High School,District,2917,1910635
1,1,Figueroa High School,District,2949,1884411
2,2,Shelton High School,Charter,1761,1056600
3,3,Hernandez High School,District,4635,3022020
4,4,Griffin High School,Charter,1468,917500


In [135]:
students_df.head()

Unnamed: 0,Student ID,student_name,gender,grade,school_name,reading_score,math_score
0,0,Paul Bradley,M,9th,Huang High School,66,79
1,1,Victor Smith,M,12th,Huang High School,94,61
2,2,Kevin Rodriguez,M,12th,Huang High School,90,60
3,3,Dr. Richard Scott,M,12th,Huang High School,67,58
4,4,Bonnie Ray,F,9th,Huang High School,97,84


In [136]:
# Calculate the total of schools in the Schools DataFrame
schools_total = len(schools_df["school_name"].unique())
schools_total

15

In [137]:
# Calculate the total of students in the schools DataFrame 
students_total = len(students_df["Student ID"].unique())
students_total

39170

In [138]:
# Calculate the Total Budget 
budget_total = schools_df["budget"].sum()
budget_total

24649428

In [139]:
# Calculate the Average Math Score 
avg_math_score = round(students_df["math_score"].mean())
avg_math_score

79

In [140]:
#Calculate the Average Math Score 
avg_reading_score = round(students_df["reading_score"].mean())
avg_reading_score

82

In [141]:
#Find total of Students Passing Math (>69)
passing_math_ttl = students_df [students_df["math_score"]>69].count()["student_name"]

# Calculate % of Students Passing Math 
passing_math_pct = (passing_math_ttl/students_total)
passing_math_pct

0.749808526933878

In [142]:
#Find total of Students Passing Reading (>69)
passing_reading_ttl = students_df [students_df["reading_score"]>69].count()["student_name"]

# Calculate % of Students Passing Reading 
passing_reading_pct =(passing_reading_ttl/students_total)
passing_reading_pct

0.8580546336482001

In [143]:
# Calculate Overall Passing Rate (Average of the above two)
overall_passing = (passing_math_pct + passing_reading_pct)/2
overall_passing

0.8039315802910391

In [154]:
# Create District Summary DataFrame using calculation to create a table showing a "high level snapshot"
district_summary = pd.DataFrame ({"Total Schools":[schools_total],
                           "Total Student":[students_total],
                           "Total Budget":[budget_total],
                            "Average Math Score":[avg_math_score],
                            "Average Reading Score":[avg_reading_score],
                            "% Passing Math":[passing_math_pct],
                            "% Passing Reading":[passing_reading_pct],
                            "Overall Passing Rate":[overall_passing]})
district_summary = district_summary.style.format({
    '% Passing Math': '{:,.0%}'.format,
    '% Passing Reading': '{:,.0%}'.format,
    'Overall Passing Rate': '{:,.0%}'.format})
   
district_summary

Unnamed: 0,Total Schools,Total Student,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,Overall Passing Rate
0,15,39170,24649428,79,82,75%,86%,80%


### School Summary

* Create an overview table that summarizes key metrics about each school, including:
  * School Name
  * School Type
  * Total Students
  * Total School Budget
  * Per Student Budget
  * Average Math Score
  * Average Reading Score
  * % Passing Math
  * % Passing Reading
  * Overall Passing Rate (Average of the above two)

In [145]:
# Merge the two dataframes using full outer join
merged_data = pd.merge(schools_df, students_df, on="school_name")
merged_data.head()

Unnamed: 0,School ID,school_name,type,size,budget,Student ID,student_name,gender,grade,reading_score,math_score
0,0,Huang High School,District,2917,1910635,0,Paul Bradley,M,9th,66,79
1,0,Huang High School,District,2917,1910635,1,Victor Smith,M,12th,94,61
2,0,Huang High School,District,2917,1910635,2,Kevin Rodriguez,M,12th,90,60
3,0,Huang High School,District,2917,1910635,3,Dr. Richard Scott,M,12th,67,58
4,0,Huang High School,District,2917,1910635,4,Bonnie Ray,F,9th,97,84


In [146]:
school_counts = merged_data["school_name"].value_counts()
school_counts

Bailey High School       4976
Johnson High School      4761
Hernandez High School    4635
Rodriguez High School    3999
Figueroa High School     2949
Huang High School        2917
Ford High School         2739
Wilson High School       2283
Cabrera High School      1858
Wright High School       1800
Shelton High School      1761
Thomas High School       1635
Griffin High School      1468
Pena High School          962
Holden High School        427
Name: school_name, dtype: int64

In [147]:
school_counts.sum()

39170

In [148]:
# Group the merged data by school name 
groupby_school = merged_data.groupby(['school_name'])

In [149]:
grp_ttl_students = len(groupby_school["student_name"].unique())
grp_ttl_students

15