### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [75]:
# Dependencies and Setup
import pandas as pd
from pathlib import Path

# File to Load (Remember to Change These)
school_data_to_load = Path("Resources/schools_complete.csv")
student_data_to_load = Path("Resources/students_complete.csv")

# Read School and Student Data File and store into Pandas DataFrames
school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)

# Combine the data into a single dataset.  
School_data_complete = pd.merge(student_data, school_data, how="left", on=["school_name", "school_name"])

# Read and display 
School_data_complete.head()




Unnamed: 0,Student ID,student_name,gender,year,school_name,reading_score,maths_score,School ID,type,size,budget
0,0,Paul Bradley,M,9,Huang High School,96,94,0,Government,2917,1910635
1,1,Victor Smith,M,12,Huang High School,90,43,0,Government,2917,1910635
2,2,Kevin Rodriguez,M,12,Huang High School,41,76,0,Government,2917,1910635
3,3,Richard Scott,M,12,Huang High School,89,86,0,Government,2917,1910635
4,4,Bonnie Ray,F,9,Huang High School,87,69,0,Government,2917,1910635


## Local Government Area Summary

* Calculate the total number of schools

* Calculate the total number of students

* Calculate the total budget

* Calculate the average maths score 

* Calculate the average reading score

* Calculate the percentage of students with a passing maths score (50 or greater)

* Calculate the percentage of students with a passing reading score (50 or greater)

* Calculate the percentage of students who passed maths **and** reading (% Overall Passing)

* Create a dataframe to hold the above results

* Optional: give the displayed data cleaner formatting

In [76]:
# Calculations and variables

school_count = len(school_data_complete["school_name"].unique())
student_count = school_data_complete["Student ID"].count()
total_budget = school_data["budget"].sum()
average_maths_score = school_data_complete["maths_score"].mean()
average_reading_score = school_data_complete["reading_score"].mean()

passing_maths = school_data_complete[school_data_complete["maths_score"] >= 50]
passing_maths_count = passing_maths["student_name"].count()
percent_passing_maths = passing_maths_count / student_count * 100

passing_reading = school_data_complete[school_data_complete["reading_score"] >= 50]
passing_reading_count = passing_reading["student_name"].count()
percent_passing_reading = passing_reading_count / student_count * 100

passing_maths_reading = school_data_complete[(school_data_complete["maths_score"] >= 50) & (school_data_complete["reading_score"] >= 50)]
passing_maths_reading_count = passing_maths_reading["student_name"].count()
percent_passing_maths_reading = passing_maths_reading_count / student_count * 100

# Create a summary dataframe
summary_df = pd.DataFrame({
    "Total Schools": [school_count],
    "Total Students": [student_count],
    "Total Budget": [total_budget],
    "Average Maths Score": [average_maths_score],
    "Average Reading Score": [average_reading_score],
    "% Passing Math": [percent_passing_maths],
    "% Passing Reading": [percent_passing_reading],
    "% Overall Passing": [percent_passing_maths_reading]
})

# Format columns
summary_df["Total Students"] = summary_df["Total Students"].map("{:,}".format)
summary_df["Total Budget"] = summary_df["Total Budget"].map("${:,.2f}".format)

# Display summary dataframe
summary_df




Unnamed: 0,Total Schools,Total Students,Total Budget,Average Maths Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
0,15,39170,"$24,649,428.00",70.338192,69.980138,86.078632,84.426857,72.808272


## School Summary

* Create an overview table that summarises key metrics about each school, including:
  * School Name
  * School Type
  * Total Students
  * Total School Budget
  * Per Student Budget
  * Average Maths Score
  * Average Reading Score
  * % Passing Maths
  * % Passing Reading
  * % Overall Passing (The percentage of students that passed maths **and** reading.)
  
* Create a dataframe to hold the above results

In [77]:
# Get school types in the school_data_complete DF; use school_name as index
per_school_types = school_data.set_index(["school_name"])["type"]

# Get each school's total student count 
per_school_counts = School_data_complete["school_name"].value_counts()

# Get each school's total school budget using the school_data DF
per_school_budget = school_data.groupby(["school_name"])["budget"].mean()

# Calculate the per capita spending
per_school_capita = per_school_budget / per_school_counts

# Get the math and reading scores
student_school_scores = school_data_complete.groupby(["school_name"])[["maths_score", "reading_score"]].mean()

# Calculate the passing scores by creating a filtered DataFrame
per_school_passing_maths = school_data_complete[(school_data_complete["maths_score"] >= 50)]
per_school_passing_reading = school_data_complete[(school_data_complete["reading_score"] >= 50)]

# Calculate the number of students passing math and passing reading by school
per_school_passing_maths = per_school_passing_maths.groupby(["school_name"]).count()["student_name"]
per_school_passing_reading = per_school_passing_reading.groupby(["school_name"]).count()["student_name"]

# Calculate the percentage of passing math and reading scores per school
per_school_passing_maths = per_school_passing_maths / per_school_counts * 100
per_school_passing_reading = per_school_passing_reading / per_school_counts * 100

# Calculate the students who passed both math and reading
per_passing_maths_reading = school_data_complete[(school_data_complete["maths_score"] >= 50) & (school_data_complete["reading_score"] >= 50)]

# Calculate the number of students who passed both math and reading
per_passing_maths_reading = per_passing_maths_reading.groupby(["school_name"]).count()["student_name"]

# Calculate the overall passing percentage
per_overall_passing_percentage = per_passing_maths_reading / per_school_counts * 100

# Adding a list of values with keys to create a new DataFrame

per_school_summary_df = pd.DataFrame({
    "School Type": per_school_types,
    "Total Students": per_school_counts,
    "Total School Budget": per_school_budget,
    "Per Student Budget": per_school_capita,
    "Average Math Score": student_school_scores["maths_score"],
    "Average Reading Score": student_school_scores["reading_score"],
    "% Passing Math": per_school_passing_maths,
    "% Passing Reading": per_school_passing_reading,
    "% Overall Passing": per_overall_passing_percentage})

# Format columns
per_school_summary_df["Total School Budget"] = per_school_summary_df["Total School Budget"].map("${:,.0f}".format)
per_school_summary_df["Per Student Budget"] = per_school_summary_df["Per Student Budget"].map("${:,.0f}".format)
per_school_summary_df["% Passing Math"] = per_school_summary_df["% Passing Math"].map("{:.1f}%".format)
per_school_summary_df["% Passing Reading"] = per_school_summary_df["% Passing Reading"].map("{:.1f}%".format)
per_school_summary_df["% Overall Passing"] = per_school_summary_df["% Overall Passing"].map("{:.1f}%".format)

# Display summary dataframe
per_school_summary_df






Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Bailey High School,Government,4976,"$3,124,928",$628,72.352894,71.008842,91.6%,87.4%,80.1%
Cabrera High School,Independent,1858,"$1,081,356",$582,71.657158,71.359526,90.9%,89.1%,80.8%
Figueroa High School,Government,2949,"$1,884,411",$639,68.698542,69.077993,81.7%,82.8%,67.7%
Ford High School,Government,2739,"$1,763,916",$644,69.091274,69.572472,82.4%,82.2%,67.5%
Griffin High School,Independent,1468,"$917,500",$625,71.788147,71.245232,91.2%,88.5%,81.3%
Hernandez High School,Government,4635,"$3,022,020",$652,68.874865,69.186408,80.9%,81.9%,66.4%
Holden High School,Independent,427,"$248,087",$581,72.583138,71.660422,89.9%,88.5%,78.9%
Huang High School,Government,2917,"$1,910,635",$655,68.935207,68.910525,81.7%,81.5%,66.7%
Johnson High School,Government,4761,"$3,094,650",$650,68.8431,69.039277,82.1%,82.0%,67.2%
Pena High School,Independent,962,"$585,858",$609,72.088358,71.613306,91.7%,86.6%,79.2%


## Top Performing Schools (By % Overall Passing)

* Sort and display the top five performing schools by % overall passing.

In [78]:
top_schools = per_school_summary_df.sort_values(["% Overall Passing"], ascending=True)
top_schools.head()

Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Hernandez High School,Government,4635,"$3,022,020",$652,68.874865,69.186408,80.9%,81.9%,66.4%
Huang High School,Government,2917,"$1,910,635",$655,68.935207,68.910525,81.7%,81.5%,66.7%
Johnson High School,Government,4761,"$3,094,650",$650,68.8431,69.039277,82.1%,82.0%,67.2%
Ford High School,Government,2739,"$1,763,916",$644,69.091274,69.572472,82.4%,82.2%,67.5%
Wilson High School,Independent,2283,"$1,319,574",$578,69.170828,68.876916,82.8%,81.3%,67.5%


## Bottom Performing Schools (By % Overall Passing)

* Sort and display the five worst-performing schools by % overall passing.

In [79]:
bottom_schools = per_school_summary_df.sort_values(["% Overall Passing"], ascending=False)
bottom_schools.head()


Unnamed: 0,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
Griffin High School,Independent,1468,"$917,500",$625,71.788147,71.245232,91.2%,88.5%,81.3%
Cabrera High School,Independent,1858,"$1,081,356",$582,71.657158,71.359526,90.9%,89.1%,80.8%
Bailey High School,Government,4976,"$3,124,928",$628,72.352894,71.008842,91.6%,87.4%,80.1%
Wright High School,Independent,1800,"$1,049,400",$583,72.047222,70.969444,91.8%,86.7%,79.7%
Rodriguez High School,Government,3999,"$2,547,363",$637,72.047762,70.935984,90.8%,87.4%,79.4%


## Maths Scores by Year

* Create a table that lists the average maths score for students of each year level (9, 10, 11, 12) at each school.

  * Create a pandas series for each year. Hint: use a conditional statement.
  
  * Group each series by school
  
  * Combine the series into a dataframe
  
  * Optional: give the displayed data cleaner formatting

In [85]:
#creates grade level average math scores for each school 
ninth_math = school_data_complete.loc[school_data_complete['year'] == '9'].groupby('school_name')["maths_score"].mean()
tenth_math = school_data_complete.loc[school_data_complete['year'] == '10'].groupby('school_name')["maths_score"].mean()
eleventh_math = school_data_complete.loc[school_data_complete['year'] == '11'].groupby('school_name')["maths_score"].mean()
twelfth_math = school_data_complete.loc[school_data_complete['year'] == '12'].groupby('school_name')["maths_score"].mean()

maths_scores = pd.DataFrame({
        "9": ninth_math,
        "10": tenth_math,
        "11": eleventh_math,
        "12": twelfth_math
})


#show and format
maths_scores.style.format({'9': '{:.1f}', 
                          "10": '{:.1f}', 
                          "11": "{:.1f}", 
                          "12": "{:.1f}"})



Unnamed: 0_level_0,9,10,11,12
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1


## Reading Score by Year

* Perform the same operations as above for reading scores

## Scores by School Spending

* Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following:
  * Average Maths Score
  * Average Reading Score
  * % Passing Maths
  * % Passing Reading
  * Overall Passing Rate (Average of the above two)

## Scores by School Size

* Perform the same operations as above, based on school size.

## Scores by School Type

* Perform the same operations as above, based on school type