## PyCitySchools

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [None]:
# Import dependencies
import pandas as pd

# Create references for each CSV file
school_data_to_load = "Resources/schools_complete.csv"
student_data_to_load = "Resources/students_complete.csv"

# Read each CSV into a pandas dataframe
school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)

# Combine each pandas dataframe into a single dataframe  
complete_df = pd.merge(student_data, school_data, how="left", on=["school_name", "school_name"])

# District Summary

* Calculate the total number of schools

* Calculate the total number of students

* Calculate the total budget

* Calculate the average math score 

* Calculate the average reading score

* Calculate the percentage of students with a passing math score (70 or greater)

* Calculate the percentage of students with a passing reading score (70 or greater)

* Calculate the percentage of students who passed math **and** reading (% Overall Passing)

* Create a dataframe to hold the above results

* Optional: give the displayed data cleaner formatting

In [None]:
# Due diligence to look for any incomplete rows?
complete_df.count()

In [None]:
# Declare variables, count and calculate totals, and store values
total_schools = school_data["School ID"].count()
total_students = student_data["Student ID"].count()
total_budget = school_data["budget"].sum()

In [None]:
# Declare variables, calculate averages, and store values
average_math = complete_df["math_score"].mean()
average_reading = complete_df["reading_score"].mean()

In [None]:
# Declare variables, apply conditional statements using .loc, calculate percentages, and store values
passing_math_scores = complete_df.loc[complete_df["math_score"] >= 70, :]
passed_math = passing_math_scores["Student ID"].count()
math_percentage = passed_math / total_students

passing_reading_scores = complete_df.loc[complete_df["reading_score"] >= 70, :]
passed_reading = passing_reading_scores["Student ID"].count()
reading_percentage = passed_reading / total_students

In [None]:
# Declare variable, apply conditional statement using .loc, .count based on "student ID" column, and store value
passing_both = complete_df.loc[(complete_df["math_score"] >= 70) & (complete_df["reading_score"] >= 70), :]
passed_both = passing_both["Student ID"].count()

In [None]:
# Declare variable, calculate percentage, and store value
passed_both_percentage = passed_both / total_students

In [None]:
# Create a dataframe to hold the above results and print to screen
unformatted_df = pd.DataFrame({"Total Schools":[total_schools],
                               "Total Students":[total_students],
                               "Total Budget":[total_budget],
                               "Average Math Score":[average_math],
                               "Average Reading Score":[average_reading],
                               "% Passing Math":[math_percentage],
                               "% Passing Reading":[reading_percentage],
                               "% Overall Passing":[passed_both_percentage]})
unformatted_df

In [None]:
# Use this cell to format values in unformatted_df ???  Or add formats to values in dataframe above

## School Summary

* Create an overview table that summarizes key metrics about each school, including:
  * School Name
  * School Type
  * Total Students
  * Total School Budget
  * Per Student Budget
  * Average Math Score
  * Average Reading Score
  * % Passing Math
  * % Passing Reading
  * % Overall Passing (The percentage of students that passed math **and** reading.)
  
* Create a dataframe to hold the above results

In [None]:
# Create new dataframe using name of school as index and sort it alphabetically
overview_df = school_data[["school_name", "type", "size", "budget"]].set_index("school_name").sort_values("school_name")
overview_df.index.name = None
overview_df

In [None]:
# Rename columns using .rename
renamed_overview_df = overview_df.rename(columns={"school_name":"School Name",
                                                  "type":"School Type",
                                                  "size":"Total Students",
                                                  "budget":"Total School Budget"})
renamed_overview_df.index.name = None
renamed_overview_df

In [None]:
# Declare variables, extract data from each row of dataframe, calculate Per Student Budget, and store values for new column
students_per_school = renamed_overview_df["Total Students"]
budget_per_school = renamed_overview_df["Total School Budget"]
budget_per_student = budget_per_school / students_per_school
budget_per_student

In [None]:
# Using GroupBy on complete dataframe in order to separate the data into fields according to school name
grouped_by_school_df = complete_df.groupby("school_name")
grouped_by_school_df.head()

In [None]:
# Declare variables, calculate averages by school using .mean, and store values
per_school_math_average = grouped_by_school_df["math_score"].mean()
per_school_reading_average = grouped_by_school_df["reading_score"].mean()
per_school_math_average

In [None]:
per_school_reading_average

In [None]:
# Declare variable, count number of students per school, and store values
per_school_total_students = grouped_by_school_df["Student ID"].count()
per_school_total_students

In [None]:
# Math scores per school
# Declare variables, apply conditional statements, calculate percentages, and store values
per_school_passing_math = complete_df[complete_df["math_score"] >= 70].groupby(["school_name"])
per_school_passed_math = per_school_passing_math["Student ID"].count()
per_school_math_percentage = per_school_passed_math / per_school_total_students
per_school_math_percentage

In [None]:
# Reading scores per school
# Declare variables, apply conditional statements, calculate percentages, and store values
per_school_passing_reading = complete_df[complete_df["reading_score"] >= 70].groupby(["school_name"])
per_school_passed_reading = per_school_passing_reading["Student ID"].count()
per_school_reading_percentage = per_school_passed_reading / per_school_total_students
per_school_reading_percentage

In [None]:
# Both math and reading scores per school
# Declare variable, apply conditional statement, and store value
per_school_passing_both = complete_df[(complete_df["math_score"] >= 70) & 
                                      (complete_df["reading_score"] >= 70)].groupby(["school_name"])
per_school_passed_both = per_school_passing_both["Student ID"].count()

# Declare variable, calculate percentage, and store value
per_school_passed_both_percentage = per_school_passed_both / per_school_total_students
per_school_passed_both_percentage

In [None]:
# Append previously defined overview dataframe with new columns and respective values
renamed_overview_df["Per Student Budget"] = budget_per_student
renamed_overview_df["Average Math Score"] = per_school_math_average
renamed_overview_df["Average Reading Score"] = per_school_reading_average
renamed_overview_df["% Passing Math"] = per_school_math_percentage
renamed_overview_df["% Passing Reading"] = per_school_reading_percentage
renamed_overview_df["% Overall Passing"] = per_school_passed_both_percentage
renamed_overview_df.index.name = None
renamed_overview_df

In [None]:
# Use this cell to format values in renamed_overview_df ???  Or add formats to values in dataframe above

## Top Performing Schools (By % Overall Passing)

* Sort and display the top five performing schools by % overall passing.

In [None]:
# Sort and display the top five performing schools by % overall passing (ascending=False must be passed in to sort high to low)
top_five_df = renamed_overview_df.sort_values("% Overall Passing", ascending = False)
top_five_df.index.name = None
top_five_df.head(5)

## Bottom Performing Schools (By % Overall Passing)

* Sort and display the five worst-performing schools by % overall passing.

In [None]:
# Sort and display the worst five performing schools by % overall passing (default sort is ascending)
worst_five_df = renamed_overview_df.sort_values("% Overall Passing")
worst_five_df.index.name = None
worst_five_df.head(5)

## Math Scores by Grade

* Create a table that lists the average math Score for students of each grade level (9th, 10th, 11th, 12th) at each school.

  * Create a pandas series for each grade. Hint: use a conditional statement.
  
  * Group each series by school
  
  * Combine the series into a dataframe
  
  * Optional: give the displayed data cleaner formatting

In [None]:
# Create a pandas series for each grade
# Using GroupBy on complete dataframe in order to separate the data into fields according to "grade"

## Reading Score by Grade 

* Perform the same operations as above for reading scores

## Scores by School Spending

* Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following:
  * Average Math Score
  * Average Reading Score
  * % Passing Math
  * % Passing Reading
  * Overall Passing Rate (Average of the above two)

## Scores by School Size

* Perform the same operations as above, based on school size.

## Scores by School Type

* Perform the same operations as above, based on school type