In [275]:
 # Dependencies and Setup
import pandas as pd

In [276]:
# Files to Load
student_input = "Resources/students_complete.csv"
school_input = "Resources/schools_complete.csv"

# Read School and Student Data File and store into Pandas DataFrames
student_data = pd.read_csv(student_input)
school_data = pd.read_csv(school_input)

# Combine the data into a single dataset.  
df = pd.merge(student_data, school_data, how="left", on=["school_name", "school_name"])


## District Summary

**Create a high-level snapshot, in a DataFrame, of the district's key metrics, including the following:**

- Total schools
- Total students
- Total budget
- Average math score
- Average reading score
- % passing math (the percentage of students who passed math)
- % passing reading (the percentage of students who passed reading)
- % overall passing (the percentage of students who passed math AND reading)

In [277]:
# calculate totals for schools and students
total_schools = len(df["school_name"].unique())
total_students = df["Student ID"].count()

# calculate the total budget
total_budget = school_data["budget"].sum()

In [278]:
# calculate average scores
avg_math = df["math_score"].mean()
avg_reading = df["reading_score"].mean()

In [279]:
# calculate % passing for math
pass_math = df.loc[(df["math_score"] >= 70.0)].math_score.count()
percent_pass_math = (pass_math / total_students) * 100

In [280]:
# calculate % passing for reading
pass_reading = df.loc[(df["reading_score"] >= 70.0)].reading_score.count()
percent_pass_reading = (pass_reading / total_students) * 100

In [281]:
# calculate % passing for both
pass_both = df.loc[(df["math_score"] >= 70.0) & (df["reading_score"] >= 70.0)].reading_score.count()
percent_pass_both = (pass_both / total_students) * 100

In [282]:
district_df = pd.DataFrame(columns=('Total Schools','Total Students','Total Budget','Average Math Score','Average Reading Score','% Passing Math','% Passing Reading','% Overall Passing'))

district_df.loc[0] = [total_schools,total_students,total_budget,avg_math,avg_reading,percent_pass_math,percent_pass_reading,percent_pass_both]

district_df["Total Schools"] = district_df["Total Schools"].astype(int)
district_df["Total Students"] = district_df["Total Students"].astype(int)
district_df["Total Budget"] = district_df["Total Budget"].map("${:,.2f}".format)
district_df["% Passing Math"] = district_df["% Passing Math"].map("{:.5f}%".format)
district_df["% Passing Reading"] = district_df["% Passing Reading"].map("{:.5f}%".format)
district_df["% Overall Passing"] = district_df["% Overall Passing"].map("{:.5f}%".format)

district_df.head()

Unnamed: 0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing
0,15,39170,"$24,649,428.00",78.985371,81.87784,74.98085%,85.80546%,65.17233%


## School Summary

**Create a DataFrame that summarizes key metrics about each school, including the following:**

- School name
- School type
- Total students
- Total school budget
- Per student budget
- Average math score
- Average reading score
- % passing math (the percentage of students who passed math)
- % passing reading (the percentage of students who passed reading)
- % overall passing (the percentage of students who passed math AND reading)

In [311]:
# group by school / school name is index
# used groupby / agg to count student names and find avg for math and reading scores
groupby_school_df = df.set_index(["school_name"]).sort_index()
groupby_school_df = groupby_school_df.groupby("school_name").agg({'student_name': 'count', 'math_score': 'mean', 'reading_score': 'mean'})
groupby_school_df.head(15)


Unnamed: 0_level_0,student_name,math_score,reading_score
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bailey High School,4976,77.048432,81.033963
Cabrera High School,1858,83.061895,83.97578
Figueroa High School,2949,76.711767,81.15802
Ford High School,2739,77.102592,80.746258
Griffin High School,1468,83.351499,83.816757
Hernandez High School,4635,77.289752,80.934412
Holden High School,427,83.803279,83.814988
Huang High School,2917,76.629414,81.182722
Johnson High School,4761,77.072464,80.966394
Pena High School,962,83.839917,84.044699


## Highest-Performing Schools (by % Overall Passing)

Create a DataFrame that highlights the top 5 performing schools based on % Overall Passing. Include the following metrics:

* School name
* School type
* Total students
* Total school budget
* Per student budget
* Average math score
* Average reading score
* % passing math (the percentage of students who passed math)
* % passing reading (the percentage of students who passed reading)
* % overall passing (the percentage of students who passed math AND reading)

## Lowest-Performing Schools (by % Overall Passing)

Create a DataFrame that highlights the bottom 5 performing schools based on % Overall Passing. Include the following metrics:

* School name
* School type
* Total students
* Total school budget
* Per student budget
* Average math score
* Average reading score
* % passing math (the percentage of students who passed math)
* % passing reading (the percentage of students who passed reading)
* % overall passing (the percentage of students who passed math AND reading)

## Math Scores by Grade

Create a DataFrame that lists the average math score for students of each grade level (9th, 10th, 11th, 12th) at each school.

## Reading Scores by Grade

Create a DataFrame that lists the average reading score for students of each grade level (9th, 10th, 11th, 12th) at each school.

## Scores by School Spending

Create a table that breaks down school performance based on average spending ranges (per student). Use your judgment to create four bins with reasonable cutoff values to group school spending. Include the following metrics in the table:

* Average math score
* Average reading score
* % passing math (the percentage of students who passed math)
* % passing reading (the percentage of students who passed reading)
* % overall passing (the percentage of students who passed math AND reading)

## Scores by School Size

Create a table that breaks down school performance based on school size (small, medium, or large).

In [None]:
#group by size

## Scores by School Type

Create a table that breaks down school performance based on type of school (district or charter).

In [None]:
#group by type