# PyCity Schools Analysis

* As a whole, schools with higher budgets, did not yield better test results. By contrast, schools with higher spending per student actually (\$645-675) underperformed compared to schools with smaller budgets (<\$585 per student).

* As a whole, smaller and medium sized schools dramatically out-performed large sized schools on passing math performances (89-91% passing vs 67%).

* As a whole, charter schools out-performed the public district schools across all metrics. However, more analysis will be required to glean if the effect is due to school practices or the fact that charter schools tend to serve smaller student populations per school. 
---

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
school_data_to_load = "Resources/schools_complete.csv"
student_data_to_load = "Resources/students_complete.csv"

In [2]:
# Read School and Student Data File and store into Pandas Data Frames
school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)

In [3]:
# Combine the data into a single dataset (consider using a left join
main_merge = pd.merge(student_data, school_data, on="school_name", how="left")
main_merge.head()

Unnamed: 0,Student ID,student_name,gender,grade,school_name,reading_score,math_score,School ID,type,size,budget
0,0,Paul Bradley,M,9th,Huang High School,66,79,0,District,2917,1910635
1,1,Victor Smith,M,12th,Huang High School,94,61,0,District,2917,1910635
2,2,Kevin Rodriguez,M,12th,Huang High School,90,60,0,District,2917,1910635
3,3,Dr. Richard Scott,M,12th,Huang High School,67,58,0,District,2917,1910635
4,4,Bonnie Ray,F,9th,Huang High School,97,84,0,District,2917,1910635


## District Summary

In [4]:
# Total Schools
total_school = main_merge["school_name"].nunique()
print(f'--- {total_school} schools ---')

# Total Students
total_student = main_merge["student_name"].count()
print(f'--- {total_student} students ---')

--- 15 schools ---
--- 39170 students ---


In [5]:
#Total Budget
total_budget = school_data["budget"].sum()
print(f'Total Budget : ${total_budget}')

Total Budget : $24649428


In [6]:
##### Calculate the Average Scores #####

#Average Math Scores
avg_math = main_merge["math_score"].mean()
print(f'Avg. Math Score : {avg_math}')

#Average Math Scores
avg_reading = main_merge["reading_score"].mean()
print(f'Avg. Reading Score : {avg_reading}')


##### Calculate the Percentage Pass Rates #####

# Percent pass Math
std_pass_math = (main_merge.loc[main_merge["math_score"] >= 70, "math_score"]).count()
percent_pass_math = (std_pass_math / total_student) * 100
print(f'%STD pass Math : {percent_pass_math}')

# Percent pass Reading
std_pass_reading = (main_merge.loc[main_merge["reading_score"] >= 70, "reading_score"]).count()
percent_pass_reading = (std_pass_reading / total_student) * 100
print(f'%STD pass Reading : {percent_pass_reading}')


Avg. Math Score : 78.98537145774827
Avg. Reading Score : 81.87784018381414
%STD pass Math : 74.9808526933878
%STD pass Reading : 85.80546336482001


In [7]:
# No. of Student passing Math & Reading
overall_pass = (main_merge.loc[(main_merge["math_score"] >= 70) & (main_merge["reading_score"] >= 70), "student_name"])
print(f'Overall passing : {overall_pass.count()} students.')

# % Passing Math & Reading
overall_percent_pass = (avg_math + avg_reading) / 2
print(f'Overall Percent Passing : {overall_percent_pass}')

Overall passing : 25528 students.
Overall Percent Passing : 80.43160582078121


In [8]:
minor_clean_cols = {
                    "Total Schools" : total_school,
                    "Total_Students" : total_student,
                    "Total Budget" : total_budget,
                    "Average Math Score" : avg_math,
                    "Average Reading Score" : avg_reading,
                    "% Passing Math" : percent_pass_math,
                    "% Passing Reading" : percent_pass_reading,
                    "% Overall Passing Rate" : overall_percent_pass
                    }

district_summary = pd.DataFrame([minor_clean_cols])
district_summary.head()

Unnamed: 0,Total Schools,Total_Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,% Overall Passing Rate
0,15,39170,24649428,78.985371,81.87784,74.980853,85.805463,80.431606


## School Summary

In [9]:
# School Names
school_name = school_data["school_name"].head()
print(school_name)

# Determine the School Type
school_type = school_data["type"].head()
print(school_type)

0        Huang High School
1     Figueroa High School
2      Shelton High School
3    Hernandez High School
4      Griffin High School
Name: school_name, dtype: object
0    District
1    District
2     Charter
3    District
4     Charter
Name: type, dtype: object


In [10]:
### See if there is unecessary cols
main_merge.columns

Index(['Student ID', 'student_name', 'gender', 'grade', 'school_name',
       'reading_score', 'math_score', 'School ID', 'type', 'size', 'budget'],
      dtype='object')

In [11]:
### GroupBy "school_name"
g_byschool = main_merge.groupby("school_name")
g_byschool

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x1206b8a50>

In [12]:
### Total Student per School
total_student = g_byschool["Student ID"].count()
total_student.head()

school_name
Bailey High School      4976
Cabrera High School     1858
Figueroa High School    2949
Ford High School        2739
Griffin High School     1468
Name: Student ID, dtype: int64

In [13]:
### Total Budget per School
total_budget = g_byschool.mean()["budget"]
total_budget

school_name
Bailey High School       3124928.0
Cabrera High School      1081356.0
Figueroa High School     1884411.0
Ford High School         1763916.0
Griffin High School       917500.0
Hernandez High School    3022020.0
Holden High School        248087.0
Huang High School        1910635.0
Johnson High School      3094650.0
Pena High School          585858.0
Rodriguez High School    2547363.0
Shelton High School      1056600.0
Thomas High School       1043130.0
Wilson High School       1319574.0
Wright High School       1049400.0
Name: budget, dtype: float64

In [14]:
### Budget per Student in Each School
per_std_budget = total_budget / total_student
per_std_budget

school_name
Bailey High School       628.0
Cabrera High School      582.0
Figueroa High School     639.0
Ford High School         644.0
Griffin High School      625.0
Hernandez High School    652.0
Holden High School       581.0
Huang High School        655.0
Johnson High School      650.0
Pena High School         609.0
Rodriguez High School    637.0
Shelton High School      600.0
Thomas High School       638.0
Wilson High School       578.0
Wright High School       583.0
dtype: float64

In [15]:
#### Avg. Scores for each school
# math
avg_math = g_byschool["math_score"].mean()
# reading
avg_reading = g_byschool["reading_score"].mean()
# overall
avg_overall = (avg_math + avg_reading) / 2

In [16]:
### Student vs. Passing Scores
# Pass Math
pass_math = g_byschool["reading_score"] > 70
pass_math = g_byschool.loc[pass_math]
pass_math.mean()


TypeError: '>' not supported between instances of 'SeriesGroupBy' and 'int'

In [None]:
school_df = pd.DataFrame({
                            "School Type" : [school_type],
                            "Total Students" : [total_student],
                            "Total School Budget" : [total_budget],
                            "Per Student Budget" : [per_std_budget],
                            "Average Math Score" : [avg_math],
                            "Average Reading Score" : [avg_reading]
                        })
school_df.head()

In [None]:
# Calculate the passing scores by creating a filtered data frame
#df[(df['year'] > 2012) & (df['reports'] < 30)]
# Convert to data frame

# Minor data munging

# Display the data frame


## Top Performing Schools (By Passing Rate)

In [None]:
# Sort and show top five schools


## Bottom Performing Schools (By Passing Rate)

In [None]:
# Sort and show bottom five schools


## Math Scores by Grade

In [17]:
main_merge.head()

Unnamed: 0,Student ID,student_name,gender,grade,school_name,reading_score,math_score,School ID,type,size,budget
0,0,Paul Bradley,M,9th,Huang High School,66,79,0,District,2917,1910635
1,1,Victor Smith,M,12th,Huang High School,94,61,0,District,2917,1910635
2,2,Kevin Rodriguez,M,12th,Huang High School,90,60,0,District,2917,1910635
3,3,Dr. Richard Scott,M,12th,Huang High School,67,58,0,District,2917,1910635
4,4,Bonnie Ray,F,9th,Huang High School,97,84,0,District,2917,1910635


In [21]:
# Create data series of scores by grade levels using conditionals
math_series = main_merge.loc[:, ["school_name","math_score", "grade"]]


# Group each by school name

# Combine series into single data frame

# Minor data munging

# Display the data frame


Unnamed: 0,school_name,math_score,grade
0,Huang High School,79,9th
1,Huang High School,61,12th
2,Huang High School,60,12th
3,Huang High School,58,12th
4,Huang High School,84,9th


## Reading Score by Grade 

In [None]:
# Create data series of scores by grade levels using conditionals

# Group each by school name

# Combine series into single data frame

# Minor data munging

# Display the data frame


## Scores by School Spending

In [None]:
# Establish the bins -- choose any set of bins you would like, but see below for testing bins
# to test, set your bins as follows: [0, 585, 615, 645, 675]
# ALSO -- Note that the values for `% Passing Math`, `% Passing Reading` and `% Overall Passing Rate`
# were computed using averages of averages -- your results may vary if you use weighted averages 

# Categorize the spending based on the bins

# Assemble into data frame

# Minor data munging

# Display results


## Scores by School Size

In [None]:
# Establish the bins 

# Categorize the spending based on the bins

# Calculate the scores based on bins

# Assemble into data frame

# Minor data munging

# Display results


## Scores by School Type

In [None]:
# Type | Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing Rate

# Assemble into data frame

# Minor data munging

# Display results
