***Observable trends*** 
1. Reading is easier than Math for all schools. The average score and the pass rate of Reading are both higher than Math.
2. The school performance is related to the school size, not related to the spending per student.  
3. Charter schools have better performance. The top 5 performance schools are all charter schools 

In [1]:
# Import Dependencies
import pandas as pd
import numpy as np

In [2]:
# Create dataframes from the raw data
schools_csv = "raw_data/schools_complete.csv"
students_csv = "raw_data/students_complete.csv"
schools_df = pd.read_csv(schools_csv)
students_df = pd.read_csv(students_csv)

**District Summary**

In [3]:
# Get Math pass rate
math_pass=students_df.loc[students_df["math_score"]>=70]
math_pass_rate=math_pass["Student ID"].count()/students_df["Student ID"].count()

# Get Reading pass rate
reading_pass=students_df.loc[students_df["reading_score"]>=70]
reading_pass_rate=reading_pass["Student ID"].count()/students_df["Student ID"].count()

#Get budget per student
schools_df["budget per student"]=schools_df["budget"]/schools_df["size"]

snapshot={"Total Schools":[schools_df["School ID"].count()],"Total Students":students_df["Student ID"].count(),
          "Total Budget":schools_df["budget"].sum(),"Average Math Score":students_df["math_score"].mean(),
          "Average Reading Score":students_df["reading_score"].mean(),"% Passing Math":math_pass_rate,
          "% Passing Reading":reading_pass_rate,"Overall Passing Rate":(math_pass_rate+reading_pass_rate)*0.5}

snapshot_df=pd.DataFrame(snapshot)

snapshot_df.loc[:,"Total Budget"]=snapshot_df.loc[:,"Total Budget"].apply('${:.0f}'.format)
snapshot_df.loc[:,"Average Math Score"]=snapshot_df.loc[:,"Average Math Score"].apply('{:.02f}'.format)
snapshot_df.loc[:,"Average Reading Score"]=snapshot_df.loc[:,"Average Reading Score"].apply('{:.02f}'.format)


snapshot_df.loc[:,"% Passing Math"]=snapshot_df.loc[:,"% Passing Math"].apply('{:.02%}'.format)
snapshot_df.loc[:,"% Passing Reading"]=snapshot_df.loc[:,"% Passing Reading"].apply('{:.02%}'.format)
snapshot_df.loc[:,"Overall Passing Rate"]=snapshot_df.loc[:,"Overall Passing Rate"].apply('{:.02%}'.format)

#**District Summary**

snapshot_df=snapshot_df[["Total Schools", "Total Students","Total Budget","Average Math Score","Average Reading Score","% Passing Math","% Passing Reading","Overall Passing Rate"]]
snapshot_df


Unnamed: 0,Total Schools,Total Students,Total Budget,Average Math Score,Average Reading Score,% Passing Math,% Passing Reading,Overall Passing Rate
0,15,39170,$24649428,78.99,81.88,74.98%,85.81%,80.39%


**School Summary**

Step1: I combine all data into the DataFrame, school_summary; 
Step2: Get the top 5 performance school 


In [4]:
# Get student counts
Grouped_schools=pd.groupby(students_df,"school")
student_number=Grouped_schools["Student ID"].count()

# Get Average Math Score and Average Reading Score
average_score=Grouped_schools.aggregate({"reading_score":'mean',"math_score":'mean'})
average_score.columns=["Average Reading Score","Average Math Score"]
average_score["name"]=average_score.index

average_score["Average Reading Score"]=average_score["Average Reading Score"].round(2)
average_score["Average Math Score"]=average_score["Average Math Score"].round(2)

# Get Pass Rate
# # math score pass rate
grouped_math_pass=pd.groupby(math_pass,"school")
# # reading score pass rate
grouped_reading_pass=pd.groupby(reading_pass,"school")
# overall score pass rate
grouped_math_pass_rate=grouped_math_pass["Student ID"].count()/student_number*100
grouped_reading_pass_rate=grouped_reading_pass["Student ID"].count()/student_number*100
overall_rate=0.5*(grouped_math_pass_rate+grouped_reading_pass_rate)
# combine three rates into a DataFrame
schools_perf=pd.concat([grouped_reading_pass_rate, grouped_math_pass_rate,overall_rate], axis=1)
schools_perf.columns=["Reading Pass Rate (%)","Math Pass Rate (%)","Overall Pass Rate (%)"]


  
  from ipykernel import kernelapp as app


In [5]:
schools_perf["name"]=schools_perf.index
schools_perf.head().round(2)

Unnamed: 0_level_0,Reading Pass Rate (%),Math Pass Rate (%),Overall Pass Rate (%),name
school,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bailey High School,81.93,66.68,74.31,Bailey High School
Cabrera High School,97.04,94.13,95.59,Cabrera High School
Figueroa High School,80.74,65.99,73.36,Figueroa High School
Ford High School,79.3,68.31,73.8,Ford High School
Griffin High School,97.14,93.39,95.27,Griffin High School


In [6]:
#Combine the Performance DataFrame into the schoo_summary DataFrame
school_summary=pd.merge(schools_df,schools_perf,on="name")
school_summary=pd.merge(school_summary,average_score,on="name")


**Top Performing Schools (By Passing Rate)**

In [7]:
#sort the school_summary to get the top 5
#format the DataFrame per requirement
top_schools = school_summary.sort_values("Overall Pass Rate (%)",ascending=False)
top_schools=top_schools[["name","type","size","budget","budget per student","Average Math Score","Average Reading Score","Math Pass Rate (%)","Reading Pass Rate (%)","Overall Pass Rate (%)"]]
top_schools.columns=["School Name","School Type","Total Students","Total School Budget","Per Student Budget","Average Math Score","Average Reading Score","Math Pass Rate","Reading Pass Rate","Overall Pass Rate"]

top_schools.head().round(2)

Unnamed: 0,School Name,School Type,Total Students,Total School Budget,Per Student Budget,Average Math Score,Average Reading Score,Math Pass Rate,Reading Pass Rate,Overall Pass Rate
6,Cabrera High School,Charter,1858,1081356,582.0,83.06,83.98,94.13,97.04,95.59
14,Thomas High School,Charter,1635,1043130,638.0,83.42,83.85,93.27,97.31,95.29
9,Pena High School,Charter,962,585858,609.0,83.84,84.04,94.59,95.95,95.27
4,Griffin High School,Charter,1468,917500,625.0,83.35,83.82,93.39,97.14,95.27
5,Wilson High School,Charter,2283,1319574,578.0,83.27,83.99,93.87,96.54,95.2


**Math & Reading Scores by Grade**

In [8]:
students_school=pd.groupby(students_df,["school","grade"])
students_school.aggregate({"reading_score":'mean',"math_score":'mean'}).round(2)


  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,Unnamed: 1_level_0,reading_score,math_score
school,grade,Unnamed: 2_level_1,Unnamed: 3_level_1
Bailey High School,10th,80.91,77.0
Bailey High School,11th,80.95,77.52
Bailey High School,12th,80.91,76.49
Bailey High School,9th,81.3,77.08
Cabrera High School,10th,84.25,83.15
Cabrera High School,11th,83.79,82.77
Cabrera High School,12th,84.29,83.28
Cabrera High School,9th,83.68,83.09
Figueroa High School,10th,81.41,76.54
Figueroa High School,11th,80.64,76.88


**Scores and Passing Rate by School Spending**

In [9]:
#   * Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following:
max_budget=max(school_summary["budget per student"])
min_budget=min(school_summary["budget per student"])
bin_gap=(max_budget-min_budget)/4

# Create the bins in which Data will be held
bins = [min_budget-10, min_budget+bin_gap, min_budget+bin_gap*2, min_budget+bin_gap*3, max_budget+10]

# Create the names for the four bins
group_names = ['Low', 'Okay', 'Moderate', 'High']
budget_cut=pd.Series(pd.cut(school_summary["budget per student"], bins, labels=group_names))
school_summary["spending range"]=budget_cut

#   * Average Math Score  -> problems using concat
#   * Average Reading Score
pd.groupby(school_summary,["spending range"]).aggregate({"Average Reading Score":'mean',"Average Math Score":'mean'}).round(2)

  app.launch_new_instance()


Unnamed: 0_level_0,Average Reading Score,Average Math Score
spending range,Unnamed: 1_level_1,Unnamed: 2_level_1
Low,83.94,83.45
Okay,83.88,83.6
Moderate,82.42,80.2
High,81.37,77.87


In [10]:
#   * % Passing Math
#   * % Passing Reading
#   * Overall Passing Rate (Average of the above two)

pd.groupby(school_summary,["spending range"]).aggregate({"Reading Pass Rate (%)":'mean',"Math Pass Rate (%)":'mean',"Overall Pass Rate (%)":'mean'}).round(2)


  """


Unnamed: 0_level_0,Reading Pass Rate (%),Math Pass Rate (%),Overall Pass Rate (%)
spending range,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Low,96.61,93.46,95.04
Okay,95.9,94.23,95.07
Moderate,89.54,80.04,84.79
High,83.0,70.35,76.67


**Scores by School Size**

In [11]:
# * Repeat the above breakdown, but this time group schools based on a reasonable approximation of school size (Small, Medium, Large).
#   * Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following:
max_size=max(school_summary["size"])
min_size=min(school_summary["size"])
bin_gap=int((max_size-min_size)/3)
bin_gap

# Create the bins in which Data will be held
bins = [min_size-10, min_size+bin_gap, min_size+bin_gap*2, max_size+10]

# Create the names for the four bins
group_names = ['Small', 'Medium', 'Large']
size_cut=pd.Series(pd.cut(school_summary["size"], bins, labels=group_names))
school_summary["size range"]=size_cut


In [12]:
#   * Average Math Score  -> problems using concat
#   * Average Reading Score
pd.groupby(school_summary,["size range"]).aggregate({"Average Reading Score":'mean',"Average Math Score":'mean'})

#   * % Passing Math
#   * % Passing Reading
#   * Overall Passing Rate (Average of the above two)

size_passing_rate=pd.groupby(school_summary,["size range"]).aggregate({"Reading Pass Rate (%)":'mean',"Math Pass Rate (%)":'mean',"Overall Pass Rate (%)":'mean'})

size_passing_rate=size_passing_rate.round(2)


  This is separate from the ipykernel package so we can avoid doing imports until
  if __name__ == '__main__':


Unnamed: 0_level_0,Reading Pass Rate (%),Math Pass Rate (%),Overall Pass Rate (%)
size range,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Small,96.59,93.59,95.09
Medium,84.47,73.46,78.97
Large,81.06,66.46,73.76


In [18]:
# **Scores by School Type**
# * Repeat the above breakdown, but this time group schools based on school type (Charter vs. District).
score_size=pd.groupby(school_summary,["type"]).aggregate({"Reading Pass Rate (%)":'mean',"Math Pass Rate (%)":'mean',"Overall Pass Rate (%)":'mean'}).round(2)
score_size.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0_level_0,Reading Pass Rate (%),Math Pass Rate (%),Overall Pass Rate (%)
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Charter,96.59,93.62,95.1
District,80.8,66.55,73.67


In [14]:
!jupyter nbconvert --to markdown main.ipynb

[NbConvertApp] Converting notebook main.ipynb to markdown
[NbConvertApp] Writing 21904 bytes to main.md


In [15]:
!rename "main.md" "README.md"

A duplicate file name exists, or the file
cannot be found.


In [16]:
!jupyter nbconvert --to html main.ipynb

[NbConvertApp] Converting notebook main.ipynb to html
[NbConvertApp] Writing 303248 bytes to main.html


In [17]:
!rename "main.html" "README.html"

A duplicate file name exists, or the file
cannot be found.
