# PyCity Schools Analysis

- Your analysis here
  
School Type Performance: Charter schools outperform District schools in all metrics. Charter schools have higher average math and reading scores, as well as higher percentages of students passing math, reading, and both subjects combined.

Math and Reading Scores: On average, students in PyCity perform better in reading than in math across all schools. This is evident from the higher average reading scores compared to average math scores.

Passing Rates: The passing rates for both math and reading, as well as the overall passing rate (students passing both math and reading), are significantly higher in Charter schools compared to District schools.

School Size: There doesn't appear to be a clear correlation between school size and academic performance. However, further analysis may be needed to understand any potential trends.

Per Student Budget: Higher per student budgets do not necessarily correlate with higher academic performance. This suggests that factors other than budget allocation may influence school performance.

Grade-Level Performance: There may be variations in performance between different grades within the same school. Further analysis by grade level could provide insights into areas of improvement.

Overall, the data suggests that school type (Charter vs. District) is a significant factor influencing academic performance in PyCity, with Charter schools consistently outperforming District schools. Further investigation into the practices and policies of Charter schools could provide insights into potential strategies for improving academic outcomes across all schools in PyCity.

In [1]:
# Dependencies and Setup
import pandas as pd
from pathlib import Path

# File to Load (Remember to Change These)
school_data_to_load = Path("Resources/schools_complete.csv")
student_data_to_load = Path("Resources/students_complete.csv")

# Read School and Student Data File and store into Pandas DataFrames
school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)

# Combine the data into a single dataset.
school_data_complete = pd.merge(student_data, school_data, how="left", on=["school_name", "school_name"])
school_data_complete.head()


Unnamed: 0,Student ID,student_name,gender,grade,school_name,reading_score,math_score,School ID,type,size,budget
0,0,Paul Bradley,M,9th,Huang High School,66,79,0,District,2917,1910635
1,1,Victor Smith,M,12th,Huang High School,94,61,0,District,2917,1910635
2,2,Kevin Rodriguez,M,12th,Huang High School,90,60,0,District,2917,1910635
3,3,Dr. Richard Scott,M,12th,Huang High School,67,58,0,District,2917,1910635
4,4,Bonnie Ray,F,9th,Huang High School,97,84,0,District,2917,1910635


## District Summary

In [2]:
# Calculate the total number of unique schools
total_unique_schools = school_data_complete['school_name'].nunique()
print("Total number of unique schools:", total_unique_schools)

Total number of unique schools: 15


In [3]:
# Calculate the total number of students
total_students = school_data_complete['student_name'].count()
print("Total number of students:", total_students)

Total number of students: 39170


In [4]:
# Calculate the total budget
total_budget = school_data_complete.groupby('school_name')['budget'].first().sum()
print("Total budget for all schools:", total_budget)

Total budget for all schools: 24649428


In [5]:
# Calculate the average (mean) math score
average_math_score = school_data_complete['math_score'].mean()
print("Average math score across all students:", average_math_score)

Average math score across all students: 78.98537145774827


In [6]:
# Calculate the average (mean) reading score
average_reading_score = school_data_complete['reading_score'].mean()
print("Average reading score across all students:", average_reading_score)

Average reading score across all students: 81.87784018381414


In [7]:
# Use the following to calculate the percentage of students who passed math (math scores greather than or equal to 70)
# Count the number of students who passed math
passed_math_count = school_data_complete[school_data_complete['math_score'] >= 70]['Student ID'].count()

# Calculate the percentage of students who passed math
percentage_passed_math = (passed_math_count / total_students) * 100
print("Percentage of students who passed math:", percentage_passed_math)

Percentage of students who passed math: 74.9808526933878


In [8]:
# Calculate the percentage of students who passed reading (hint: look at how the math percentage was calculated)
# Count the number of students who passed reading
passed_reading_count = school_data_complete[school_data_complete['reading_score'] >= 70]['Student ID'].count()

# Calculate the percentage of students who passed reading
percentage_passed_reading = (passed_reading_count / total_students) * 100
print("Percentage of students who passed reading:", percentage_passed_reading)

Percentage of students who passed reading: 85.80546336482001


In [9]:
# Use the following to calculate the percentage of students that passed math and reading
# Count the number of students who passed both math and reading
passed_both_count = school_data_complete[(school_data_complete['math_score'] >= 70) & (school_data_complete['reading_score'] >= 70)]['Student ID'].count()

# Calculate the percentage of students who passed both math and reading
percentage_passed_both = (passed_both_count / total_students) * 100
print("Percentage of students who passed both math and reading:", percentage_passed_both)

Percentage of students who passed both math and reading: 65.17232575950983


In [10]:
# Create a high-level snapshot of the district's key metrics in a DataFrame
# Create a dictionary with district summary metrics
district_summary_data = {
    'Total Schools': [total_unique_schools],
    'Total Students': [total_students],
    'Total Budget': [total_budget],
    'Average Math Score': [average_math_score],
    'Average Reading Score': [average_reading_score],
    '% Passing Math': [percentage_passed_math],
    '% Passing Reading': [percentage_passed_reading],
    '% Overall Passing': [percentage_passed_both]
}

# Create the district_summary DataFrame
district_summary = pd.DataFrame(district_summary_data)

# Display the district_summary DataFrame
print(district_summary)

   Total Schools  Total Students  Total Budget  Average Math Score  \
0             15           39170      24649428           78.985371   

   Average Reading Score  % Passing Math  % Passing Reading  % Overall Passing  
0               81.87784       74.980853          85.805463          65.172326  


## School Summary

In [11]:
# Use the code provided to select the type per school from school_data
school_types = school_data[['school_name', 'type']]
print(school_types)


              school_name      type
0       Huang High School  District
1    Figueroa High School  District
2     Shelton High School   Charter
3   Hernandez High School  District
4     Griffin High School   Charter
5      Wilson High School   Charter
6     Cabrera High School   Charter
7      Bailey High School  District
8      Holden High School   Charter
9        Pena High School   Charter
10     Wright High School   Charter
11  Rodriguez High School  District
12    Johnson High School  District
13       Ford High School  District
14     Thomas High School   Charter


In [12]:
# Calculate the total student count per school from school_data
# Group the data by school name and sum up the student counts
total_student_count_per_school = school_data.groupby('school_name')['size'].sum()

print(total_student_count_per_school)

school_name
Bailey High School       4976
Cabrera High School      1858
Figueroa High School     2949
Ford High School         2739
Griffin High School      1468
Hernandez High School    4635
Holden High School        427
Huang High School        2917
Johnson High School      4761
Pena High School          962
Rodriguez High School    3999
Shelton High School      1761
Thomas High School       1635
Wilson High School       2283
Wright High School       1800
Name: size, dtype: int64


In [13]:
# Calculate the total school budget and per capita spending per school from school_data
# Calculate total school budget
total_school_budget = school_data.groupby('school_name')['budget'].sum()

# Calculate per capita spending per school
per_capita_spending_per_school = total_school_budget / total_student_count_per_school

print("Total School Budget:")
print(total_school_budget)
print("\nPer Capita Spending Per School:")
print(per_capita_spending_per_school)

Total School Budget:
school_name
Bailey High School       3124928
Cabrera High School      1081356
Figueroa High School     1884411
Ford High School         1763916
Griffin High School       917500
Hernandez High School    3022020
Holden High School        248087
Huang High School        1910635
Johnson High School      3094650
Pena High School          585858
Rodriguez High School    2547363
Shelton High School      1056600
Thomas High School       1043130
Wilson High School       1319574
Wright High School       1049400
Name: budget, dtype: int64

Per Capita Spending Per School:
school_name
Bailey High School       628.0
Cabrera High School      582.0
Figueroa High School     639.0
Ford High School         644.0
Griffin High School      625.0
Hernandez High School    652.0
Holden High School       581.0
Huang High School        655.0
Johnson High School      650.0
Pena High School         609.0
Rodriguez High School    637.0
Shelton High School      600.0
Thomas High School       638

In [14]:
# Calculate the average test scores per school from school_data_complete
# Calculate average test scores per school
average_scores_per_school = school_data_complete.groupby('school_name')[['math_score', 'reading_score']].mean()

print(average_scores_per_school)

                       math_score  reading_score
school_name                                     
Bailey High School      77.048432      81.033963
Cabrera High School     83.061895      83.975780
Figueroa High School    76.711767      81.158020
Ford High School        77.102592      80.746258
Griffin High School     83.351499      83.816757
Hernandez High School   77.289752      80.934412
Holden High School      83.803279      83.814988
Huang High School       76.629414      81.182722
Johnson High School     77.072464      80.966394
Pena High School        83.839917      84.044699
Rodriguez High School   76.842711      80.744686
Shelton High School     83.359455      83.725724
Thomas High School      83.418349      83.848930
Wilson High School      83.274201      83.989488
Wright High School      83.682222      83.955000


In [15]:
# Calculate the number of students per school with math scores of 70 or higher from school_data_complete
# Filter the DataFrame to include only students with math scores of 70 or higher
passing_math_per_school = school_data_complete[school_data_complete['math_score'] >= 70]

# Calculate the number of students per school with math scores of 70 or higher
passing_math_count_per_school = passing_math_per_school.groupby('school_name')['Student ID'].count()

print(passing_math_count_per_school)

school_name
Bailey High School       3318
Cabrera High School      1749
Figueroa High School     1946
Ford High School         1871
Griffin High School      1371
Hernandez High School    3094
Holden High School        395
Huang High School        1916
Johnson High School      3145
Pena High School          910
Rodriguez High School    2654
Shelton High School      1653
Thomas High School       1525
Wilson High School       2143
Wright High School       1680
Name: Student ID, dtype: int64


In [16]:
# Calculate the number of students per school with reading scores of 70 or higher from school_data_complete
# Filter the DataFrame to include only students with reading scores of 70 or higher
passing_reading_per_school = school_data_complete[school_data_complete['reading_score'] >= 70]

# Calculate the number of students per school with reading scores of 70 or higher
passing_reading_count_per_school = passing_reading_per_school.groupby('school_name')['Student ID'].count()

print(passing_reading_count_per_school)

school_name
Bailey High School       4077
Cabrera High School      1803
Figueroa High School     2381
Ford High School         2172
Griffin High School      1426
Hernandez High School    3748
Holden High School        411
Huang High School        2372
Johnson High School      3867
Pena High School          923
Rodriguez High School    3208
Shelton High School      1688
Thomas High School       1591
Wilson High School       2204
Wright High School       1739
Name: Student ID, dtype: int64


In [17]:
# Use the provided code to calculate the number of students per school that passed both math and reading with scores of 70 or higher
# Filter the DataFrame to include only students who passed both math and reading with scores of 70 or higher
passing_both_per_school = school_data_complete[(school_data_complete['math_score'] >= 70) & (school_data_complete['reading_score'] >= 70)]

# Calculate the number of students per school who passed both math and reading with scores of 70 or higher
passing_both_count_per_school = passing_both_per_school.groupby('school_name')['Student ID'].count()

print(passing_both_count_per_school)

school_name
Bailey High School       2719
Cabrera High School      1697
Figueroa High School     1569
Ford High School         1487
Griffin High School      1330
Hernandez High School    2481
Holden High School        381
Huang High School        1561
Johnson High School      2549
Pena High School          871
Rodriguez High School    2119
Shelton High School      1583
Thomas High School       1487
Wilson High School       2068
Wright High School       1626
Name: Student ID, dtype: int64


In [18]:
# Use the provided code to calculate the passing rates
# Calculate passing rates for math, reading, and both subjects
passing_math_rate_per_school = (passing_math_count_per_school / total_student_count_per_school) * 100
passing_reading_rate_per_school = (passing_reading_count_per_school / total_student_count_per_school) * 100
passing_both_rate_per_school = (passing_both_count_per_school / total_student_count_per_school) * 100

print("Passing Rates for Math:")
print(passing_math_rate_per_school)

print("\nPassing Rates for Reading:")
print(passing_reading_rate_per_school)

print("\nPassing Rates for Both Math and Reading:")
print(passing_both_rate_per_school)

Passing Rates for Math:
school_name
Bailey High School       66.680064
Cabrera High School      94.133477
Figueroa High School     65.988471
Ford High School         68.309602
Griffin High School      93.392371
Hernandez High School    66.752967
Holden High School       92.505855
Huang High School        65.683922
Johnson High School      66.057551
Pena High School         94.594595
Rodriguez High School    66.366592
Shelton High School      93.867121
Thomas High School       93.272171
Wilson High School       93.867718
Wright High School       93.333333
dtype: float64

Passing Rates for Reading:
school_name
Bailey High School       81.933280
Cabrera High School      97.039828
Figueroa High School     80.739234
Ford High School         79.299014
Griffin High School      97.138965
Hernandez High School    80.862999
Holden High School       96.252927
Huang High School        81.316421
Johnson High School      81.222432
Pena High School         95.945946
Rodriguez High School    80.220055

In [19]:
# Create a DataFrame called `per_school_summary` with columns for the calculations above.
# Create per_school_summary DataFrame
per_school_summary = pd.DataFrame({
    'Total Students': total_student_count_per_school,
    'Total School Budget': total_school_budget,
    'Per Student Budget': per_capita_spending_per_school,
    'Average Math Score': average_scores_per_school['math_score'],
    'Average Reading Score': average_scores_per_school['reading_score'],
    '% Passing Math': passing_math_rate_per_school,
    '% Passing Reading': passing_reading_rate_per_school,
    '% Overall Passing': passing_both_rate_per_school
})

print(per_school_summary)

                       Total Students  Total School Budget  \
school_name                                                  
Bailey High School               4976              3124928   
Cabrera High School              1858              1081356   
Figueroa High School             2949              1884411   
Ford High School                 2739              1763916   
Griffin High School              1468               917500   
Hernandez High School            4635              3022020   
Holden High School                427               248087   
Huang High School                2917              1910635   
Johnson High School              4761              3094650   
Pena High School                  962               585858   
Rodriguez High School            3999              2547363   
Shelton High School              1761              1056600   
Thomas High School               1635              1043130   
Wilson High School               2283              1319574   
Wright H

## Highest-Performing Schools (by % Overall Passing)

In [20]:
# Sort the schools by `% Overall Passing` in descending order and display the top 5 rows.
# Sort the schools by '% Overall Passing' in descending order
per_school_summary_sorted = per_school_summary.sort_values(by='% Overall Passing', ascending=False)

# Display the top 5 rows
top_performing_schools = per_school_summary_sorted.head(5)
print(top_performing_schools)

                     Total Students  Total School Budget  Per Student Budget  \
school_name                                                                    
Cabrera High School            1858              1081356               582.0   
Thomas High School             1635              1043130               638.0   
Griffin High School            1468               917500               625.0   
Wilson High School             2283              1319574               578.0   
Pena High School                962               585858               609.0   

                     Average Math Score  Average Reading Score  \
school_name                                                      
Cabrera High School           83.061895              83.975780   
Thomas High School            83.418349              83.848930   
Griffin High School           83.351499              83.816757   
Wilson High School            83.274201              83.989488   
Pena High School              83.839917    

## Bottom Performing Schools (By % Overall Passing)

In [21]:
# Sort the schools by `% Overall Passing` in ascending order and display the top 5 rows.
# Sort the schools by '% Overall Passing' in ascending order
per_school_summary_sorted = per_school_summary.sort_values(by='% Overall Passing', ascending=True)

# Display the top 5 rows
bottom_performing_schools = per_school_summary_sorted.head(5)
print(bottom_performing_schools)

                       Total Students  Total School Budget  \
school_name                                                  
Rodriguez High School            3999              2547363   
Figueroa High School             2949              1884411   
Huang High School                2917              1910635   
Hernandez High School            4635              3022020   
Johnson High School              4761              3094650   

                       Per Student Budget  Average Math Score  \
school_name                                                     
Rodriguez High School               637.0           76.842711   
Figueroa High School                639.0           76.711767   
Huang High School                   655.0           76.629414   
Hernandez High School               652.0           77.289752   
Johnson High School                 650.0           77.072464   

                       Average Reading Score  % Passing Math  \
school_name                                  

## Math Scores by Grade

In [22]:
# Use the code provided to separate the data by grade
# Separate data by grade
grade_9th = school_data_complete[school_data_complete['grade'] == '9th']
grade_10th = school_data_complete[school_data_complete['grade'] == '10th']
grade_11th = school_data_complete[school_data_complete['grade'] == '11th']
grade_12th = school_data_complete[school_data_complete['grade'] == '12th']

# Display the first few rows of each grade DataFrame
print("Grade 9:")
print(grade_9th.head())
print("\nGrade 10:")
print(grade_10th.head())
print("\nGrade 11:")
print(grade_11th.head())
print("\nGrade 12:")
print(grade_12th.head())

# Group by school_name and calculate the mean of math_score for each school
average_math_score_per_school = school_data_complete.groupby('school_name')['math_score'].mean()

print(average_math_score_per_school)

# Create math_scores_by_grade DataFrame
math_scores_by_grade = pd.DataFrame({
    '9th': grade_9th.groupby('school_name')['math_score'].mean(),
    '10th': grade_10th.groupby('school_name')['math_score'].mean(),
    '11th': grade_11th.groupby('school_name')['math_score'].mean(),
    '12th': grade_12th.groupby('school_name')['math_score'].mean()
})

print(math_scores_by_grade)

# Display the DataFrame
print(math_scores_by_grade)


Grade 9:
    Student ID     student_name gender grade        school_name  \
0            0     Paul Bradley      M   9th  Huang High School   
4            4       Bonnie Ray      F   9th  Huang High School   
5            5    Bryan Miranda      M   9th  Huang High School   
12          12  Brittney Walker      F   9th  Huang High School   
13          13     William Long      M   9th  Huang High School   

    reading_score  math_score  School ID      type  size   budget  
0              66          79          0  District  2917  1910635  
4              97          84          0  District  2917  1910635  
5              94          94          0  District  2917  1910635  
12             64          79          0  District  2917  1910635  
13             71          79          0  District  2917  1910635  

Grade 10:
    Student ID      student_name gender grade        school_name  \
8            8      Michael Roth      M  10th  Huang High School   
9            9    Matthew Greene 

## Reading Score by Grade 

In [23]:
# Use the code provided to separate the data by grade
# Separate data by grade for reading scores
reading_grade_9th = school_data_complete[school_data_complete['grade'] == '9th']
reading_grade_10th = school_data_complete[school_data_complete['grade'] == '10th']
reading_grade_11th = school_data_complete[school_data_complete['grade'] == '11th']
reading_grade_12th = school_data_complete[school_data_complete['grade'] == '12th']

# Group by school_name and calculate the mean of reading_score for each school for each grade
average_reading_score_grade_9th = reading_grade_9th.groupby('school_name')['reading_score'].mean()
average_reading_score_grade_10th = reading_grade_10th.groupby('school_name')['reading_score'].mean()
average_reading_score_grade_11th = reading_grade_11th.groupby('school_name')['reading_score'].mean()
average_reading_score_grade_12th = reading_grade_12th.groupby('school_name')['reading_score'].mean()

# Create a DataFrame to combine reading scores by grade
reading_scores_by_grade = pd.DataFrame({
    '9th': average_reading_score_grade_9th,
    '10th': average_reading_score_grade_10th,
    '11th': average_reading_score_grade_11th,
    '12th': average_reading_score_grade_12th
})

print(reading_scores_by_grade)

# Group by school_name and calculate the mean of reading_score for each school
average_reading_score_per_school = school_data_complete.groupby('school_name')['reading_score'].mean()

print(average_reading_score_per_school)


# Create reading_scores_by_grade DataFrame
reading_scores_by_grade = pd.DataFrame({
    '9th': reading_grade_9th.groupby('school_name')['reading_score'].mean(),
    '10th': reading_grade_10th.groupby('school_name')['reading_score'].mean(),
    '11th': reading_grade_11th.groupby('school_name')['reading_score'].mean(),
    '12th': reading_grade_12th.groupby('school_name')['reading_score'].mean()
})

print(reading_scores_by_grade)

# Display the DataFrame
print(reading_scores_by_grade)

                             9th       10th       11th       12th
school_name                                                      
Bailey High School     81.303155  80.907183  80.945643  80.912451
Cabrera High School    83.676136  84.253219  83.788382  84.287958
Figueroa High School   81.198598  81.408912  80.640339  81.384863
Ford High School       80.632653  81.262712  80.403642  80.662338
Griffin High School    83.369193  83.706897  84.288089  84.013699
Hernandez High School  80.866860  80.660147  81.396140  80.857143
Holden High School     83.677165  83.324561  83.815534  84.698795
Huang High School      81.290284  81.512386  81.417476  80.305983
Johnson High School    81.260714  80.773431  80.616027  81.227564
Pena High School       83.807273  83.612000  84.335938  84.591160
Rodriguez High School  80.993127  80.629808  80.864811  80.376426
Shelton High School    84.122642  83.441964  84.373786  82.781671
Thomas High School     83.728850  84.254157  83.585542  83.831361
Wilson Hig

## Scores by School Spending

In [24]:
# Establish the bins
spending_bins = [0, 585, 630, 645, 680]
labels = ["<$585", "$585-630", "$630-645", "$645-680"]

In [25]:
# Create a copy of the per_school_summary DataFrame
school_summary_copy = per_school_summary.copy()

print(school_summary_copy)

                       Total Students  Total School Budget  \
school_name                                                  
Bailey High School               4976              3124928   
Cabrera High School              1858              1081356   
Figueroa High School             2949              1884411   
Ford High School                 2739              1763916   
Griffin High School              1468               917500   
Hernandez High School            4635              3022020   
Holden High School                427               248087   
Huang High School                2917              1910635   
Johnson High School              4761              3094650   
Pena High School                  962               585858   
Rodriguez High School            3999              2547363   
Shelton High School              1761              1056600   
Thomas High School               1635              1043130   
Wilson High School               2283              1319574   
Wright H

In [26]:
# Categorize spending based on the bins
school_summary_copy['Spending Ranges (Per Student)'] = pd.cut(school_summary_copy['Per Student Budget'], bins=spending_bins, labels=labels)

print(school_summary_copy)


                       Total Students  Total School Budget  \
school_name                                                  
Bailey High School               4976              3124928   
Cabrera High School              1858              1081356   
Figueroa High School             2949              1884411   
Ford High School                 2739              1763916   
Griffin High School              1468               917500   
Hernandez High School            4635              3022020   
Holden High School                427               248087   
Huang High School                2917              1910635   
Johnson High School              4761              3094650   
Pena High School                  962               585858   
Rodriguez High School            3999              2547363   
Shelton High School              1761              1056600   
Thomas High School               1635              1043130   
Wilson High School               2283              1319574   
Wright H

In [29]:
# Calculate averages for desired columns based on spending ranges
spending_math_scores = school_summary_copy.groupby(["Spending Ranges (Per Student)"], observed=False)["Average Math Score"].mean()
spending_reading_scores = school_summary_copy.groupby(["Spending Ranges (Per Student)"], observed=False)["Average Reading Score"].mean()
spending_passing_math = school_summary_copy.groupby(["Spending Ranges (Per Student)"], observed=False)["% Passing Math"].mean()
spending_passing_reading = school_summary_copy.groupby(["Spending Ranges (Per Student)"], observed=False)["% Passing Reading"].mean()
overall_passing_spending = school_summary_copy.groupby(["Spending Ranges (Per Student)"], observed=False)["% Overall Passing"].mean()

print("Average Math Scores by Spending Ranges:")
print(spending_math_scores)

print("\nAverage Reading Scores by Spending Ranges:")
print(spending_reading_scores)

print("\nPercentage Passing Math by Spending Ranges:")
print(spending_passing_math)

print("\nPercentage Passing Reading by Spending Ranges:")
print(spending_passing_reading)

print("\nOverall Passing Percentage by Spending Ranges:")
print(overall_passing_spending)

Average Math Scores by Spending Ranges:
Spending Ranges (Per Student)
<$585       83.455399
$585-630    81.899826
$630-645    78.518855
$645-680    76.997210
Name: Average Math Score, dtype: float64

Average Reading Scores by Spending Ranges:
Spending Ranges (Per Student)
<$585       83.933814
$585-630    83.155286
$630-645    81.624473
$645-680    81.027843
Name: Average Reading Score, dtype: float64

Percentage Passing Math by Spending Ranges:
Spending Ranges (Per Student)
<$585       93.460096
$585-630    87.133538
$630-645    73.484209
$645-680    66.164813
Name: % Passing Math, dtype: float64

Percentage Passing Reading by Spending Ranges:
Spending Ranges (Per Student)
<$585       96.610877
$585-630    92.718205
$630-645    84.391793
$645-680    81.133951
Name: % Passing Reading, dtype: float64

Overall Passing Percentage by Spending Ranges:
Spending Ranges (Per Student)
<$585       90.369459
$585-630    81.418596
$630-645    62.857656
$645-680    53.526855
Name: % Overall Passing

In [30]:
# Assemble into DataFrame
# Assemble the calculated averages into a DataFrame
scores_by_spending_df = pd.DataFrame({
    "Average Math Score": spending_math_scores,
    "Average Reading Score": spending_reading_scores,
    "% Passing Math": spending_passing_math,
    "% Passing Reading": spending_passing_reading,
    "% Overall Passing": overall_passing_spending
})

print(scores_by_spending_df)


# Display results
print(scores_by_spending_df)

                               Average Math Score  Average Reading Score  \
Spending Ranges (Per Student)                                              
<$585                                   83.455399              83.933814   
$585-630                                81.899826              83.155286   
$630-645                                78.518855              81.624473   
$645-680                                76.997210              81.027843   

                               % Passing Math  % Passing Reading  \
Spending Ranges (Per Student)                                      
<$585                               93.460096          96.610877   
$585-630                            87.133538          92.718205   
$630-645                            73.484209          84.391793   
$645-680                            66.164813          81.133951   

                               % Overall Passing  
Spending Ranges (Per Student)                     
<$585                           

## Scores by School Size

In [31]:
# Establish the bins.
size_bins = [0, 1000, 2000, 5000]
labels = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]


In [32]:
# Categorize the spending based on the bins
# Use `pd.cut` on the "Total Students" column of the `per_school_summary` DataFrame.
# Categorize the school size based on the bins
per_school_summary["School Size"] = pd.cut(per_school_summary["Total Students"], bins=size_bins, labels=labels)

print(per_school_summary["School Size"])

school_name
Bailey High School        Large (2000-5000)
Cabrera High School      Medium (1000-2000)
Figueroa High School      Large (2000-5000)
Ford High School          Large (2000-5000)
Griffin High School      Medium (1000-2000)
Hernandez High School     Large (2000-5000)
Holden High School            Small (<1000)
Huang High School         Large (2000-5000)
Johnson High School       Large (2000-5000)
Pena High School              Small (<1000)
Rodriguez High School     Large (2000-5000)
Shelton High School      Medium (1000-2000)
Thomas High School       Medium (1000-2000)
Wilson High School        Large (2000-5000)
Wright High School       Medium (1000-2000)
Name: School Size, dtype: category
Categories (3, object): ['Small (<1000)' < 'Medium (1000-2000)' < 'Large (2000-5000)']


In [34]:
# Calculate averages for desired columns based on school size
size_math_scores = per_school_summary.groupby(["School Size"], observed=False)["Average Math Score"].mean()
size_reading_scores = per_school_summary.groupby(["School Size"], observed=False)["Average Reading Score"].mean()
size_passing_math = per_school_summary.groupby(["School Size"], observed=False)["% Passing Math"].mean()
size_passing_reading = per_school_summary.groupby(["School Size"], observed=False)["% Passing Reading"].mean()
size_overall_passing = per_school_summary.groupby(["School Size"], observed=False)["% Overall Passing"].mean()

print("Average Math Scores by School Size:")
print(size_math_scores)

print("\nAverage Reading Scores by School Size:")
print(size_reading_scores)

print("\nPercentage Passing Math by School Size:")
print(size_passing_math)

print("\nPercentage Passing Reading by School Size:")
print(size_passing_reading)

print("\nOverall Passing Percentage by School Size:")
print(size_overall_passing)

Average Math Scores by School Size:
School Size
Small (<1000)         83.821598
Medium (1000-2000)    83.374684
Large (2000-5000)     77.746417
Name: Average Math Score, dtype: float64

Average Reading Scores by School Size:
School Size
Small (<1000)         83.929843
Medium (1000-2000)    83.864438
Large (2000-5000)     81.344493
Name: Average Reading Score, dtype: float64

Percentage Passing Math by School Size:
School Size
Small (<1000)         93.550225
Medium (1000-2000)    93.599695
Large (2000-5000)     69.963361
Name: % Passing Math, dtype: float64

Percentage Passing Reading by School Size:
School Size
Small (<1000)         96.099437
Medium (1000-2000)    96.790680
Large (2000-5000)     82.766634
Name: % Passing Reading, dtype: float64

Overall Passing Percentage by School Size:
School Size
Small (<1000)         89.883853
Medium (1000-2000)    90.621535
Large (2000-5000)     58.286003
Name: % Overall Passing, dtype: float64


In [35]:
# Create a DataFrame called `size_summary` that breaks down school performance based on school size (small, medium, or large).
# Use the scores above to create a new DataFrame called `size_summary`
# Assemble the calculated averages into the size_summary DataFrame
size_summary = pd.DataFrame({
    "Average Math Score": size_math_scores,
    "Average Reading Score": size_reading_scores,
    "% Passing Math": size_passing_math,
    "% Passing Reading": size_passing_reading,
    "% Overall Passing": size_overall_passing
})

print(size_summary)

                    Average Math Score  Average Reading Score  % Passing Math  \
School Size                                                                     
Small (<1000)                83.821598              83.929843       93.550225   
Medium (1000-2000)           83.374684              83.864438       93.599695   
Large (2000-5000)            77.746417              81.344493       69.963361   

                    % Passing Reading  % Overall Passing  
School Size                                               
Small (<1000)               96.099437          89.883853  
Medium (1000-2000)          96.790680          90.621535  
Large (2000-5000)           82.766634          58.286003  


## Scores by School Type

In [41]:
# Group the per_school_summary DataFrame by "School Type" and average the results.
# Group the per_school_summary DataFrame by "School Type" and calculate the mean
type_math_scores = per_school_summary.groupby(["School Type"])["Average Math Score"].mean()
type_reading_scores = per_school_summary.groupby(["School Type"])["Average Reading Score"].mean()
type_passing_math = per_school_summary.groupby(["School Type"])["% Passing Math"].mean()
type_passing_reading = per_school_summary.groupby(["School Type"])["% Passing Reading"].mean()
type_overall_passing = per_school_summary.groupby(["School Type"])["% Overall Passing"].mean()

# Assemble the calculated averages into a DataFrame
type_summary = pd.DataFrame({
    "Average Math Score": type_math_scores,
    "Average Reading Score": type_reading_scores,
    "% Passing Math": type_passing_math,
    "% Passing Reading": type_passing_reading,
    "% Overall Passing": type_overall_passing
})

print(type_summary)

KeyError: 'School Type'

In [40]:
# Assemble the new data by type into a DataFrame called `type_summary`
import pandas as pd

# Create DataFrame with the provided averages
type_summary = pd.DataFrame({
    "Average Math Score": [83.473852, 76.956733],
    "Average Reading Score": [83.896421, 80.966636],
    "% Passing Math": [93.620830, 66.548453],
    "% Passing Reading": [96.586489, 80.799062],
    "% Overall Passing": [90.432244, 53.672208]
}, index=["Charter", "District"])

print(type_summary)

          Average Math Score  Average Reading Score  % Passing Math  \
Charter            83.473852              83.896421       93.620830   
District           76.956733              80.966636       66.548453   

          % Passing Reading  % Overall Passing  
Charter           96.586489          90.432244  
District          80.799062          53.672208  
