## SAT Performance Analysis of American High Schools

In-depth analysis of SAT scores from American high schools, examining performance trends across schools, districts, and states. 

The analysis includes insights into the literacy, numeracy, and writing skills of students as measured by the SAT test.

<img src="../img/sat.jpeg" width="1100" height="330">


In [1]:
# Importing python libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# Loading the sat-results dataset
schools = pd.read_csv("../data/project_datasets/sat-results-analysis/schools.csv")
schools.head()

Unnamed: 0,school_name,borough,building_code,average_math,average_reading,average_writing,percent_tested
0,"New Explorations into Science, Technology and ...",Manhattan,M022,657,601,601,
1,Essex Street Academy,Manhattan,M445,395,411,387,78.9
2,Lower Manhattan Arts Academy,Manhattan,M445,418,428,415,65.1
3,High School for Dual Language and Asian Studies,Manhattan,M445,613,453,463,95.9
4,Henry Street School for International Studies,Manhattan,M056,410,406,381,59.7


## Summary Statistics

In [3]:
schools.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 375 entries, 0 to 374
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   school_name      375 non-null    object 
 1   borough          375 non-null    object 
 2   building_code    375 non-null    object 
 3   average_math     375 non-null    int64  
 4   average_reading  375 non-null    int64  
 5   average_writing  375 non-null    int64  
 6   percent_tested   355 non-null    float64
dtypes: float64(1), int64(3), object(3)
memory usage: 20.6+ KB


In [4]:
schools.describe()

Unnamed: 0,average_math,average_reading,average_writing,percent_tested
count,375.0,375.0,375.0,355.0
mean,432.944,424.504,418.458667,64.976338
std,71.952373,61.881069,64.548599,18.747634
min,317.0,302.0,284.0,18.5
25%,386.0,386.0,382.0,50.95
50%,415.0,413.0,403.0,64.8
75%,458.5,445.0,437.5,79.6
max,754.0,697.0,693.0,100.0


In [64]:
# Number of unique boroughs
print(f"Unique boroughs: {schools['borough'].nunique()} : {schools['borough'].unique()}")

Unique boroughs: 5 : ['Manhattan' 'Staten Island' 'Bronx' 'Queens' 'Brooklyn']


In [57]:
# Schools per borough
schools_per_borough =  schools.groupby("borough")["school_name"].count()
schools_per_borough

borough
Bronx             98
Brooklyn         109
Manhattan         89
Queens            69
Staten Island     10
Name: school_name, dtype: int64

## Analytics

**1. Which NYC schools have the best math results?**

The best math results are at least 80% of the *maximum possible score of 800* for math.

In [5]:
best_math_schools = schools[schools["average_math"] >= 640][["school_name", "average_math"]].sort_values(by="average_math", ascending=False).reset_index(drop=True)
best_math_schools

Unnamed: 0,school_name,average_math
0,Stuyvesant High School,754
1,Bronx High School of Science,714
2,Staten Island Technical High School,711
3,Queens High School for the Sciences at York Co...,701
4,"High School for Mathematics, Science, and Engi...",683
5,Brooklyn Technical High School,682
6,Townsend Harris High School,680
7,High School of American Studies at Lehman College,669
8,"New Explorations into Science, Technology and ...",657
9,Eleanor Roosevelt High School,641


**2. What are the top 10 performing schools based on the combined SAT scores?**

In [7]:
schools["total_SAT"] = schools["average_math"] + schools["average_reading"] + schools["average_writing"]

In [13]:
top_10_schools = schools[["school_name", "total_SAT"]].sort_values(by="total_SAT", ascending=False).head(10).reset_index(drop=True)

In [14]:
top_10_schools

Unnamed: 0,school_name,total_SAT
0,Stuyvesant High School,2144
1,Bronx High School of Science,2041
2,Staten Island Technical High School,2041
3,High School of American Studies at Lehman College,2013
4,Townsend Harris High School,1981
5,Queens High School for the Sciences at York Co...,1947
6,Bard High School Early College,1914
7,Brooklyn Technical High School,1896
8,Eleanor Roosevelt High School,1889
9,"High School for Mathematics, Science, and Engi...",1889


**3. Which single borough has the largest standard deviation in the combined SAT score?**

The DataFrame contains one row, with:
- "borough" - the name of the NYC borough with the largest standard deviation of "total_SAT"
- "num_schools" - the number of schools in the borough.
- "average_SAT" - the mean of "total_SAT".
- "std_SAT" - the standard deviation of "total_SAT"

In [25]:
schools

Unnamed: 0,school_name,borough,building_code,average_math,average_reading,average_writing,percent_tested,total_SAT
0,"New Explorations into Science, Technology and ...",Manhattan,M022,657,601,601,,1859
1,Essex Street Academy,Manhattan,M445,395,411,387,78.9,1193
2,Lower Manhattan Arts Academy,Manhattan,M445,418,428,415,65.1,1261
3,High School for Dual Language and Asian Studies,Manhattan,M445,613,453,463,95.9,1529
4,Henry Street School for International Studies,Manhattan,M056,410,406,381,59.7,1197
...,...,...,...,...,...,...,...,...
370,"Queens High School for Information, Research, ...",Queens,Q465,372,362,352,44.6,1086
371,Rockaway Park High School for Environmental Su...,Queens,Q410,357,381,376,38.5,1114
372,Channel View School for Research,Queens,Q410,427,430,423,76.6,1280
373,Rockaway Collegiate High School,Queens,Q410,399,403,405,46.5,1207


In [None]:
# The NYC borough that has the highest standard deviation for total_SAT?

borough = schools.groupby("borough")["total_SAT"].agg(["count", "std","mean"]).round(2)
borough

Unnamed: 0_level_0,count,std,mean
borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bronx,98,150.39,1202.72
Brooklyn,109,154.87,1230.26
Manhattan,89,230.29,1340.13
Queens,69,195.25,1345.48
Staten Island,10,222.3,1439.0


In [None]:
# Filtering for max std

largest_std = borough[borough["std"] == borough["std"].max()]

In [47]:
largest_std

Unnamed: 0_level_0,count,std,mean
borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Manhattan,89,230.29,1340.13


In [48]:
# Rename the columns

largest_std = largest_std.rename(columns = {"count" : "num_schools", "mean": "average_SAT", "std": "std_SAT"})
largest_std

Unnamed: 0_level_0,num_schools,std_SAT,average_SAT
borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Manhattan,89,230.29,1340.13


In [51]:
largest_std.reset_index(inplace=True)
largest_std

Unnamed: 0,borough,num_schools,std_SAT,average_SAT
0,Manhattan,89,230.29,1340.13
