<a href="https://colab.research.google.com/github/mileribeiro/school-performance/blob/main/school_performance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NYC School Performance Analysis

> In this project, I used Python and the Pandas library to manipulate and analyze a dataset containing school performance information in New York City. The project is divided into the following stages:

* Data Loading:the CSV file containing school data was loaded from Google Drive using Google Colab.

* Top Math-Performing Schools: I filtered schools with an average math score of 640 or higher and sorted them in descending order. This helped identify which schools excelled in mathematics.

* SAT Total Score Calculation: Using the average scores for math, reading, and writing, I calculated the total SAT score for each school. I then ranked the schools based on this total score, highlighting the top 10 performers.

* Borough Analysis: I grouped schools by borough and calculated the number of schools, average SAT score, and standard deviation to understand performance variations across different regions. The standard deviation analysis helped identify which borough had the highest variability in SAT scores.

In [None]:
# Importing the required libraries
import pandas as pd
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Loading the dataset
school = pd.read_csv('/content/drive/MyDrive/ARQUIVOS/schools.csv')
school.head()

Unnamed: 0,school_name,borough,building_code,average_math,average_reading,average_writing,percent_tested
0,"New Explorations into Science, Technology and ...",Manhattan,M022,657,601,601,
1,Essex Street Academy,Manhattan,M445,395,411,387,78.9
2,Lower Manhattan Arts Academy,Manhattan,M445,418,428,415,65.1
3,High School for Dual Language and Asian Studies,Manhattan,M445,613,453,463,95.9
4,Henry Street School for International Studies,Manhattan,M056,410,406,381,59.7


In [None]:
# Which schools are best for math?
best_math_schools = school[school['average_math'] >= 640][['school_name', 'average_math']].sort_values('average_math', ascending=False)
best_math_schools.head()

Unnamed: 0,school_name,average_math
88,Stuyvesant High School,754
170,Bronx High School of Science,714
93,Staten Island Technical High School,711
365,Queens High School for the Sciences at York Co...,701
68,"High School for Mathematics, Science, and Engi...",683


In [None]:
# Who are the top 10 performing schools?
school['total_SAT'] = school[['average_math', 'average_reading', 'average_writing']].sum(axis=1)
top_10_schools = school[['school_name', 'total_SAT']].sort_values('total_SAT', ascending=False)
top_10_schools.head()

Unnamed: 0,school_name,total_SAT
88,Stuyvesant High School,2144
170,Bronx High School of Science,2041
93,Staten Island Technical High School,2041
174,High School of American Studies at Lehman College,2013
333,Townsend Harris High School,1981


In [None]:
# Which NYC borough has the highest standard deviation for total_SAT?
boroughs = school.groupby("borough")["total_SAT"].agg(["count", "mean", "std"]).round(2)
boroughs_rename = boroughs.rename(columns={"count": "num_schools", "mean": "average_SAT", "std": "std_SAT"})
boroughs_rename

Unnamed: 0_level_0,num_schools,average_SAT,std_SAT
borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bronx,98,1202.72,150.39
Brooklyn,109,1230.26,154.87
Manhattan,89,1340.13,230.29
Queens,69,1345.48,195.25
Staten Island,10,1439.0,222.3


In [None]:
largest_std_dev = boroughs_rename[boroughs_rename['std_SAT'] == boroughs_rename['std_SAT'].max()]
largest_std_dev

Unnamed: 0_level_0,num_schools,average_SAT,std_SAT
borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Manhattan,89,1340.13,230.29
