# Project 1

2010 - 2016 School Safety Report
keep only borough & safety score columns
In this cell, we load the dataset from a public source.

In [54]:
import pandas as pd
project = pd.read_csv("https://data.cityofnewyork.us/resource/qybk-bjjc.csv")
project.head()

Unnamed: 0,school_year,building_code,dbn,location_name,location_code,address,borough,geographical_district_code,register,building_name,...,borough_name,postcode,latitude,longitude,community_board,community_council,census_tract,bin,bbl,nta
0,2015-16,K415,84K736,New American Academy Charter School,K736,5800 TILDEN AVENUE,K,18.0,261.0,5800 TILDEN AVENUE CONSOLIDATED LOCATION,...,BROOKLYN,11203.0,40.648657,-73.922171,17.0,45.0,936.0,3104598.0,3047420000.0,Rugby-Remsen Village ...
1,2015-16,M488,,231-249 EAST 56 STREET CONSOLIDATED LOCATION,,231-249 EAST 56 STREET,M,2.0,2015.0,231-249 EAST 56 STREET CONSOLIDATED LOCATION,...,MANHATTAN,10022.0,40.75924,-73.967203,6.0,4.0,108.0,1088892.0,1013300000.0,Turtle Bay-East Midtown ...
2,2015-16,K195,22K195,P.S. 195 Manhattan Beach,K195,131 IRWIN STREET,K,22.0,458.0,,...,BROOKLYN,11235.0,40.579451,-73.943819,15.0,48.0,620.0,3246519.0,3087520000.0,Sheepshead Bay-Gerritsen Beach-Manhattan Beach...
3,2015-16,K415,18K629,Cultural Academy for the Arts and Sciences,K629,5800 TILDEN AVENUE,K,18.0,354.0,5800 TILDEN AVENUE CONSOLIDATED LOCATION,...,BROOKLYN,11203.0,40.648657,-73.922171,17.0,45.0,936.0,3104598.0,3047420000.0,Rugby-Remsen Village ...
4,2015-16,M075,03M075,P.S. 075 Emily Dickinson,M075,735 WEST END AVENUE,M,3.0,569.0,735 WEST END AVENUE CONSOLDATED LOCATION,...,MANHATTAN,10025.0,40.795157,-73.973292,7.0,6.0,183.0,1034190.0,1012530000.0,Upper West Side ...


then filter out unnecessary columns to focus on `borough_name` and `major_n`, which represent the borough names and the numeric values of major crime committed in schools in each borough. 

In [58]:
project = project[['borough_name', 'major_n']]

convert crimes to numeric and calculate mean, median, and mode

In [60]:
project['major_n'] = pd.to_numeric(project['major_n'], errors='coerce')
mean_value = project['major_n'].mean()
median_value = project['major_n'].median()
mode_value = project['major_n'].mode()

print(mean_value)
print(median_value)
print(mode_value)

0.4444444444444444
0.0
0    0.0
Name: major_n, dtype: float64


The mean value of `major_n` is approximately 0.44. This low average indicates that many values in this column are close to zero, suggesting either a sparse dataset or one where lower values are frequent.The median value is 0.0, meaning that at least half of the entries in `major_n` are zero. This reinforces the idea that zero values are prevalent in the data, potentially indicating a large number of records with no activity or counts in this column.
The mode is also 0.0, which is the most frequently occurring value in `major_n`. This further confirms that zero is highly common in this dataset, possibly indicating that the `major_n` column includes many entries with no recorded activity.

These statistics suggest that a significant portion of the data might be concentrated around zero values, and it could be helpful to examine the distribution further or filter out zeros to focus on more substantial entries.

calculate the mean, median, and mode manually using only Python’s standard library. 
first convert the column data to a list and use Python operations to calculate each statistic. This manual method allows us to see the structure of these calculations more clearly.

In [62]:
values = project['major_n'].dropna().tolist()
mean = sum(values) / len(values)
print(mean)

sorted_values = sorted(values)
n = len(values)
if n % 2 == 0:
    median = (sorted_values[n // 2 - 1] + sorted_values[n // 2]) / 2
else:
    median = sorted_values[n // 2]
print(median)

counts = {}
for value in values:
    counts[value] = counts.get(value, 0) + 1
mode = max(counts, key=counts.get)
print(mode)


0.4444444444444444
0.0
0.0


To avoid repetition and better understand each borough's aggregate contribution, we group the data by `borough_name` and sum the `major_n` values. By grouping, we create a cleaner dataset with a single row per borough, reflecting the total values of `major_n` across all records for that borough. This approach allows for a focused and easily interpreted visualization.

In [64]:
project = project.dropna(subset=['major_n'])
project = project[project['major_n'] > 0]  
project_grouped = project.groupby('borough_name')['major_n'].sum().reset_index()
values = project_grouped['major_n'].tolist()
boroughs = project_grouped['borough_name'].tolist()
max_value = max(values)


create a sparkline (using stars `*`) to represent each borough's relative `major_n` value compared to the maximum in the dataset. Each star `*` represents a scaled portion of the total value for that borough, with up to 10 stars indicating the maximum value.  The more stars a borough has, the higher its `major_n` value relative to other boroughs. This simple visualization allows us to quickly see which boroughs have higher or lower values at a glance.



In [66]:

for borough, value in zip(boroughs, values):
    stars = int((value / max_value) * 10) 
    print(f"{borough}: {'*' * stars}")

BROOKLYN : **********
MANHATTAN: *****


Brooklyn has the highest number of major crimes in schools, shown by the maximum of 10 stars.
Manhattan’s count of major crimes in schools is half of Brooklyn's, represented by 5 stars.