# Project 1

## Overview

- Read in data
- Compute **mean, median, mode** using `pandas`
- Repeat the same using only the **Python standard library** 
- Create a **text-based visualization** using only the **standard library**

### Dataset & Source

- **Dataset title:** 2018-2019 School Demographic Snapshot
- **Primary source:** NYC Open Data Portal
- **Link to source dataset:** (https://data.cityofnewyork.us/Education/2018-2019-School-Demographic-Snapshot/45j8-f6um/about_data)

### Step 1 
**Save and load dataset**


In [16]:
import pandas as pd 
df_school = pd.read_csv("school.csv") #load csv file
df_school.head() #See the top 5 rows

Unnamed: 0,DBN,School Name,Year,Total Enrollment,Grade PK (Half Day & Full Day),Grade K,Grade 1,Grade 2,Grade 3,Grade 4,...,% Multiple Race Categories Not Represented,# White,% White,# Students with Disabilities,% Students with Disabilities,# English Language Learners,% English Language Learners,# Poverty,% Poverty,Economic Need Index
0,01M015,P.S. 015 Roberto Clemente,2018-19,174,13,20,33,30,30,20,...,0.6%,6,3.4%,38,21.8%,8,4.6%,145,83.3%,88%
1,01M019,P.S. 019 Asher Levy,2018-19,249,10,30,39,43,41,44,...,3.6%,18,7.2%,92,36.9%,8,3.2%,180,72.3%,66.8%
2,01M020,P.S. 020 Anna Silver,2018-19,481,42,61,69,76,67,90,...,3.3%,22,4.6%,115,23.9%,63,13.1%,316,65.7%,73.8%
3,01M034,P.S. 034 Franklin D. Roosevelt,2018-19,305,17,26,20,16,21,43,...,1.3%,9,3%,114,37.4%,22,7.2%,301,98.7%,94.4%
4,01M063,The STAR Academy - P.S.63,2018-19,230,23,49,41,28,40,25,...,3.9%,22,9.6%,73,31.7%,4,1.7%,176,76.5%,73.2%


### Step 2 - Clean data set
**Convert Total Enrollment from a string to numeric type to avoid errors**



In [17]:
df_school["Total Enrollment"] = pd.to_numeric(df_school["Total Enrollment"], errors="coerce") #Convert Total Enrollment to numeric type

### Step 3 - Analyze Total Enrollment
**Calculate mean, median and mode**

In [18]:
# Calculate summary statistics for Total Enrollment
mean_enrollment = df_school["Total Enrollment"].mean()
median_enrollment = df_school["Total Enrollment"].median()
mode_enrollment = df_school["Total Enrollment"].mode()[0]

print(f"Mean total enrollment:   {mean_enrollment:,.0f}")
print(f"Median total enrollment: {median_enrollment:,.0f}")
print(f"Mode total enrollment:   {mode_enrollment:,.0f}")

Mean total enrollment:   465
Median total enrollment: 433
Mode total enrollment:   414


### Step 4 - Analyze Total Enrollment (Using Standard Library)
**Calculate mean, median and mode (No pandas)**

In [19]:
import csv

# Read the "Total Enrollment" values from the CSV
enrollments = []

with open("school.csv", "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        value = row["Total Enrollment"].strip()
        if value.isdigit():           # only keep numbers
            enrollments.append(int(value))

# Mean
total = 0
for num in enrollments:
    total += num
mean_val = total / len(enrollments)

# Median
enrollments.sort()
n = len(enrollments)
if n % 2 == 0:
    median_val = (enrollments[n//2 - 1] + enrollments[n//2]) / 2
else:
    median_val = enrollments[n//2]

# Mode (using a dictionary)
counts = {}                     # make an empty dictionary
for num in enrollments:
    counts[num] = counts.get(num, 0) + 1   # add 1 each time we see the number

mode_val = max(counts, key=counts.get)     # find the number with the highest count

# Print results
print("Mean:", round(mean_val))
print("Median:", round(median_val))
print("Mode:", mode_val)


Mean: 465
Median: 433
Mode: 414


### Step 5 - Create a simple text-based visualization

In [20]:
# Get the first 10 enrollment values from pandas
data = df_school["Total Enrollment"].head(10).tolist()  # convert Series to a list

print("NYC SCHOOL ENROLLMENTS (First 10 Schools)")
print("X-axis: Total enrollment (students)")
print("Y-axis: School number (1–10)")
print("-" * 60)

# Scale bars so they fit on screen
scale = 5  # 1 asterisk = 5 students

for i, num in enumerate(data, start=1):
    try:
        num = int(num)
        bar = "*" * max(1, num // scale)
        print(f"School {i:>2} | {bar} ({num})")
    except (ValueError, TypeError):
        continue


NYC SCHOOL ENROLLMENTS (First 10 Schools)
X-axis: Total enrollment (students)
Y-axis: School number (1–10)
------------------------------------------------------------
School  1 | ********************************** (174)
School  2 | ************************************************* (249)
School  3 | ************************************************************************************************ (481)
School  4 | ************************************************************* (305)
School  5 | ********************************************** (230)
School  6 | ******************************************** (223)
School  7 | ************************************************************************** (372)
School  8 | ******************************************************** (281)
School  9 | ***************************************************************************** (385)
School 10 | ********************************************************************** (353)
