# Project 1


I'll be doing calculations on [Active NYC Health Code Regulated Childcare Programs](https://data.cityofnewyork.us/api/views/gy3q-4tzp/rows.csv?accessType=DOWNLOAD) data using pandas. Specifically, we are looking at the number of children allowed column to analyze the capacity of child care programs in NYC.

## Step 0

The data needs to available on the machine where Python is running in order to process it, so we will download from the  site directly:

1. Open https://data.cityofnewyork.us/api/views/gy3q-4tzp/rows.csv?accessType=DOWNLOAD in the browser, which should download the CSV document.
2. Move the CSV to the same directory as this notebook.

## Step 1

Read the NYC childcare data.

In [1]:
import pandas as pd

url = "https://data.cityofnewyork.us/api/views/gy3q-4tzp/rows.csv?accessType=DOWNLOAD"
df = pd.read_csv(url)
df.to_csv("childcare.csv", index=False)


In [24]:
print(df.columns.tolist())

['DCID', 'PERMIT NUMBER', 'PROGRAM NAME', 'FACILITY TYPE', 'PROGRAM TYPE', 'STREET ADDRESS', 'BOROUGH', 'ZIP CODE', 'PHONE NUMBER', 'CHILD AGE RANGE', 'CHILDREN ALLOWED IN CARE', 'ADMINISTER MEDICATION', 'BIN', 'BBL', 'COMMUNITY BOARD', 'COUNCIL DISTRICT', 'CENSUS TRACT', 'NTA CODE', 'LATITUDE', 'LONGITUDE']


## Step 2

Calculate the Average Mean of the children allowed in care column

In [30]:
# Calculate the average (mean) children allowed in care 
average_capacity = df['CHILDREN ALLOWED IN CARE'].mean()

# Print the result
print("Average capacity:", average_capacity)

Average capacity: 63.11817026683609


## Step 3

Calculate the Median of the children allowed in care column

In [29]:
# Calculate the median of children allowed in care 
median_capacity = df['CHILDREN ALLOWED IN CARE'].median()

# Print the result
print("Median of capacity:", median_capacity)

Median of capacity: 48.0


## Step 4

Calculate the Mode of the children allowed in care column

In [28]:
# Calculate the mode of children allowed in care 
mode_capacity = df['CHILDREN ALLOWED IN CARE'].mode()[0] #consulted ChatGPT to add [0] to get the first mode

# Print the result
print("Mode of capacity:", mode_capacity)

Mode of capacity: 18.0


## Step 5

Calculate the mean median mode of children allowed in care in the hard way 

In [19]:
import csv
with open("childcare.csv", encoding="utf-8") as f:
    print(next(csv.reader(f)))

['DCID', 'Permit Number', 'Program Name', 'Facility Type', 'Program Type', 'Street Address', 'Borough', 'Zip Code', 'Phone Number', 'Child Age Range', 'Children Allowed in Care', 'Administer Medication', 'BIN', 'BBL', 'Community Board', 'Council District', 'Census Tract', 'NTA Code', 'Latitude', 'Longitude']


In [31]:
import csv

# Step 1: Read the file
filename = "childcare.csv"  

with open(filename, newline='', encoding='utf-8') as f: #consulted ChatGPT to add encoding
    reader = csv.DictReader(f) 
    capacities = []

    # Step 2: collect valid numeric values
    for row in reader:
        value = row.get('Children Allowed in Care', '').strip()  # exact column name
        if value and value.upper() != "NO DATA": #consulted ChatGPT to handle "NO DATA" entries
            try:
                capacities.append(int(value))
            except ValueError:
                continue

# Step 3: calculate mean
mean_val = sum(capacities) / len(capacities)

# Step 4: calculate median
sorted_vals = sorted(capacities)
n = len(sorted_vals)
if n % 2 == 1:
    median_val = sorted_vals[n // 2]
else:
    median_val = (sorted_vals[n // 2 - 1] + sorted_vals[n // 2]) / 2

# Step 5: calculate mode using a dictionary
freq = {}
for val in capacities:
    freq[val] = freq.get(val, 0) + 1
max_freq = max(freq.values())
modes = [k for k, v in freq.items() if v == max_freq]

# Step 6: print results
print(f"Count: {len(capacities)}")
print(f"Average of capacity: {mean_val:.2f}")
print(f"Median of capacity: {median_val}")
print(f"Mode of capacity: {modes}")


Count: 2361
Average of capacity: 63.12
Median of capacity: 48
Mode of capacity: [18]


## Step 6

Calculate the mean median mode of children allowed in care in the hard way 

In [32]:
# Values from previous calculations
mean_val = 63.12
median_val = 48
mode_val = 18

# Build the table data
stats = {
    "Mean (average)": f"{mean_val:.2f}",
    "Median": f"{median_val}",
    "Mode": f"{mode_val}"
}

# Print formatted table (consulted ChatGPT for formatting)
print("\nNYC CHILDCARE CAPACITY STATISTICS")
print("Units: Number of children allowed per facility\n")

# Table header
print(f"{'Statistic':<20} | {'Value (children)':>20}")
print("-" * 45)

# Table rows
for key, value in stats.items():
    print(f"{key:<20} | {value:>20}")

print("\n(Each statistic represents the number of children allowed per childcare program.)")



NYC CHILDCARE CAPACITY STATISTICS
Units: Number of children allowed per facility

Statistic            |     Value (children)
---------------------------------------------
Mean (average)       |                63.12
Median               |                   48
Mode                 |                   18

(Each statistic represents the number of children allowed per childcare program.)
