<small> We import the `os` module to handle file system operations like creating directories and managing file paths. We also import `pandas`, the core Python library for data manipulation and analysis, which enables us to read, write, and transform tabular data efficiently. </small>

In [None]:
import os
import pandas as pd

<small> This line ensures that the directory `"data/raw"` exists before saving any files there. The parameter `exist_ok=True` prevents an error if the folder already exists, making the operation safe to run multiple times without interruption. </small>

In [11]:
#Ensures the folder exist before you try to save files there. 
os.makedirs("data/raw", exist_ok=True)

<small> Here we load two key datasets directly from online sources:

The NYC ZIP code data is necessary for population analysis.
The NYC 311 complaints data is also loaded from an online source, limiting the data to 1,000 records for initial analysis. 

These datasets form the foundation for our analysis, linking complaint records to population data by ZIP code.
</small>

In [None]:
# Load NYC 311 complaints data 
data_url1 = "https://data.cityofnewyork.us/api/odata/v4/erm2-nwe9?$top=1000"
# Read JSON, get 'value' list, then flatten to DataFrame
raw_complaints = pd.read_json(data_url1)
complaints_df = pd.json_normalize(raw_complaints["value"])

# Load ZIP Code-level population data 
data_url2 = "https://data.cityofnewyork.us/resource/pri4-ifjk.json"
# This endpoint returns a JSON array directly, so just read_json works
pop_df = pd.read_json(data_url2)

<small> After loading the datasets, we save each one locally as CSV files for future use. We define separate file paths for the complaints data and the population data within the `"data/raw"` directory. Using `.to_csv()`, we export each DataFrame without the index column to keep the files clean. Printing the file paths confirms successful saving and provides easy reference for subsequent processing steps. </small>

In [13]:
# Define separate paths
complaints_path = "data/raw/nyc_311_raw.csv"
pop_path = "data/raw/nyc_zcta_population.csv"

# Save each dataset to its own file
complaints_df.to_csv(complaints_path, index=False)
pop_df.to_csv(pop_path, index=False)

print(f"Saved complaints to: {complaints_path}")
print(f"Saved population data to: {pop_path}")

Saved complaints to: data/raw/nyc_311_raw.csv
Saved population data to: data/raw/nyc_zcta_population.csv
