## **Advanced Analyitcs and Applications - Data Collection Strategy**

Desciption 

#### Team: 
- Robin Reiners
- Saied Farham Nia

##### **Table of Contents**

0. [Notebook Setup](#Notebook-Set-Up-and-Imports)
1. [Introduction](#Introduction)

7. [References](#References)

##### **Notebook Set Up and Imports**

In [1]:
%%html
<style>
.dataframe th {
    font-family: "JetBrainsMono Nerd Font";
}
.dataframe td {
    font-family: "JetBrainsMono Nerd Font";
}
</style>

In [2]:
import importlib
import os
import pickle
import subprocess
import sys
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import requests
import seaborn as sns
import yaml

In [3]:
sys.path.append(str(Path.cwd().parent))
from src.utils.notebook_setup import load_files, setup_notebook

style_manager = setup_notebook()

if str(Path().resolve()).split("/")[-1] == "AAA":
    print("already set repo root")
else:
    notebooks_dir = Path().resolve()
    repo_root = notebooks_dir.parent
    results_dir = repo_root / "data" / "results"
    os.chdir(repo_root)

## Introduction
[Back to Table of Contents](#Table-of-Contents)

##### **1.1 Primary Dataset: Chicago Taxi Trips 2024**
URL: https://data.cityofchicago.org/Transportation/Taxi-Trips-2024-/ajtu-isnz/about_data

Format: CSV (likely large file)

Key considerations:
- File size may require preprocessing with sed or xsv
- Spatial data at census tract level (privacy protection)
- Need spatial discretization using H3 hexagons


##### **1.2 Weather Data Collection**
URL: https://www.ncei.noaa.gov/access/past-weather/chicago

Target: Hourly weather data for 2024
Key variables: Temperature, precipitation, wind speed, humidity, visibility

Alternative Sources:
- OpenWeatherMap API (historical data)
- Weather Underground
- Kaggle weather datasets

##### **1.3 Point of Interest (POI) Data (Optional)**
Source: OpenStreetMap (OSM)
Tools: Overpass API, OSMnx library
Categories: Restaurants, hotels, entertainment, transportation hubs, hospitals, schools

In [4]:
data_dir = Path("../data")
raw_data_dir = data_dir / "raw"
processed_data_dir = data_dir / "processed"
weather_data_dir = data_dir / "weather"

for directory in [raw_data_dir, processed_data_dir, weather_data_dir]:
    directory.mkdir(parents=True, exist_ok=True)

print("📁 Data directories created successfully!")

📁 Data directories created successfully!


In [10]:
from src.api.taxi import ChicagoTaxiAPI

In [17]:

api = ChicagoTaxiAPI()

# Get metadata
metadata = api.get_metadata()
if metadata:
    print(f"📊 Dataset: {metadata.get('name')}")
    print(f"📊 Total rows: {metadata.get('totalRows', 'Unknown')}")

api.fetch_data(limit=100, start_date="2025-05-01")

📊 Dataset: Taxi Trips (2024-)
📊 Total rows: Unknown


Unnamed: 0,trip_id,taxi_id,trip_start_timestamp,trip_end_timestamp,trip_seconds,trip_miles,pickup_community_area,dropoff_community_area,fare,tips,...,payment_type,company,pickup_centroid_latitude,pickup_centroid_longitude,pickup_centroid_location,dropoff_centroid_latitude,dropoff_centroid_longitude,dropoff_centroid_location,pickup_census_tract,dropoff_census_tract
0,0a11567bd996c51ac32efbb3e60f923c063aabdc,55ef60184d508809c50a84b8c378a8572fcc0f5bd2ff49...,2025-05-01T00:00:00.000,2025-05-01T00:00:00.000,291,1.27,3.0,6.0,6.25,0.0,...,Prcard,Flash Cab,41.96581197,-87.655878786,"{'type': 'Point', 'coordinates': [-87.65587878...",41.944226601,-87.655998182,"{'type': 'Point', 'coordinates': [-87.65599818...",,
1,122e816a1b04575c94de158252e4e3a544b9f900,52c3ffa685a3b5ced3d16461deec5a0326086bee3d8f6a...,2025-05-01T00:00:00.000,2025-05-01T00:15:00.000,1733,14.95,76.0,3.0,38.75,8.65,...,Credit Card,Sun Taxi,41.980264315,-87.913624596,"{'type': 'Point', 'coordinates': [-87.91362459...",41.96581197,-87.655878786,"{'type': 'Point', 'coordinates': [-87.65587878...",,
2,181bc2b0347f1364d50dc2312b97f48cd686658a,cf278f6c67e799170264672cf78527e136318d96aefd5f...,2025-05-01T00:00:00.000,2025-05-01T00:15:00.000,1319,15.05,76.0,,37.25,0.0,...,Credit Card,Taxicab Insurance Agency Llc,41.97907082,-87.903039661,"{'type': 'Point', 'coordinates': [-87.90303966...",,,,17031980000.0,
3,1d17183a28c299443a40163dc3e6d50e10f054f1,d511072131b602026bdb9faa5491d15c3af8d62dc00659...,2025-05-01T00:00:00.000,2025-05-01T00:00:00.000,420,3.1,8.0,32.0,10.75,0.0,...,Cash,Taxi Affiliation Services,41.899602111,-87.633308037,"{'type': 'Point', 'coordinates': [-87.63330803...",41.878865584,-87.625192142,"{'type': 'Point', 'coordinates': [-87.62519214...",,
4,25264cb08e7db926efd88cab3c35d1df4995e11c,c84c28526a906ef1ad0ea7dc570f97949ecf92dfe156cb...,2025-05-01T00:00:00.000,2025-05-01T00:15:00.000,882,7.72,76.0,17.0,21.0,0.0,...,Cash,Globe Taxi,41.980264315,-87.913624596,"{'type': 'Point', 'coordinates': [-87.91362459...",41.94651142,-87.806020002,"{'type': 'Point', 'coordinates': [-87.80602000...",,
5,3592419fe47b3f8b8c6e9df61d50421dd71a32e1,f32c0c8e63d4e5d4bfc7e9d57e5d0f6dcf28450850245d...,2025-05-01T00:00:00.000,2025-05-01T00:15:00.000,1560,12.7,76.0,2.0,33.25,7.55,...,Credit Card,Taxi Affiliation Services,41.980264315,-87.913624596,"{'type': 'Point', 'coordinates': [-87.91362459...",42.001571027,-87.695012589,"{'type': 'Point', 'coordinates': [-87.69501258...",,
6,3662c715b13e717144c2d713c31b28000f3232cc,51482afe455eeface5c7492f4dc7638fd2c3a7e10f9174...,2025-05-01T00:00:00.000,2025-05-01T00:15:00.000,1415,13.6,56.0,8.0,35.0,5.92,...,Mobile,Globe Taxi,41.79259236,-87.769615453,"{'type': 'Point', 'coordinates': [-87.76961545...",41.899602111,-87.633308037,"{'type': 'Point', 'coordinates': [-87.63330803...",,
7,3fdaa95736223ad2fadf303e32519742f8970142,d744f003d8f56f6a8b53b97c0589575fa9a975e9d12f66...,2025-05-01T00:00:00.000,2025-05-01T00:15:00.000,943,11.66,8.0,10.0,29.75,0.0,...,Prcard,5 Star Taxi,41.899602111,-87.633308037,"{'type': 'Point', 'coordinates': [-87.63330803...",41.985015101,-87.804532006,"{'type': 'Point', 'coordinates': [-87.80453200...",,
8,420d71c9f73c8b157b83e53a36435339acfef112,34766262f2e312774b1ad4651b99dc23b780dbd00b658f...,2025-05-01T00:00:00.000,2025-05-01T00:15:00.000,1254,10.92,56.0,33.0,29.5,8.75,...,Mobile,Taxicab Insurance Agency Llc,41.79259236,-87.769615453,"{'type': 'Point', 'coordinates': [-87.76961545...",41.857183858,-87.620334624,"{'type': 'Point', 'coordinates': [-87.62033462...",,
9,445512e4c4cedf6a9280375ac86b1ebea1c97c9b,63d895bf335c522af83a8f2c608e31bb46a5d78cde4df2...,2025-05-01T00:00:00.000,2025-05-01T00:30:00.000,1792,18.84,76.0,8.0,47.5,0.0,...,Cash,Blue Ribbon Taxi Association,41.980264315,-87.913624596,"{'type': 'Point', 'coordinates': [-87.91362459...",41.899602111,-87.633308037,"{'type': 'Point', 'coordinates': [-87.63330803...",,
