## Notebook-Dokumentation
- Analysiert räumliche Verteilung nach Boroughs sowie Top-Straßen/Intersection-Hotspots.
- Bildet Cluster/Heatmap-Basis aus Koordinaten (Plausibilitätscheck auf fehlende Lat/Lng).
- Liefert Ranking-Tabellen für lokale Drilldowns.


Lädt Crash-CSV, castet Koordinaten, baut crash_datetime, filtert auf Cyclist-Involved; liefert LazyFrame scan.

In [2]:
from pathlib import Path
import polars as pl
import folium

pl.Config.set_tbl_rows(500)

SCHEMA = {
    "CRASH DATE": pl.Utf8,
    "CRASH TIME": pl.Utf8,
    "BOROUGH": pl.Utf8,
    "ZIP CODE": pl.Utf8,
    "LATITUDE": pl.Utf8,
    "LONGITUDE": pl.Utf8,
    "LOCATION": pl.Utf8,
    "ON STREET NAME": pl.Utf8,
    "CROSS STREET NAME": pl.Utf8,
    "OFF STREET NAME": pl.Utf8,
    "NUMBER OF PERSONS INJURED": pl.Int64,
    "NUMBER OF PERSONS KILLED": pl.Int64,
    "NUMBER OF PEDESTRIANS INJURED": pl.Int64,
    "NUMBER OF PEDESTRIANS KILLED": pl.Int64,
    "NUMBER OF CYCLIST INJURED": pl.Int64,
    "NUMBER OF CYCLIST KILLED": pl.Int64,
    "NUMBER OF MOTORIST INJURED": pl.Int64,
    "NUMBER OF MOTORIST KILLED": pl.Int64,
    "CONTRIBUTING FACTOR VEHICLE 1": pl.Utf8,
    "CONTRIBUTING FACTOR VEHICLE 2": pl.Utf8,
    "CONTRIBUTING FACTOR VEHICLE 3": pl.Utf8,
    "CONTRIBUTING FACTOR VEHICLE 4": pl.Utf8,
    "CONTRIBUTING FACTOR VEHICLE 5": pl.Utf8,
    "COLLISION_ID": pl.Int64,
    "VEHICLE TYPE CODE 1": pl.Utf8,
    "VEHICLE TYPE CODE 2": pl.Utf8,
    "VEHICLE TYPE CODE 3": pl.Utf8,
    "VEHICLE TYPE CODE 4": pl.Utf8,
    "VEHICLE TYPE CODE 5": pl.Utf8,
}

DATA_PATH = Path("../../raw_data/nypd/Motor_Vehicle_Collisions_Crashes.csv")
scan = pl.scan_csv(DATA_PATH, schema=SCHEMA, infer_schema_length=2000, null_values=[""])
rename_map = {name: name.lower().replace(" ", "_") for name in scan.columns}
scan = scan.rename(rename_map)
scan = scan.with_columns(
    [
    pl.col("latitude").str.replace(",", ".").cast(pl.Float64, strict=False),
    pl.col("longitude").str.replace(",", ".").cast(pl.Float64, strict=False),
    pl.concat_str([pl.col("crash_date"), pl.col("crash_time")], separator=" ")
    .str.strptime(pl.Datetime, "%m/%d/%Y %H:%M", strict=False)
    .alias("crash_datetime")
    ]
)
scan =     scan.filter(
        pl.col("crash_datetime").is_not_null()
        & (
            (pl.col("number_of_cyclist_injured") > 0)
            | (pl.col("number_of_cyclist_killed") > 0)
        )
    )


  rename_map = {name: name.lower().replace(" ", "_") for name in scan.columns}


Ermittelt Top-Straßen nach Crash-Zahl inkl. Injured/Killed; Tabelle (Top 20).

In [5]:
streets = (
    scan.filter(pl.col("on_street_name").is_not_null())
    .group_by("on_street_name")
    .agg(
        [
            pl.len().alias("crashes"),
            pl.sum("number_of_persons_injured").alias("injured"),
            pl.sum("number_of_persons_killed").alias("killed"),
        ]
    )
    .sort("crashes", descending=True)
    .limit(20)
    .collect()
)
streets


on_street_name,crashes,injured,killed
str,u32,i64,i64
"""BROADWAY …",906,935,3
"""5 AVENUE …",492,502,1
"""BROADWAY""",442,451,2
"""BEDFORD AVENUE …",420,433,0
"""2 AVENUE …",389,395,0
"""3 AVENUE …",388,391,5
"""1 AVENUE …",328,337,3
"""ROOSEVELT AVENUE …",298,301,1
"""FULTON STREET …",281,283,0
"""MYRTLE AVENUE …",275,276,1


Ermittelt Top-Kreuzungen (on+cross street) mit Crash-Zahl und Schwere; Tabelle (Top 20).

In [6]:
intersections = (
    scan.filter(pl.col("on_street_name").is_not_null() & pl.col("cross_street_name").is_not_null())
    .group_by(["on_street_name", "cross_street_name"])
    .agg(
        [
            pl.len().alias("crashes"),
            pl.sum("number_of_persons_injured").alias("injured"),
            pl.sum("number_of_persons_killed").alias("killed"),
        ]
    )
    .sort("crashes", descending=True)
    .limit(20)
    .collect()
)
intersections


on_street_name,cross_street_name,crashes,injured,killed
str,str,u32,i64,i64
"""CHRYSTIE STREET …","""GRAND STREET""",15,18,0
"""GRAND STREET …","""LORIMER STREET …",14,14,0
"""JAY STREET …","""TILLARY STREET""",13,14,0
"""ROOSEVELT AVENUE …","""126 STREET …",13,12,1
"""BEDFORD AVENUE …","""ATLANTIC AVENUE …",12,12,0
"""GRAND STREET …","""UNION AVENUE …",12,12,0
"""DELANCEY STREET …","""CLINTON STREET …",11,12,0
"""ROOSEVELT AVENUE …","""114 STREET …",11,11,0
"""WEST DRIVE …","""WEST LAKE DRIVE""",11,20,0
"""FULTON STREET …","""NOSTRAND AVENUE …",10,10,0


Aggregiert Crash-Hotspots nach Lat/Lng, erstellt Folium-Karte (Kreise, Farbe nach Killed>0); öffnet Browser-Map.

In [7]:
hotspots = (
    scan.filter(pl.col("latitude").is_not_null() & pl.col("longitude").is_not_null())
    .group_by(["latitude", "longitude"])
    .agg(
        [
            pl.len().alias("crashes"),
            pl.sum("number_of_persons_injured").alias("injured"),
            pl.sum("number_of_persons_killed").alias("killed"),
        ]
    )
    .sort("crashes", descending=True)
    .collect()
    .to_pandas()
)

m = folium.Map(location=[40.72, -73.98], zoom_start=11, tiles="CartoDB positron")

# KEIN MarkerCluster verwenden
# cluster = MarkerCluster().add_to(m)  <-- Diese Zeile entfernen

for _, row in hotspots.iterrows():
    popup = f"Crashes: {row['crashes']} | Injured: {row['injured']} | Killed: {row['killed']}"
    
    folium.CircleMarker(
        location=[row["latitude"], row["longitude"]],
        radius=3,
        # Logik für Farbe beibehalten
        color="#d62728" if row["killed"] > 0 else "#3186cc",
        fill=True,
        fill_opacity=0.6,
        popup=popup,
    ).add_to(m) # <-- WICHTIG: Direkt zu 'm' hinzufügen, nicht zu 'cluster'

m.show_in_browser()


Your map should have been opened in your browser automatically.
Press ctrl+c to return.


Berechnet Hotspots pro Jahr (Lat/Lng, Jahr) und erzeugt farbcodierte Folium-Karte; Ausgabe im Browser.

In [8]:
hotspots_by_year = (
    scan.with_columns(pl.col("crash_datetime").dt.year().alias("year"))
    .filter(pl.col("latitude").is_not_null() & pl.col("longitude").is_not_null())
    .group_by(["latitude", "longitude", "year"])
    .agg(
        [
            pl.len().alias("crashes"),
            pl.sum("number_of_persons_injured").alias("injured"),
            pl.sum("number_of_persons_killed").alias("killed"),
        ]
    )
    .sort("crashes", descending=True)
    .collect()
    .to_pandas()
)

palette = [
    "#ffa600", "#ffab0e", "#98a02c", "#96d627", "#00ff15",
    "#00e86c", "#0073ff", "#1500ff", "#c109ff", "#ff0000",
]
years = sorted(hotspots_by_year["year"].unique())
color_map = {year: palette[i % len(palette)] for i, year in enumerate(years)}

m_yearly = folium.Map(location=[40.72, -73.98], zoom_start=11, tiles="CartoDB positron")

for _, row in hotspots_by_year.iterrows():
    year = int(row["year"])
    color = color_map[year]
    popup = (
        f"Year: {year} | Crashes: {int(row['crashes'])} | "
        f"Injured: {int(row['injured'])} | Killed: {int(row['killed'])}"
    )
    folium.CircleMarker(
        location=[row["latitude"], row["longitude"]],
        radius=3,
        color=color,
        fill=True,
        fill_color=color,
        fill_opacity=0.6,
        popup=popup,
    ).add_to(m_yearly)

m_yearly.show_in_browser()

Your map should have been opened in your browser automatically.
Press ctrl+c to return.
