## Notebook-Dokumentation
- Auswertung der Fahrzeugtypen pro Crash und ihrer Häufigkeit.
- Analyse der `contributing_factor_*`-Felder und Schwere (injuries/fatalities) je Faktor.
- Unterstützt Priorisierung von häufigen Risikofaktoren.


Lädt Crash-CSV, harmonisiert Spalten, baut crash_datetime; liefert LazyFrame scan.

In [1]:
# Load data
from pathlib import Path
import polars as pl

pl.Config.set_tbl_rows(500)

SCHEMA = {
    "CRASH DATE": pl.Utf8,
    "CRASH TIME": pl.Utf8,
    "BOROUGH": pl.Utf8,
    "ZIP CODE": pl.Utf8,
    "LATITUDE": pl.Float64,
    "LONGITUDE": pl.Float64,
    "LOCATION": pl.Utf8,
    "ON STREET NAME": pl.Utf8,
    "CROSS STREET NAME": pl.Utf8,
    "OFF STREET NAME": pl.Utf8,
    "NUMBER OF PERSONS INJURED": pl.Int64,
    "NUMBER OF PERSONS KILLED": pl.Int64,
    "NUMBER OF PEDESTRIANS INJURED": pl.Int64,
    "NUMBER OF PEDESTRIANS KILLED": pl.Int64,
    "NUMBER OF CYCLIST INJURED": pl.Int64,
    "NUMBER OF CYCLIST KILLED": pl.Int64,
    "NUMBER OF MOTORIST INJURED": pl.Int64,
    "NUMBER OF MOTORIST KILLED": pl.Int64,
    "CONTRIBUTING FACTOR VEHICLE 1": pl.Utf8,
    "CONTRIBUTING FACTOR VEHICLE 2": pl.Utf8,
    "CONTRIBUTING FACTOR VEHICLE 3": pl.Utf8,
    "CONTRIBUTING FACTOR VEHICLE 4": pl.Utf8,
    "CONTRIBUTING FACTOR VEHICLE 5": pl.Utf8,
    "COLLISION_ID": pl.Int64,
    "VEHICLE TYPE CODE 1": pl.Utf8,
    "VEHICLE TYPE CODE 2": pl.Utf8,
    "VEHICLE TYPE CODE 3": pl.Utf8,
    "VEHICLE TYPE CODE 4": pl.Utf8,
    "VEHICLE TYPE CODE 5": pl.Utf8,
}

DATA_PATH = Path("../../raw_data/nypd/Motor_Vehicle_Collisions_Crashes.csv")
scan = pl.scan_csv(DATA_PATH, schema=SCHEMA, infer_schema_length=2000, null_values=[""])
rename_map = {name: name.lower().replace(" ", "_") for name in scan.columns}
scan = scan.rename(rename_map)
scan = scan.with_columns(
    pl.concat_str([pl.col("crash_date"), pl.col("crash_time")], separator=" ")
    .str.strptime(pl.Datetime, "%m/%d/%Y %H:%M", strict=False)
    .alias("crash_datetime")
)


  rename_map = {name: name.lower().replace(" ", "_") for name in scan.columns}


Zählt häufigste contributing_factor_vehicle_* (ohne Unspecified) und listet Top 25

In [None]:
factor_cols = [f"contributing_factor_vehicle_{i}" for i in range(1, 6)]
factors = pl.concat([scan.select(pl.col(c).alias("factor")) for c in factor_cols])
factor_counts = (
    factors.filter(pl.col("factor").is_not_null() & (pl.col("factor") != "Unspecified"))
    .group_by("factor")
    .agg(pl.len().alias("records"))
    .sort("records", descending=True)
    .limit(25)
    .collect()
)
factor_counts


factor,records
str,u32
"""Driver Inattention/Distraction""",557570
"""Failure to Yield Right-of-Way""",152146
"""Following Too Closely""",143522
"""Other Vehicular""",108371
"""Backing Unsafely""",89722
"""Passing or Lane Usage Improper""",78324
"""Passing Too Closely""",66286
"""Turning Improperly""",64131
"""Fatigued/Drowsy""",59483
"""Unsafe Lane Changing""",51335


Aggregiert Faktoren mit Summen von Verletzten/Toten, sortiert nach Verletzten (Top 25).

In [3]:
factor_severity = (
    pl.concat(
        [
            scan.select(
                [
                    pl.col("number_of_persons_injured").alias("injured"),
                    pl.col("number_of_persons_killed").alias("killed"),
                    pl.col(c).alias("factor"),
                ]
            )
            for c in factor_cols
        ]
    )
    .filter(pl.col("factor").is_not_null() & (pl.col("factor") != "Unspecified"))
    .group_by("factor")
    .agg(
        [
            pl.len().alias("records"),
            pl.sum("injured").alias("injured"),
            pl.sum("killed").alias("killed"),
        ]
    )
    .sort(["injured", "records"], descending=True)
    .limit(25)
    .collect()
)
factor_severity


factor,records,injured,killed
str,u32,i64,i64
"""Driver Inattention/Distraction""",557570,201199,483
"""Failure to Yield Right-of-Way""",152146,80132,322
"""Following Too Closely""",143522,63290,27
"""Traffic Control Disregarded""",49412,36157,328
"""Other Vehicular""",108371,34550,53
"""Unsafe Speed""",41173,30001,488
"""Passing or Lane Usage Improper""",78324,18798,55
"""Fatigued/Drowsy""",59483,16458,3
"""Turning Improperly""",64131,15557,27
"""Driver Inexperience""",43623,14871,80


Zählt häufigste vehicle_type_code_* und listet Top 25

In [4]:
vehicle_cols = [f"vehicle_type_code_{i}" for i in range(1, 6)]
vehicles = pl.concat([scan.select(pl.col(c).alias("vehicle_type")) for c in vehicle_cols])
vehicle_counts = (
    vehicles.filter(pl.col("vehicle_type").is_not_null())
    .group_by("vehicle_type")
    .agg(pl.len().alias("records"))
    .sort("records", descending=True)
    .limit(25)
    .collect()
)
vehicle_counts


vehicle_type,records
str,u32
"""Sedan""",1158157
"""Station Wagon/Sport Utility Ve…",918594
"""PASSENGER VEHICLE""",769982
"""SPORT UTILITY / STATION WAGON""",337507
"""UNKNOWN""",105463
"""Taxi""",98650
"""Pick-up Truck""",76363
"""4 dr sedan""",73540
"""TAXI""",60768
"""Box Truck""",59403


Aggregiert Fahrzeugtypen mit Summen von Verletzten/Toten, sortiert nach Verletzten (Top 25).

In [5]:
vehicle_severity = (
    pl.concat(
        [
            scan.select(
                [
                    pl.col(c).alias("vehicle_type"),
                    pl.col("number_of_persons_injured").alias("injured"),
                    pl.col("number_of_persons_killed").alias("killed"),
                ]
            )
            for c in vehicle_cols
        ]
    )
    .filter(pl.col("vehicle_type").is_not_null())
    .group_by("vehicle_type")
    .agg(
        [
            pl.len().alias("records"),
            pl.sum("injured").alias("injured"),
            pl.sum("killed").alias("killed"),
        ]
    )
    .sort(["injured", "records"], descending=True)
    .limit(25)
    .collect()
)
vehicle_severity


vehicle_type,records,injured,killed
str,u32,i64,i64
"""Sedan""",1158157,455806,1305
"""Station Wagon/Sport Utility Ve…",918594,354168,1321
"""PASSENGER VEHICLE""",769982,202923,610
"""SPORT UTILITY / STATION WAGON""",337507,87267,366
"""Bike""",57438,49045,207
"""Taxi""",98650,36324,83
"""Pick-up Truck""",76363,22178,130
"""4 dr sedan""",73540,21952,55
"""Bus""",47358,16126,106
"""BICYCLE""",19331,15496,59
