# Identifying Dangerous Locations (Hot Spots)

Identifying locations with a high number of collisions, especially collisions resulting in injury or death, helps us prioritize locations to study and improve

### Different Approaches
 - A simple approach to identifying dangerous locations is to find single coordinates that have a large number of injuries or fatalities. This will often (but not always) be higher-volume intersections
 - Clustering algorithms can be used to identify areas, not just intersections, that have a significant concentration of  serious collisions. The below clustering maps used the Density-Based Spatial Clustering of Applications with Noise algorithm (DBSCAN).

_Note that the following analysis presupposes some knowledge of NYC geography_

### Definitions
- __Collision:__ A motor vehicle collision involving injuries, deaths, or a significant amount of property damage (~> $1000) reported on a New York State form, MV104-AN
- __Serious Collision:__ A collision where at least one person is injured or killed

### Data Sources

- Collision data obtained from https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95
- Data was processed by running `process_raw.data.py` which saves processed data to local directory specified in `process_raw.data.py` script
- _Note that \~10% of collisions are missing lat-long coordinates. These collisions are excluded from the following geographic analyses._

In [1]:
import os.path
import folium
import folium.plugins
import pandas as pd
from sklearn.cluster import DBSCAN
from src.constants import NYC_MAP_CENTER

# Parameters 

In [2]:
PROCESSED_CRASH_DATA = "data/processed/crashes.pkl"
IMG_DIR = "output/hotspots"
LAT_LONG = ["LAT", "LONG"]

Preparing data

In [3]:
crashes = pd.read_pickle(PROCESSED_CRASH_DATA)

In [4]:
# useful slices that will later be mutated - TODO: perform copy in function 
serious = crashes[crashes["valid_lat_long"] & crashes["serious"]]
pedestrian = serious[serious["pedestrian"]]
cyclist = serious[serious["cyclist"]]

Helper functions to prepare maps

In [5]:
def id_worst_points(df: pd. DataFrame, num_worst: int, lat_long_cols: list, casualty_cols: list, fatality_weight=10):
    """
    Return the specified number of worst points.

    num_worst: The number of points to identify.
    fatality_weight: Weight of death relative to injuries for purposes of 
        ranking worst points. Increase weight to increase focus on points with deaths.
    """
    bad_df = df.copy()
    bad = bad_df.groupby(by=lat_long_cols)[casualty_cols].sum()
    bad["score"] = bad[casualty_cols[0]] + (bad[casualty_cols[1]] * fatality_weight)
    bad = bad.reset_index()
    worst = bad.sort_values(by="score", ascending=False).iloc[:num_worst]
    return worst


def make_worst_map(df, casualty_names: list[str]):
    f_map = folium.Map(location=NYC_MAP_CENTER, zoom_start=10, tiles="OpenStreetMap")
    for row in df.itertuples():
        info_str = f"{casualty_names[0]}: {int(row[3])}\n{casualty_names[1]}: {int(row[4])}"
        icon = folium.map.Icon(color="red", icon="fa-exclamation", angle=0, prefix="fa")
        folium.map.Marker(
            (row[1], row[2]), icon=icon, tooltip=info_str, popup=info_str, fill="red"
        ).add_to(f_map)
    return f_map


def make_worst_cluster_map(df: pd. DataFrame, num_worst: int, lat_long_cols: list, casualty_cols: list, fatality_weight=10, eps=0.00085, min_samples=50):
    """
    Return the specified number of worst clusters.

    num_worst: The number of clusters to identify.
    fatality_weight: Weight of death relative to injuries for purposes of 
        ranking worst points. Increase weight to increase focus on points with deaths.
    """
    bad_df = df.copy()
    # create clusters
    # eps=0.00075  - 0.0011 seems to produce useful clusters. Higher number is wider range
    clusters = DBSCAN(eps=eps, min_samples=min_samples, metric="euclidean").fit(bad_df[lat_long_cols])
    bad_df["CLUSTER"] = clusters.labels_  # -1 means does not belong to cluster
    # assign score to cluster
    cluster_groupby = bad_df[bad_df["CLUSTER"] > -1].groupby(by=["CLUSTER"])[casualty_cols].sum()
    cluster_groupby["score"] = cluster_groupby[casualty_cols[0]] + (cluster_groupby[casualty_cols[1]] * fatality_weight)
    # id worst to create mask
    cluster_numbers = cluster_groupby.sort_values(by="score", ascending=False)
    cluster_numbers = cluster_numbers.iloc[:num_worst].index.to_list()
    cluster_mask = bad_df["CLUSTER"].isin(cluster_numbers)

    # create map
    cols = lat_long_cols + ["DATE", "TIME"] + casualty_cols
    cols.append("CLUSTER")
    cluster_map_data = bad_df.loc[cluster_mask, cols]
    cluster_map = folium.Map(location=NYC_MAP_CENTER, zoom_start=10, tiles="OpenStreetMap")
    js_callback = (
        "function (row) {"
        "var colors = ['red', 'blue', 'gray', 'orange', 'black'];"
        "var color = colors[row[6] % colors.length];"
        "var marker = L.marker(new L.LatLng(row[0], row[1]));"
        "var icon = L.AwesomeMarkers.icon({"
        "icon: 'fa-exclamation',"
        "iconColor: 'white',"
        "markerColor: color,"
        "prefix: 'fa',"
        "});"
        "marker.setIcon(icon);"
        "var popup = L.popup({maxWidth: '300'});"
        "const display_text = {text:"
        f"'{cols[6]}: ' + row[6] + '<br>' +"
        f"'{cols[2]}: ' + row[2] + '<br>' +"
        f"'{cols[3]}: ' + row[3] + '<br>' +"
        f"'{cols[4]}: ' + row[4] + '<br>' +"
        f"'{cols[5]}: ' + row[5]}};"
        "var poptext = $(`<div id='mytext' style='width: 100.0%; height: 100.0%;'> ${display_text.text}</div>`)[0];"
        "popup.setContent(poptext);"
        "marker.bindPopup(popup);"
        "return marker};"
    )
    folium.plugins.FastMarkerCluster(data=cluster_map_data, callback=js_callback).add_to(
        cluster_map
    )
    return cluster_map

# Dangerous Single Points

___Dangerous single points___ are single coordinates with a large number of injuries or fatalities

- We see that dangerous single points are spread throughout the city (with the exception of Staten Island where they are less common) when we select 100 of the most dangerous locations
<br><br>
- __The worst single points in NYC are__:
    - Where West Fordam Road crosses the Major Deegan Expressway in the Bronx
    - Two coordinates on the Verrazano Bridge
    - A single coordinate on the Whitestone Bridge
    - The intersection of Guy R. Brewer Blvd. with Rockaway Blvd. next to Kennedy Airport
    - Where East 138th Street intersects Bruckner Blvd. and the Bruckner Expressway in the Bronx
    - The intersection of Linden Blvd. and Pennsylvania Ave. in Brooklyn
<br><br>
- __Several highways stand out as having multiple single coordinates with high numbers of injuries and deaths. For example:__ 
    - The Belt Parkway near Kennedy Airport in Queens
    - The Bruckner Expressway in the Bronx
    - The Cross Bronx Expressway between the Alexander Hamilton Bridge and 3rd Ave (in the Bronx)
<br><br>
- __Several non-highway roadways stand out as having multiple dangerous single coordinates. For example:__
    - Eastern Parkway in Brooklyn
    - Utica Avenue in Brooklyn
    - Woodhaven Blvd. in Queens
    - Fordham Road in the Bronx
    - 125th Street in Manhattan

In [6]:
worst_serious = id_worst_points(serious, 100, LAT_LONG, ["INJURED", "KILLED"])
serious_map = make_worst_map(worst_serious, ["INJURED", "KILLED"])
serious_map.save(os.path.join(IMG_DIR,  "points_serious_map.html"))
serious_map

# Dangerous Single Points for Pedestrians

In [7]:
worst_pedestrian = id_worst_points(pedestrian, 100, LAT_LONG, ["PEDESTRIAN INJURED", "PEDESTRIAN KILLED"])
pedestrian_map = make_worst_map(worst_pedestrian, ["PEDESTRIAN INJURED", "PEDESTRIAN KILLED"])
pedestrian_map.save(os.path.join(IMG_DIR, "points_pedestrian_map.html"))
pedestrian_map

# Dangerous Single Points for Cyclists

In [8]:
worst_cyclist = id_worst_points(cyclist, 100, LAT_LONG, ["CYCLIST INJURED", "CYCLIST KILLED"])
cyclist_map = make_worst_map(worst_cyclist, ["CYCLIST INJURED", "CYCLIST KILLED"])
cyclist_map.save(os.path.join(IMG_DIR, "points_cyclist_map.html"))
cyclist_map

# Clustering Serious Collisions
- Clustering performed using Density-Based Spatial Clustering of Applications with Noise algorithm (DBSCAN)
- The parameters `eps` and `min_samples` can be tuned to produce clusters of desired size
    - See [sklearn documentation](https://scikit-learn.org/stable/modules/clustering.html) for more information

In [9]:
casualty_cols = ["INJURED", "KILLED"]
serious_cluster_map = make_worst_cluster_map(serious, 50, LAT_LONG, casualty_cols, fatality_weight=10, eps=0.00080, min_samples=50)
serious_cluster_map.save(os.path.join(IMG_DIR, "clusters_serious_map.html"))
serious_cluster_map

# Clustering Pedestrian Collisions

In [10]:
casualty_cols = ["PEDESTRIAN INJURED", "PEDESTRIAN KILLED"]
pedestrian_cluster_map = make_worst_cluster_map(pedestrian, 50, LAT_LONG, casualty_cols, fatality_weight=10, eps=0.0010, min_samples=25)
pedestrian_cluster_map.save(os.path.join(IMG_DIR, "clusters_pedestrian_map.html"))
pedestrian_cluster_map

# Clustering Cyclist Collisions

In [11]:
casualty_cols = ["CYCLIST INJURED", "CYCLIST KILLED"]
cyclist_cluster_map = make_worst_cluster_map(cyclist, 50, LAT_LONG, casualty_cols, fatality_weight=10, eps=0.0010, min_samples=25)
cyclist_cluster_map.save(os.path.join(IMG_DIR, "clusters_cyclist_map.html"))
cyclist_cluster_map

# Conclusions

- Serious collisions, including collisions where pedestrians or cyclists are injured or killed is fairly widespread throughout NYC
- Clustering can be used to reveal multi-block areas with elevated injuries and deaths that would be missed by tracking high-collision intersections


# Follow-Up Questions
- To what extent have relevant government and community groups identified these hotspots?
- What steps are being taken to reduce collisions at these locations? 