# Identifying Dangerous Locations (Hot Spots)

Identifying locations with a high number of collisions, especially collisions resulting in injury or death, helps us prioritize locations to study and improve

### Different Approaches
 - A simple approach to identifying dangerous locations is to find single coordinates that have a large number of injuries or fatalities. This will often (but not always) be higher-volume intersections
 - Clustering algorithms can be used to identify areas, not just intersections, that have a significant concentration of  serious collisions. The below clustering maps used the Density-Based Spatial Clustering of Applications with Noise algorithm (DBSCAN).

_Note that the following analysis presupposes some knowledge of NYC geography_

### Definitions
- __Collision:__ A motor vehicle collision involving injuries, deaths, or a significant amount of property damage (~> $1000) reported on a New York State form, MV104-AN
- __Serious Collision:__ A collision where at least one person is injured or killed
- __Non-Motorist:__ A pedestrian or cyclist (not the driver or passenger of a motor vehicle)


### Data Sources

- Collision data obtained from https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95
- Data was processed by running `process_raw.data.py` which saves processed data to local directory specified in `process_raw.data.py` script
- _Note that \~12% of collisions are missing lat-long coordinates. These collisions are excluded from the following geographic analyses._

In [1]:
import os.path
import folium
import folium.plugins
import pandas as pd
from sklearn.cluster import DBSCAN

# Parameters 

In [2]:
PROCESSED_CRASH_DATA = "data/processed/crashes.pkl"
IMG_DIR = "output/hotspots"
NYC_MAP_CENTER = (40.73, -73.92)

Preparing data

In [3]:
crashes = pd.read_pickle(PROCESSED_CRASH_DATA)

In [4]:
# useful slices that will later be mutated
serious = crashes[crashes["valid_lat_long"] & crashes["serious"]].copy()
non_motorist = serious[serious["non-motorist"]].copy()

# Dangerous Single Points

___Dangerous single points___ are single coordinates with a large number of injuries or fatalities

- We see that dangerous single points are spread throughout the city (with the exception of Staten Island where they are less common) when we select 250 of the most dangerous locations
<br><br>
- __The worst single points in NYC are__:
    - Where West Fordam Road crosses the Major Deegan Expressway in the Bronx
    - Two coordinates on the Verrazano Bridge
    - A single coordinate on the Whitestone Bridge
    - The intersection of Guy R. Brewer Blvd. with Rockaway Blvd. next to Kennedy Airport
    - Where East 138th Street intersects Bruckner Blvd. and the Bruckner Expressway in the Bronx
<br><br>
- __Several highways stand out as having multiple single coordinates with high numbers of injuries and deaths. For example:__ 
    - The Belt Parkway near Kennedy Airport in Queens
    - The Bruckner Expressway in the Bronx
    - The Cross Bronx Expressway between the Alexander Hamilton Bridge and 3rd Ave (in the Bronx)
<br><br>
- __Several non-highway roadways stand out as having multiple dangerous single coordinates. For example:__
    - Eastern Parkway in Brooklyn
    - Utica Avenue in Brooklyn
    - Woodhaven Blvd. in Queens
    - Fordham Road in the Bronx
    - 125th Street in Manhattan

In [5]:
MOST_SERIOUS = 250  # number of most dangerous locations to identify
FATALITY_MULTIPLE = 10  # weight assigned to fatalities relative to injuries

dangerous = serious.groupby(by=["LAT", "LONG"])[["INJURED", "KILLED"]].sum()
dangerous["danger"] = dangerous["INJURED"] + (dangerous["KILLED"] * FATALITY_MULTIPLE)
dangerous = dangerous.reset_index()
most_dangerous = dangerous.sort_values(by="danger", ascending=False).iloc[:MOST_SERIOUS]

danger_map = folium.Map(location=NYC_MAP_CENTER, zoom_start=10, tiles="OpenStreetMap")
for row in most_dangerous.itertuples():
    info_str = f"INJURED: {int(row[3])}\nKILLED: {int(row[4])}"
    icon = folium.map.Icon(color="red", icon="fa-exclamation", angle=0, prefix="fa")
    folium.map.Marker(
        (row[1], row[2]), icon=icon, tooltip=info_str, popup=info_str, fill="red"
    ).add_to(danger_map)
danger_map.save(os.path.join(IMG_DIR, "points_serious_map.html"))
danger_map

# Dangerous Single Points for Pedestrians and Cyclists

- We see that dangerous single points for pedestrians and cyclists are spread throughout the city (with Staten Island having fewer)
<br><br>
- __The worst single coordinates in NYC for pedestrians and cyclists are the intersections of:__
    - Eastern Parkway and Utica Avenue in Brooklyn
    - West Houston Street and West Street at Pier 40 / Hudson River Park
    - 42nd Street and 7th Avenue in Manhattan
    - Atlantic Avenue and Rockaway Avenue in Brooklyn
    - 125th Street and Malcolm X Blvd. in Manhattan
<br><br>

- __Several roadways stand out as having multiple dangerous single coordinates for pedestrians and cyclists. For example:__
    - Eastern Parkway in Brooklyn
    - Flatbush Avenue in Brooklyn
    - Atlantic Avenue in Brooklyn
    - Ocean Parkway in Brooklyn
    - Main Street in Queens
    - Queens Blvd. in Queens
    - Fordham Road in the Bronx
    - East 149th Street in both the Bronx and Manhattan
    - East 138th Street in the Bronx
    - 2nd Avenue from \~60th Street to 14th Street in Manhattan
    - 125th Street in Manhattan
    - 116th Street in Manhattan
    - 96th Street in Manhattan
    - 42th Street in Manhattan
    - 34th Street in Manhattan
    - 23th Street in Manhattan
    - 14th Street in Manhattan
    - Houston Street in Manhattan
    - Delancey Street in Manhattan
    - Canal Street in Manhattan

In [6]:
MOST_NON_MOTORIST_SERIOUS = 250  # number of most dangerous locations to identify
FATALITY_MULTIPLE = 10  # weight assigned to fatalities relative to injuries

dangerous_non_motor = non_motorist.groupby(by=["LAT", "LONG"])[
    ["PEDESTRIAN INJURED", "CYCLIST INJURED", "PEDESTRIAN KILLED", "CYCLIST KILLED",]
].sum()
dangerous_non_motor["danger"] = (
    dangerous_non_motor["PEDESTRIAN INJURED"]
    + dangerous_non_motor["CYCLIST INJURED"]
    + (dangerous_non_motor["PEDESTRIAN KILLED"] * FATALITY_MULTIPLE)
    + (dangerous_non_motor["CYCLIST KILLED"] * FATALITY_MULTIPLE)
)
dangerous_non_motor = dangerous_non_motor.reset_index()
most_dangerous_non_motor = dangerous_non_motor.sort_values(
    by="danger", ascending=False
).iloc[:MOST_NON_MOTORIST_SERIOUS]

danger_non_motor_map = folium.Map(
    location=NYC_MAP_CENTER, zoom_start=10, tiles="OpenStreetMap"
)
for row in most_dangerous_non_motor.itertuples():
    info_str = f"PEDESTRIAN INJURED: {int(row[3])}\nCYCLIST INJURED: {int(row[4])}\nPEDESTRIAN KILLED: {int(row[5])}\nCYCLIST KILLED: {int(row[6])}"
    icon = folium.map.Icon(color="red", icon="fa-exclamation", angle=0, prefix="fa")
    folium.map.Marker(
        (row[1], row[2]), icon=icon, tooltip=info_str, popup=info_str, fill="red"
    ).add_to(danger_non_motor_map)
danger_non_motor_map.save(os.path.join(IMG_DIR, "points_non_motor_map.html"))
danger_non_motor_map

# Clustering Serious Collisions

- Clustering performed using Density-Based Spatial Clustering of Applications with Noise algorithm (DBSCAN)
- The parameters `eps` and `min_samples` can be tuned to produce clusters of desired size
- __Clustering reveals concentrations of serious collisions that were not as prominent when focusing on single points (usually intersections). For example__:
    - The Park Avenue and 23rd Street area in Manhattan
    - 6th Avenue between 16th and 29th Streets in Manhattan
    - Malcolm X Blvd. between 122nd and 133rd Streets in Manhattan
    - Pennslyvania Avenue from Liberty Avenue to the Jackie Robinson Parkway in Brooklyn
    - The Linden Blvd. and Pennsylvania Avenue area in Brooklyn
    - The Metropolitan Avenue, Union Avenue, Brooklyn-Queens Expressway area in Brooklyn
    - The area near the Van Wyck Expressway, Queens Blvd., Jamaica Blvd., and Hillside Avenue in Queens
    - Astoria Blvd. between 31st and 33rd Streets and vicinity

In [7]:
SERIOUS_CLUSTERS = 200  # number of most dangerous clusters to identify
FATALITY_MULTIPLE = 10  # weight assigned to fatalities relative to injuries

serious_clusters = DBSCAN(eps=0.0008, min_samples=50, metric="euclidean").fit(
    serious[["LAT", "LONG"]]  # eps=0.0008 seems to produce useful clusters
)
serious["cluster"] = serious_clusters.labels_  # -1 means does not belong to cluster
cluster_groupby = (
    serious[serious["cluster"] > -1]
    .groupby(by=["cluster"])[["INJURED", "KILLED"]]
    .sum()
)
cluster_groupby["danger"] = cluster_groupby["INJURED"] + (
    cluster_groupby["KILLED"] * FATALITY_MULTIPLE
)
cluster_numbers = cluster_groupby.sort_values(by="danger", ascending=False)
cluster_numbers = cluster_numbers.iloc[:SERIOUS_CLUSTERS].index.to_list()

cluster_mask = serious["cluster"].isin(cluster_numbers)
fields_to_use = ["LAT", "LONG", "DATE", "TIME", "INJURED", "KILLED", "cluster"]
cluster_map_data = serious.loc[cluster_mask, fields_to_use]
cluster_map = folium.Map(location=NYC_MAP_CENTER, zoom_start=10, tiles="OpenStreetMap")
js_callback = (
    "function (row) {"
    "var colors = ['red', 'blue', 'gray', 'orange', 'black'];"
    "var color = colors[row[6] % colors.length];"
    "var marker = L.marker(new L.LatLng(row[0], row[1]));"
    "var icon = L.AwesomeMarkers.icon({"
    "icon: 'fa-exclamation',"
    "iconColor: 'white',"
    "markerColor: color,"
    "prefix: 'fa',"
    "});"
    "marker.setIcon(icon);"
    "var popup = L.popup({maxWidth: '300'});"
    "const display_text = {text:"
    "'CLUSTER: ' + row[6] + '<br>' +"
    "'DATE: ' + row[2] + '<br>' +"
    "'TIME: ' + row[3] + '<br>' +"
    "'PEOPLE INJURED: ' + row[4] + '<br>' +"
    "'PEOPLE KILLED: ' + row[5]};"
    "var poptext = $(`<div id='mytext' style='width: 100.0%; height: 100.0%;'> ${display_text.text}</div>`)[0];"
    "popup.setContent(poptext);"
    "marker.bindPopup(popup);"
    "return marker};"
)
folium.plugins.FastMarkerCluster(data=cluster_map_data, callback=js_callback).add_to(
    cluster_map
)
cluster_map.save(os.path.join(IMG_DIR, "clusters_serious_map.html"))
cluster_map

# Clustering Collisions with Pedestrians and Cyclists

- Clustering performed using Density-Based Spatial Clustering of Applications with Noise algorithm (DBSCAN)
- The parameters `eps` and `min_samples` can be tuned to produce clusters of desired size
- __Clustering reveals concentrations of collisions with pedestrians and cyclists that were not as prominent focusing on single points (usually intersections). For example:__
    - Near Elmhurst Avenue and Roosevelt Avenue in Queens
    - Near Broadway and Roosevelt Avenue in Queens
    - Near the Rosedale LIRR station where Francis Lewis Blvd. intersects Conduit Avenue in Queens
    - Near the southeast corner of Prospect Park on Parkside Avenue between Ocean Avenue and Parkside Avenue in Brooklyn
    - Along Bedford Avenue from Fulton Street to Atlantic Avenue in Brooklyn
    - Manhattan Avenue from Greenpoint Avenue to Green Street in Brooklyn
    - Near Marcy Aveune and Broadway in Brooklyn
    - Along 5th Avenue from Flatbush Avenue to President Street in Brooklyn
    - Along 1st Avenue from St. Mark's Place to 14th Street in Manhattan
    - Along East Tremont Avenue from Tremont Park to Prospect Avenue in the Bronx
    - Along Westchester Avenue from Bronx River Avenue to Wheeler Avenue in the Bronx
    - Near Gun Hill Road and Jerome Avenue in the Bronx

In [8]:
NON_MOTOR_CLUSTERS = 200  # number of most dangerous locations to identify
FATALITY_MULTIPLE = 10  # weight assigned to fatalities relative to injuries

non_motor_clusters = DBSCAN(eps=0.0005, min_samples=5, metric="euclidean").fit(
    non_motorist[["LAT", "LONG"]]
)
non_motorist["cluster"] = non_motor_clusters.labels_
# -1 means does not belong to cluster
nm_groupby = non_motorist[non_motorist["cluster"] > -1].groupby(by=["cluster"])
nm_cluster_groupby = nm_groupby[
    ["PEDESTRIAN INJURED", "CYCLIST INJURED", "PEDESTRIAN KILLED", "CYCLIST KILLED",]
].sum()
nm_cluster_groupby["danger"] = (
    nm_cluster_groupby["PEDESTRIAN INJURED"]
    + nm_cluster_groupby["CYCLIST INJURED"]
    + (nm_cluster_groupby["PEDESTRIAN KILLED"] * FATALITY_MULTIPLE)
    + (nm_cluster_groupby["CYCLIST KILLED"] * FATALITY_MULTIPLE)
)

cluster_numbers = nm_cluster_groupby.sort_values(by="danger", ascending=False)
cluster_numbers = cluster_numbers.iloc[:NON_MOTOR_CLUSTERS].index.to_list()

cluster_mask = non_motorist["cluster"].isin(cluster_numbers)
fields_to_use = [
    "LAT",
    "LONG",
    "DATE",
    "TIME",
    "INJURED",
    "PEDESTRIAN INJURED",
    "CYCLIST INJURED",
    "KILLED",
    "PEDESTRIAN KILLED",
    "CYCLIST KILLED",
    "cluster",
]
nm_cluster_map_data = non_motorist.loc[cluster_mask, fields_to_use]
nm_cluster_map = folium.Map(
    location=NYC_MAP_CENTER, zoom_start=10, tiles="OpenStreetMap"
)
js_callback = (
    "function (row) {"
    "var colors = ['red', 'blue', 'gray', 'orange', 'black'];"
    "var color = colors[row[10] % colors.length];"
    "var marker = L.marker(new L.LatLng(row[0], row[1]));"
    "var icon = L.AwesomeMarkers.icon({"
    "icon: 'fa-exclamation',"
    "iconColor: 'white',"
    "markerColor: color,"
    "prefix: 'fa',"
    "});"
    "marker.setIcon(icon);"
    "var popup = L.popup({maxWidth: '300'});"
    "const display_text = {text:"
    "'CLUSTER: ' + row[10] + '<br>' +"
    "'DATE: ' + row[2] + '<br>' +"
    "'TIME: ' + row[3] + '<br>' +"
    "'TOTAL INJURED: ' + row[4] + '<br>' +"
    "'PEDESTRIANS INJURED: ' + row[5] + '<br>' +"
    "'CYCLISTS INJURED: ' + row[6] + '<br>' +"
    "'TOTAL KILLED: ' + row[7] + '<br>' +"
    "'PEDESTRIANS KILLED: ' + row[8] + '<br>' +"
    "'CYCLISTS KILLED: ' + row[9]};"
    "var poptext = $(`<div id='mytext' style='width: 100.0%; height: 100.0%;'> ${display_text.text}</div>`)[0];"
    "popup.setContent(poptext);"
    "marker.bindPopup(popup);"
    "return marker};"
)
folium.plugins.FastMarkerCluster(data=nm_cluster_map_data, callback=js_callback).add_to(
    nm_cluster_map
)
nm_cluster_map.save(os.path.join(IMG_DIR, "clusters_non_motor_map.html"))
nm_cluster_map

# Conclusions

- Serious collisions, including collisions where pedestrians or cyclists are injured or killed is fairly widespread throughout NYC
- Clustering can be used to reveal multi-block areas with elevated injuries and deaths that would be missed by tracking high-collision intersections


# Follow-Up Questions
- To what extent have relevant government and community groups identified these hotspots?
- What steps are being taken to reduce collisions at these locations? 