# UCLA ITS Data Camp, Day 4
## Capstone Exercise

For today's exercise, we are going to wrap-up our work with collision data. Yesterday we were able to summarize different types of collisions within the City of Los Angeles. Today we will be able to show _where_ these collisions take place, as well as put together an initial action plan for focusing on improving intersections where there is a high incidence of severe and fatal injury.

### Course Summary Exercise: Identify Intersections for Targeted Improvements
This exercise is intended to summarize all that we've learned in this course. We will use the same collision data that we've become familiar with to put together a short action plan for the City's [Vision Zero](http://visionzero.lacity.org/) effort to reduce traffic fatalities. One way we can start focusing our efforts is to look for intersections that have witnessed a high number of severe and fatal injuries (also known as KSIs), and prioritize those for engineering improvements. 


##### Step 1: Filter for KSIs within 200ft. of an intersection
Most of the Vision Zero work is focused on both severe and fatal injury. The reason we broaden the focus to severe injury is because we recognize that it is often not the characteristics of the collision, but instead the victim, that determine whether a severe injury becomes fatal. Expanding our focus to severe and fatal injuries, we also have a larger dataset to use to help identify locations for possible improvement.

In [None]:
import os

path = 'output'

try:
    os.mkdir(path)
except OSError:
    print('Creation of directory %s failed' % path)
else:
    print('Successfully created directory %s' % path)

In [2]:
#reading data
import pandas as pd

collisions = pd.read_csv('data/raw/Collisions_20092013_SWITRS.csv')

In [3]:
# filter collisions for KSI, severity 1 or 2
ksi_collisions = collisions.query('COLLISION_SEVERITY == 1 or COLLISION_SEVERITY == 2')

In [4]:
# filter collisions for within 200ft distance from intersection
ksi_collisions_200ft = ksi_collisions.query('DISTANCE <= 200')

In [5]:
# TODO: Convert it to a GeoDataframe like this -> 
# https://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html

# get the la city intersections shapefile from here ->
# https://geohub.lacity.org/datasets/0372aa1fb42a4e29adb9caadcfb210bb_9

In [6]:
# importing packages for geodataframe and plotting
import geopandas
import matplotlib.pyplot as plt

In [7]:
geo_ksi_col_200 = geopandas.GeoDataFrame(
    ksi_collisions_200ft, geometry=geopandas.points_from_xy(ksi_collisions_200ft.POINT_X, ksi_collisions_200ft.POINT_Y))

In [8]:
LA_intersections = geopandas.read_file('data/Geodata/Intersections/Intersections.shp')

In [55]:
geo_ksi_col_200.head()

Unnamed: 0,X,Y,OBJECTID,CASE_ID,ACCIDENT_YEAR,PROCDATE,JURIS,COLLISION_DATE,COLLISION_TIME,OFFICER_ID,...,MONTH_,CITY,COUNTY,STATE,POINT_X,POINT_Y,Match_addr,m_primaryrd,m_secondrd,geometry
140,-118.579866,34.257172,4141,4128155,2009,2009-10-21T00:00:00.000Z,1942,2009-02-22T00:00:00.000Z,2105,35951,...,2,LOS ANGELES,LOS ANGELES,CA,-118.579866,34.257172,"MASON AVE & DEVONSHIRE ST, LOS ANGELES, CA, 91311",MASON AVE,DEVONSHIRE ST,POINT (-118.579865886753 34.2571716308594)
213,-118.342997,34.032528,4214,4129112,2009,2009-08-21T00:00:00.000Z,1942,2009-01-25T00:00:00.000Z,2315,36630,...,1,LOS ANGELES,LOS ANGELES,CA,-118.342997,34.032528,"W ADAMS BLVD & HILLCREST DR, LOS ANGELES, CA, ...",W ADAMS BLVD,HILLCREST DR,POINT (-118.342997417804 34.032527923584)
290,-118.56231,34.23547,4291,4129876,2009,2009-10-24T00:00:00.000Z,1942,2009-02-09T00:00:00.000Z,1115,34265,...,2,LOS ANGELES,LOS ANGELES,CA,-118.56231,34.23547,"CORBIN AVE & NORDHOFF ST, LOS ANGELES, CA, 91324",CORBIN AVE,NORDHOFF ST,POINT (-118.562310000508 34.2354698181152)
320,-118.60606,34.25721,4321,4129911,2009,2010-01-05T00:00:00.000Z,1942,2009-02-08T00:00:00.000Z,2155,32774,...,2,LOS ANGELES,LOS ANGELES,CA,-118.60606,34.25721,"TOPANGA CANYON BLVD & DEVONSHIRE ST, LOS ANGEL...",TOPANGA CANYON BLVD,DEVONSHIRE ST,POINT (-118.606060000494 34.257209777832)
395,-118.27392,33.970451,4396,4131294,2009,2009-09-23T00:00:00.000Z,1942,2009-02-10T00:00:00.000Z,1250,38633,...,2,LOS ANGELES,LOS ANGELES,CA,-118.27392,33.970451,"S MAIN ST & E 76TH PL, LOS ANGELES, CA, 90003",S MAIN ST,E 76TH PL,POINT (-118.273920000329 33.9704513549805)


In [37]:
LA_intersections.head()

Unnamed: 0,OBJECTID,ASSETID,CL_NODE_ID,X,Y,LAT,LON,TYPE,CRTN_DT,LST_MODF_D,USER_ID,FROM_ST,TO_ST,TOOLTIP,ZIP_CODE,NLA_URL,geometry
0,3001,98966,52918,6374563.0,1895188.0,34.198479,-118.61877,,,,,VOSE ST,D/E,VOSE ST at D/E,91307.0,navigatela/reports/intersection_report.cfm?pk=...,POINT (-118.6187821198503 34.19848360530864)
1,3002,98967,52920,6374037.0,1895146.0,34.198354,-118.620506,,,,,VICKY AVE,D/E,VICKY AVE at D/E,91307.0,navigatela/reports/intersection_report.cfm?pk=...,POINT (-118.6205180126272 34.19835872675433)
2,3003,98968,52924,6364493.0,1894846.0,34.197364,-118.652064,,,,,ST EDENS CIR,D/E,ST EDENS CIR at D/E,91307.0,navigatela/reports/intersection_report.cfm?pk=...,POINT (-118.6520762660379 34.19736878155231)
3,3004,98969,52932,6386270.0,1895673.0,34.200002,-118.580061,,,,,ENADIA WAY,D/E,ENADIA WAY at D/E,91306.0,navigatela/reports/intersection_report.cfm?pk=...,POINT (-118.5800733953557 34.2000069245188)
4,3005,98970,52945,6407912.0,1896310.0,34.202075,-118.5085,,,,,CANTLAY ST,D/E,CANTLAY ST at D/E,91406.0,navigatela/reports/intersection_report.cfm?pk=...,POINT (-118.5085122173155 34.20207949708298)


##### Step 2: Create a function to get the nearest point
Let's start with the following idea. We have one point (A), and we want to find the closest point (B) amongst several points in a GeoSeries such that the distance between A and B is shorter than any other combination of points. 

The simplest way to caluclate the distance between two points is by using the Haversine formula. However, since we are working with a GeoSeries, we can take advantage of some of the built-in functionality to calculate the distances between a point and many other points. Once we calculate all these distances, we will then want to find the point B with the minimum distance. Let's create a function to do just this. 

Hint: Take a look at [this SO question](https://gis.stackexchange.com/questions/222315/geopandas-find-nearest-point-in-other-dataframe) for a good place on where to start.


In [None]:
# TODO: Complete function to find the nearest point
def nearest(point, geoseries):
    

##### Step 3: Using the new function, assign each KSI to the nearest intersection
We now have
1. KSI Collisions (GeoSeries)
2. LA City Intersections (GeoSeries)
3. A function that takes a point and finds the nearest point in a GeoSeries

With these pieces, our goal is to apply our function to another GeoSeries so that for each KSI collision point we can tag it with the nearest intersection ID. To do this, we will be using the GeoPandas **apply** function to run our function for each point in the series.

In [None]:
# TODO: Apply 'nearest' function to GeoSeries and save the ID
#       of the nearest intersection to a field called 'NearestIntID' 
ksi_int_collisions =

In [32]:
geo_ksi_col_200_head10 = geo_ksi_col_200[:10]

In [70]:
from shapely.ops import nearest_points
from shapely.geometry import MultiPoint
# unary union of the LA intersection geometries
# LA_int_pts =LA_intersections.geometry.unary_union

# converting intersection geometries to multipoint
LA_int_pts_list = LA_intersections.geometry.tolist()
LA_int_pts = MultiPoint(LA_int_pts_list)

def nearest(point, LA_int_pts):
    #find the nearest point and retrun hte corresponidng Place value
    nearest_int = nearest_points(point, LA_int_pts)[1]
    print(nearest_int)
    int_id = LA_intersections[LA_intersections['geometry'] == nearest_int]['CL_NODE_ID'].get_values()[0]
    print(int_id)
    return int_id

geo_ksi_col_200_head10['CL_NODE_ID'] = geo_ksi_col_200_head10.apply(lambda rwo: nearest(rwo.geometry, LA_int_pts), axis=1)

POINT (-118.5798337463974 34.25725501379736)
40913
POINT (-118.342879194126 34.03255454251313)
7250
POINT (-118.5623199472765 34.2354829190999)
41357
POINT (-118.6060657950638 34.25723236313444)
40759
POINT (-118.273921843124 33.97046133865431)
1516
POINT (-118.2995433311462 33.83155043381955)
44567
POINT (-118.328246567904 34.0256125102868)
4915
POINT (-118.2739539695735 33.99333807912488)
1156
POINT (-118.2721328626087 34.04637071415952)
8644
POINT (-118.2968053631783 33.72700079093505)
9979


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [71]:
geo_ksi_col_200_head10

Unnamed: 0,X,Y,OBJECTID,CASE_ID,ACCIDENT_YEAR,PROCDATE,JURIS,COLLISION_DATE,COLLISION_TIME,OFFICER_ID,...,COUNTY,STATE,POINT_X,POINT_Y,Match_addr,m_primaryrd,m_secondrd,geometry,NearestIntID,CL_NODE_ID
140,-118.579866,34.257172,4141,4128155,2009,2009-10-21T00:00:00.000Z,1942,2009-02-22T00:00:00.000Z,2105,35951,...,LOS ANGELES,CA,-118.579866,34.257172,"MASON AVE & DEVONSHIRE ST, LOS ANGELES, CA, 91311",MASON AVE,DEVONSHIRE ST,POINT (-118.579865886753 34.2571716308594),40913,40913
213,-118.342997,34.032528,4214,4129112,2009,2009-08-21T00:00:00.000Z,1942,2009-01-25T00:00:00.000Z,2315,36630,...,LOS ANGELES,CA,-118.342997,34.032528,"W ADAMS BLVD & HILLCREST DR, LOS ANGELES, CA, ...",W ADAMS BLVD,HILLCREST DR,POINT (-118.342997417804 34.032527923584),7250,7250
290,-118.56231,34.23547,4291,4129876,2009,2009-10-24T00:00:00.000Z,1942,2009-02-09T00:00:00.000Z,1115,34265,...,LOS ANGELES,CA,-118.56231,34.23547,"CORBIN AVE & NORDHOFF ST, LOS ANGELES, CA, 91324",CORBIN AVE,NORDHOFF ST,POINT (-118.562310000508 34.2354698181152),41357,41357
320,-118.60606,34.25721,4321,4129911,2009,2010-01-05T00:00:00.000Z,1942,2009-02-08T00:00:00.000Z,2155,32774,...,LOS ANGELES,CA,-118.60606,34.25721,"TOPANGA CANYON BLVD & DEVONSHIRE ST, LOS ANGEL...",TOPANGA CANYON BLVD,DEVONSHIRE ST,POINT (-118.606060000494 34.257209777832),40759,40759
395,-118.27392,33.970451,4396,4131294,2009,2009-09-23T00:00:00.000Z,1942,2009-02-10T00:00:00.000Z,1250,38633,...,LOS ANGELES,CA,-118.27392,33.970451,"S MAIN ST & E 76TH PL, LOS ANGELES, CA, 90003",S MAIN ST,E 76TH PL,POINT (-118.273920000329 33.9704513549805),1516,1516
494,-118.299562,33.831982,4495,4131947,2009,2009-10-17T00:00:00.000Z,1942,2009-02-06T00:00:00.000Z,1915,36028,...,LOS ANGELES,CA,-118.299562,33.831982,"NORMANDIE AVE & CARSON ST, LOS ANGELES, CA, 90501",NORMANDIE AVE,CARSON ST,POINT (-118.299562122747 33.8319816589355),44567,44567
499,-118.328229,34.025597,4500,4131956,2009,2009-10-17T00:00:00.000Z,1942,2009-02-02T00:00:00.000Z,1405,38476,...,LOS ANGELES,CA,-118.328229,34.025597,"W JEFFERSON BLVD & 10TH AVE, LOS ANGELES, CA, ...",W JEFFERSON BLVD,10TH AVE,POINT (-118.328229203629 34.0255966186523),4915,4915
631,-118.27396,33.99332,4632,4132378,2009,2009-08-22T00:00:00.000Z,1942,2009-01-16T00:00:00.000Z,635,37724,...,LOS ANGELES,CA,-118.27396,33.99332,"S MAIN ST & E 54TH ST, LOS ANGELES, CA, 90037",S MAIN ST,E 54TH ST,POINT (-118.273959999722 33.9933204650879),1156,1156
652,-118.272115,34.046337,4653,4132449,2009,2009-10-01T00:00:00.000Z,1942,2009-02-12T00:00:00.000Z,755,37952,...,LOS ANGELES,CA,-118.272115,34.046337,"W 11TH ST & ALBANY ST, LOS ANGELES, CA, 90015",W 11TH ST,ALBANY ST,POINT (-118.272114704957 34.0463371276855),8644,8644
675,-118.29682,33.72699,4676,4132647,2009,2009-09-15T00:00:00.000Z,1942,2009-01-15T00:00:00.000Z,1520,35203,...,LOS ANGELES,CA,-118.29682,33.72699,"W 19TH ST & S MEYLER ST, LOS ANGELES, CA, 90731",W 19TH ST,S MEYLER ST,POINT (-118.296819999586 33.7269897460938),9979,9979


##### Step 3: Count KSIs per intersection 
We now have an additional column in our KSI collisions GeoSeries with the ID for the closest intersection. Using the skills you learned yesterday to summarize by an attribute, let's find the count of KSIs for each of the intersection IDs. 

In [75]:
# TODO: Group by intersection ID
intersection_ksi_ct = pd.pivot_table(geo_ksi_col_200_head10, index='CL_NODE_ID', aggfunc='size').to_frame()

In [81]:
intersection_ksi_ct['count'] = intersection_ksi_ct[0]

##### Step 4: Join back to Intersections GeoSeries
Let's do an inner join with our original LA City Intersections GeoSeries to get the spatial information for the top intersections we are interested in.

In [82]:
pd.merge(LA_intersections,intersection_ksi_ct, on='CL_NODE_ID', how='inner')

Unnamed: 0,OBJECTID,ASSETID,CL_NODE_ID,X,Y,LAT,LON,TYPE,CRTN_DT,LST_MODF_D,USER_ID,FROM_ST,TO_ST,TOOLTIP,ZIP_CODE,NLA_URL,geometry,0,count
0,7323,103288,4915,6462209.0,1831864.0,34.025608,-118.328234,,,,,JEFFERSON BLVD,10TH AVE,JEFFERSON BLVD at 10TH AVE,90018.0,navigatela/reports/intersection_report.cfm?pk=...,POINT (-118.328246567904 34.0256125102868),1,1
1,10633,106597,1156,6478628.0,1820069.0,33.993333,-118.273942,,,,,54TH ST,MAIN ST,54TH ST at MAIN ST,90037.0,navigatela/reports/intersection_report.cfm?pk=...,POINT (-118.2739539695735 33.99333807912488),1,1
2,10681,106645,1516,6478615.0,1811744.0,33.970456,-118.27391,,,,,MAIN ST,76TH PL,MAIN ST at 76TH PL,90003.0,navigatela/reports/intersection_report.cfm?pk=...,POINT (-118.273921843124 33.97046133865431),1,1
3,2751,98717,44567,6470696.0,1761213.0,33.831546,-118.299531,,,,,NORMANDIE AVE,CARSON ST,NORMANDIE AVE at CARSON ST,90501.0,navigatela/reports/intersection_report.cfm?pk=...,POINT (-118.2995433311462 33.83155043381955),1,1
4,14585,110549,8644,6479232.0,1839367.0,34.046366,-118.272121,,,,,ALBANY ST,11TH ST,ALBANY ST at 11TH ST,90015.0,navigatela/reports/intersection_report.cfm?pk=...,POINT (-118.2721328626087 34.04637071415952),1,1
5,23727,119688,40759,6378537.0,1916544.0,34.257227,-118.606054,,,,,DEVONSHIRE ST,TOPANGA CANYON BLVD,DEVONSHIRE ST at TOPANGA CANYON BLVD,91311.0,navigatela/reports/intersection_report.cfm?pk=...,POINT (-118.6060657950638 34.25723236313444),1,1
6,25101,121061,7250,6457784.0,1834405.0,34.03255,-118.342867,,,,,ADAMS BLVD,HILLCREST DR,ADAMS BLVD at HILLCREST DR,90016.0,navigatela/reports/intersection_report.cfm?pk=...,POINT (-118.342879194126 34.03255454251313),1,1
7,37819,133774,41357,6391711.0,1908552.0,34.235478,-118.562308,,,,,NORDHOFF ST,NORDHOFF PL,NORDHOFF ST at NORDHOFF PL,91324.0,navigatela/reports/intersection_report.cfm?pk=...,POINT (-118.5623199472765 34.2354829190999),1,1
8,39137,135092,9979,6471415.0,1723162.0,33.726996,-118.296793,,,,,MEYLER ST,19TH ST,MEYLER ST at 19TH ST,90731.0,navigatela/reports/intersection_report.cfm?pk=...,POINT (-118.2968053631783 33.72700079093505),1,1
9,48353,144306,40913,6386463.0,1916506.0,34.25725,-118.579822,,,,,DEVONSHIRE ST,MASON AVE,DEVONSHIRE ST at MASON AVE,91311.0,navigatela/reports/intersection_report.cfm?pk=...,POINT (-118.5798337463974 34.25725501379736),1,1


##### Step 5: Map our Target Intersections
Using the same method from earlier for plotting points on a `leaflet` map with the `folium` package, let's plot all our top intersections as Markers. Customize the pop-up so that when you click on each marker it shows you the number of KSI collisions at that location.

### Bonus Challenge Exercise
Let's create a way to better immediately visualize the number of KSIs at an intersection. Instead of displaying a marker at each intersection, create a circle where the **_area_** is equal to some ratio of the total number of collisions at that intersection. You can set the ratio depending on the desired zoom level of the map you want to display.