# UCLA ITS Data Camp, Day 3
## Data Summarizing & Plotting

For today's exercise, we are going to continue working through re-investigating some of the findings in the [Los Angeles Vision Zero Safety Study](https://view.joomag.com/vision-zero-safety-study/0065798001485405769?short) to see if anything has changed. Beyond summarizing data, we are going to explore turning tables into charts for beautiful visual presentation.

In [2]:
import os

path = 'data/processed'

try:
    os.mkdir(path)
except OSError:
    print('creation of directory %s failed' % path)

else:
    print('successfully created directory %s' % path)

successfully created directory data/processed


In [3]:
# TODO: Import libraries (pandas)
import pandas as pd

In [4]:
%pwd

'C:\\Users\\Workstation User\\Documents\\GitHub\\ucla-its-data-camp-2019\\Day3\\day3-prj'

In [72]:
# TODO: Read in all the collision/victim tables
collisions = pd.read_csv('data/raw/Collisions_20092013_SWITRS.csv')
victims = pd.read_csv('data/raw/Victim_Tables__Collisions_20092013_SWITRS.csv')

In [14]:
# TODO: Print out the column names
print('Collisions columns:\n',collisions.columns)
print('Victims columns:\n',victims.columns)

Collisions columns:
 Index(['X', 'Y', 'OBJECTID', 'CASE_ID', 'ACCIDENT_YEAR', 'PROCDATE', 'JURIS',
       'COLLISION_DATE', 'COLLISION_TIME', 'OFFICER_ID', 'REPORTING_DISTRICT',
       'DAY_OF_WEEK', 'SHIFT', 'POPULATION', 'CNTY_CITY_LOC', 'SPECIAL_COND',
       'BEAT_TYPE', 'CHP_BEAT_TYPE', 'CITY_DIVISION_LAPD', 'CHP_BEAT_CLASS',
       'BEATNUMB', 'PRIMARY_RD', 'SECONDARY_RD', 'DISTANCE', 'DIRECTION',
       'INTERSECTION', 'WEATHER_1', 'WEATHER_2', 'STATE_HWY_IND',
       'CALTRANS_COUNTY', 'CALTRANS_DISTRICT', 'STATE_ROUTE', 'ROUTE_SUFFIX',
       'POSTMILE_PREFIX', 'POSTMILE', 'LOCATION_TYPE', 'RAMP_INTERSECTION',
       'SIDE_OF_HWY', 'TOW_AWAY', 'COLLISION_SEVERITY', 'NUMBER_KILLED',
       'NUMBER_INJURED', 'PARTY_COUNT', 'PRIMARY_COLL_FACTOR', 'PCF_CODE_VIOL',
       'PCF_VIOL_CATEGORY', 'PCF_VIOLATION', 'PCF_VIOL_SUBSECTION',
       'HIT_AND_RUN', 'TYPE_OF_COLLISION', 'MVIW', 'PED_ACTION',
       'ROAD_SURFACE', 'ROAD_COND_1', 'ROAD_COND_2', 'LIGHTING',
       'CONTROL_DEVICE

##### Prep: Replace Victim Role
For the first task, we are going to be creating a chart that shows the different number of fatalities by year and mode, which we can access in the `VICTIM_ROLE`. However, that only gives us a value that represents the mode. Let's replace that value (1-5) with the actual mode type. Consult the [SWITRS Codebook](https://peteraldhous.com/Data/ca_traffic/SWITRS_codebook.pdf) (look for Victim Role) and replace the numeric values with the string representation.

To do this task, consult the [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html) for Panda's `replace()` method. _Hint: we want to make sure we are only replacing the values in the `VICTIM_ROLE` column!_  

In [None]:
# TODO: Replace the VICTIM_ROLE with the actual values. Eg. Victim Role 1 = Driver, etc.
victims.VICTIM_ROLE.replace([1,2,3,4,5,6],
                            ['Driver','Passenger','Pedestrian',
                             'Bicyclist','Other','Non-injured party'], inplace=True)
victims.head()

### Exercise 1: Summarize Fatalities by Mode & Year, and Plot
Yesterday we were able to count the number of fatalities by year. Today we are more interested in the question: are we making more progress on reducing fatalities for certain modes more than others?

Pages 10 & 11 in the report feature a line chart showing the number of yearly fatalities by mode through 2016 (included below for your reference).

![fatality_trendline](fatality_trendline.png)


##### Step 1: Create a Pivot-Table (like Excel)
Now that we have a general picture for how we are doing against our overall goal of eliminating traffic fatalities, let's drill down to see if there have been any trend changes when grouped by mode. Let's get the data into the format so we can easily recreate the chart above. 

Yesterday we showed you one way to group data, `groupby()` - here we are going to show you another way to group data using the pandas `pivot_table()` method. Take a look at the [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html#pandas.pivot_table) for some guidance. In our case, we want our columns to be the `VICTIM_ROLE` and our `aggfunc='size'`. Give it a shot figuring out the other two arguments.

In [34]:
# TODO: 1) Filter to victims with fatal injuries, 
#       2) Pivot the table: Accident_Year as Index, Victim Role as columns, (take a stab at the other two args)
#       3) Let's drop the 'Other' mode column using the df.drop() method
#       You can try chaining the operations together!
fatal_by_yearmode = pd.pivot_table(victims[victims['VICTIM_DEGREE_OF_INJURY'] == 1],
                       index='ACCIDENT_YEAR', columns='VICTIM_ROLE', aggfunc='size')

fatal_by_yearmode.drop('Other', axis=1, inplace=True)

# Examine the resulting table
fatal_by_yearmode

VICTIM_ROLE,Bicyclist,Driver,Passenger,Pedestrian
ACCIDENT_YEAR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2003,10.0,88.0,52.0,92.0
2004,10.0,85.0,37.0,87.0
2005,5.0,84.0,30.0,83.0
2006,11.0,91.0,38.0,87.0
2007,8.0,92.0,31.0,79.0
2008,9.0,91.0,33.0,91.0
2009,5.0,83.0,23.0,70.0
2010,10.0,61.0,24.0,88.0
2011,7.0,54.0,20.0,76.0
2012,9.0,73.0,27.0,91.0


In [46]:
fatal_by_yearmode.columns

Index(['Bicyclist', 'Driver', 'Passenger', 'Pedestrian'], dtype='object', name='VICTIM_ROLE')

##### Step 2: Create HTML-based interactive plot
Let's make a basic line chart from this data, using the `plotly` package. There are plenty of places to find different interactive charting libraries, like [here](https://mode.com/blog/python-data-visualization-libraries). I recommend checking out [plotly](https://plot.ly/pandas/getting-started/), which is incredibly easy to use. Make sure you install the package (plotly anaconda install [here](https://anaconda.org/plotly/plotly)) prior to using and consult the documentation for constructing a chart.

_Hint: When we are using the pivot_table ouput, the x-axis is going to be the fatal_by_yearmode.index_

In [47]:
# Import package
import plotly.graph_objects as go

# Create the initial figure
fig_fatals = go.Figure()

# TODO: Let's follow the example from here -> https://plot.ly/python/line-charts/
#       We are going to want to 'add_trace' for each of our victim modes
roles = ['Bicyclist', 'Driver', 'Passenger', 'Pedestrian']

for role in roles:
    fig_fatals.add_trace(go.Scatter(x=fatal_by_yearmode.index , y=fatal_by_yearmode[role],
                    mode='lines+markers',
                    name=role))

# TODO: Edit the layout to add a title, xaxis_title, yaxis_title
fig_fatals.update_layout(title='Fatalities by Mode',
                         xaxis_title='Year',
                         yaxis_title='Fatalities')
# Show the final plot
fig_fatals.show()

##### Step 3: Export table
Now that we have our data table, let's export it as a CSV to our `data/processed` folder.

In [49]:
# TODO: Write out the table to CSV - don't drop the index this time
fatal_by_yearmode.to_csv('data/processed/fatal_by_yearmode.csv')

##### Bonus Step
You will notice that in the chart above we have 'motorcyles' as a mode, yet we don't see that in the collision table. Within SWITRS, motorycle is defined as a type of vehicle. We can get this distinction by joining to the party table and accessing the Vehicle Type attribute. Go ahead and give it a shot (note: you will need to read in the party table, which we have not yet done for this exercise).

In [71]:
parties = pd.read_csv('data/raw/Party_Tables__Collisions_20092013_SWITRS.csv')

In [51]:
parties.columns

Index(['OBJECTID', 'CASE_ID', 'PARTY_NUMBER', 'PARTY_TYPE', 'AT_FAULT',
       'PARTY_SEX', 'PARTY_AGE', 'PARTY_SOBRIETY', 'PARTY_DRUG_PHYSICAL',
       'DIR_OF_TRAVEL', 'PARTY_SAFETY_EQUIP_1', 'PARTY_SAFETY_EQUIP_2',
       'FINAN_RESPONS', 'SP_INFO_1', 'SP_INFO_2', 'SP_INFO_3',
       'OAF_VIOLATION_CODE', 'OAF_VIOL_CAT', 'OAF_VIOL_SECTION',
       'OAF_VIOLATION_SUFFIX', 'OAF_1', 'OAF_2', 'PARTY_NUMBER_KILLED',
       'PARTY_NUMBER_INJURED', 'MOVE_PRE_ACC', 'VEHICLE_YEAR', 'VEHICLE_MAKE',
       'STWD_VEHICLE_TYPE', 'CHP_VEH_TYPE_TOWING', 'CHP_VEH_TYPE_TOWED',
       'RACE', 'INATTENTION', 'SPECIAL_INFO_F', 'SPECIAL_INFO_G',
       'ACCIDENT_YEAR'],
      dtype='object')

In [153]:
victims_parties = pd.merge(victims, parties, on=['CASE_ID','PARTY_NUMBER', 'ACCIDENT_YEAR'])

In [161]:
fatal_by_yeartype = pd.pivot_table(victims_parties[victims_parties['VICTIM_DEGREE_OF_INJURY'] == 1],
                                       index='ACCIDENT_YEAR', columns='STWD_VEHICLE_TYPE', aggfunc='size')

In [162]:
fatal_by_yeartype['C']

ACCIDENT_YEAR
2003    15.0
2004    16.0
2005    16.0
2006    16.0
2007    23.0
2008    29.0
2009    24.0
2010    14.0
2011    22.0
2012    34.0
2013    28.0
2014    33.0
Name: C, dtype: float64

In [166]:
fatal_by_yeartype['Motorcycle'] = fatal_by_yeartype['C']

In [167]:
fatal_by_yearmode_motor = pd.merge(fatal_by_yearmode,fatal_by_yeartype['Motorcycle'], on='ACCIDENT_YEAR')

In [168]:
fatal_by_yearmode_motor

Unnamed: 0_level_0,Bicyclist,Driver,Passenger,Pedestrian,Motorcycle
ACCIDENT_YEAR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2003,10.0,88.0,52.0,92.0,15.0
2004,10.0,85.0,37.0,87.0,16.0
2005,5.0,84.0,30.0,83.0,16.0
2006,11.0,91.0,38.0,87.0,16.0
2007,8.0,92.0,31.0,79.0,23.0
2008,9.0,91.0,33.0,91.0,29.0
2009,5.0,83.0,23.0,70.0,24.0
2010,10.0,61.0,24.0,88.0,14.0
2011,7.0,54.0,20.0,76.0,22.0
2012,9.0,73.0,27.0,91.0,34.0


### Exercise 2: Map Fatal Collisions
Are fatal collisions concentrated in some areas of the City? Let's find out by plotting all the fatal collisions on a map. 

Rather than creating static image plots, we will create interactive webmaps that we can view in our notebooks here or export for anyone else with a browser. Building these rich webmaps requires javascript, the programming language that powers interactivity within browsers. Within javascript, `leaflet` is the most common package for creating webmaps in browsers. Fortunately for us, the `folium` Python package provides us the bridge to `leaflet` so that we never have to leave Python. For some example uses of folium see the [QuickStart Jupyter Notebook](https://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/Quickstart.ipynb).

Note: Before using folium we will need to install the package, since it is not part of the standard Anaconda library. To do so, follow the Anaconda install instructions on the [folium Github page](https://github.com/python-visualization/folium).

In [141]:
# TODO: Import folium (after installing)
import folium

##### Step 1: Create the initial map object

In [142]:
# Initialize the map
f_map = folium.Map(
    
    # Initial center of the map - You can change this!
    location=[34.0689, -118.4452],
    
    # Basemap. See the Quickstart for more basemap ideas
    tiles='Stamen Toner',
    
    # Initial zoom level of the map
    zoom_start=13
)

# Display the map in our notebook
f_map

##### Step 2: Filtering: Fatal Collisions, removing invalid coordinate pairs
One of the problems that you will run into mapping these points is that not every collision has a valid coordinate pair. In some cases you will see null values and in other cases you will see coordinates = `[0,0]`

In [146]:
# TODO: Keep only fatal collisions
#       Get rid of any rows with NaN or 0 values for POINT_X or POINT_Y
collisions_victims = pd.merge(collisions,victims,on='CASE_ID')
fatal_collisions = (collisions_victims
                   .query('VICTIM_DEGREE_OF_INJURY == 1')
                   .query('POINT_X != 0 and POINT_Y != 0')
                   .dropna(subset=['POINT_X','POINT_Y']))


len(fatal_collisions)

888

##### Step 3: Add additional objects (such as markers) to the map

In [147]:
for ix, row in fatal_collisions.iterrows():
    # TODO: For each row, create a marker object and add it to th emap
    # Also, set the popup value to be the date of the collision
    coord = [row.POINT_Y, row.POINT_X]
    collision_date = row['COLLISION_DATE']
    folium.Marker(coord, popup=collision_date).add_to(f_map)

The cell below will display your map. If you are having difficulty loading all of the objects into your browser, reduce the number of points by filtering for a specific year.

In [148]:
# Display the object-rich map in our notebook
f_map

### Bonus Challenge Exercise
Make the map a bit for visually appealing by changing the icons or basemap.