# Exploratory Data Analysis for the Mapbox Open Data Challenge

## Challenge

The [challenge](https://splashthat.com/sites/view/opendata-contest.splashthat.com?cmsPage=1#sfid-199769342) is hosted by [Mapbox](https://www.mapbox.com).

Gaining a better understanding on the causes of crashes can help transportation planners and engineers reduce their frequency and severity. Traffic volume at a given location is often looked as a risk factor. But contributing causes to the crashes themselves, such as, weather, driver distraction, etc, and high frequency of non-severe/non-fatal crashes can be an indicator of increased probability for severe or fatal crashes.

* Create a visualization of crash locations around Bloomington, IN by type (vehicle, bicycle, pedestrian).
    * Include accident cause and traffic volume where available.
    * Make sure to normalize traffic volume by converting average daily traffic (ADT) to per
      million entering vehicles (MEV).
    * Separate crash types to provide insight on patterns in car crashes involving bicycles
      and pedestrians hold true for car on car crashes.
      
## Data

* [Crash data](https://data.bloomington.in.gov/dataset/117733fb-31cb-480a-8b30-fbf425a690cd/resource/8673744e-53f2-42d1-9d05-4e412bd55c94/download/monroe-county-crash-data2003-to-2015.cs) 
* [Bicycle & Pedestrian Counts](https://data.bloomington.in.gov/dataset/117733fb-31cb-480a-8b30-fbf425a690cd/resource/2b2a4280-964c-4845-b397-3105e227a1ae/download/pedestrian-and-bicyclist-counts.csv)
* [Traffic Counts](https://data.bloomington.in.gov/dataset/117733fb-31cb-480a-8b30-fbf425a690cd/resource/d5ba88f9-5798-46cd-888a-189eb59f7b46/download/traffic-counts2013-2015.csv)

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

plt.style.use('fivethirtyeight')

In [6]:
crashPath = '/Users/joshisaacson-work/Desktop/DesktopRoot/Projects/MapboxChallenge/Datasets/crash_data_2003-2015.csv'

crashDf = pd.read_csv(crashPath) 
crashDf.head()

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625


In [7]:
crashDf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53943 entries, 0 to 53942
Data columns (total 12 columns):
Master Record Number    53943 non-null int64
Year                    53943 non-null int64
Month                   53943 non-null int64
Day                     53943 non-null int64
Weekend?                53875 non-null object
Hour                    53718 non-null float64
Collision Type          53937 non-null object
Injury Type             53943 non-null object
Primary Factor          52822 non-null object
Reported_Location       53908 non-null object
Latitude                53913 non-null float64
Longitude               53913 non-null float64
dtypes: float64(3), int64(4), object(5)
memory usage: 4.9+ MB
