#Worcester Crash Reports

##This project is based on the vehicular crash reports from the Worcester County. Using the mapping tool from the MASSDOT [crash portal](http://services.massdot.state.ma.us/crashportal/CrashMapPage.aspx?Mode=Mapping). I tried to grab all the data I could get, but since you could only get 8000 results at once, I had to do it multiple times to get the entire city of Worcester's data.

In [114]:
import pandas as pd #For Data Cleaning and Manipulation
import matplotlib as plt #Data analysis by plotting
import glob #Used as a way to open all csv files that were collected
from datetime import time #used to clean and better express the Date and Time features
#import time

##Joining all csv files into one dataframe
###Since I had to download multiple csv files, I needed to combine all of them into one data. I could of open them one by one and assigning them to a dataframe, but I decided to look around for a solution into reading all the csv files and then concatinating it into a bigger csv file. I then decided to remove all the duplicates if there was any, so that we  wouldn't have repeated data.

In [130]:
worcester_crashreports = glob.glob("*.csv") 
df = pd.concat((pd.read_csv(f, header = 0, parse_dates=[['Crash Date', 'Crash Time']]) for f in worcester_crashreports))
df_worcrash = df.drop_duplicates()
#df_worcrash.to_csv("Worces.csv")

  
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=True'.


  


In [131]:
df_worcrash.head(5) #Just looking at what our data looks like.

Unnamed: 0.1,City/Town,Crash Date_Crash Time,Crash Number,Crash Severity,Landmark,Near Intersection Roadway,Police Agency,Roadway,Unnamed: 0
0,WORCESTER,3/24/2015 11:47 AM,4051393,Non-fatal injury,,,Local police,SALISBURY STREET / WESTWOOD DRIVE,
1,WORCESTER,3/24/2015 2:25 PM,4051388,Property damage only (none injured),,,Local police,CAMELOT DRIVE,
2,WORCESTER,2/26/2015 11:10 AM,4054884,Property damage only (none injured),,,Local police,CHESTER STREET,
3,WORCESTER,11/1/2011 8:25 PM,2790347,Property damage only (none injured),,,State police,Rte 190 S,
4,WORCESTER,2/26/2015 8:29 PM,4054867,Property damage only (none injured),,,Local police,MOUNTAIN STREET WEST / BROOKS STREET,


###Cleaning the data a bit more, I decided to drop the Landmark and the Near Intersection or Roadway because it was mostly not available

In [117]:
df_worcrash = df_worcrash[['Crash Number', 'Crash Date', 'Crash Time', 'City/Town', 'Crash Severity','Police Agency','Roadway']]

###Since I did this using the crash portal tool where I had to use the "drawing" kind of tool to select the 
areas I wanted, there were times when I accidentally grabbed some data from other towns. So I tried to find
if I had any other town and only selected WORCESTER.

In [118]:
df_worcrash['City/Town'].unique()

array(['WORCESTER', 'WEST BOYLSTON', 'HOLDEN', 'GRAFTON', nan,
       'SHREWSBURY', 'HUDSON', 'LEICESTER', 'MILLBURY', 'AUBURN',
       'LEOMINSTER'], dtype=object)

In [119]:
df_worcrash = df_worcrash[df_worcrash['City/Town'] == 'WORCESTER'] #Only selecting Worcester from the files

In [120]:
df_worcrash.head(5) #This is what the data looks like now.

Unnamed: 0,Crash Number,Crash Date,Crash Time,City/Town,Crash Severity,Police Agency,Roadway
0,4051393,3/24/2015,11:47 AM,WORCESTER,Non-fatal injury,Local police,SALISBURY STREET / WESTWOOD DRIVE
1,4051388,3/24/2015,2:25 PM,WORCESTER,Property damage only (none injured),Local police,CAMELOT DRIVE
2,4054884,2/26/2015,11:10 AM,WORCESTER,Property damage only (none injured),Local police,CHESTER STREET
3,2790347,11/1/2011,8:25 PM,WORCESTER,Property damage only (none injured),State police,Rte 190 S
4,4054867,2/26/2015,8:29 PM,WORCESTER,Property damage only (none injured),Local police,MOUNTAIN STREET WEST / BROOKS STREET


###Again, since its all gonna be from worcester, I am just going to drop the City/Town feature.

In [121]:
df_worcrash = df_worcrash[['Crash Number', 'Crash Date', 'Crash Time', 'Crash Severity','Police Agency','Roadway']]

In [122]:
df_worcrash.head(5)

Unnamed: 0,Crash Number,Crash Date,Crash Time,Crash Severity,Police Agency,Roadway
0,4051393,3/24/2015,11:47 AM,Non-fatal injury,Local police,SALISBURY STREET / WESTWOOD DRIVE
1,4051388,3/24/2015,2:25 PM,Property damage only (none injured),Local police,CAMELOT DRIVE
2,4054884,2/26/2015,11:10 AM,Property damage only (none injured),Local police,CHESTER STREET
3,2790347,11/1/2011,8:25 PM,Property damage only (none injured),State police,Rte 190 S
4,4054867,2/26/2015,8:29 PM,Property damage only (none injured),Local police,MOUNTAIN STREET WEST / BROOKS STREET


###Now, the data looks pretty good but it seems like it would be better if the crash time is in a 24 hour format
since that would be better to graph.

In [123]:
 #df_worcrash['Crash Date'] = pd.to_datetime(df_worcrash['Crash Date'])

In [124]:
#df_worcrash['Crash Date'].head(5)

0   2015-03-24
1   2015-03-24
2   2015-02-26
3   2011-11-01
4   2015-02-26
Name: Crash Date, dtype: datetime64[ns]

###The pd.to_datetim gave me an error which seems to have found out that there is a time written as 00:98 AM and so I fixed it by changing the specific row. I could have looked at the whole dataset but it seems like there was only one so I just hard coded it.

In [125]:
df_worcrash.loc[df_worcrash['Crash Time'] == '00:98 AM', 'Crash Time' ] = '01:38 AM'

In [126]:
df_worcrash.iloc[[8]]

Unnamed: 0,Crash Number,Crash Date,Crash Time,Crash Severity,Police Agency,Roadway
8,2791027,2010-07-01,01:38 AM,Not Reported,Local police,BLUEBELL ROAD


In [129]:
pd.to_datetime(df_worcrash['Crash Time'])

0       2018-08-27 11:47:00
1       2018-08-27 14:25:00
2       2018-08-27 11:10:00
3       2018-08-27 20:25:00
4       2018-08-27 20:29:00
5       2018-08-27 15:58:00
6       2018-08-27 13:00:00
7       2018-08-27 22:00:00
8       2018-08-27 01:38:00
9       2018-08-27 10:30:00
10      2018-08-27 23:28:00
11      2018-08-27 10:39:00
12      2018-08-27 06:02:00
13      2018-08-27 11:00:00
14      2018-08-27 11:15:00
15      2018-08-27 23:30:00
16      2018-08-27 16:30:00
17      2018-08-27 00:19:00
18      2018-08-27 14:14:00
19      2018-08-27 15:04:00
20      2018-08-27 08:21:00
21      2018-08-27 08:20:00
22      2018-08-27 11:00:00
23      2018-08-27 10:00:00
24      2018-08-27 11:36:00
25      2018-08-27 23:50:00
26      2018-08-27 12:10:00
27      2018-08-27 13:15:00
28      2018-08-27 09:20:00
29      2018-08-27 18:10:00
                ...        
76007   2018-08-27 11:10:00
76008   2018-08-27 21:13:00
76009   2018-08-27 18:05:00
76010   2018-08-27 16:24:00
76011   2018-08-27 0

In [128]:
df_worcrash.head(5)

Unnamed: 0,Crash Number,Crash Date,Crash Time,Crash Severity,Police Agency,Roadway
0,4051393,2015-03-24,2018-08-27 11:47:00,Non-fatal injury,Local police,SALISBURY STREET / WESTWOOD DRIVE
1,4051388,2015-03-24,2018-08-27 14:25:00,Property damage only (none injured),Local police,CAMELOT DRIVE
2,4054884,2015-02-26,2018-08-27 11:10:00,Property damage only (none injured),Local police,CHESTER STREET
3,2790347,2011-11-01,2018-08-27 20:25:00,Property damage only (none injured),State police,Rte 190 S
4,4054867,2015-02-26,2018-08-27 20:29:00,Property damage only (none injured),Local police,MOUNTAIN STREET WEST / BROOKS STREET
