# Toronto Vehicle Serious Collision Data Preparation


---


**Author**: Group 1

**Creation Date**: November 14, 2021

**Revision Date**: November 14, 2021


---


**Data Source**: Toronto Police Service: Public Safety Data Portal

**Data**: https://data.torontopolice.on.ca/datasets/ksi/

**Data Dictionary**: https://torontops.maps.arcgis.com/sharing/rest/content/items/c0b17f1888544078bf650f3b8b04d35d/data 

**Data Licence**: See below for full data licence details.

---



The data captures information about serious vehicle collisions in the city of Toronto Ontario.

This notebook will prepare the data for analysis.



Note: In order to run this code you will need to create a shortcut to the shared drive in your Google drive and specify the path to the shortcut in the file_path variable.

# Importing Data Set and Other Configurations

In [1]:
# Importing libraries
import pandas as pd
import numpy as np
from datetime import date, datetime

In [2]:
# Mounting drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
# Main paths and file names
file_path = '/content/drive/MyDrive/Data Files/'  #Pay attention to the needed shortcut in Google Drive

# Input file
file_in = 'KSI.csv'

# Output files
file_incidents = 'KSI_Incidents.csv'
file_people = 'KSI_People.csv'

In [4]:
# Functions for cleaning the data during importation

# Converting time format. For instance, '852' will be converted to '8:52'
def time_cleanup(x):
    if len(x) < 3:
        h = '0'
    else:
        h = x[:-2]
    t = x[-2:]
    return h + ':' + t

# Cleaning Date attribute. For instance, '2006/03/11 05:00:00+00' will be converted to '2006/03/11' only
def date_cleanup(x):
    return x[:10]

In [5]:
# Read all information into a dataframe
file_content = pd.read_csv(file_path + file_in,
                           nrows = None,
                           index_col='INDEX_',
                           converters = {'TIME': time_cleanup,
                                         'DATE': date_cleanup,
                                         }
                           )
file_content.head()

Unnamed: 0_level_0,X,Y,ACCNUM,YEAR,DATE,TIME,HOUR,STREET1,STREET2,OFFSET,ROAD_CLASS,DISTRICT,WARDNUM,DIVISION,LATITUDE,LONGITUDE,LOCCOORD,ACCLOC,TRAFFCTL,VISIBILITY,LIGHT,RDSFCOND,ACCLASS,IMPACTYPE,INVTYPE,INVAGE,INJURY,FATAL_NO,INITDIR,VEHTYPE,MANOEUVER,DRIVACT,DRIVCOND,PEDTYPE,PEDACT,PEDCOND,CYCLISTYPE,CYCACT,CYCCOND,PEDESTRIAN,CYCLIST,AUTOMOBILE,MOTORCYCLE,TRUCK,TRSN_CITY_VEH,EMERG_VEH,PASSENGER,SPEEDING,AG_DRIV,REDLIGHT,ALCOHOL,DISABILITY,POLICE_DIVISION,HOOD_ID,NEIGHBOURHOOD,ObjectId
INDEX_,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1
3387730,-8844611.0,5412414.0,892658,2006,2006/03/11,8:52,8,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Turning Left,Failed to Yield Right of Way,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),1
3387731,-8844611.0,5412414.0,892658,2006,2006/03/11,8:52,8,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Pedestrian,65 to 69,Fatal,<Null>,North,Other,<Null>,<Null>,<Null>,Vehicle turns left while ped crosses with ROW ...,Crossing with right of way,Unknown,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),2
3388101,-8816480.0,5434843.0,892810,2006,2006/03/11,9:15,9,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Motorcycle Driver,45 to 49,Fatal,<Null>,East,Motorcycle,Turning Right,Disobeyed Traffic Control,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),3
3388102,-8816480.0,5434843.0,892810,2006,2006/03/11,9:15,9,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Going Ahead,Driving Properly,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),4
3387793,-8822759.0,5424516.0,892682,2006,2006/03/12,2:40,2,EGLINTON AVE E,COMMONWEALTH AVE,<Null>,Major Arterial,Scarborough,2120,41,43.734945,-79.25619,Mid-Block,<Null>,No Control,Clear,Dark,Dry,Fatal,Pedestrian Collisions,Driver,25 to 29,,<Null>,West,"Automobile, Station Wagon",Going Ahead,Other,"Ability Impaired, Alcohol",<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,D41,138,Eglinton East (138),5


In [6]:
# Looking at the obtained data
file_content.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 16860 entries, 3387730 to 81509748
Data columns (total 56 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   X                16860 non-null  float64
 1   Y                16860 non-null  float64
 2   ACCNUM           16860 non-null  int64  
 3   YEAR             16860 non-null  int64  
 4   DATE             16860 non-null  object 
 5   TIME             16860 non-null  object 
 6   HOUR             16860 non-null  int64  
 7   STREET1          16860 non-null  object 
 8   STREET2          16860 non-null  object 
 9   OFFSET           16860 non-null  object 
 10  ROAD_CLASS       16860 non-null  object 
 11  DISTRICT         16860 non-null  object 
 12  WARDNUM          16860 non-null  object 
 13  DIVISION         16860 non-null  object 
 14  LATITUDE         16860 non-null  float64
 15  LONGITUDE        16860 non-null  float64
 16  LOCCOORD         16860 non-null  object 
 17  ACC

# DateTime Conversion and Related Information

In [7]:
# It is necessary to convert DATE and TIME columns to one datetime value.
# The new column 'DATE_TIME will be the 4th column, closer to the original values allowing the comparison between them.

# Creating a function to support the date conversion
def find_date_time(x):
    result = x['DATE'] + ' ' + x['TIME']
    return pd.to_datetime(result)

# Applying the date conversion function
file_content.insert(3, 'DATE_TIME',file_content.apply(find_date_time, axis=1))

In [8]:
# Function to determine season
def season_of_date(my_date):

  Y = my_date.year

  seasons = [('Winter', (date(Y,  1,  1),  date(Y,  3, 20))),
             ('Spring', (date(Y,  3, 21),  date(Y,  6, 20))),
             ('Summer', (date(Y,  6, 21),  date(Y,  9, 22))),
             ('Autumn', (date(Y,  9, 23),  date(Y, 12, 20))),
             ('Winter', (date(Y, 12, 21),  date(Y, 12, 31)))]

  return next(season for season, (start, end) in seasons if start <= my_date <= end)

In [9]:
# Including other date-related attributes, to be used in further analysis
file_content.insert(4, 'SEASON', file_content["DATE_TIME"].map(season_of_date))   # Season
file_content.insert(6, 'MONTH', file_content["DATE_TIME"].dt.month)               # Month Number
file_content.insert(7, 'MONTH_NAME', file_content["DATE_TIME"].dt.month_name())   # Month Number
file_content.insert(9, 'DAY_NAME',file_content["DATE_TIME"].dt.day_name() )       # Day of week
file_content['HOUR'] = file_content["DATE_TIME"].dt.strftime('%H')                # Hour in a more appropriate format
file_content.insert(12, 'HOUR_INTERVAL', file_content["DATE_TIME"].dt.strftime('%H') + \
    ':00 to ' + file_content["DATE_TIME"].dt.strftime('%H') + ':59')              # Hour interval
file_content.head()

Unnamed: 0_level_0,X,Y,ACCNUM,DATE_TIME,SEASON,YEAR,MONTH,MONTH_NAME,DATE,DAY_NAME,TIME,HOUR,HOUR_INTERVAL,STREET1,STREET2,OFFSET,ROAD_CLASS,DISTRICT,WARDNUM,DIVISION,LATITUDE,LONGITUDE,LOCCOORD,ACCLOC,TRAFFCTL,VISIBILITY,LIGHT,RDSFCOND,ACCLASS,IMPACTYPE,INVTYPE,INVAGE,INJURY,FATAL_NO,INITDIR,VEHTYPE,MANOEUVER,DRIVACT,DRIVCOND,PEDTYPE,PEDACT,PEDCOND,CYCLISTYPE,CYCACT,CYCCOND,PEDESTRIAN,CYCLIST,AUTOMOBILE,MOTORCYCLE,TRUCK,TRSN_CITY_VEH,EMERG_VEH,PASSENGER,SPEEDING,AG_DRIV,REDLIGHT,ALCOHOL,DISABILITY,POLICE_DIVISION,HOOD_ID,NEIGHBOURHOOD,ObjectId
INDEX_,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1
3387730,-8844611.0,5412414.0,892658,2006-03-11 08:52:00,Winter,2006,3,March,2006/03/11,Saturday,8:52,8,08:00 to 08:59,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Turning Left,Failed to Yield Right of Way,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),1
3387731,-8844611.0,5412414.0,892658,2006-03-11 08:52:00,Winter,2006,3,March,2006/03/11,Saturday,8:52,8,08:00 to 08:59,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Pedestrian,65 to 69,Fatal,<Null>,North,Other,<Null>,<Null>,<Null>,Vehicle turns left while ped crosses with ROW ...,Crossing with right of way,Unknown,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),2
3388101,-8816480.0,5434843.0,892810,2006-03-11 09:15:00,Winter,2006,3,March,2006/03/11,Saturday,9:15,9,09:00 to 09:59,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Motorcycle Driver,45 to 49,Fatal,<Null>,East,Motorcycle,Turning Right,Disobeyed Traffic Control,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),3
3388102,-8816480.0,5434843.0,892810,2006-03-11 09:15:00,Winter,2006,3,March,2006/03/11,Saturday,9:15,9,09:00 to 09:59,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Going Ahead,Driving Properly,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),4
3387793,-8822759.0,5424516.0,892682,2006-03-12 02:40:00,Winter,2006,3,March,2006/03/12,Sunday,2:40,2,02:00 to 02:59,EGLINTON AVE E,COMMONWEALTH AVE,<Null>,Major Arterial,Scarborough,2120,41,43.734945,-79.25619,Mid-Block,<Null>,No Control,Clear,Dark,Dry,Fatal,Pedestrian Collisions,Driver,25 to 29,,<Null>,West,"Automobile, Station Wagon",Going Ahead,Other,"Ability Impaired, Alcohol",<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,D41,138,Eglinton East (138),5


# People Involvement

In [10]:
# The attribute INVTYPE represents the main involvement types for each person
file_content.INVTYPE.unique()

array(['Driver', 'Pedestrian', 'Motorcycle Driver', 'Passenger',
       'Vehicle Owner', 'Other Property Owner', 'Other', 'Cyclist',
       'Truck Driver', 'Motorcycle Passenger', '<Null>',
       'Driver - Not Hit', 'In-Line Skater', 'Moped Driver', 'Wheelchair',
       'Pedestrian - Not Hit', 'Trailer Owner', 'Witness',
       'Cyclist Passenger'], dtype=object)

In [11]:
# Objective: count the number of Drivers, Pedestrians, Passengers, Cyclists and Others,
# creating some new numerical attributes

# Flaging each row with drivers involved, pedestrians involved, passengers involved, cyclists involved
file_content['DRIVERS_COUNT'] = file_content["INVTYPE"].apply(lambda val : 1 if 'Driver' in val else 0)          #Include Driver / Motorcycle Driver / Driver - Not Hit / Moped Driver
file_content['PEDESTRIAN_COUNT'] = file_content["INVTYPE"].apply(lambda val : 1 if 'Pedestrian' in val else 0)   #Include Pedestrian / Pedestrian - Not Hit 
file_content['PASSENGER_COUNT'] = file_content["INVTYPE"].apply(lambda val : 1 if 'Passenger' in val 
                                                                and val != 'Cyclist Passenger' else 0)           #Include Passenger / Motorcycle Passenger
file_content['CYCLIST_COUNT'] = file_content["INVTYPE"].apply(lambda val : 1 if 'Cyclist' in val  else 0)        #Include Cyclist  / Cyclist Passenger
file_content['NULL_INVTYPE_COUNT'] = file_content["INVTYPE"].apply(lambda val : 1 if val == '<Null>' else 0)     #Include <Null>
file_content['OTHER_INVTYPE_COUNT'] = file_content.apply(lambda val : 1 if val['DRIVERS_COUNT'] == 0 
                                                         and val['PEDESTRIAN_COUNT'] == 0
                                                         and val['PASSENGER_COUNT'] == 0
                                                         and val['CYCLIST_COUNT'] == 0
                                                         and val['NULL_INVTYPE_COUNT'] == 0 else 0, axis=1)      #Other types

file_content.head()

Unnamed: 0_level_0,X,Y,ACCNUM,DATE_TIME,SEASON,YEAR,MONTH,MONTH_NAME,DATE,DAY_NAME,TIME,HOUR,HOUR_INTERVAL,STREET1,STREET2,OFFSET,ROAD_CLASS,DISTRICT,WARDNUM,DIVISION,LATITUDE,LONGITUDE,LOCCOORD,ACCLOC,TRAFFCTL,VISIBILITY,LIGHT,RDSFCOND,ACCLASS,IMPACTYPE,INVTYPE,INVAGE,INJURY,FATAL_NO,INITDIR,VEHTYPE,MANOEUVER,DRIVACT,DRIVCOND,PEDTYPE,PEDACT,PEDCOND,CYCLISTYPE,CYCACT,CYCCOND,PEDESTRIAN,CYCLIST,AUTOMOBILE,MOTORCYCLE,TRUCK,TRSN_CITY_VEH,EMERG_VEH,PASSENGER,SPEEDING,AG_DRIV,REDLIGHT,ALCOHOL,DISABILITY,POLICE_DIVISION,HOOD_ID,NEIGHBOURHOOD,ObjectId,DRIVERS_COUNT,PEDESTRIAN_COUNT,PASSENGER_COUNT,CYCLIST_COUNT,NULL_INVTYPE_COUNT,OTHER_INVTYPE_COUNT
INDEX_,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1
3387730,-8844611.0,5412414.0,892658,2006-03-11 08:52:00,Winter,2006,3,March,2006/03/11,Saturday,8:52,8,08:00 to 08:59,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Turning Left,Failed to Yield Right of Way,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),1,1,0,0,0,0,0
3387731,-8844611.0,5412414.0,892658,2006-03-11 08:52:00,Winter,2006,3,March,2006/03/11,Saturday,8:52,8,08:00 to 08:59,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Pedestrian,65 to 69,Fatal,<Null>,North,Other,<Null>,<Null>,<Null>,Vehicle turns left while ped crosses with ROW ...,Crossing with right of way,Unknown,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),2,0,1,0,0,0,0
3388101,-8816480.0,5434843.0,892810,2006-03-11 09:15:00,Winter,2006,3,March,2006/03/11,Saturday,9:15,9,09:00 to 09:59,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Motorcycle Driver,45 to 49,Fatal,<Null>,East,Motorcycle,Turning Right,Disobeyed Traffic Control,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),3,1,0,0,0,0,0
3388102,-8816480.0,5434843.0,892810,2006-03-11 09:15:00,Winter,2006,3,March,2006/03/11,Saturday,9:15,9,09:00 to 09:59,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Going Ahead,Driving Properly,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),4,1,0,0,0,0,0
3387793,-8822759.0,5424516.0,892682,2006-03-12 02:40:00,Winter,2006,3,March,2006/03/12,Sunday,2:40,2,02:00 to 02:59,EGLINTON AVE E,COMMONWEALTH AVE,<Null>,Major Arterial,Scarborough,2120,41,43.734945,-79.25619,Mid-Block,<Null>,No Control,Clear,Dark,Dry,Fatal,Pedestrian Collisions,Driver,25 to 29,,<Null>,West,"Automobile, Station Wagon",Going Ahead,Other,"Ability Impaired, Alcohol",<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,D41,138,Eglinton East (138),5,1,0,0,0,0,0


In [12]:
# Checking if there are duplicated records
# The result must be equal the total number of rows: 16,860 
file_content['DRIVERS_COUNT'].sum()+file_content['PEDESTRIAN_COUNT'].sum()+ \
file_content['PASSENGER_COUNT'].sum()+file_content['CYCLIST_COUNT'].sum()+ \
file_content['NULL_INVTYPE_COUNT'].sum()+file_content['OTHER_INVTYPE_COUNT'].sum()

16860

In [13]:
# Attribute INVOLVED: categorical attribute which combines 7 different columns
# into one to describe who was involved

def who_else(x):
    result = []
    
    if x['PEDESTRIAN'] == 'Yes':
        result.append('Pedestrian')
    if x['CYCLIST'] == 'Yes':
        result.append('Cyclist')
    if x['AUTOMOBILE'] == 'Yes':
        result.append('Automobile')
    if x['MOTORCYCLE'] == 'Yes':
        result.append('Motorcycle')
    if x['TRUCK'] == 'Yes':
        result.append('Truck')
    if x['TRSN_CITY_VEH'] == 'Yes':
        result.append('City Vehicle')
    if x['EMERG_VEH'] == 'Yes':
        result.append('Emergency Vehicle')
    if result == []:
        result.append('Not Recorded')   
    
    # The list is converted to a string to avoid issues with the
    # group_by function later
    result_str = ', '.join(map(str, result))

    return result_str

file_content['INVOLVED'] = file_content.apply(who_else, axis=1)
file_content.head(7)

Unnamed: 0_level_0,X,Y,ACCNUM,DATE_TIME,SEASON,YEAR,MONTH,MONTH_NAME,DATE,DAY_NAME,TIME,HOUR,HOUR_INTERVAL,STREET1,STREET2,OFFSET,ROAD_CLASS,DISTRICT,WARDNUM,DIVISION,LATITUDE,LONGITUDE,LOCCOORD,ACCLOC,TRAFFCTL,VISIBILITY,LIGHT,RDSFCOND,ACCLASS,IMPACTYPE,INVTYPE,INVAGE,INJURY,FATAL_NO,INITDIR,VEHTYPE,MANOEUVER,DRIVACT,DRIVCOND,PEDTYPE,PEDACT,PEDCOND,CYCLISTYPE,CYCACT,CYCCOND,PEDESTRIAN,CYCLIST,AUTOMOBILE,MOTORCYCLE,TRUCK,TRSN_CITY_VEH,EMERG_VEH,PASSENGER,SPEEDING,AG_DRIV,REDLIGHT,ALCOHOL,DISABILITY,POLICE_DIVISION,HOOD_ID,NEIGHBOURHOOD,ObjectId,DRIVERS_COUNT,PEDESTRIAN_COUNT,PASSENGER_COUNT,CYCLIST_COUNT,NULL_INVTYPE_COUNT,OTHER_INVTYPE_COUNT,INVOLVED
INDEX_,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1
3387730,-8844611.0,5412414.0,892658,2006-03-11 08:52:00,Winter,2006,3,March,2006/03/11,Saturday,8:52,8,08:00 to 08:59,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Turning Left,Failed to Yield Right of Way,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),1,1,0,0,0,0,0,"Pedestrian, Automobile"
3387731,-8844611.0,5412414.0,892658,2006-03-11 08:52:00,Winter,2006,3,March,2006/03/11,Saturday,8:52,8,08:00 to 08:59,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Pedestrian,65 to 69,Fatal,<Null>,North,Other,<Null>,<Null>,<Null>,Vehicle turns left while ped crosses with ROW ...,Crossing with right of way,Unknown,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),2,0,1,0,0,0,0,"Pedestrian, Automobile"
3388101,-8816480.0,5434843.0,892810,2006-03-11 09:15:00,Winter,2006,3,March,2006/03/11,Saturday,9:15,9,09:00 to 09:59,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Motorcycle Driver,45 to 49,Fatal,<Null>,East,Motorcycle,Turning Right,Disobeyed Traffic Control,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),3,1,0,0,0,0,0,"Automobile, Motorcycle"
3388102,-8816480.0,5434843.0,892810,2006-03-11 09:15:00,Winter,2006,3,March,2006/03/11,Saturday,9:15,9,09:00 to 09:59,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Going Ahead,Driving Properly,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),4,1,0,0,0,0,0,"Automobile, Motorcycle"
3387793,-8822759.0,5424516.0,892682,2006-03-12 02:40:00,Winter,2006,3,March,2006/03/12,Sunday,2:40,2,02:00 to 02:59,EGLINTON AVE E,COMMONWEALTH AVE,<Null>,Major Arterial,Scarborough,2120,41,43.734945,-79.25619,Mid-Block,<Null>,No Control,Clear,Dark,Dry,Fatal,Pedestrian Collisions,Driver,25 to 29,,<Null>,West,"Automobile, Station Wagon",Going Ahead,Other,"Ability Impaired, Alcohol",<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,D41,138,Eglinton East (138),5,1,0,0,0,0,0,"Pedestrian, Automobile"
3387794,-8822759.0,5424516.0,892682,2006-03-12 02:40:00,Winter,2006,3,March,2006/03/12,Sunday,2:40,2,02:00 to 02:59,EGLINTON AVE E,COMMONWEALTH AVE,<Null>,Major Arterial,Scarborough,2120,41,43.734945,-79.25619,Mid-Block,<Null>,No Control,Clear,Dark,Dry,Fatal,Pedestrian Collisions,Pedestrian,30 to 34,Minor,<Null>,South,Other,<Null>,<Null>,<Null>,Pedestrian hit at mid-block,"Crossing, no Traffic Control",Normal,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,D41,138,Eglinton East (138),6,0,1,0,0,0,0,"Pedestrian, Automobile"
3387795,-8822759.0,5424516.0,892682,2006-03-12 02:40:00,Winter,2006,3,March,2006/03/12,Sunday,2:40,2,02:00 to 02:59,EGLINTON AVE E,COMMONWEALTH AVE,<Null>,Major Arterial,Scarborough,2120,41,43.734945,-79.25619,Mid-Block,<Null>,No Control,Clear,Dark,Dry,Fatal,Pedestrian Collisions,Pedestrian,40 to 44,Fatal,<Null>,South,Other,<Null>,<Null>,<Null>,Pedestrian hit at mid-block,"Crossing, no Traffic Control",Normal,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,D41,138,Eglinton East (138),7,0,1,0,0,0,0,"Pedestrian, Automobile"


#Collisions Motive

In [14]:
# Attribute MOTIVE: categorical attribute which combines 5 different columns
# into one to describe possible motive of the accident

def possible_motive(x):

    result = []
    
    if x['SPEEDING'] == 'Yes':
        result.append('Speeding  Related Collision')
    if x['REDLIGHT'] == 'Yes':
        result.append('Red Light Related  Collision')
    if x['ALCOHOL'] == 'Yes':
        result.append('Alcohol  Related Collision')
    if x['AG_DRIV'] == 'Yes':
        result.append('Aggressive and Distracted Driving Collision')
    if x['DISABILITY'] == 'Yes':
        result.append('Medical or Physical Disability Related Collision')
    if result == []:
        result.append('Not Recorded')
    
    # The list is converted to a string to avoid issues with the
    # group_by function later
    result_str = ', '.join(map(str, result))

    return result_str


file_content['MOTIVE'] = file_content.apply(possible_motive, axis=1)
file_content.head()

Unnamed: 0_level_0,X,Y,ACCNUM,DATE_TIME,SEASON,YEAR,MONTH,MONTH_NAME,DATE,DAY_NAME,TIME,HOUR,HOUR_INTERVAL,STREET1,STREET2,OFFSET,ROAD_CLASS,DISTRICT,WARDNUM,DIVISION,LATITUDE,LONGITUDE,LOCCOORD,ACCLOC,TRAFFCTL,VISIBILITY,LIGHT,RDSFCOND,ACCLASS,IMPACTYPE,INVTYPE,INVAGE,INJURY,FATAL_NO,INITDIR,VEHTYPE,MANOEUVER,DRIVACT,DRIVCOND,PEDTYPE,PEDACT,PEDCOND,CYCLISTYPE,CYCACT,CYCCOND,PEDESTRIAN,CYCLIST,AUTOMOBILE,MOTORCYCLE,TRUCK,TRSN_CITY_VEH,EMERG_VEH,PASSENGER,SPEEDING,AG_DRIV,REDLIGHT,ALCOHOL,DISABILITY,POLICE_DIVISION,HOOD_ID,NEIGHBOURHOOD,ObjectId,DRIVERS_COUNT,PEDESTRIAN_COUNT,PASSENGER_COUNT,CYCLIST_COUNT,NULL_INVTYPE_COUNT,OTHER_INVTYPE_COUNT,INVOLVED,MOTIVE
INDEX_,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1
3387730,-8844611.0,5412414.0,892658,2006-03-11 08:52:00,Winter,2006,3,March,2006/03/11,Saturday,8:52,8,08:00 to 08:59,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Turning Left,Failed to Yield Right of Way,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),1,1,0,0,0,0,0,"Pedestrian, Automobile",Aggressive and Distracted Driving Collision
3387731,-8844611.0,5412414.0,892658,2006-03-11 08:52:00,Winter,2006,3,March,2006/03/11,Saturday,8:52,8,08:00 to 08:59,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Pedestrian,65 to 69,Fatal,<Null>,North,Other,<Null>,<Null>,<Null>,Vehicle turns left while ped crosses with ROW ...,Crossing with right of way,Unknown,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),2,0,1,0,0,0,0,"Pedestrian, Automobile",Aggressive and Distracted Driving Collision
3388101,-8816480.0,5434843.0,892810,2006-03-11 09:15:00,Winter,2006,3,March,2006/03/11,Saturday,9:15,9,09:00 to 09:59,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Motorcycle Driver,45 to 49,Fatal,<Null>,East,Motorcycle,Turning Right,Disobeyed Traffic Control,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),3,1,0,0,0,0,0,"Automobile, Motorcycle","Red Light Related Collision, Aggressive and D..."
3388102,-8816480.0,5434843.0,892810,2006-03-11 09:15:00,Winter,2006,3,March,2006/03/11,Saturday,9:15,9,09:00 to 09:59,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Going Ahead,Driving Properly,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),4,1,0,0,0,0,0,"Automobile, Motorcycle","Red Light Related Collision, Aggressive and D..."
3387793,-8822759.0,5424516.0,892682,2006-03-12 02:40:00,Winter,2006,3,March,2006/03/12,Sunday,2:40,2,02:00 to 02:59,EGLINTON AVE E,COMMONWEALTH AVE,<Null>,Major Arterial,Scarborough,2120,41,43.734945,-79.25619,Mid-Block,<Null>,No Control,Clear,Dark,Dry,Fatal,Pedestrian Collisions,Driver,25 to 29,,<Null>,West,"Automobile, Station Wagon",Going Ahead,Other,"Ability Impaired, Alcohol",<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,D41,138,Eglinton East (138),5,1,0,0,0,0,0,"Pedestrian, Automobile",Alcohol Related Collision


In [15]:
# Seeing records without any possible motive - 7581 records
print(file_content[file_content['MOTIVE']=='Not Recorded'].shape)
file_content[file_content['MOTIVE']=='Not Recorded'].head()

(7581, 70)


Unnamed: 0_level_0,X,Y,ACCNUM,DATE_TIME,SEASON,YEAR,MONTH,MONTH_NAME,DATE,DAY_NAME,TIME,HOUR,HOUR_INTERVAL,STREET1,STREET2,OFFSET,ROAD_CLASS,DISTRICT,WARDNUM,DIVISION,LATITUDE,LONGITUDE,LOCCOORD,ACCLOC,TRAFFCTL,VISIBILITY,LIGHT,RDSFCOND,ACCLASS,IMPACTYPE,INVTYPE,INVAGE,INJURY,FATAL_NO,INITDIR,VEHTYPE,MANOEUVER,DRIVACT,DRIVCOND,PEDTYPE,PEDACT,PEDCOND,CYCLISTYPE,CYCACT,CYCCOND,PEDESTRIAN,CYCLIST,AUTOMOBILE,MOTORCYCLE,TRUCK,TRSN_CITY_VEH,EMERG_VEH,PASSENGER,SPEEDING,AG_DRIV,REDLIGHT,ALCOHOL,DISABILITY,POLICE_DIVISION,HOOD_ID,NEIGHBOURHOOD,ObjectId,DRIVERS_COUNT,PEDESTRIAN_COUNT,PASSENGER_COUNT,CYCLIST_COUNT,NULL_INVTYPE_COUNT,OTHER_INVTYPE_COUNT,INVOLVED,MOTIVE
INDEX_,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1
3389258,-8855097.0,5418678.0,893251,2006-03-15 15:35:00,Winter,2006,3,March,2006/03/15,Wednesday,15:35,15,15:00 to 15:59,ISLINGTON AVE,DIXON RD,<Null>,Major Arterial,Etobicoke York,12,23,43.697045,-79.54669,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Driver,40 to 44,,<Null>,East,"Automobile, Station Wagon",Going Ahead,Driving Properly,Normal,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,D23,6,Kingsview Village-The Westway (6),10,1,0,0,0,0,0,"Pedestrian, Automobile",Not Recorded
3389259,-8855097.0,5418678.0,893251,2006-03-15 15:35:00,Winter,2006,3,March,2006/03/15,Wednesday,15:35,15,15:00 to 15:59,ISLINGTON AVE,DIXON RD,<Null>,Major Arterial,Etobicoke York,12,23,43.697045,-79.54669,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Pedestrian,20 to 24,Fatal,<Null>,North,Other,<Null>,<Null>,<Null>,Vehicle is going straight thru inter.while ped...,Crossing without right of way,Normal,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,D23,6,Kingsview Village-The Westway (6),11,0,1,0,0,0,0,"Pedestrian, Automobile",Not Recorded
3400137,-8840024.0,5413768.0,897348,2006-03-17 15:54:00,Winter,2006,3,March,2006/03/17,Friday,15:54,15,15:00 to 15:59,BLOOR ST W,BATHURST ST,<Null>,Major Arterial,Toronto and East York,11,14,43.665145,-79.41129,Intersection,At Intersection,No Control,Clear,Daylight,Dry,Non-Fatal Injury,Pedestrian Collisions,Driver,40 to 44,,<Null>,West,"Automobile, Station Wagon",Going Ahead,Driving Properly,Normal,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,D14,95,Annex (95),25,1,0,0,0,0,0,"Pedestrian, Automobile",Not Recorded
3400138,-8840024.0,5413768.0,897348,2006-03-17 15:54:00,Winter,2006,3,March,2006/03/17,Friday,15:54,15,15:00 to 15:59,BLOOR ST W,BATHURST ST,<Null>,Major Arterial,Toronto and East York,11,14,43.665145,-79.41129,Intersection,At Intersection,No Control,Clear,Daylight,Dry,Non-Fatal Injury,Pedestrian Collisions,Pedestrian,20 to 24,Major,<Null>,North,Other,<Null>,<Null>,<Null>,Vehicle is going straight thru inter.while ped...,Running onto Roadway,Inattentive,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,D14,95,Annex (95),26,0,1,0,0,0,0,"Pedestrian, Automobile",Not Recorded
3391145,-8820043.0,5432052.0,893964,2006-03-20 07:55:00,Winter,2006,3,March,2006/03/20,Monday,7:55,7,07:00 to 07:59,941 PROGRESS AVE,<Null>,<Null>,Minor Arterial,Scarborough,24,43,43.783845,-79.23179,Mid-Block,<Null>,Traffic Signal,Clear,Daylight,Dry,Non-Fatal Injury,Pedestrian Collisions,Driver,20 to 24,,<Null>,East,"Automobile, Station Wagon",Going Ahead,Driving Properly,Normal,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,D43,137,Woburn (137),44,1,0,0,0,0,0,"Pedestrian, Automobile",Not Recorded


#Injuries

In [16]:
# The attribute INJURY represents the types of injuries for each person
file_content.INJURY.unique()

array(['None', 'Fatal', 'Minor', 'Major', 'Minimal', '<Null>'],
      dtype=object)

In [17]:
# Objective: count the number of Fatal, Major, Minor and No Injury,
# creating some new numerical attributes

# Flaging each row with types of injuries
file_content['FATAL_INJURY_COUNT'] = file_content["INJURY"].apply(lambda val : 1 if val == 'Fatal' else 0)                        
file_content['MAJOR_INJURY_COUNT'] = file_content["INJURY"].apply(lambda val : 1 if val == 'Major'  else 0)   
file_content['MINOR_INJURY_COUNT'] = file_content["INJURY"].apply(lambda val : 1 if val == 'Minor' or val == 'Minimal' else 0)   
file_content['NO_INJURY_COUNT'] = file_content["INJURY"].apply(lambda val : 1 if val == 'None' else 0)
file_content['NULL_INJURY_COUNT'] = file_content["INJURY"].apply(lambda val : 1 if val == '<Null>' else 0)
file_content.head()

Unnamed: 0_level_0,X,Y,ACCNUM,DATE_TIME,SEASON,YEAR,MONTH,MONTH_NAME,DATE,DAY_NAME,TIME,HOUR,HOUR_INTERVAL,STREET1,STREET2,OFFSET,ROAD_CLASS,DISTRICT,WARDNUM,DIVISION,LATITUDE,LONGITUDE,LOCCOORD,ACCLOC,TRAFFCTL,VISIBILITY,LIGHT,RDSFCOND,ACCLASS,IMPACTYPE,INVTYPE,INVAGE,INJURY,FATAL_NO,INITDIR,VEHTYPE,MANOEUVER,DRIVACT,DRIVCOND,PEDTYPE,PEDACT,PEDCOND,CYCLISTYPE,CYCACT,CYCCOND,PEDESTRIAN,CYCLIST,AUTOMOBILE,MOTORCYCLE,TRUCK,TRSN_CITY_VEH,EMERG_VEH,PASSENGER,SPEEDING,AG_DRIV,REDLIGHT,ALCOHOL,DISABILITY,POLICE_DIVISION,HOOD_ID,NEIGHBOURHOOD,ObjectId,DRIVERS_COUNT,PEDESTRIAN_COUNT,PASSENGER_COUNT,CYCLIST_COUNT,NULL_INVTYPE_COUNT,OTHER_INVTYPE_COUNT,INVOLVED,MOTIVE,FATAL_INJURY_COUNT,MAJOR_INJURY_COUNT,MINOR_INJURY_COUNT,NO_INJURY_COUNT,NULL_INJURY_COUNT
INDEX_,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
3387730,-8844611.0,5412414.0,892658,2006-03-11 08:52:00,Winter,2006,3,March,2006/03/11,Saturday,8:52,8,08:00 to 08:59,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Turning Left,Failed to Yield Right of Way,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),1,1,0,0,0,0,0,"Pedestrian, Automobile",Aggressive and Distracted Driving Collision,0,0,0,1,0
3387731,-8844611.0,5412414.0,892658,2006-03-11 08:52:00,Winter,2006,3,March,2006/03/11,Saturday,8:52,8,08:00 to 08:59,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Pedestrian,65 to 69,Fatal,<Null>,North,Other,<Null>,<Null>,<Null>,Vehicle turns left while ped crosses with ROW ...,Crossing with right of way,Unknown,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),2,0,1,0,0,0,0,"Pedestrian, Automobile",Aggressive and Distracted Driving Collision,1,0,0,0,0
3388101,-8816480.0,5434843.0,892810,2006-03-11 09:15:00,Winter,2006,3,March,2006/03/11,Saturday,9:15,9,09:00 to 09:59,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Motorcycle Driver,45 to 49,Fatal,<Null>,East,Motorcycle,Turning Right,Disobeyed Traffic Control,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),3,1,0,0,0,0,0,"Automobile, Motorcycle","Red Light Related Collision, Aggressive and D...",1,0,0,0,0
3388102,-8816480.0,5434843.0,892810,2006-03-11 09:15:00,Winter,2006,3,March,2006/03/11,Saturday,9:15,9,09:00 to 09:59,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Going Ahead,Driving Properly,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),4,1,0,0,0,0,0,"Automobile, Motorcycle","Red Light Related Collision, Aggressive and D...",0,0,0,1,0
3387793,-8822759.0,5424516.0,892682,2006-03-12 02:40:00,Winter,2006,3,March,2006/03/12,Sunday,2:40,2,02:00 to 02:59,EGLINTON AVE E,COMMONWEALTH AVE,<Null>,Major Arterial,Scarborough,2120,41,43.734945,-79.25619,Mid-Block,<Null>,No Control,Clear,Dark,Dry,Fatal,Pedestrian Collisions,Driver,25 to 29,,<Null>,West,"Automobile, Station Wagon",Going Ahead,Other,"Ability Impaired, Alcohol",<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,D41,138,Eglinton East (138),5,1,0,0,0,0,0,"Pedestrian, Automobile",Alcohol Related Collision,0,0,0,1,0


In [18]:
# Checking if there are duplicated records
# The result must be equal the total number of rows: 16,860 
file_content['FATAL_INJURY_COUNT'].sum()+file_content['MAJOR_INJURY_COUNT'].sum()+ \
file_content['MINOR_INJURY_COUNT'].sum()+file_content['NO_INJURY_COUNT'].sum()+ \
file_content['NULL_INJURY_COUNT'].sum()

16860

#Vehicle Types

In [19]:
# The attribute VEHTYPE represents the types of vehicle for each person
file_content.VEHTYPE.unique()

array(['Automobile, Station Wagon', 'Other', 'Motorcycle', 'Bicycle',
       '<Null>', 'Municipal Transit Bus (TTC)', 'Truck - Open', 'Taxi',
       'Passenger Van', 'Delivery Van', 'Moped', 'Pick Up Truck',
       'Police Vehicle', 'Truck-Tractor', 'Truck - Closed (Blazer, etc)',
       'Street Car', 'Bus (Other) (Go Bus, Gray Coach)', 'Truck - Dump',
       'Construction Equipment', 'Intercity Bus', 'Truck (other)',
       'Truck - Tank', 'Other Emergency Vehicle', 'School Bus',
       'Tow Truck', 'Off Road - 2 Wheels', 'Fire Vehicle',
       'Truck - Car Carrier'], dtype=object)

In [20]:
# Objective: count the number of four categories: People, Automobile, Recreational
# and Other, creating some new numerical attributes

# List of vehicle types per category
People_List = ['Pedestrians', 'Bicycle']
Automobile_List = ['Automobile, Station Wagon', 'Taxi', 'Pick Up Truck', 'Truck - Closed (Blazer, etc)', 'Passenger Van']
Recreational_List = ['Motorcycle', 'Moped', 'Off Road - 2 Wheels']
Other_List = [x for x in file_content.VEHTYPE.unique().tolist() if x not in People_List and x not in Automobile_List and x not in Recreational_List and x != '<Null>']

# Flaging each row with the vehicle category
# Important: for Automobile, Recreational and Other categories, the vehicles
# will only be counted for drivers, based on DRIVERS_COUNT attribute
file_content['PEOPLE_VEH_COUNT'] = file_content["VEHTYPE"].apply(lambda val : 1 if val in People_List else 0)    
file_content['AUTOMOBILE_VEH_COUNT'] = file_content.apply(lambda val : 1 if val['VEHTYPE'] in Automobile_List and val['DRIVERS_COUNT'] > 0 else 0, axis=1)
file_content['RECREATIONAL_VEH_COUNT'] = file_content.apply(lambda val : 1 if val['VEHTYPE'] in Recreational_List and val['DRIVERS_COUNT'] > 0 else 0, axis=1)
file_content['OTHER_VEH_COUNT'] = file_content.apply(lambda val : 1 if val['VEHTYPE'] in Other_List and val['DRIVERS_COUNT'] > 0 else 0, axis=1)
file_content['NULL_VEH_COUNT'] = file_content.apply(lambda val : 1 if val['VEHTYPE'] == '<Null>' and val['DRIVERS_COUNT'] > 0 else 0, axis=1)    

file_content.head()

Unnamed: 0_level_0,X,Y,ACCNUM,DATE_TIME,SEASON,YEAR,MONTH,MONTH_NAME,DATE,DAY_NAME,TIME,HOUR,HOUR_INTERVAL,STREET1,STREET2,OFFSET,ROAD_CLASS,DISTRICT,WARDNUM,DIVISION,LATITUDE,LONGITUDE,LOCCOORD,ACCLOC,TRAFFCTL,VISIBILITY,LIGHT,RDSFCOND,ACCLASS,IMPACTYPE,INVTYPE,INVAGE,INJURY,FATAL_NO,INITDIR,VEHTYPE,MANOEUVER,DRIVACT,DRIVCOND,PEDTYPE,PEDACT,PEDCOND,CYCLISTYPE,CYCACT,CYCCOND,PEDESTRIAN,CYCLIST,AUTOMOBILE,MOTORCYCLE,TRUCK,TRSN_CITY_VEH,EMERG_VEH,PASSENGER,SPEEDING,AG_DRIV,REDLIGHT,ALCOHOL,DISABILITY,POLICE_DIVISION,HOOD_ID,NEIGHBOURHOOD,ObjectId,DRIVERS_COUNT,PEDESTRIAN_COUNT,PASSENGER_COUNT,CYCLIST_COUNT,NULL_INVTYPE_COUNT,OTHER_INVTYPE_COUNT,INVOLVED,MOTIVE,FATAL_INJURY_COUNT,MAJOR_INJURY_COUNT,MINOR_INJURY_COUNT,NO_INJURY_COUNT,NULL_INJURY_COUNT,PEOPLE_VEH_COUNT,AUTOMOBILE_VEH_COUNT,RECREATIONAL_VEH_COUNT,OTHER_VEH_COUNT,NULL_VEH_COUNT
INDEX_,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1
3387730,-8844611.0,5412414.0,892658,2006-03-11 08:52:00,Winter,2006,3,March,2006/03/11,Saturday,8:52,8,08:00 to 08:59,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Turning Left,Failed to Yield Right of Way,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),1,1,0,0,0,0,0,"Pedestrian, Automobile",Aggressive and Distracted Driving Collision,0,0,0,1,0,0,1,0,0,0
3387731,-8844611.0,5412414.0,892658,2006-03-11 08:52:00,Winter,2006,3,March,2006/03/11,Saturday,8:52,8,08:00 to 08:59,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Pedestrian,65 to 69,Fatal,<Null>,North,Other,<Null>,<Null>,<Null>,Vehicle turns left while ped crosses with ROW ...,Crossing with right of way,Unknown,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),2,0,1,0,0,0,0,"Pedestrian, Automobile",Aggressive and Distracted Driving Collision,1,0,0,0,0,0,0,0,0,0
3388101,-8816480.0,5434843.0,892810,2006-03-11 09:15:00,Winter,2006,3,March,2006/03/11,Saturday,9:15,9,09:00 to 09:59,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Motorcycle Driver,45 to 49,Fatal,<Null>,East,Motorcycle,Turning Right,Disobeyed Traffic Control,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),3,1,0,0,0,0,0,"Automobile, Motorcycle","Red Light Related Collision, Aggressive and D...",1,0,0,0,0,0,0,1,0,0
3388102,-8816480.0,5434843.0,892810,2006-03-11 09:15:00,Winter,2006,3,March,2006/03/11,Saturday,9:15,9,09:00 to 09:59,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Going Ahead,Driving Properly,Unknown,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),4,1,0,0,0,0,0,"Automobile, Motorcycle","Red Light Related Collision, Aggressive and D...",0,0,0,1,0,0,1,0,0,0
3387793,-8822759.0,5424516.0,892682,2006-03-12 02:40:00,Winter,2006,3,March,2006/03/12,Sunday,2:40,2,02:00 to 02:59,EGLINTON AVE E,COMMONWEALTH AVE,<Null>,Major Arterial,Scarborough,2120,41,43.734945,-79.25619,Mid-Block,<Null>,No Control,Clear,Dark,Dry,Fatal,Pedestrian Collisions,Driver,25 to 29,,<Null>,West,"Automobile, Station Wagon",Going Ahead,Other,"Ability Impaired, Alcohol",<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,D41,138,Eglinton East (138),5,1,0,0,0,0,0,"Pedestrian, Automobile",Alcohol Related Collision,0,0,0,1,0,0,1,0,0,0


In [21]:
# Special case: when there is a driver, however no vehicle type was specified
# These are categorized as NULL_VEH_COUNT, not OTHER_VEH_COUNT
file_content.loc[3376951][['ACCNUM','DATE_TIME','VEHTYPE','INVTYPE','OTHER_VEH_COUNT','NULL_VEH_COUNT']]

ACCNUM                          888497
DATE_TIME          2006-02-15 09:35:00
VEHTYPE                         <Null>
INVTYPE                         Driver
OTHER_VEH_COUNT                      0
NULL_VEH_COUNT                       1
Name: 3376951, dtype: object

In [22]:
# Checking if there are duplicated records
# The result must be lower than the total number of rows: 16,860
file_content['PEOPLE_VEH_COUNT'].sum()+file_content['AUTOMOBILE_VEH_COUNT'].sum()+file_content['RECREATIONAL_VEH_COUNT'].sum()+file_content['OTHER_VEH_COUNT'].sum()+file_content['NULL_VEH_COUNT'].sum()

9307

In [23]:
# Checking number of drivers and number of vehicles
# The results must be the same
print('Number of drivers: ' + str(file_content['DRIVERS_COUNT'].sum()))
print('Number of vehicles: ' + str(file_content['AUTOMOBILE_VEH_COUNT'].sum()+file_content['RECREATIONAL_VEH_COUNT'].sum()+file_content['OTHER_VEH_COUNT'].sum()+file_content['NULL_VEH_COUNT'].sum()))

Number of drivers: 8585
Number of vehicles: 8585


#Dataset 1: Individual Collision Data

Here it will be created a data set with information related to the accidents. This means that each row will represent a collision.

In [24]:
# Grouping the attributes related to collisions. Note that attributes like 
# LATITUDE, LONGITUDE, DATE_TIME, etc are similar for the same accident

# Attribute ACCNUM represents the number of the accident, and will become 
# the index of this data set
incidents = file_content.groupby(['ACCNUM',
                                  'ACCLASS',
                                  'INVOLVED',
                                  'MOTIVE',
                                  'X', 'Y',
                                  'LATITUDE', 'LONGITUDE',
                                  'DATE_TIME', 
                                  'SEASON',
                                  'YEAR',
                                  'MONTH',
                                  'MONTH_NAME',
                                  'DAY_NAME',
                                  'HOUR',
                                  'HOUR_INTERVAL',
                                  'ROAD_CLASS', 'ACCLOC', 'TRAFFCTL', 'VISIBILITY',
                                  'LIGHT', 'RDSFCOND', 'IMPACTYPE', 'DISTRICT',
                                  'NEIGHBOURHOOD', 'HOOD_ID', 'LOCCOORD']).\
                                  aggregate({'INVTYPE': 'count',
                                             'FATAL_INJURY_COUNT': 'sum', 
                                             'MAJOR_INJURY_COUNT': 'sum',
                                             'MINOR_INJURY_COUNT': 'sum', 
                                             'NO_INJURY_COUNT': 'sum', 
                                             'NULL_INJURY_COUNT': 'sum', 
                                             'DRIVERS_COUNT': 'sum',
                                             'PEDESTRIAN_COUNT': 'sum',
                                             'PASSENGER_COUNT': 'sum',
                                             'CYCLIST_COUNT' : 'sum',
                                             'OTHER_INVTYPE_COUNT' : 'sum',
                                             'NULL_INVTYPE_COUNT': 'sum',
                                             'PEOPLE_VEH_COUNT' : 'sum',
                                             'AUTOMOBILE_VEH_COUNT' : 'sum',
                                             'RECREATIONAL_VEH_COUNT' : 'sum',
                                             'OTHER_VEH_COUNT' : 'sum',
                                             'NULL_VEH_COUNT': 'sum'}).\
                                  rename(columns={'INVTYPE': 'PEOPLE_COUNT'}).\
                                  reset_index().set_index('ACCNUM')

incidents.head()

Unnamed: 0_level_0,ACCLASS,INVOLVED,MOTIVE,X,Y,LATITUDE,LONGITUDE,DATE_TIME,SEASON,YEAR,MONTH,MONTH_NAME,DAY_NAME,HOUR,HOUR_INTERVAL,ROAD_CLASS,ACCLOC,TRAFFCTL,VISIBILITY,LIGHT,RDSFCOND,IMPACTYPE,DISTRICT,NEIGHBOURHOOD,HOOD_ID,LOCCOORD,PEOPLE_COUNT,FATAL_INJURY_COUNT,MAJOR_INJURY_COUNT,MINOR_INJURY_COUNT,NO_INJURY_COUNT,NULL_INJURY_COUNT,DRIVERS_COUNT,PEDESTRIAN_COUNT,PASSENGER_COUNT,CYCLIST_COUNT,OTHER_INVTYPE_COUNT,NULL_INVTYPE_COUNT,PEOPLE_VEH_COUNT,AUTOMOBILE_VEH_COUNT,RECREATIONAL_VEH_COUNT,OTHER_VEH_COUNT,NULL_VEH_COUNT
ACCNUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1
25301,Non-Fatal Injury,"Pedestrian, Automobile",Not Recorded,-8836220.0,5420822.0,43.710967,-79.377116,2020-01-04 18:50:00,Winter,2020,1,January,Saturday,18,18:00 to 18:59,Major Arterial,Intersection Related,Traffic Signal,Rain,Dark,Wet,Pedestrian Collisions,North York,Leaside-Bennington (56),56,Intersection,2,0,1,0,1,0,1,1,0,0,0,0,0,1,0,0,0
26294,Fatal,"Pedestrian, Automobile",Not Recorded,-8836047.0,5412910.0,43.659568,-79.37556,2020-01-04 22:14:00,Winter,2020,1,January,Saturday,22,22:00 to 22:59,Major Arterial,Non Intersection,No Control,Clear,"Dark, artificial",Dry,Pedestrian Collisions,Toronto and East York,Moss Park (73),73,Mid-Block,2,1,0,0,1,0,1,1,0,0,0,0,0,1,0,0,0
37330,Non-Fatal Injury,"Pedestrian, Automobile",Aggressive and Distracted Driving Collision,-8842513.0,5411391.0,43.649699,-79.43365,2020-01-06 15:55:00,Winter,2020,1,January,Monday,15,15:00 to 15:59,Minor Arterial,At Intersection,Traffic Signal,Clear,"Dawn, artificial",Dry,Pedestrian Collisions,Toronto East York,Little Portugal (84),84,Intersection,2,0,1,0,1,0,1,1,0,0,0,0,0,1,0,0,0
45664,Non-Fatal Injury,Automobile,Aggressive and Distracted Driving Collision,-8827355.0,5423072.0,43.725577,-79.297481,2020-01-07 18:50:00,Winter,2020,1,January,Tuesday,18,18:00 to 18:59,Major Arterial,At Intersection,Traffic Signal,Clear,Dark,Dry,Turning Movement,Scarborough,Clairlea-Birchmount (120),120,Intersection,3,0,1,1,1,0,2,0,1,0,0,0,0,2,0,0,0
56815,Non-Fatal Injury,Automobile,"Speeding Related Collision, Red Light Related...",-8858314.0,5419422.0,43.701876,-79.575588,2020-01-09 11:00:00,Winter,2020,1,January,Thursday,11,11:00 to 11:59,Major Arterial,At Intersection,Traffic Signal,Clear,Daylight,Dry,Angle,Etobicoke York,West Humber-Clairville (1),1,Intersection,4,0,1,0,3,0,4,0,0,0,0,0,0,3,0,1,0


In [25]:
# Getting the number of accidents
incidents.shape

(6002, 43)

In [26]:
# Confirm the correct number of row generated in 'incidents'
len(file_content['ACCNUM'].unique())

6002

In [27]:
# Additional check: are there repeated accidents with different ACCNUM values?

# Looking at similar LATITUDE X LONGITUDE X DATE_TIME (which represent the same accident)
incidents_1 = incidents.reset_index()
incidents_1[['LONGITUDE', 'LATITUDE', 'DATE_TIME']].value_counts().sort_values(ascending=False)

LONGITUDE   LATITUDE   DATE_TIME          
-79.301790  43.688145  2009-03-04 21:41:00    2
-79.403590  43.773145  2011-08-23 19:05:00    2
-79.411381  43.655708  2013-10-23 17:04:00    2
-79.624290  43.753145  2008-11-02 00:50:00    1
-79.614190  43.755445  2012-10-09 17:28:00    1
                                             ..
-79.142290  43.801445  2011-11-25 15:56:00    1
-79.140193  43.788043  2008-04-22 11:11:00    1
-79.139507  43.778929  2014-06-17 10:50:00    1
-79.139377  43.786063  2019-04-07 13:30:00    1
-79.397403  43.651316  2017-04-14 18:05:00    1
Length: 5999, dtype: int64

In [28]:
# The first three rows represent repeated accidents
# Droping this three repeated records, whe should have a total of 6,002 - 3 = 5,999
incidents.drop_duplicates(subset=['LONGITUDE', 'LATITUDE', 'DATE_TIME'], inplace=True)
incidents.shape

(5999, 43)

In [29]:
# Checking whether the repeated values remain
incidents_1 = incidents.reset_index()
incidents_1[['LONGITUDE', 'LATITUDE', 'DATE_TIME']].value_counts().sort_values(ascending=False)

LONGITUDE   LATITUDE   DATE_TIME          
-79.638390  43.749045  2008-09-03 19:00:00    1
-79.624290  43.753145  2008-11-02 00:50:00    1
-79.614190  43.755445  2012-10-09 17:28:00    1
-79.614230  43.745425  2020-01-18 09:55:00    1
-79.634670  43.751242  2015-08-20 09:47:00    1
                                             ..
-79.140193  43.788043  2008-04-22 11:11:00    1
-79.139507  43.778929  2014-06-17 10:50:00    1
-79.139377  43.786063  2019-04-07 13:30:00    1
-79.138290  43.783345  2011-08-20 02:03:00    1
-79.125896  43.790312  2016-01-28 21:30:00    1
Length: 5999, dtype: int64

In [30]:
# Looking at the obtained data
incidents.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5999 entries, 25301 to 9085345312
Data columns (total 43 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   ACCLASS                 5999 non-null   object        
 1   INVOLVED                5999 non-null   object        
 2   MOTIVE                  5999 non-null   object        
 3   X                       5999 non-null   float64       
 4   Y                       5999 non-null   float64       
 5   LATITUDE                5999 non-null   float64       
 6   LONGITUDE               5999 non-null   float64       
 7   DATE_TIME               5999 non-null   datetime64[ns]
 8   SEASON                  5999 non-null   object        
 9   YEAR                    5999 non-null   int64         
 10  MONTH                   5999 non-null   int64         
 11  MONTH_NAME              5999 non-null   object        
 12  DAY_NAME                5999 non-null 

In [31]:
# Write incident data to a new file
incidents.to_csv(file_path + file_incidents)

#Dataset 2: Individual Involved Person Data

Here it will be created a data set with information related to people involved with the accidents. It is similar to the original data set, but with selected and treated attributes.

In [32]:
# Firstly, vehicles will be categorized by row (not only to drivers, as done before)

file_content['VEHICLE_CATEGORY'] = file_content['VEHTYPE'].apply(lambda val : 'People' if val in People_List
                                                                 else 'Automobile' if val in Automobile_List
                                                                 else 'Recreational' if val in Recreational_List
                                                                 else 'Other' if val in Other_List and val != '<Null>'
                                                                 else 'Null')#, axis=1)

file_content.head()                                        

Unnamed: 0_level_0,X,Y,ACCNUM,DATE_TIME,SEASON,YEAR,MONTH,MONTH_NAME,DATE,DAY_NAME,TIME,HOUR,HOUR_INTERVAL,STREET1,STREET2,OFFSET,ROAD_CLASS,DISTRICT,WARDNUM,DIVISION,LATITUDE,LONGITUDE,LOCCOORD,ACCLOC,TRAFFCTL,VISIBILITY,LIGHT,RDSFCOND,ACCLASS,IMPACTYPE,INVTYPE,INVAGE,INJURY,FATAL_NO,INITDIR,VEHTYPE,MANOEUVER,DRIVACT,DRIVCOND,PEDTYPE,...,PEDCOND,CYCLISTYPE,CYCACT,CYCCOND,PEDESTRIAN,CYCLIST,AUTOMOBILE,MOTORCYCLE,TRUCK,TRSN_CITY_VEH,EMERG_VEH,PASSENGER,SPEEDING,AG_DRIV,REDLIGHT,ALCOHOL,DISABILITY,POLICE_DIVISION,HOOD_ID,NEIGHBOURHOOD,ObjectId,DRIVERS_COUNT,PEDESTRIAN_COUNT,PASSENGER_COUNT,CYCLIST_COUNT,NULL_INVTYPE_COUNT,OTHER_INVTYPE_COUNT,INVOLVED,MOTIVE,FATAL_INJURY_COUNT,MAJOR_INJURY_COUNT,MINOR_INJURY_COUNT,NO_INJURY_COUNT,NULL_INJURY_COUNT,PEOPLE_VEH_COUNT,AUTOMOBILE_VEH_COUNT,RECREATIONAL_VEH_COUNT,OTHER_VEH_COUNT,NULL_VEH_COUNT,VEHICLE_CATEGORY
INDEX_,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
3387730,-8844611.0,5412414.0,892658,2006-03-11 08:52:00,Winter,2006,3,March,2006/03/11,Saturday,8:52,8,08:00 to 08:59,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Turning Left,Failed to Yield Right of Way,Unknown,<Null>,...,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),1,1,0,0,0,0,0,"Pedestrian, Automobile",Aggressive and Distracted Driving Collision,0,0,0,1,0,0,1,0,0,0,Automobile
3387731,-8844611.0,5412414.0,892658,2006-03-11 08:52:00,Winter,2006,3,March,2006/03/11,Saturday,8:52,8,08:00 to 08:59,BLOOR ST W,DUNDAS ST W,<Null>,Major Arterial,Toronto and East York,4,11,43.656345,-79.45249,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Pedestrian Collisions,Pedestrian,65 to 69,Fatal,<Null>,North,Other,<Null>,<Null>,<Null>,Vehicle turns left while ped crosses with ROW ...,...,Unknown,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,<Null>,<Null>,D11,88,High Park North (88),2,0,1,0,0,0,0,"Pedestrian, Automobile",Aggressive and Distracted Driving Collision,1,0,0,0,0,0,0,0,0,0,Other
3388101,-8816480.0,5434843.0,892810,2006-03-11 09:15:00,Winter,2006,3,March,2006/03/11,Saturday,9:15,9,09:00 to 09:59,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Motorcycle Driver,45 to 49,Fatal,<Null>,East,Motorcycle,Turning Right,Disobeyed Traffic Control,Unknown,<Null>,...,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),3,1,0,0,0,0,0,"Automobile, Motorcycle","Red Light Related Collision, Aggressive and D...",1,0,0,0,0,0,0,1,0,0,Recreational
3388102,-8816480.0,5434843.0,892810,2006-03-11 09:15:00,Winter,2006,3,March,2006/03/11,Saturday,9:15,9,09:00 to 09:59,MORNINGSIDE AVE,SHEPPARD AVE E,<Null>,Major Arterial,Scarborough,25,42,43.801943,-79.199786,Intersection,At Intersection,Traffic Signal,Clear,Daylight,Dry,Fatal,Turning Movement,Driver,unknown,,<Null>,South,"Automobile, Station Wagon",Going Ahead,Driving Properly,Unknown,<Null>,...,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,Yes,<Null>,<Null>,D42,131,Rouge (131),4,1,0,0,0,0,0,"Automobile, Motorcycle","Red Light Related Collision, Aggressive and D...",0,0,0,1,0,0,1,0,0,0,Automobile
3387793,-8822759.0,5424516.0,892682,2006-03-12 02:40:00,Winter,2006,3,March,2006/03/12,Sunday,2:40,2,02:00 to 02:59,EGLINTON AVE E,COMMONWEALTH AVE,<Null>,Major Arterial,Scarborough,2120,41,43.734945,-79.25619,Mid-Block,<Null>,No Control,Clear,Dark,Dry,Fatal,Pedestrian Collisions,Driver,25 to 29,,<Null>,West,"Automobile, Station Wagon",Going Ahead,Other,"Ability Impaired, Alcohol",<Null>,...,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,Yes,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,<Null>,Yes,<Null>,D41,138,Eglinton East (138),5,1,0,0,0,0,0,"Pedestrian, Automobile",Alcohol Related Collision,0,0,0,1,0,0,1,0,0,0,Automobile


In [33]:
# Creating the data set
people = file_content[['DATE_TIME','SEASON','YEAR','MONTH','MONTH_NAME','DAY_NAME',
                       'HOUR','HOUR_INTERVAL','ACCNUM','INVTYPE','INVAGE','INJURY',
                       'DISTRICT','NEIGHBOURHOOD','VEHTYPE','VEHICLE_CATEGORY', 'ALCOHOL']]
people.head()

Unnamed: 0_level_0,DATE_TIME,SEASON,YEAR,MONTH,MONTH_NAME,DAY_NAME,HOUR,HOUR_INTERVAL,ACCNUM,INVTYPE,INVAGE,INJURY,DISTRICT,NEIGHBOURHOOD,VEHTYPE,VEHICLE_CATEGORY,ALCOHOL
INDEX_,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
3387730,2006-03-11 08:52:00,Winter,2006,3,March,Saturday,8,08:00 to 08:59,892658,Driver,unknown,,Toronto and East York,High Park North (88),"Automobile, Station Wagon",Automobile,<Null>
3387731,2006-03-11 08:52:00,Winter,2006,3,March,Saturday,8,08:00 to 08:59,892658,Pedestrian,65 to 69,Fatal,Toronto and East York,High Park North (88),Other,Other,<Null>
3388101,2006-03-11 09:15:00,Winter,2006,3,March,Saturday,9,09:00 to 09:59,892810,Motorcycle Driver,45 to 49,Fatal,Scarborough,Rouge (131),Motorcycle,Recreational,<Null>
3388102,2006-03-11 09:15:00,Winter,2006,3,March,Saturday,9,09:00 to 09:59,892810,Driver,unknown,,Scarborough,Rouge (131),"Automobile, Station Wagon",Automobile,<Null>
3387793,2006-03-12 02:40:00,Winter,2006,3,March,Sunday,2,02:00 to 02:59,892682,Driver,25 to 29,,Scarborough,Eglinton East (138),"Automobile, Station Wagon",Automobile,Yes


In [34]:
# Write people data to a new file
people.to_csv(file_path + file_people)

Open Government Licence – Ontario

You are encouraged to use the Information that is available under this licence with only a few conditions.

Using Information under this licence

Use of any Information indicates your acceptance of the terms below.
The Information Provider grants you a worldwide, royalty-free, perpetual, non-exclusive licence to use the Information, including for commercial purposes, subject to the terms below.
You are free to:

Copy, modify, publish, translate, adapt, distribute or otherwise use the Information in any medium, mode or format for any lawful purpose.
You must, where you do any of the above:

Acknowledge the source of the Information by including any attribution statement specified by the Information Provider(s) and, where possible, provide a link to this licence.
If the Information Provider does not provide a specific attribution statement, or if you are using Information from several Information Providers and multiple attributions are not practical for your product or application, you must use the following attribution statement:

Contains information licensed under the Open Government Licence – Ontario.

The terms of this licence are important, and if you fail to comply with any of them, the rights granted to you under this licence, or any similar licence granted by the Information Provider, will end automatically.
Exemptions

This licence does not grant you any right to use:
Personal Information;
Information or Records not accessible under the Freedom of Information and Protection of Privacy Act (Ontario);
third party rights the Information Provider is not authorized to license;
the names, crests, logos, or other official symbols of the Information Provider; and
Information subject to other intellectual property rights, including patents, trade-marks and official marks.
Non-endorsement

This licence does not grant you any right to use the Information in a way that suggests any official status or that the Information Provider endorses you or your use of the Information.
No warranty

The Information is licensed "as is", and the Information Provider excludes all representations, warranties, obligations, and liabilities, whether express or implied, to the maximum extent permitted by law.
The Information Provider is not liable for any errors or omissions in the Information, and will not under any circumstances be liable for any direct, indirect, special, incidental, consequential, or other loss, injury or damage caused by its use or otherwise arising in connection with this licence or the Information, even if specifically advised of the possibility of such loss, injury or damage.
Governing Law

This licence is governed by the laws of the Province of Ontario and the applicable laws of Canada. 
Legal proceedings related to this licence may only be brought in the courts of Ontario.
Definitions

In this licence, the terms below have the following meanings:
"Information"

means information resources or Records protected by copyright or other information or Records that are offered for use under the terms of this licence.

"Information Provider"

means Her Majesty the Queen in right of Ontario.

"Personal Information"

has the meaning set out in section 2(1) of Freedom of Information and Protection of Privacy Act (Ontario).

"Records"

has the meaning of "record" as set out in the Freedom of Information and Protection of Privacy Act (Ontario).

"You"

means the natural or legal person, or body of persons corporate or incorporate, acquiring rights under this licence.

Versioning

This is version 1.0 of the Open Government Licence – Ontario. The Information Provider may make changes to the terms of this licence from time to time and issue a new version of the licence. Your use of the Information will be governed by the terms of the licence in force as of the date you accessed the information.
Updated: September 14, 2016

Published: June 18, 2013