# Chicago, IL : Car Crash Analysis & Predictive Modeling


### *Predicting car crashes with Machine Learning Models*

Authors: [Christos Maglaras](mailto:Christo111M@gmail.com), [Marcos Panyagua](mailto:marcosvppfernandes@gmail.com), [Jamie Dowat](mailto:jamie_dowat44@yahoo.com)

![chicago](img/chicago_night_drive.jpg)

## Stakeholder: Chicago Department of Transportation

![cdot](img/cdot.png)

## Business Understanding

Put text here

******

## Predictive Modeling Preview

Put text here


******

## Data: [Chicago City Data Portal](https://data.cityofchicago.org/)

![ccdp](img/chicagocitydataportal.jpg)

### [Crashes](https://data.cityofchicago.org/Transportation/Traffic-Crashes-Crashes/85ca-t3if):

##### Number of Rows: 

*Shows crash data from crash from the Chicago Police Department's **E-Crash** system*

**"All crashes are recorded as per the format specified in the Traffic Crash Report, SR1050, of the Illinois Department of Transportation."**

| Column Name                 | Description                |
| --------------------------- | -------------------------- |
| crash_record_id  |  Can be used to link to the same crash in the Vehicles and People datasets. |
| rd_no | Chicago Police Department report number|
| crash_date | Date and time of crash as entered by the reporting officer |
| posted_speed_limit  | Posted speed limit, as determined by reporting officer |
| traffic_control_device | Traffic control device present at crash location, as determined by reporting officer |
| device_condition  | Condition of traffic control device, as determined by reporting officer |
| weather_condition | Weather condition at time of crash, as determined by reporting officer |
| lighting_condition | Light condition at time of crash, as determined by reporting officer |
| first_crash_type | Type of first collision in crash |
| trafficway_type  | Trafficway type, as determined by reporting officer |
| lane_ct | Total number of through lanes in either direction, excluding turn lanes, as determined by reporting officer (0 = intersection)|
| alignment | Street alignment at crash location, as determined by reporting officer |
| roadway_surface_cond        | Road surface condition, as determined by reporting officer |
| road_defect | Road defects, as determined by reporting officer |
| crash_type | A general severity classification for the crash. Can be either Injury and/or Tow Due to Crash or No Injury / Drive Away |
| damage | A field observation of estimated damage. |
| prim_contributory_cause   | The factor which was most significant in causing the crash, as determined by officer judgment |
| sec_contributory_cause | The factor which was second most significant in causing the crash, as determined by officer judgment |
| street_name | Street address name of crash location, as determined by reporting officer|
| num_units | Number of units involved in the crash. A unit can be a motor vehicle, a pedestrian, a bicyclist, or another non-passenger roadway user. Each unit represents a mode of traffic with an independent trajectory. |
| most_severe_injury | Most severe injury sustained by any person involved in the crash |
| injuries_total | Total persons sustaining fatal, incapacitating, non-incapacitating, and possible injuries as determined by the reporting officer |
| injuries_fatal | Total persons sustaining fatal injuries in the crash |
| injuries_incapacitating | Total persons sustaining incapacitating/serious injuries in the crash as determined by the reporting officer. Any injury other than fatal injury, which prevents the injured person from walking, driving, or normally continuing the activities they were capable of performing before the injury occurred. Includes severe lacerations, broken limbs, skull or chest injuries, and abdominal injuries. |
| injuries_non_incapacitating | Total persons sustaining non-incapacitating injuries in the crash as determined by the reporting officer. Any injury, other than fatal or incapacitating injury, which is evident to observers at the scene of the crash. Includes lump on head, abrasions, bruises, and minor lacerations. |
| crash_hour | The hour of the day component of CRASH_DATE. |
| crash_day_of_week | The day of the week component of CRASH_DATE. Sunday=1 |
| latitude | The latitude of the crash location, as determined by reporting officer, as derived from the reported address of crash |
| longitude | The longitude of the crash location, as determined by reporting officer, as derived from the reported address of crash |


### [People](https://data.cityofchicago.org/Transportation/Traffic-Crashes-People/u6pd-qa9d):

##### Number of Rows: 

*Information about people involved in a crash and if any injuries were sustained.*

| Column Name                 | Description                |
| --------------------------- | -------------------------- |
| crash_record_id | This number can be used to link to the same crash in the Crashes and Vehicles datasets. This number also serves as a unique ID in the Crashes dataset. |
| person_type | Type of roadway user involved in crash |
| rd_no | Chicago Police Department report number. For privacy reasons, this column is blank for recent crashes. |
| crash_date | Date and time of crash as entered by the reporting officer |
| seat_no | Code for seating position of motor vehicle occupant: 1= driver, 2= center front, 3 = front passenger, 4 = second row left, 5 = second row center, 6 = second row right, 7 = enclosed passengers, 8 = exposed passengers, 9= unknown position, 10 = third row left, 11 = third row center, 12 = third row right |
| city | City of residence of person involved in crash |
| state | State of residence of person involved in crash |
| zipcode | ZIP Code of residence of person involved in crash |
| sex | Gender of person involved in crash, as determined by reporting officer |
| age | Age of person involved in crash |
| drivers_license_state | State issuing driver's license of person involved in crash |
| drivers_license_class | Class of driver's license of person involved in crash |
| safety_equipment | Safety equipment used by vehicle occupant in crash, if any |
| airbag_deployed | Whether vehicle occupant airbag deployed as result of crash |
| ejection | Whether vehicle occupant was ejected or extricated from the vehicle as a result of crash |
| injury_classification | Severity of injury person sustained in the crash |
| driver_action | Driver action that contributed to the crash, as determined by reporting officer |
| driver_vision | What, if any, objects obscured the driver’s vision at time of crash |
| physical_condition | Driver’s apparent physical condition at time of crash, as observed by the reporting officer |
| pedpedal_action | Action of pedestrian or cyclist at the time of crash |
| pedpedal_visibility | Visibility of pedestrian of cyclist safety equipment in use at time of crash |
| pedpedal_location | Location of pedestrian or cyclist at the time of crash |
| bac_result | Status of blood alcohol concentration testing for driver or other person involved in crash |
| bac_result value | Driver’s blood alcohol concentration test result (fatal crashes may include pedestrian or cyclist results) |
| cell_phone_use | Whether person was/was not using cellphone at the time of the crash, as determined by the reporting officer |

### [Vehicles](https://data.cityofchicago.org/Transportation/Traffic-Crashes-Vehicles/68nd-jvt3):

##### Number of Rows: 

*Information about vehicles ("units") involved in a traffic crash.*

| Column Name                 | Description                |
| --------------------------- | -------------------------- |
| crash_record_id | This number can be used to link to the same crash in the Crashes and People datasets. This number also serves as a unique ID in the Crashes dataset. |
| rd_no | Chicago Police Department report number. For privacy reasons, this column is blank for recent crashes. |
| crash_date | Date and time of crash as entered by the reporting officer |
| unit_type | The type of unit |
| num_passengers | Number of passengers in the vehicle. The driver is not included. More information on passengers is in the People dataset. |
| make | The make (brand) of the vehicle, if relevant |
| model | The model of the vehicle, if relevant |
| lic_plate_state | The state issuing the license plate of the vehicle, if relevant |
| vehicle_year | The model year of the vehicle, if relevant |
| vehicle_defect |  |
| vehicle_type | The type of vehicle, if relevant |
| vehicle_use | The normal use of the vehicle, if relevant |
| maneuver | The action the unit was taking prior to the crash, as determined by the reporting officer |
| towed_I | Indicator of whether the vehicle was towed |
| occupant_cnt | The number of people in the unit, as determined by the reporting officer |
| exceed_speed_limit_I | Indicator of whether the unit was speeding, as determined by the reporting officer |
| first_contact_point |  |
| vehicle_config |  |
| carrier_name |  |
| carrier_state |  |
| carrier_city |  |



In [1]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 700)

In [None]:
crashes = ('')

In [None]:
people = 

In [None]:
vehicles = 

## Exploratory Data Analysis (EDA)

In [None]:
vehicles = vehicles.drop(columns=['NUM_PASSENGERS', 'CMRC_VEH_I', 'TOWED_I', 'FIRE_I', 'EXCEED_SPEED_LIMIT_I', 
                                  'TOWED_BY', 'TOWED_TO', 'AREA_00_I', 'AREA_01_I', 'AREA_02_I', 'AREA_03_I', 
                                  'AREA_04_I', 'AREA_05_I', 'AREA_06_I', 'AREA_07_I', 'AREA_08_I', 'AREA_09_I', 
                                  'AREA_10_I', 'AREA_11_I', 'AREA_12_I', 'AREA_99_I', 'CMV_ID', 'USDOT_NO', 'CCMC_NO', 
                                  'ILCC_NO', 'COMMERCIAL_SRC', 'GVWR', 'CARRIER_NAME', 'CARRIER_STATE', 'CARRIER_CITY',
                                  'HAZMAT_PLACARDS_I', 'HAZMAT_NAME', 'UN_NO', 'HAZMAT_PRESENT_I', 'HAZMAT_REPORT_I',
                                  'HAZMAT_REPORT_NO', 'MCS_REPORT_I', 'MCS_REPORT_NO', 'HAZMAT_VIO_CAUSE_CRASH_I',
                                  'MCS_VIO_CAUSE_CRASH_I', 'IDOT_PERMIT_NO', 'WIDE_LOAD_I', 'TRAILER1_WIDTH',
                                  'TRAILER2_WIDTH', 'TRAILER1_LENGTH', 'TRAILER2_LENGTH', 'TOTAL_VEHICLE_LENGTH',
                                  'AXLE_CNT', 'VEHICLE_CONFIG', 'CARGO_BODY_TYPE', 'LOAD_TYPE', 'HAZMAT_OUT_OF_SERVICE_I',
                                  'MCS_OUT_OF_SERVICE_I', 'HAZMAT_CLASS', 'LIC_PLATE_STATE'])

In [None]:
vehicles.dropna(inplace=True)

## Predictive Modeling