# Phase 4 Project.

# Table of Contents.
## 1. Business Problem Overview.
## 2. Proposed Solution.
## 3. Dataset Desciption and Download Links.
## 4. Data Cleaning.
## 5. Data Transformation.
## 6. Modelling.
## 7. Results.
## 8. Conclusion.

## 1. Business Problem Overview.

Your task is to:

Build a model that can predict the primary contributory cause of a car accident, given information about the car, the people in the car, the road conditions etc. You might imagine your audience as a Vehicle Safety Board who's interested in reducing traffic accidents, or as the City of Chicago who's interested in becoming aware of any interesting patterns.


## 2. Proposed Solution.

## 3. Data Download Links.

* [Traffic Crashes: Crashes](https://data.cityofchicago.org/Transportation/Traffic-Crashes-Crashes/85ca-t3if/about_data).
* [Traffic Crashes: People](https://data.cityofchicago.org/Transportation/Traffic-Crashes-People/u6pd-qa9d/about_data).
* [Traffic Ctashes: Vehicle](https://data.cityofchicago.org/Transportation/Traffic-Crashes-Vehicles/68nd-jvt3/about_data).

## 4. Data Loading and Overview.

In this section of the notebook we'll load the datasets and do a brief EDA.
We'll take a look at the datasets, we'll list columns and unique values for categorical columns. This would give us a better understanding of the datasets structures and amount of cleaning we might have to do for each set.

### Loading datasets. 

In [7]:
# importing modules
import pandas as pd
import numpy as np
import os

# Data directory
dir_data = './Data'

# filenames
data_crashes = 'Traffic_Crashes_-_Crashes_20240719.csv'
data_people = 'Traffic_Crashes_-_People_20240719.csv'
data_vehicles = 'Traffic_Crashes_-_Vehicles_20240719.csv'

# path to files
path_crashes = os.path.join(dir_data, data_crashes)
path_people = os.path.join(dir_data, data_people)
path_vehices = os.path.join(dir_data, data_vehicles)

# loading datasets
df_crashes = pd.read_csv(path_crashes, low_memory=False)
df_people = pd.read_csv(path_people, low_memory=False)
df_vehicles = pd.read_csv(path_vehices, low_memory=False)

#### Supplemental function to display unique categorical column values in a dataframe.

In [21]:
# function will display unique values for all categorical columns in a dataframe.
def display_categorical_vals(df):
    # select categorical columns
    categorical_columns = df.select_dtypes(include=['object', 'category']).columns

    # print categorical columns and their unique values
    for col in categorical_columns:
        unique_values = df[col].unique()
        print(f"Column '{col}' has unique values: {unique_values}")

#### Overview of Vehicle Crashes: Crashes. 
Viewing top 5 elemets, listing columns, listing unique values in categorical columns.

In [8]:
# view datasets
df_crashes.head()

Unnamed: 0,CRASH_RECORD_ID,CRASH_DATE_EST_I,CRASH_DATE,POSTED_SPEED_LIMIT,TRAFFIC_CONTROL_DEVICE,DEVICE_CONDITION,WEATHER_CONDITION,LIGHTING_CONDITION,FIRST_CRASH_TYPE,TRAFFICWAY_TYPE,...,INJURIES_NON_INCAPACITATING,INJURIES_REPORTED_NOT_EVIDENT,INJURIES_NO_INDICATION,INJURIES_UNKNOWN,CRASH_HOUR,CRASH_DAY_OF_WEEK,CRASH_MONTH,LATITUDE,LONGITUDE,LOCATION
0,6c1659069e9c6285a650e70d6f9b574ed5f64c12888479...,,08/18/2023 12:50:00 PM,15,OTHER,FUNCTIONING PROPERLY,CLEAR,DAYLIGHT,REAR END,OTHER,...,1.0,0.0,1.0,0.0,12,6,8,,,
1,5f54a59fcb087b12ae5b1acff96a3caf4f2d37e79f8db4...,,07/29/2023 02:45:00 PM,30,TRAFFIC SIGNAL,FUNCTIONING PROPERLY,CLEAR,DAYLIGHT,PARKED MOTOR VEHICLE,DIVIDED - W/MEDIAN (NOT RAISED),...,0.0,0.0,1.0,0.0,14,7,7,41.85412,-87.665902,POINT (-87.665902342962 41.854120262952)
2,61fcb8c1eb522a6469b460e2134df3d15f82e81fd93e9c...,,08/18/2023 05:58:00 PM,30,NO CONTROLS,NO CONTROLS,CLEAR,DAYLIGHT,PEDALCYCLIST,NOT DIVIDED,...,1.0,0.0,1.0,0.0,17,6,8,41.942976,-87.761883,POINT (-87.761883496974 41.942975745006)
3,004cd14d0303a9163aad69a2d7f341b7da2a8572b2ab33...,,11/26/2019 08:38:00 AM,25,NO CONTROLS,NO CONTROLS,CLEAR,DAYLIGHT,PEDESTRIAN,ONE-WAY,...,0.0,0.0,1.0,0.0,8,3,11,,,
4,a1d5f0ea90897745365a4cbb06cc60329a120d89753fac...,,08/18/2023 10:45:00 AM,20,NO CONTROLS,NO CONTROLS,CLEAR,DAYLIGHT,FIXED OBJECT,OTHER,...,0.0,0.0,1.0,0.0,10,6,8,,,


In [11]:
# print column names
df_crashes.columns

Index(['CRASH_RECORD_ID', 'CRASH_DATE_EST_I', 'CRASH_DATE',
       'POSTED_SPEED_LIMIT', 'TRAFFIC_CONTROL_DEVICE', 'DEVICE_CONDITION',
       'WEATHER_CONDITION', 'LIGHTING_CONDITION', 'FIRST_CRASH_TYPE',
       'TRAFFICWAY_TYPE', 'LANE_CNT', 'ALIGNMENT', 'ROADWAY_SURFACE_COND',
       'ROAD_DEFECT', 'REPORT_TYPE', 'CRASH_TYPE', 'INTERSECTION_RELATED_I',
       'NOT_RIGHT_OF_WAY_I', 'HIT_AND_RUN_I', 'DAMAGE', 'DATE_POLICE_NOTIFIED',
       'PRIM_CONTRIBUTORY_CAUSE', 'SEC_CONTRIBUTORY_CAUSE', 'STREET_NO',
       'STREET_DIRECTION', 'STREET_NAME', 'BEAT_OF_OCCURRENCE',
       'PHOTOS_TAKEN_I', 'STATEMENTS_TAKEN_I', 'DOORING_I', 'WORK_ZONE_I',
       'WORK_ZONE_TYPE', 'WORKERS_PRESENT_I', 'NUM_UNITS',
       'MOST_SEVERE_INJURY', 'INJURIES_TOTAL', 'INJURIES_FATAL',
       'INJURIES_INCAPACITATING', 'INJURIES_NON_INCAPACITATING',
       'INJURIES_REPORTED_NOT_EVIDENT', 'INJURIES_NO_INDICATION',
       'INJURIES_UNKNOWN', 'CRASH_HOUR', 'CRASH_DAY_OF_WEEK', 'CRASH_MONTH',
       'LATITUDE', 

In [24]:
# unique values for categorical columns.
display_categorical_vals(df_crashes)

Column 'CRASH_RECORD_ID' has unique values: ['6c1659069e9c6285a650e70d6f9b574ed5f64c12888479093dfeef179c0344ec6d2057eae224b5c0d5dfc278c0a237f8c22543f07fdef2e4a95a3849871c9345'
 '5f54a59fcb087b12ae5b1acff96a3caf4f2d37e79f8db4106558b34b8a6d2b81af02cf91b576ecd7ced08ffd10fcfd940a84f7613125b89d33636e6075064e22'
 '61fcb8c1eb522a6469b460e2134df3d15f82e81fd93e9cafd3dc7e631b9e1ba8b450a63af12bd90d1d2d9b127ea287f88d32e138a4eeba17799f536c08048934'
 ...
 '61c8dcd63fae60613bc9ec526fa901420cbe99a6d35840052c27bbd0cf1f8d6af74ff575276d3795f26878601232f6b9297b250a3499b62a96373e068134d21a'
 'bc2876dcd7c4098806301cb646232eb8f65c86a4f418b70f90391ab5dee6c889b5511da3e2c3666f19cc4e30df20e078ac9234cca189952b3dd18716dacaafd4'
 '6dee8823d4ae96624b741428681d19f50b5960418b6d790275e76ec34ee74f1b85ea13fab7248a863dc3761b4c1a7d96a18e6c8b1bd5777d665971ec3ab5598c']
Column 'CRASH_DATE_EST_I' has unique values: [nan 'Y' 'N']
Column 'CRASH_DATE' has unique values: ['08/18/2023 12:50:00 PM' '07/29/2023 02:45:00 PM'
 '08/18/2

#### Takeaways after initial overview:
* The dataset `Traffic Crashes: Crashes` has 48 columns and 854910 data entries;
* The dataset has a primary key `CRASH_RECORD_ID`.
* The dataset has a target column `PRIM_CONTRIBUTORY_CAUSE`.
* The dataset contains information about the road conditions, traffic devices, road type. 
* Proposed columns to drop:

In [25]:
cols_to_drop_df_crashes = ['CRASH_DATE_EST_I', \
                           'REPORT_TYPE', 'HIT_AND_RUN_I', 'DAMAGE', \
                           'DATE_POLICE_NOTIFIED', 'STREET_NO',
       'STREET_DIRECTION', 'STREET_NAME', 'BEAT_OF_OCCURRENCE',
       'PHOTOS_TAKEN_I', 'STATEMENTS_TAKEN_I', 'DOORING_I',
       'NUM_UNITS','MOST_SEVERE_INJURY', 'INJURIES_TOTAL', 'INJURIES_FATAL',
       'INJURIES_INCAPACITATING', 'INJURIES_NON_INCAPACITATING',
       'INJURIES_REPORTED_NOT_EVIDENT', 'INJURIES_NO_INDICATION',
       'INJURIES_UNKNOWN',
       'LATITUDE', 'LONGITUDE', 'LOCATION']

#### Overview of Vehicle Crashes: People. 
Viewing top 5 elemets, listing columns, listing unique values in categorical columns.

In [9]:
df_people.head()

Unnamed: 0,PERSON_ID,PERSON_TYPE,CRASH_RECORD_ID,VEHICLE_ID,CRASH_DATE,SEAT_NO,CITY,STATE,ZIPCODE,SEX,...,EMS_RUN_NO,DRIVER_ACTION,DRIVER_VISION,PHYSICAL_CONDITION,PEDPEDAL_ACTION,PEDPEDAL_VISIBILITY,PEDPEDAL_LOCATION,BAC_RESULT,BAC_RESULT VALUE,CELL_PHONE_USE
0,O749947,DRIVER,81dc0de2ed92aa62baccab641fa377be7feb1cc47e6554...,834816.0,09/28/2019 03:30:00 AM,,CHICAGO,IL,60651.0,M,...,,UNKNOWN,UNKNOWN,UNKNOWN,,,,TEST NOT OFFERED,,
1,O871921,DRIVER,af84fb5c8d996fcd3aefd36593c3a02e6e7509eeb27568...,827212.0,04/13/2020 10:50:00 PM,,CHICAGO,IL,60620.0,M,...,,NONE,NOT OBSCURED,NORMAL,,,,TEST NOT OFFERED,,
2,O10018,DRIVER,71162af7bf22799b776547132ebf134b5b438dcf3dac6b...,9579.0,11/01/2015 05:00:00 AM,,,,,X,...,,IMPROPER BACKING,UNKNOWN,UNKNOWN,,,,TEST NOT OFFERED,,
3,O10038,DRIVER,c21c476e2ccc41af550b5d858d22aaac4ffc88745a1700...,9598.0,11/01/2015 08:00:00 AM,,,,,X,...,,UNKNOWN,UNKNOWN,UNKNOWN,,,,TEST NOT OFFERED,,
4,O10039,DRIVER,eb390a4c8e114c69488f5fb8a097fe629f5a92fd528cf4...,9600.0,11/01/2015 10:15:00 AM,,,,,X,...,,UNKNOWN,UNKNOWN,UNKNOWN,,,,TEST NOT OFFERED,,


In [12]:
df_people.columns

Index(['PERSON_ID', 'PERSON_TYPE', 'CRASH_RECORD_ID', 'VEHICLE_ID',
       'CRASH_DATE', 'SEAT_NO', 'CITY', 'STATE', 'ZIPCODE', 'SEX', 'AGE',
       'DRIVERS_LICENSE_STATE', 'DRIVERS_LICENSE_CLASS', 'SAFETY_EQUIPMENT',
       'AIRBAG_DEPLOYED', 'EJECTION', 'INJURY_CLASSIFICATION', 'HOSPITAL',
       'EMS_AGENCY', 'EMS_RUN_NO', 'DRIVER_ACTION', 'DRIVER_VISION',
       'PHYSICAL_CONDITION', 'PEDPEDAL_ACTION', 'PEDPEDAL_VISIBILITY',
       'PEDPEDAL_LOCATION', 'BAC_RESULT', 'BAC_RESULT VALUE',
       'CELL_PHONE_USE'],
      dtype='object')

In [23]:
display_categorical_vals(df_people)

Column 'PERSON_ID' has unique values: ['O749947' 'O871921' 'O10018' ... 'P411567' 'P411568' 'P411569']
Column 'PERSON_TYPE' has unique values: ['DRIVER' 'PASSENGER' 'PEDESTRIAN' 'BICYCLE' 'NON-MOTOR VEHICLE'
 'NON-CONTACT VEHICLE']
Column 'CRASH_RECORD_ID' has unique values: ['81dc0de2ed92aa62baccab641fa377be7feb1cc47e6554932773284e51271e820d7a3c2398fa53636ac3b5b9004d27ee725ff26cfe65ce9b7869b67572e8f17d'
 'af84fb5c8d996fcd3aefd36593c3a02e6e7509eeb27568963d242f4b42bd91abfe2e5a046370df528b37a30d594ab2921c37f38f59f7db8863e2e2fa3e9dfa1f'
 '71162af7bf22799b776547132ebf134b5b438dcf3dac6b1ccc744e423da96652f8b51ba415f677df81aa76b994674618744fa0547e1216f32cf320324dcc732e'
 ...
 '5b94999fd7e8057a65771acc3a7811ce625a6715196aa8099efb0c76cf4b5f3c8fcdb967c8800f451b6bddbc6445d4f7002be951f1b21ac60112939c89675009'
 '4795d2b77a415c8f891bcf60f64eaacfc70b7b4d3b38f7514083e3894d86f297c8377ee7e2a7b53b843514233deca1e94cc9495a7e2ca0bb7d7e7bc42c044023'
 'aedab45aa3253b963b77823a1e5a20cbdce25cf524486d9d5623cd0d6

#### Takeaways after initial overview:
* The dataset `Traffic Crashes: People` has 29 columns and 1877321 data entries;
* The dataset has a primary kery `PERSON_ID` and two non-primary keys:`CRASH_RECORD_ID`, `VEHICLE_ID`.
* The dataset contains information about the vehicle occupants. 
* Proposed columns to drop:

In [None]:
cols_to_drop_df_people = ['CITY', 'STATE', 'ZIPCODE', 'SEX', 'AGE',
       'DRIVERS_LICENSE_STATE', 'SAFETY_EQUIPMENT',
       'AIRBAG_DEPLOYED', 'EJECTION', 'INJURY_CLASSIFICATION', 'HOSPITAL',
       'EMS_AGENCY', 'EMS_RUN_NO']

#### Overview of Vehicle Crashes: Vehicles. 
Viewing top 5 elemets, listing columns, listing unique values in categorical columns.

In [10]:
df_vehicles.head()

Unnamed: 0,CRASH_UNIT_ID,CRASH_RECORD_ID,CRASH_DATE,UNIT_NO,UNIT_TYPE,NUM_PASSENGERS,VEHICLE_ID,CMRC_VEH_I,MAKE,MODEL,...,TRAILER1_LENGTH,TRAILER2_LENGTH,TOTAL_VEHICLE_LENGTH,AXLE_CNT,VEHICLE_CONFIG,CARGO_BODY_TYPE,LOAD_TYPE,HAZMAT_OUT_OF_SERVICE_I,MCS_OUT_OF_SERVICE_I,HAZMAT_CLASS
0,1727162,f5943b05f46b8d4148a63b7506a59113eae0cf1075aabc...,12/21/2023 08:57:00 AM,2,PEDESTRIAN,,,,,,...,,,,,,,,,,
1,1717556,7b1763088507f77e0e552c009a6bf89a4d6330c7527706...,12/06/2023 03:24:00 PM,1,DRIVER,,1634931.0,,NISSAN,SENTRA,...,,,,,,,,,,
2,1717574,2603ff5a88f0b9b54576934c5ed4e4a64e8278e005687b...,12/06/2023 04:00:00 PM,2,DRIVER,,1634978.0,,CHRYSLER,SEBRING,...,,,,,,,,,,
3,1717579,a52ef70e33d468b855b5be44e8638a564434dcf99c0edf...,12/06/2023 04:30:00 PM,1,DRIVER,,1634948.0,,SUBARU,OUTBACK,...,,,,,,,,,,
4,1720118,609055f4b1a72a44d6ec40ba9036cefd7c1287a755eb6c...,12/10/2023 12:12:00 PM,1,DRIVER,,1637401.0,,TOYOTA,RAV4,...,,,,,,,,,,


In [13]:
df_vehicles.columns

Index(['CRASH_UNIT_ID', 'CRASH_RECORD_ID', 'CRASH_DATE', 'UNIT_NO',
       'UNIT_TYPE', 'NUM_PASSENGERS', 'VEHICLE_ID', 'CMRC_VEH_I', 'MAKE',
       'MODEL', 'LIC_PLATE_STATE', 'VEHICLE_YEAR', 'VEHICLE_DEFECT',
       'VEHICLE_TYPE', 'VEHICLE_USE', 'TRAVEL_DIRECTION', 'MANEUVER',
       'TOWED_I', 'FIRE_I', 'OCCUPANT_CNT', 'EXCEED_SPEED_LIMIT_I', 'TOWED_BY',
       'TOWED_TO', 'AREA_00_I', 'AREA_01_I', 'AREA_02_I', 'AREA_03_I',
       'AREA_04_I', 'AREA_05_I', 'AREA_06_I', 'AREA_07_I', 'AREA_08_I',
       'AREA_09_I', 'AREA_10_I', 'AREA_11_I', 'AREA_12_I', 'AREA_99_I',
       'FIRST_CONTACT_POINT', 'CMV_ID', 'USDOT_NO', 'CCMC_NO', 'ILCC_NO',
       'COMMERCIAL_SRC', 'GVWR', 'CARRIER_NAME', 'CARRIER_STATE',
       'CARRIER_CITY', 'HAZMAT_PLACARDS_I', 'HAZMAT_NAME', 'UN_NO',
       'HAZMAT_PRESENT_I', 'HAZMAT_REPORT_I', 'HAZMAT_REPORT_NO',
       'MCS_REPORT_I', 'MCS_REPORT_NO', 'HAZMAT_VIO_CAUSE_CRASH_I',
       'MCS_VIO_CAUSE_CRASH_I', 'IDOT_PERMIT_NO', 'WIDE_LOAD_I',
       'TRAILER1_

In [27]:
display_categorical_vals(df_vehicles)

Column 'CRASH_RECORD_ID' has unique values: ['f5943b05f46b8d4148a63b7506a59113eae0cf1075aabca4408e733f6f9de735fb237fe5c15f032ea3b70e89a3c445f750f576fba089f421c9e99f17bdacf0d9'
 '7b1763088507f77e0e552c009a6bf89a4d6330c75277067fe4c9c8081d86be3db49c1f283c6bcf64b690e03175db497c49aaa3456b5e3c858b3e013dabc1228e'
 '2603ff5a88f0b9b54576934c5ed4e4a64e8278e005687b0edb576f26e0d17decfa6d65506e65b16db5ef12628f2a02cbfcfcbfc97782d28c8f7fdda5a34e9f75'
 ...
 'd49cd71c0149c0e957f252725ed0cd5e8b1f10a03aa9db3596867207dc4788ce4e29f90b65dde6d549776a7970ef06037761bd9e7c72bcfb770d2989213a227c'
 'f83b83bab36c775a70a45aae757daa7193dfa024009bf0cdebdf98c2ebc552f79073a99065ce254f5b4c88ccafd6e7b9c6b470b106fdaab8239d9ac7cee0a9d5'
 '6509398fcca80bc2af582f0ad4a1be2eb803df27d35d2fbefb22e13ca601f889dd454f1d2dd9b6bd92c766f25e1011b17dc8f7a70c8a2f2c0dfb988f1114ce64']
Column 'CRASH_DATE' has unique values: ['12/21/2023 08:57:00 AM' '12/06/2023 03:24:00 PM'
 '12/06/2023 04:00:00 PM' ... '07/18/2024 11:09:00 PM'
 '07/18/2024 

Column 'HAZMAT_NAME' has unique values: [nan 'NITROGEN' 'DICHLOROMETHANE' 'GASOLINE' 'GASOLINE/PETROL'
 'FLAMMABLE LIQUID' 'PROPANE' 'HOT' 'TETRAFLUOROETHANE' 'MEDICAL WASTE'
 'PETROLEUM' 'DIESEL' 'FLAMMABLE' 'CLASS 2.2' 'CORROSIVE' 'PETROLUEM'
 'FLAMMABLE GAS' 'DIESEL FUEL' 'CARBON DIOXIDE' 'OXIDIZER'
 'FLAMMABLE TREE' 'COROSIVE' 'PETRO GAS' 'NAPHTHALENE' 'FLAMMABLE LIQUIDS'
 'OXYGEN LIQUID' 'FLAME' 'GASOLINE/DIESEL FUEL' 'FUEL OIL' 'FLAMABLES' 'C'
 'INTEGRAL SLEEPER' 'ACIDIC' 'GAS AND DIESEL' 'TAR COMPOUND' 'FLAMABLE'
 'SUBSTANCES GUIDE 171' 'NONE' '1203' 'LIQUID NITROGEN']
Column 'UN_NO' has unique values: [nan '7289' '2021' '1621' '5671' '6545' '4651' '0607' '698' '2203' '2271'
 '0434' '1993' '2716' '4751' '237' 'FT48' '1496' 'FDXU' '4387' '8330'
 '6967' '5440' '1977' '2533' '0141' '0000' '3188' '1827' '1908' '1153'
 '6953' '1203' '2907' '9430' 'UNK' '7800' '8126' '1930' '1945' '1772'
 'B476' '1909' '0230' 'NONE' 'H41' '1075' '5971' '1473' '4967' '4984'
 'B295' '669' '3065' '1195' 

Column 'VEHICLE_CONFIG' has unique values: [nan 'TRUCK/TRACTOR' 'TRACTOR/SEMI-TRAILER' 'BUS'
 'SINGLE UNIT TRUCK, 2 AXLES, 6 TIRES' 'TRUCK/TRAILER'
 'SINGLE UNIT TRUCK, 3 OR MORE AXLES' 'UNKNOWN HEAVY TRUCK'
 'TRACTOR/DOUBLES']
Column 'CARGO_BODY_TYPE' has unique values: [nan 'VAN/ENCLOSED BOX' 'BUS' 'FLATBED' 'OTHER' 'DUMP' 'TANK'
 'GARBAGE/REFUSE' 'AUTO TRANSPORTER' 'CONCRETE MIXER']
Column 'LOAD_TYPE' has unique values: [nan 'OTHER' 'UNKNOWN' 'CONSTRUCTION EQUIPMENT' 'BUILDING MATERIALS'
 'FARM EQUIPMENT' 'STEEL COILS']
Column 'HAZMAT_OUT_OF_SERVICE_I' has unique values: [nan 'N' 'Y']
Column 'MCS_OUT_OF_SERVICE_I' has unique values: [nan 'N' 'Y']
Column 'HAZMAT_CLASS' has unique values: [nan 'MISCELLANEOUS' 'GASES' 'FLAMMABLE OR COMBUSTIBLE LIQUID'
 'POISON (TOXIC) OR POISON INHALATION HAZARD'
 'OXIDIZER OR ORGANIC PEROXIDE' 'EXPLOSIVES'
 'FLAMMABLE SOLID, SPONTANEOUSLY COMBUSTIBLE, OR DANGEROUS WHEN WET'
 'CORROSIVE']


#### Takeaways after initial overview:
* The dataset `Traffic Crashes: Vehicle` has 71 columns and 1743922 data entries;
* The dataset has a primary kery `CRASH_UNIT_ID` and two non-primary keys:`CRASH_RECORD_ID`, `VEHICLE_ID`.
* The dataset contains information about the vehicle, such as make, model, defect. 
* Proposed columns to drop:

In [29]:
cols_to_drop_df_vehicle = ['LIC_PLATE_STATE',
       'TRAVEL_DIRECTION',
       'TOWED_I', 'FIRE_I', 'OCCUPANT_CNT', 'EXCEED_SPEED_LIMIT_I', 'TOWED_BY',
       'TOWED_TO', 'AREA_00_I', 'AREA_01_I', 'AREA_02_I', 'AREA_03_I',
       'AREA_04_I', 'AREA_05_I', 'AREA_06_I', 'AREA_07_I', 'AREA_08_I',
       'AREA_09_I', 'AREA_10_I', 'AREA_11_I', 'AREA_12_I', 'AREA_99_I',
       'CMV_ID', 'USDOT_NO', 'CCMC_NO', 'ILCC_NO',
       'COMMERCIAL_SRC', 'CARRIER_NAME', 'CARRIER_STATE',
       'CARRIER_CITY', 'HAZMAT_PLACARDS_I', 'HAZMAT_NAME', 'UN_NO', 'HAZMAT_REPORT_I', 'HAZMAT_REPORT_NO',
       'MCS_REPORT_I', 'MCS_REPORT_NO', 'HAZMAT_VIO_CAUSE_CRASH_I',
       'MCS_VIO_CAUSE_CRASH_I', 'IDOT_PERMIT_NO', 'WIDE_LOAD_I',
       'TRAILER1_WIDTH', 'TRAILER2_WIDTH', 'TRAILER1_LENGTH',
       'TRAILER2_LENGTH',
       'CARGO_BODY_TYPE', 'LOAD_TYPE', 'HAZMAT_OUT_OF_SERVICE_I',
       'MCS_OUT_OF_SERVICE_I']

### Observarions after loading the datasets.

* `Traffic Crashes: Crashes` -- 48 columns, 854910 data entries;
* `Traffic Crashes: People` -- 29 columns, 1877321 data entries;
* `Traffic Crashes: Vehicles` -- 71 columns, 1743922 data entries;


## 5. Data Cleaning.

## 6. Data Transformation.

## 7. Modelling.

## 8. Results.

## . Conclusion.