# Omdena Liverpool Chapter: Predicting RTC serverity using Machine Learning

## *THE BACKGROUND*:

UK RTCs which have resulted in a persons death have been on a downward trend since the 1960s – however in 2020 1,516 people lost their lives on UK roads. The UK road systems, especially in Liverpool, are dated which means they have not been upgraded to reflect the increase of cars on the road. This means there are still preventative measures that could be implemented to prevent even more deaths on UK roads.

The UK government compiles and disseminates extensive data about road incidents around the nation (often once per year). This data is particularly fascinating and thorough for analysis and research because it contains, but is not limited to, geographic areas, weather conditions, vehicle types, casualty numbers, and vehicle manoeuvres.
______________
## *THE PROBLEM*:

In this 4-5 weeks project, the team will be harnessing the power of Machine Learning to predict the severity of RTCs and identify RTC hotspots which would allow the local authority to implement further traffic safety measures.

____________________
## *TIMELINE*:
The entire project is split into two parts of 2 weeks each. In part-1 of the project, our goal is to pre-process the data by transforming the raw data into an understandable format (numerical/categorical etc). Subsequently, perform exploratory data analysis (EDA) to visualize and make statistical decisions. Lastly, our feature engineering to find out the features that have the highest impact on the energy consumption. In part-2, the predictive model will be developed using several supervised learning techniques including ANN, and SVM. Subsequently, the predictive model would be evaluated based on metrics (MSE, MAE etc) and finally, the model is going to be deployed to a dashboard or an app.




___
## *Data Overview*:
The data come from the Open Data website of the UK government, where they have been published by the Department of Transport.

The dataset comprises of two csv files:

`accident_data.csv:` every line in the file represents a unique traffic accident (identified by the AccidentIndex column), featuring various properties related to the accident as columns. Date range: 2005-2017. ~1 million records.

`vechile_data.csv:` every line in the file represents the involvement of a unique vehicle in a unique traffic accident, featuring various vehicle and passenger properties as columns. Date range: 2004-2016. ~1.6 million records.

The two above-mentioned files/datasets can be linked through the unique traffic accident identifier (Accident_Index column). 

**Features**

> - ```Accident_Index``` accident ID

> - ```1st_Road_Class``` road class of 1st road the accident happened on.  For more information on UK road classes,  [click here.](https://www.gov.uk/government/publications/guidance-on-road-classification-and-the-primary-route-network/guidance-on-road-classification-and-the-primary-route-network) 

> - ```1st_Road_Number``` the road number of the 1st road the accident happened on. 

> - ```2nd_Road_Class``` road class of 2nd road the accident happened on.

> - ```2nd_Road_Class``` road number of 2nd road the accident happened on.

> - ```Accident_Serverity``` the **target variable**.  Indicates 3 classes of serverity:  "slight," "serious" and "fatal."

> - ```Carriageway_Hazards``` an observation of any hazards in the road at the time of the accident eg. animals or predestrians in the road. 

> - ```Date``` date of the accident. 

> - ```Did_Police_Officer_Attend_Scene_of_Accident``` 3 options: 
* 1 - Yes.
* 2 - No.
* 3 - No, the accident was reported by a self-completion form. 

> - ```Junction_Control``` what controls are in place to control traffic at the a junction.

> - ```Junction_Detail``` what type of junction at the location of the accident.  [Click here](https://www.intensive-driving-school.co.uk/types-of-road-junctions-in-the-uk) for UK junction types. 

> - ```Latitude``` latitude of where the accident took place

> - ```Light_Conditions``` the light condition at the time of the accident. 

> - ```Local_Authority_(District)``` which district council jurisdiction did ther accident occurr.

> - ```Local_Authority_(Highway)``` who is the highway authority for the area the accident took place.

> - ```Location_Easting_OSGR``` easting grid reference. [Click here](https://gridreferencefinder.com/) for more info.

> - ```Location_Easting_OSGR``` northing grid reference.

> - ```Longitude``` longitude of where the accident took place

> - ```LSOA_of_Accident_Location``` Lower Layer Super Output Area of accident.  [Click here](https://www.data.gov.uk/dataset/c481f2d3-91fc-4767-ae10-2efdf6d58996/lower-layer-super-output-areas-lsoas) for more info. 

> - ```Number_of_Casualties``` numbver of those killed or injured in the accident

> - ```Number_of_Vehicles``` number of vehicles involved in the accident

> - ```Pedestrian_Crossing-Human_Control``` was there a human controlled crossing present at the scene of the accident.
* 0. None within 50 metres
* 1. Control by school crossing patrol
* 2. Control by other authorised person 

> - ```Pedestrian_Crossing-Physical_Facilities``` number of vehicles involved in the accident
* 0. No physical crossing facility within 50 metres
* 1. Zebra crossing
* 4. Pelican, puffin, toucan or similar non-junction pedestrian light crossing
* 5. Pedestrian phase at traffic signal junction
* 7. Footbridge or subway
* 8. Central refuge - no other controls 

> - ```Police_Force``` the police force responsible for the area.

> - ```Road_Surface_Conditions``` condition of the road when the accident took place.

> - ```Special_Conditions_at_Site``` was there any other factors which could have caused the accident ie. oil on the road, faulty traffic lights etc. 

> - ```Speed_Limit``` speed limit where the accident took place. 

> - ```Time``` the time the accident took place.

> - ```Urban_or_Rural_Area``` was the location rural or urban.

> - ```Weather_Conditions``` what was the weather like at the time of the accident.

> - ```Year``` the year the accident took place.

> - ```InScotland``` did the accident take place in Scotland

There is more detailed information on these definitions [here](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/995423/stats20-2011.pdf). 









___
# 1- Data Cleaning

Data cleaning is a critically important step in any machine learning project. It has a lot of steps overlapping with **exploratory data analysis**, **feature selection**, **feature engineering**, **data transformation**, and **dimensionality reduction**. Our goal here at this step is to get the data tidy ie. fix missing values, check for duplicates, correct data types and fixing date and times etc. We will break this section down to two sub-sections i-e ***Data Insight*** and ***Data Preprocessing***. 
___
## 1.1- Data Insight

**The common steps for data insight are:**

- Checkout the **head** and **tail** of the data.
- **Study** and **understand** every feature in the data and what exactly it represents and how it is linked to every other feature and the tearget variable.
- **Check for the data types** of every feature because machine learning models need the data to be in numerical form i-e if you have categorical variables then we need to come up with a strategy to encode them so that it keeps the complexity of the model, and variance-bias in our data in a check.
- **Look at the summary statistics** of the data to eye ball the variables that have some odd distributions.
- **Visually look at the distribution** of the data and get some insight about each feature and take notes on all the features we have to fix for our next step ie. **Data Preprocessing**