# Task overview

**Goal: Reduce the number of accidents in 2017 based on data from 2016**

1. Question 1. Data Understanding
    * Report how many rows and columns of data have been loaded for each one of the files
    * Report a general description of the different features that can be found in Accidents.csv
    * Select and justify what features will be kept and which ones will not be used from Accidents.csv
2. Question 2. Exploratory Analysis
    * Plot histograms and other exploratory graphs for insights
    * Using roadsafetykey file, explain any patter found and any hypotheses you would have
    * Get simple statistics that could help understand the distribution of the data
    * Get a rough sense of identifying areas or type of roads with more number of accidents (Map plot)
3. Question 3. Decide Approach
    * Explain what different models could be used to reach goal of reducing the number of traffic accidents
    * Identify pros, cons and how the effectiveness of this approach could be measured
4. Question 4. Data Transformation
    * Select and justify features used for the approach and the size of data sample
    * Perform transformation required and explain them
5. Question 5. Modelling and results
    * Build supervised, unsupervised or semi-supervised model and explain the results

Datasets have been moved to the 'data' folder.

In [3]:
# Imports
import pandas as pd

In [22]:
def createDataFrame(name:str) -> pd.DataFrame:
    return pd.read_csv('data/{}'.format(name))

def getNumberColsRows(df:pd.DataFrame):
    print('Number of rows: {}'.format(df.shape[0]))
    print('Number of columns: {}'.format(df.shape[1]))

def getBasicInformation(df:pd.DataFrame):
    print(df.shape)
    print(df.info())
    print(df.describe())
    print('Duplicated values')
    print(df.duplicated().sum())
    print('None values')
    print(df.isna().sum())

In [12]:
accidents = createDataFrame('Accidents2016.csv')
casualties = createDataFrame('Casualties2016.csv')
makemodel = createDataFrame('MakeModel2016.csv')
vehicles = createDataFrame('Vehicles2016.csv')

In [23]:
getNumberColsRows(accidents)

Number of rows: 136621
Number of columns: 32


In [24]:
accidents.columns

Index(['Accident_Index', 'Location_Easting_OSGR', 'Location_Northing_OSGR',
       'Longitude', 'Latitude', 'Police_Force', 'Accident_Severity',
       'Number_of_Vehicles', 'Number_of_Casualties', 'Date', 'Day_of_Week',
       'Time', 'Local_Authority_(District)', 'Local_Authority_(Highway)',
       '1st_Road_Class', '1st_Road_Number', 'Road_Type', 'Speed_limit',
       'Junction_Detail', 'Junction_Control', '2nd_Road_Class',
       '2nd_Road_Number', 'Pedestrian_Crossing-Human_Control',
       'Pedestrian_Crossing-Physical_Facilities', 'Light_Conditions',
       'Weather_Conditions', 'Road_Surface_Conditions',
       'Special_Conditions_at_Site', 'Carriageway_Hazards',
       'Urban_or_Rural_Area', 'Did_Police_Officer_Attend_Scene_of_Accident',
       'LSOA_of_Accident_Location'],
      dtype='object')

In [31]:
accidents['Junction_Detail'].unique()

array([ 0,  9,  1,  3,  7,  6,  2,  5,  8, -1])

## Accidents.csv columns description: 
1. Location_Easting/Northing: \
10 digit ordnance survey grid reference number, the first digit define the national 100 kilometre square, the second digit define the national 10 kilometre square, the third digit define the national 1 kilometre square, the fourth digit define the national 100 metre square, the fifth digit define the national 10 metre square
2. Longitute/Latitude:\
?
3. Police_Force:\
Code number of the police force in whose area the accident occured. 
4. Accident_Severity:\
1 - Fatal
2 - Serious
3 - Slight
5. Number_of_Vehicles:\
Number of vehicle records submitted for the accident
6. Number_of_Casualties:\
Number of casualty records submitted for the accident
7. Date:\
Date of the accident
8. Day_of_Week:\
Weekday
9. Time:\
Time
10. Local_Authority_(District):\
Number of the local authority in whose are the accident occured
11. Local_Authority_(Highway):\
?
12. 1st_Road_Class\
1 - Motorway
2 - A(M)
3 - A
4 - B
5 - C
6 - Unclassified
13. 1st_Road_Number\
Number of the road whose class was entered
14. Road_Type\
1 - Roundabout
2 - One way street
3 - Dual carriageway
6 - Single carriageway
7 - Slip Road
9 - Unknown
15. Speed_limit\
General speed limit applicable to the road on which the accident occurred.
MPH
16. Junction_Detail\
00 - Not at or within 20 metres of junction
01 - Roundabout
02 - Mini roundabout
03 - T or staggered junction
05 - Slip road
06 - Crossroads
07 - Multiple junction
08 - Using private drive or entrance
09 - Other junction
17. 2nd_Road_Class\
For junction accidents only
1 - Motorway
2 - A(M)
3 - A
4 - B
5 - C
6 - Unclassified
13. 2nd_Road_Number\
Number of the road whose class was entered