# Study of road accidents 🚗

> Author: Luiza Kuze <br> Date: 27/01/2024

### Source

`https://www.kaggle.com/datasets/nextmillionaire/car-accident-dataset/data`

### Database Information

- These are accidents recorded in January 2021;
- Location: Kensington and Chelsea.

In [1]:
#@title importing libraries
import numpy as np
import pandas as pd

In [2]:
#@title file reading
uri = './RoadAccidentData.csv'
accidents = pd.read_csv(uri)

In [3]:
#@title initial data view
accidents.head()

Unnamed: 0,Accident_Index,Accident Date,Day_of_Week,Junction_Control,Junction_Detail,Accident_Severity,Latitude,Light_Conditions,Local_Authority_(District),Carriageway_Hazards,...,Number_of_Casualties,Number_of_Vehicles,Police_Force,Road_Surface_Conditions,Road_Type,Speed_limit,Time,Urban_or_Rural_Area,Weather_Conditions,Vehicle_Type
0,200901BS70001,1/1/2021,Thursday,Give way or uncontrolled,T or staggered junction,Serious,51.512273,Daylight,Kensington and Chelsea,,...,1,2,Metropolitan Police,Dry,One way street,30,15:11,Urban,Fine no high winds,Car
1,200901BS70002,1/5/2021,Monday,Give way or uncontrolled,Crossroads,Serious,51.514399,Daylight,Kensington and Chelsea,,...,11,2,Metropolitan Police,Wet or damp,Single carriageway,30,10:59,Urban,Fine no high winds,Taxi/Private hire car
2,200901BS70003,1/4/2021,Sunday,Give way or uncontrolled,T or staggered junction,Slight,51.486668,Daylight,Kensington and Chelsea,,...,1,2,Metropolitan Police,Dry,Single carriageway,30,14:19,Urban,Fine no high winds,Taxi/Private hire car
3,200901BS70004,1/5/2021,Monday,Auto traffic signal,T or staggered junction,Serious,51.507804,Daylight,Kensington and Chelsea,,...,1,2,Metropolitan Police,Frost or ice,Single carriageway,30,8:10,Urban,Other,Motorcycle over 500cc
4,200901BS70005,1/6/2021,Tuesday,Auto traffic signal,Crossroads,Serious,51.482076,Darkness - lights lit,Kensington and Chelsea,,...,1,2,Metropolitan Police,Dry,Single carriageway,30,17:25,Urban,Fine no high winds,Car


In [4]:
#@title analyzing data types in columns
accidents.dtypes

Accident_Index                 object
Accident Date                  object
Day_of_Week                    object
Junction_Control               object
Junction_Detail                object
Accident_Severity              object
Latitude                      float64
Light_Conditions               object
Local_Authority_(District)     object
Carriageway_Hazards            object
Longitude                     float64
Number_of_Casualties            int64
Number_of_Vehicles              int64
Police_Force                   object
Road_Surface_Conditions        object
Road_Type                      object
Speed_limit                     int64
Time                           object
Urban_or_Rural_Area            object
Weather_Conditions             object
Vehicle_Type                   object
dtype: object

In [5]:
#@title number of rows and columns
accidents.shape

(307973, 21)

## Inicial Treatment 🛠

In [6]:
#@title checking for duplications
accidents.duplicated().any()

True

In [7]:
#@title removing duplications and keeping only one occurrence
accidents_treated = accidents.drop_duplicates()

In [8]:
#@title number of rows and columns after data processing
accidents_treated.shape

(307972, 21)

In [9]:
#@title checking missing values
accidents.isnull().sum()

Accident_Index                     0
Accident Date                      0
Day_of_Week                        0
Junction_Control                   0
Junction_Detail                    0
Accident_Severity                  0
Latitude                           0
Light_Conditions                   0
Local_Authority_(District)         0
Carriageway_Hazards           302549
Longitude                          0
Number_of_Casualties               0
Number_of_Vehicles                 0
Police_Force                       0
Road_Surface_Conditions          317
Road_Type                       1534
Speed_limit                        0
Time                              17
Urban_or_Rural_Area                0
Weather_Conditions              6057
Vehicle_Type                       0
dtype: int64

### About these NaN:
- **Carriageway_Hazards**: Describes any hazards present on the carriageway at the time of the accident. NaN: ✅
- **Road_Surface_Conditions**: Describes The surface conditions of the road at the time of the accident. NaN: ❌
- **Road_Type**: Specifies the type of road where the accident occurred. NaN: ❌
- **Time**: The time of day when the accident happened (format: HH:MM). NaN: ❌
- **Weather_Conditions**: Describes the weather conditions at the time of the accident. NaN: ❌