<h1>UK Road Accidents Data Analysis</h1>
<hr>
<h2>Data Analyst: Kim Andrei D. Lugatoc</h2>

<h2>Import Libraries</h2>

In [1]:
import numpy as np 
import matplotlib.pyplot as plt
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

<h2>Load dataset</h2>

In [3]:
accidents = pd.read_csv('datasets\\uk_road_accident.csv')
accidents

Unnamed: 0,Index,Accident_Severity,Accident Date,Latitude,Light_Conditions,District Area,Longitude,Number_of_Casualties,Number_of_Vehicles,Road_Surface_Conditions,Road_Type,Urban_or_Rural_Area,Weather_Conditions,Vehicle_Type
0,200701BS64157,Serious,5/6/2019,51.506187,Darkness - lights lit,Kensington and Chelsea,-0.209082,1,2,Dry,Single carriageway,Urban,Fine no high winds,Car
1,200701BS65737,Serious,2/7/2019,51.495029,Daylight,Kensington and Chelsea,-0.173647,1,2,Wet or damp,Single carriageway,Urban,Raining no high winds,Car
2,200701BS66127,Serious,26-08-2019,51.517715,Darkness - lighting unknown,Kensington and Chelsea,-0.210215,1,3,Dry,,Urban,,Taxi/Private hire car
3,200701BS66128,Serious,16-08-2019,51.495478,Daylight,Kensington and Chelsea,-0.202731,1,4,Dry,Single carriageway,Urban,Fine no high winds,Bus or coach (17 or more pass seats)
4,200701BS66837,Slight,3/9/2019,51.488576,Darkness - lights lit,Kensington and Chelsea,-0.192487,1,2,Dry,,Urban,,Other vehicle
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
660674,201091NM01760,Slight,18-02-2022,57.374005,Daylight,Highland,-3.467828,2,1,Dry,Single carriageway,Rural,Fine no high winds,Car
660675,201091NM01881,Slight,21-02-2022,57.232273,Darkness - no lighting,Highland,-3.809281,1,1,Frost or ice,Single carriageway,Rural,Fine no high winds,Car
660676,201091NM01935,Slight,23-02-2022,57.585044,Daylight,Highland,-3.862727,1,3,Frost or ice,Single carriageway,Rural,Fine no high winds,Car
660677,201091NM01964,Serious,23-02-2022,57.214898,Darkness - no lighting,Highland,-3.823997,1,2,Wet or damp,Single carriageway,Rural,Fine no high winds,Motorcycle over 500cc


<h3>Checking for Null Values</h3>

In [None]:
accidents.isnull().sum()

In [None]:
accidents['Latitude'] = accidents['Latitude'].fillna(accidents['Latitude'].mean())
accidents['Longitude'] = accidents['Longitude'].fillna(accidents['Longitude'].mean())
accidents['Road_Surface_Conditions'] = accidents['Road_Surface_Conditions'].fillna('unaccounted')
accidents['Road_Type'] = accidents['Road_Type'].fillna(accidents['Road_Type'].mode()[0])
accidents['Urban_or_Rural_Area'] = accidents['Urban_or_Rural_Area'].fillna(accidents['Urban_or_Rural_Area'].mode()[0])
accidents['Weather_Conditions'] = accidents['Weather_Conditions'].fillna(accidents['Weather_Conditions'].mode()[0])

accidents.isnull().sum()

<h3>Clearing inconsistencies with the Data Set</h3>

In [None]:
accidents['Accident Date'] = accidents['Accident Date'].str.strip()
accidents['Accident Date'] = accidents['Accident Date'].astype('str')
accidents['Accident Date'] = accidents['Accident Date'].str.replace('/', '-')

In [11]:
accidents['Accident Date'] = pd.to_datetime(accidents['Accident Date'], dayfirst= True, errors= 'coerce')

<h3>Changing to data types to category</h3>

In [12]:
accidents.dtypes

Index                              object
Accident_Severity                  object
Accident Date              datetime64[ns]
Latitude                          float64
Light_Conditions                   object
District Area                      object
Longitude                         float64
Number_of_Casualties                int64
Number_of_Vehicles                  int64
Road_Surface_Conditions            object
Road_Type                          object
Urban_or_Rural_Area                object
Weather_Conditions                 object
Vehicle_Type                       object
dtype: object

<h3>Categorical Data Fields</h3>

In [14]:
accidents['Accident_Severity'] = accidents['Accident_Severity'].astype('category')
accidents['Light_Conditions'] = accidents['Light_Conditions'].astype('category')
accidents['District Area'] = accidents['District Area'].astype('category')
accidents['Road_Surface_Conditions'] = accidents['Road_Surface_Conditions'].astype('category')
accidents['Road_Type'] = accidents['Road_Type'].astype('category')
accidents['Urban_or_Rural_Area'] = accidents['Urban_or_Rural_Area'].astype('category')
accidents['Weather_Conditions'] = accidents['Weather_Conditions'].astype('category')
accidents['Vehicle_Type'] = accidents['Vehicle_Type'].astype('category')

# check for changes
accidents.dtypes

Index                              object
Accident_Severity                category
Accident Date              datetime64[ns]
Latitude                          float64
Light_Conditions                 category
District Area                    category
Longitude                         float64
Number_of_Casualties                int64
Number_of_Vehicles                  int64
Road_Surface_Conditions          category
Road_Type                        category
Urban_or_Rural_Area              category
Weather_Conditions               category
Vehicle_Type                     category
dtype: object

<h3>Making the date information as part of the dataset so we can extract data information</h3>

In [66]:
accidents['Year'] = accidents['Accident Date'].dt.year
accidents['Month'] = accidents['Accident Date'].dt.month
accidents['Day'] = accidents['Accident Date'].dt.day
accidents['DayOfWeek'] = accidents['Accident Date'].dt.dayofweek #MONDAY=0 #SUNDAY=6

accidents.dtypes

Index                              object
Accident_Severity                category
Accident Date              datetime64[ns]
Latitude                          float64
Light_Conditions                 category
District Area                    category
Longitude                         float64
Number_of_Casualties                int64
Number_of_Vehicles                  int64
Road_Surface_Conditions          category
Road_Type                        category
Urban_or_Rural_Area              category
Weather_Conditions               category
Vehicle_Type                     category
Year                              float64
Month                             float64
Day                               float64
DayOfWeek                         float64
dtype: object

<h1>Questions</h1>

<h3>1. How many accidents are recorded in the dataset?</h3>

In [15]:
accidents['Index'].count()

np.int64(660679)

<h3>Insight 1: As shown above, There are 660,679 record accidents in this dataset.</h3>

<h3>2. What is the most common accident severity?</h3>

In [18]:
accidents['Accident_Severity'].mode()

0    Slight
Name: Accident_Severity, dtype: category
Categories (3, object): ['Fatal', 'Serious', 'Slight']

<h3>Insight 2: As shown above, The most common accident severity is "Slight"</h3>

<h3>3. What is the most common road surface condition during accidents?</h3>

In [19]:
accidents['Road_Surface_Conditions'].mode()

0    Dry
Name: Road_Surface_Conditions, dtype: category
Categories (5, object): ['Dry', 'Flood over 3cm. deep', 'Frost or ice', 'Snow', 'Wet or damp']

<h3>Insight 3: The most common road surface condition during accidents is Dry. This suggests that accidents don’t only happen in bad weather, even normal road conditions can be risky.</h3>

<h3>4. What are the most common vehicle types involved in accidents?</h3>

In [21]:
vehicle_counts = accidents.groupby('Vehicle_Type')['Index'].count()
vehicle_counts

Vehicle_Type
Agricultural vehicle                       1947
Bus or coach (17 or more pass seats)      25878
Car                                      497992
Data missing or out of range                  6
Goods 7.5 tonnes mgw and over             17307
Goods over 3.5t. and under 7.5t            6096
Minibus (8 - 16 passenger seats)           1976
Motorcycle 125cc and under                15269
Motorcycle 50cc and under                  7603
Motorcycle over 125cc and up to 500cc      7656
Motorcycle over 500cc                     25657
Other vehicle                              5637
Pedal cycle                                 197
Ridden horse                                  4
Taxi/Private hire car                     13294
Van / Goods 3.5 tonnes mgw or under       34160
Name: Index, dtype: int64

<h3>Insight 4: The most common vehicle type involved in accidents is obviously Cars, while Cars are the most common type, the other types such as Motorcycles over 500, Motorcycles 125cc and under, Taxi/Private cars and Buses are also frequently recorded.</h3>

<h3>5. How many accidents happened in Urban vs Rural areas?)</h3>

In [24]:
urban_rural = accidents.groupby('Urban_or_Rural_Area')['Index'].count()
urban_rural

Urban_or_Rural_Area
Rural          238990
Unallocated        11
Urban          421663
Name: Index, dtype: int64

<h3>Insight 5: As shown above, The Urban area have 421,678 total accidents, while Rural area have 238,990 total accidents. Obviously, most accidents occur in the Urban areas compared to Rural ones.</h3>

<h3>6. How many accidents happened in Urban vs Rural areas?)</h3>

In [45]:
accidents.groupby('District Area')['Index'].count().nlargest(10)

District Area
Birmingham          13491
Leeds                8898
Manchester           6720
Bradford             6212
Sheffield            5710
Westminster          5706
Liverpool            5587
Glasgow City         4942
Bristol, City of     4819
Kirklees             4690
Name: Index, dtype: int64

<h3>Insight 6: Birmingham had the most accidents with 13,491 recorded accident, which is way higher compared to Leeds and Manchester.</h3>
<h3>Insight 7: Most of the top districts are big cities, so it makes sense because there are more cars, more people, and heavier traffic, meaning accidents are more likely to happen there.</h3>

<h3>7. Which road surface condition has the highest number of accidents?</h3>

In [52]:
accidents.groupby('Road_Surface_Conditions')['Index'].count()

Road_Surface_Conditions
Dry                     447821
Flood over 3cm. deep      1017
Frost or ice             18517
Snow                      5890
Wet or damp             186708
Name: Index, dtype: int64

<h3>Insight 8: Most accidents happen on Dry roads, showing that accidents aren’t only caused by bad weather but also by driver behavior.</h3>

<h3>8. How many accidents happened under different light conditions</h3>

In [55]:
accidents.groupby('Light_Conditions').size()

Light_Conditions
Darkness - lighting unknown      6484
Darkness - lights lit          129335
Darkness - lights unlit          2543
Darkness - no lighting          37437
Daylight                       484880
dtype: int64

<h3>Insight 9: The most accidents happen in daylight, which may be due to higher traffic volumes during the day rather than poor visibility at night.</h3>

<h3>9. What is the most common weather condition during accidents?</h3>

In [59]:
accidents.groupby('Weather_Conditions')['Index'].count()

Weather_Conditions
Fine + high winds          8554
Fine no high winds       520885
Fog or mist                3528
Other                     17150
Raining + high winds       9615
Raining no high winds     79696
Snowing + high winds        885
Snowing no high winds      6238
Name: Index, dtype: int64

<h3>Insight 10: The most common weather condition of accidents happen in fine weather with no high winds with over 500,000 recorded. This shows that good weather does not guarantee safety, accidents are more strongly linked to traffic and driver behavior than to poor weather.</h3>
<h3>Insight 11: Next to the most common weather condition is Raining no high winds. This highlights the added risk of reduced visibility and slippery roads.</h3>

<h3>10. What is the average number of casualties per accident?</h3>

In [63]:
accidents['Number_of_Casualties'].mean()

np.float64(1.357040257068864)

<h3>Insight 12: On average, each accident involves about 1–2 casualties, showing that while most accidents result in few casualties, the numbers add up significantly across all incidents.</h3>

<h3>11. What is the average number of vehicles involved in accidents across all records?</h3>

In [65]:
accidents['Number_of_Vehicles'].mean()

np.float64(1.8312554205597575)

<h3>Insight 13: On average, each accident involves about 1–2 casualties, showing that while most accidents result in few casualties, the numbers add up significantly across all incidents.</h3>

<h3>12. Which year had the most reported accidents?</h3>

In [70]:
accidents.groupby('Year')['Index'].count()

Year
2019.0    71867
2020.0    70163
2021.0    66172
2022.0    56805
Name: Index, dtype: int64

<h3>Insight 14: 2019 had the highest reported accidents (71,867), meaning road safety issues were most critical before the pandemic.</h3>
<h3>Insight 15: Accidents decreased in 2020 (70,163). Likely influenced by COVID-19 lockdowns</h3>
<h3>Insight 16: 2021 continued the downward trend (66,172), showing that even with partial reopening, road usage was still lower than pre-pandemic.</h3>
<h3>Insight 17: A2022 recorded the lowest accidents (56,805), which could indicate lasting effects of pandemic lifestyle changes or improved safety measures.</h3>

<h3>13. Which month across all years recorded the highest number of accidents?</h3>

In [71]:
accidents.groupby('Month')['Index'].count()

Month
1.0     18252
2.0     22264
3.0     21824
4.0     19787
5.0     21723
6.0     22196
7.0     22939
8.0     21106
9.0     22558
10.0    23962
11.0    24240
12.0    24156
Name: Index, dtype: int64

<h3>Insight 18: November (24,240 accidents) recorded the highest number of accidents. This could be linked to seasonal changes such as early winter conditions, shorter daylight hours, or increased traffic due to holiday preparations.</h3>

<h3>Insight 19: January (18,252 accidents) had the lowest number of accidents. This might be because of reduced travel after the holiday season or colder weather discouraging long-distance trips.</h3>

<h3>13. Which day of the week had the most reported accidents?</h3>

In [73]:
accidents.groupby('DayOfWeek')['Index'].count() # MONDAY = 0, SUNDAY = 6

DayOfWeek
0.0    28564
1.0    38714
2.0    40037
3.0    39641
4.0    39822
5.0    43164
6.0    35065
Name: Index, dtype: int64

<h3>Insight 20: Saturday (5) recorded the highest number of accidents with 43,164 cases. This suggests that weekends, when more people may be traveling for leisure, social activities, or long drives, are riskier times on the road.</h3>

<h3>Insight 21: Monday (0) had the lowest number of accidents with 28,564 cases. This could mean that at the start of the work week, people may be more cautious, or traffic patterns are more predictable compared to weekends.</h3>