<h5>Importing libraries</h5>

In [None]:
import pandas as pd
import numpy as np
from scipy.stats import f_oneway
import warnings
warnings.filterwarnings('ignore')

<h3>Importing dataset and converting it into column</h3>

In [None]:
uk = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/datasets/uk_road_accident.csv')

In [None]:
uk

In [None]:
uk.info()

<h3>Changing the data types of the columns</h3>

In [None]:
uk.dtypes

In [None]:
uk['Index'] = uk['Index'].astype('category')
uk['Accident_Severity'] = uk['Accident_Severity'].astype('category')
uk['Accident Date'] = uk['Accident Date'].astype('category')
uk['Light_Conditions'] = uk['Light_Conditions'].astype('category')
uk['District Area'] = uk['District Area'].astype('category')
uk['Road_Surface_Conditions'] = uk['Road_Surface_Conditions'].astype('category')
uk['Road_Type'] = uk['Road_Type'].astype('category')
uk['Urban_or_Rural_Area'] = uk['Urban_or_Rural_Area'].astype('category')
uk['Weather_Conditions'] = uk['Weather_Conditions'].astype('category')
uk['Vehicle_Type'] = uk['Vehicle_Type'].astype('category')

In [None]:
uk.dtypes

<h3>Cleaning empty datas</h3>

In [None]:
uk.isna().sum()

<h5>Cleaning empty numerical datas</h5>

In [None]:
uk['Latitude'] = uk['Latitude'].fillna(uk['Latitude'].mean())
uk['Longitude'] = uk['Longitude'].fillna(uk['Longitude'].mean())

In [None]:
uk.isna().sum()

<h5>Cleaning non-numerical empty datas </h5>

In [None]:
uk['Road_Surface_Conditions'] = uk['Road_Surface_Conditions'].fillna(uk['Road_Surface_Conditions'].mode()[0])
uk['Road_Type'] = uk['Road_Type'].fillna(uk['Road_Type'].mode()[0])
uk['Urban_or_Rural_Area'] = uk['Urban_or_Rural_Area'].fillna(uk['Urban_or_Rural_Area'].mode()[0])
uk['Weather_Conditions'] = uk['Weather_Conditions'].fillna(uk['Weather_Conditions'].mode()[0])

In [None]:
uk.isna().sum()

<h1>Exploratory Data Analytics</h5>

<h1>1. What percentages do each accident severity represent on the accident report? and which severity had the highest percentage on the report.

In [None]:
uk['Accident_Severity'].unique()

In [None]:
severity = uk['Accident_Severity']

In [None]:
Slight, Serious, Fatal = severity.value_counts()

<p>Percentage of Slight category</p>

In [None]:
(Slight / len(uk)) * 100

<p>Slight severity consumes <strong>85%</strong> of overall reports</p>

<p>Percentage of serious category</p>

In [None]:
(Serious / len(uk)) * 100

<p>Serious severity consumes <strong>13%</strong> of overall reports</p>

In [None]:
(Fatal / len(uk)) * 100

<p>Fatal severity consumes <strong>1%</strong> of overall reports</p>

<h1>Insight # 1: With the calculation above, we found out that Serious accident severity consumes 85.33% of the overall reportts while Serious and Fatal have 13.35% and 1.31% respectively</h1>

<h1>2. Does the weather condition and vehicle type affect the number of casualties? If so, which weather condition and vehicle type recorded the highest number of casualties?

In [None]:
 pd.set_option('display.max_colwidth', 500)

In [None]:
weatherandvehicle = uk.groupby(['Weather_Conditions', "Vehicle_Type"])['Number_of_Casualties'].size()

In [None]:
uk['Weather_Conditions'].value_counts()

<h3>Breakdown</h3>

In [None]:
# Fine and no high winds
weatherandvehicle['Fine no high winds'].sort_values(ascending = False)

In [None]:
# raining no high winds
weatherandvehicle['Raining no high winds'].sort_values(ascending = False)

In [None]:
# raining+ high winds
weatherandvehicle['Raining + high winds'].sort_values(ascending = False)

In [None]:
# Fine + high winds
weatherandvehicle['Fine + high winds'].sort_values(ascending = False)

In [None]:
# snowing no high winds
weatherandvehicle['Snowing no high winds'].sort_values(ascending = False)

In [None]:
# Fog or mist
weatherandvehicle['Fog or mist'].sort_values(ascending = False)

In [None]:
# Snowing + high winds
weatherandvehicle['Snowing + high winds'].sort_values(ascending = False)

<h1>Insight #2:</h1>
<h5>Car consistently dominated the vehucle type per casualties. It remained as the number one cause of accident regardlessof the weather condition. Surprisingly, contrary to the popular belief that wet roads causes more accidents, the data shows that 500000 of the overall report occured on a dry weather.

<h1>Question #3: Which is the safest? Urban or Rural area. Determine if the Light condition and Road type of an area determine the number of casualties it  will record. </h1>

In [None]:
uk.info()

In [None]:
urbanorrural = uk.groupby(['Urban_or_Rural_Area','Light_Conditions','Road_Type'])['Number_of_Casualties'].count()

<h3>Breakdown</h3>

In [None]:
# Rural
urbanorrural['Rural']

In [None]:
urbanorrural['Urban']

In [None]:
# Checking if Rural recorded more number of casualties than Urban
(urbanorrural['Rural'] > urbanorrural['Urban']).value_counts()

<p>We could say here that rural is safer than Urban, but we still need to compare the number of casualties each have</p>

In [None]:
urbanorrural['Rural'].sum()

In [None]:
urbanorrural['Urban'].sum()

<p>Now, let's determine which Road Type and Light condition from each area recorded the highest number of casualties

In [None]:
urbanorrural['Rural'].sort_values(ascending=False).head(n=5)

In [None]:
urbanorrural['Urban'].sort_values(ascending=False).head(n=5)

<h1>Insight #3:</h1>
<p>Daylight recorded the highest number of casualties, this would mean that drivers were more careful at night that they don't take extra car during daytime. Also, single carriageway road types recorded the highest number of casualties

<h1>Question #4: Traffic is one of the main source of accidents, determine which district areas recorded an accident involving more than 5 cars</h1>

In [None]:
more_than_5 = uk[uk['Number_of_Casualties'] > 5]
uk.groupby(more_than_5['District Area'])['Number_of_Casualties'].count().sort_values(ascending=False)

<h1>Insight #4:</h1>
<p>Birmingham District recorded 70 casualties involving more than 5 cars</p>

<h1>Question #5: Since Birminham District has the highest number of casualties involving more than 5 cars, determine what type of road surface condition they have.</h1>

In [None]:
birmingham = uk[uk['District Area'] == 'Birmingham']
birmingham_road_surface = birmingham[birmingham['Number_of_Casualties'] > 5]['Road_Surface_Conditions']
birmingham_road_surface.value_counts()

<h1>Insight #5:</h1>
<p>Strengthening the conclusion that we had earlier wet surface is not the main source of accident</p>

<h1>Question #6: Which district are have the highest number of fatal accidents</h1>

In [None]:
Fatal_per_district = uk[uk['Accident_Severity'] == 'Fatal']['District Area'].value_counts()

In [None]:
Fatal_per_district.sort_values(ascending=False).head(n=10)

<h1>Insight #6:</h1>
<p>Birmingham district have the highest number of Fatal accidents</p>

<h1>Question #7: What is the common vehicle types involved in Fatal Accidents?</h1>

In [None]:
Vehicle_fatal = uk[uk['Accident_Severity'] == 'Fatal']['Vehicle_Type'].value_counts()

In [None]:
Vehicle_fatal.sort_values(ascending=False).head(n=10)

<h1>Insight #7:</h1>
<p>Car remains the number one vehicle involved on fatal accidents</p>

<h1>Question #8: Is there a relationship between latitude and longitude and the number of casualties?</h1>

In [None]:
uk['Number_of_Casualties'].corr(uk['Latitude'])

In [None]:
uk['Number_of_Casualties'].corr(uk['Longitude'])

<h1>Insight #8:</h1>
<p>There's no relation between longitude, latitude, and number of casualties</p>

<h1>Question #9: Is there a correlation between the number of vehicles involved and the number of casualties</h1>

In [None]:
uk['Number_of_Casualties'].corr(uk['Number_of_Vehicles'])

<h1>Insight #9: No, there is no correlation between Number of vehicles involved and the number of casualties</h1>

<h1>Question #10: Which road surface conditions are prevalent during serious accidents?</h1>

In [None]:
serious_accident = uk[uk['Accident_Severity'] == 'Serious']
surface_condition_serious = serious_accident['Road_Surface_Conditions'].value_counts()
surface_condition_serious.sort_values(ascending=False)

<h1>Insight #10:</h1>
<p>Dry surface are prevalent during serious accidents</h1>

<h1>Insight #11: What are the most common combinations of weather conditions and light conditions during accidents?</h1>

In [None]:
weather_and_light = uk.groupby(['Weather_Conditions','Light_Conditions'])['Number_of_Casualties'].count()
weather_and_light.sort_values(ascending=False).head(n=10)

<h1>Insight #11:</h1>
<p>Fine no high winds and daylight are most common combination if weather and light conditions during accidents</p>

<h1>Question #12: What are the most frequent 'Vehicle_Type' involved in accidents under 'Darkness - lights lit' conditions?</h1>

In [None]:
darkness_lights_lit = uk[uk['Light_Conditions'] == 'Darkness - lights lit']
darkness_lights_lit['Vehicle_Type'].value_counts()

<h1>Insight #12: </h1>
<p>Car is the most common vehicle under Darkness - Lights lit condition</p>

<h1>Question #13: What is the distribution of Road types within Urban Areas compared to Rural Areas?</h1>

In [None]:
urban_road_types = uk[uk['Urban_or_Rural_Area']== "Urban"]['Road_Type']

In [None]:
urban_road_types.value_counts()

In [None]:
rural_road_types = uk[uk['Urban_or_Rural_Area'] == 'Rural']['Road_Type']

In [None]:
rural_road_types.value_counts()

<h1>Insight #13:</h1>
<p>According to the data, Urban areas have twice the amount of Single carriageway roads that Rural areas have. While there is no big difference between Dual carriageway, The gap between the nummber of roundabout, slip road, and one way street between Urban and Rural are prevalent.</p>

<h1>Question #14: Is there a significant difference in the number of vehicles involved in accidents between Urban and Rural areas considering the road type?</h1>

In [None]:
urban_vehicles_involved = uk[uk['Urban_or_Rural_Area'] == 'Urban']['Number_of_Vehicles']

In [None]:
rural_vehicles_involved = uk[uk['Urban_or_Rural_Area'] == 'Rural']['Number_of_Vehicles']

<p>Combining and creating a new column for percentage difference</p>

In [None]:
vehicle_counts = pd.DataFrame({'Urban': urban_vehicles_involved.value_counts(), 'Rural': rural_vehicles_involved.value_counts()})
vehicle_counts['Percentage_Difference'] = ((vehicle_counts['Urban'] - vehicle_counts['Rural']) / vehicle_counts['Rural']) * 100

In [None]:
vehicle_counts

<h1>Insight #14: According to the data above:</h1>
<p>According to the data above, we can see that there is a significant differece for the number of cars below 3, but it started to decline when the number of vehicles reaches above 4</p>


<h1>Question #15: Which district are have the highest number of accidents under "Wet or damp" surface conditions?</h1>

In [None]:
Accident_per_district_area = uk[uk['Road_Surface_Conditions'] == 'Wet or damp']['District Area'].value_counts()

In [None]:
Accident_per_district_area.sort_values(ascending=False).head(n=10)

<h1>Insight #15:</h1>
<p>Birmingham district holds the record for the most number of accidents under the Wet or damp surface conditions</p>

<h1>Question #16: What are the weather conditions in the top 10 district areas with the highest number of fatal accidents?</h1>

In [None]:
top_ten_district_area_fatal = uk[uk['Accident_Severity'] == 'Fatal']

In [None]:
top_ten_district_area_fatal.groupby(['District Area','Weather_Conditions'])['Number_of_Casualties'].count().sort_values(ascending=False).head(n=10)

<h1>Insight #16:</h1>
<p>Fine no high winds is the weather condition on the top ten district area with the most fatal accidents</h1>

<h1>Question #17: Is there a relation between Road surface condition and weather condition when it comes to the number of casualties?</h1>

In [None]:
from scipy.stats import f_oneway