<h1>UK ROAD ACCIDENTS DATA ANALYSIS</h1>
<hr>
<h2>Inclusive Year: 2019-2022</h2>
<h3>Analyst: Maria Louizza B. Pajarillon</h3>

In [1]:
import numpy as np
import pandas as pd
from scipy.stats import f_oneway
from scipy import stats

In [2]:
pip install scipy

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


<h2>Converting CSV file into a Pandas DataFrame</h2>

In [3]:
accident = pd.read_csv('datasets\\uk_road_accident.csv')
accident

Unnamed: 0,Index,Accident_Severity,Accident_Date,Latitude,Light_Conditions,District_Area,Longitude,Number_of_Casualties,Number_of_Vehicles,Road_Surface_Conditions,Road_Type,Urban_or_Rural_Area,Weather_Conditions,Vehicle_Type
0,200701BS64157,Serious,05/06/2019,51.506187,Darkness - lights lit,Kensington and Chelsea,-0.209082,1,2,Dry,Single carriageway,Urban,Fine no high winds,Car
1,200701BS65737,Serious,02/07/2019,51.495029,Daylight,Kensington and Chelsea,-0.173647,1,2,Wet or damp,Single carriageway,Urban,Raining no high winds,Car
2,200701BS66127,Serious,26/08/2019,51.517715,Darkness - lighting unknown,Kensington and Chelsea,-0.210215,1,3,Dry,,Urban,,Taxi/Private hire car
3,200701BS66128,Serious,16/08/2019,51.495478,Daylight,Kensington and Chelsea,-0.202731,1,4,Dry,Single carriageway,Urban,Fine no high winds,Bus or coach (17 or more pass seats)
4,200701BS66837,Slight,03/09/2019,51.488576,Darkness - lights lit,Kensington and Chelsea,-0.192487,1,2,Dry,,Urban,,Other vehicle
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
660674,201091NM01760,Slight,18/02/2022,57.374005,Daylight,Highland,-3.467828,2,1,Dry,Single carriageway,Rural,Fine no high winds,Car
660675,201091NM01881,Slight,21/02/2022,57.232273,Darkness - no lighting,Highland,-3.809281,1,1,Frost or ice,Single carriageway,Rural,Fine no high winds,Car
660676,201091NM01935,Slight,23/02/2022,57.585044,Daylight,Highland,-3.862727,1,3,Frost or ice,Single carriageway,Rural,Fine no high winds,Car
660677,201091NM01964,Serious,23/02/2022,57.214898,Darkness - no lighting,Highland,-3.823997,1,2,Wet or damp,Single carriageway,Rural,Fine no high winds,Motorcycle over 500cc


<h2>Checking and Filling the Missing Values</h2>

In [4]:
accident.isnull().sum()

Index                          0
Accident_Severity              0
Accident_Date                  0
Latitude                      25
Light_Conditions               0
District_Area                  0
Longitude                     26
Number_of_Casualties           0
Number_of_Vehicles             0
Road_Surface_Conditions      726
Road_Type                   4520
Urban_or_Rural_Area           15
Weather_Conditions         14128
Vehicle_Type                   0
dtype: int64

In [5]:
#For Categorical Datas [.mode()]
accident['Latitude'] = accident['Latitude'].fillna(accident['Latitude'].mode()[0])
accident['Longitude'] = accident['Longitude'].fillna(accident['Longitude'].mode()[0])

#For large number of missing values
accident['Road_Surface_Conditions'] = accident['Road_Surface_Conditions'].fillna('unknown road condition')
accident['Road_Type'] = accident['Road_Type'].fillna('unaccounted')
accident['Weather_Conditions'] = accident['Weather_Conditions'].fillna('unaccounted')

accident['Urban_or_Rural_Area'] = accident['Urban_or_Rural_Area'].fillna(accident['Urban_or_Rural_Area'].mode()[0])

In [6]:
#Missing value counts
accident.isnull().sum()

Index                      0
Accident_Severity          0
Accident_Date              0
Latitude                   0
Light_Conditions           0
District_Area              0
Longitude                  0
Number_of_Casualties       0
Number_of_Vehicles         0
Road_Surface_Conditions    0
Road_Type                  0
Urban_or_Rural_Area        0
Weather_Conditions         0
Vehicle_Type               0
dtype: int64

<h2>Extracting date information using pandas date time</h2>

In [35]:
accident['Accident_Date'] = pd.to_datetime(accident['Accident_Date'], dayfirst = True, errors = 'coerce') 

In [36]:
accident['Accident_Date'] = accident['Accident_Date'].astype('str')
accident['Accident_Date'] = accident['Accident_Date'].str.strip()
accident['Accident_Date'] = accident['Accident_Date'].str.replace('/', '-')

In [38]:
accident['Accident_Date'] = pd.to_datetime(accident['Accident_Date'], dayfirst = True, errors = 'coerce') 
accident['Year'] = accident['Accident_Date'].dt.year
accident['Month'] = accident['Accident_Date'].dt.month
accident['Day'] = accident['Accident_Date'].dt.day
accident['DayofWeek'] = accident['Accident_Date'].dt.dayofweek
accident['Accident_Date'].value_counts()

Accident_Date
2021-11-02    685
2021-06-10    680
2019-06-12    678
2019-01-02    676
2021-04-12    667
             ... 
2022-09-01    236
2022-02-12    236
2022-05-04    231
2022-07-02    213
2022-10-01    123
Name: count, Length: 576, dtype: int64

<h2>Changing Datatypes into Categorical Fields</h2>

In [8]:
accident.dtypes

Index                              object
Accident_Severity                  object
Accident_Date              datetime64[ns]
Latitude                          float64
Light_Conditions                   object
District_Area                      object
Longitude                         float64
Number_of_Casualties                int64
Number_of_Vehicles                  int64
Road_Surface_Conditions            object
Road_Type                          object
Urban_or_Rural_Area                object
Weather_Conditions                 object
Vehicle_Type                       object
Year                                int32
Month                               int32
Day                                 int32
DayofWeek                           int32
dtype: object

In [9]:
accident['Index'] = accident['Index'].astype('category')
accident['Accident_Severity'] = accident['Accident_Severity'].astype('category')
accident['Latitude'] = accident['Latitude'].astype('category')
accident['Light_Conditions'] = accident['Light_Conditions'].astype('category')
accident['District_Area'] = accident['District_Area'].astype('category')
accident['Longitude'] = accident['Longitude'].astype('category')
accident['Road_Surface_Conditions'] = accident['Road_Surface_Conditions'].astype('category')
accident['Road_Type'] = accident['Road_Type'].astype('category')
accident['Urban_or_Rural_Area'] = accident['Urban_or_Rural_Area'].astype('category')
accident['Weather_Conditions'] = accident['Weather_Conditions'].astype('category')
accident['Vehicle_Type'] = accident['Vehicle_Type'].astype('category')

accident.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 660679 entries, 0 to 660678
Data columns (total 18 columns):
 #   Column                   Non-Null Count   Dtype         
---  ------                   --------------   -----         
 0   Index                    660679 non-null  category      
 1   Accident_Severity        660679 non-null  category      
 2   Accident_Date            660679 non-null  datetime64[ns]
 3   Latitude                 660679 non-null  category      
 4   Light_Conditions         660679 non-null  category      
 5   District_Area            660679 non-null  category      
 6   Longitude                660679 non-null  category      
 7   Number_of_Casualties     660679 non-null  int64         
 8   Number_of_Vehicles       660679 non-null  int64         
 9   Road_Surface_Conditions  660679 non-null  category      
 10  Road_Type                660679 non-null  category      
 11  Urban_or_Rural_Area      660679 non-null  category      
 12  Weather_Conditio

In [10]:
accident.describe(include='all')

Unnamed: 0,Index,Accident_Severity,Accident_Date,Latitude,Light_Conditions,District_Area,Longitude,Number_of_Casualties,Number_of_Vehicles,Road_Surface_Conditions,Road_Type,Urban_or_Rural_Area,Weather_Conditions,Vehicle_Type,Year,Month,Day,DayofWeek
count,660679.0,660679,660679,660679.0,660679,660679,660679.0,660679.0,660679.0,660679,660679,660679,660679,660679,660679.0,660679.0,660679.0,660679.0
unique,421020.0,3,,511618.0,5,422,529766.0,,,6,6,3,9,16,,,,
top,2010000000000.0,Slight,,52.458798,Daylight,Birmingham,-0.977611,,,Dry,Single carriageway,Urban,Fine no high winds,Car,,,,
freq,239478.0,563801,,75.0,484880,13491,71.0,,,447821,492143,421678,520885,497992,,,,
mean,,,2020-11-30 08:30:32.761749760,,,,,1.35704,1.831255,,,,,,2020.40909,6.607965,15.58135,3.111195
min,,,2019-01-01 00:00:00,,,,,1.0,1.0,,,,,,2019.0,1.0,1.0,0.0
25%,,,2019-11-27 00:00:00,,,,,1.0,1.0,,,,,,2019.0,4.0,8.0,1.0
50%,,,2020-11-13 00:00:00,,,,,1.0,2.0,,,,,,2020.0,7.0,16.0,3.0
75%,,,2021-11-17 00:00:00,,,,,1.0,2.0,,,,,,2021.0,10.0,23.0,5.0
max,,,2022-12-31 00:00:00,,,,,68.0,32.0,,,,,,2022.0,12.0,31.0,6.0


In [11]:
accident

Unnamed: 0,Index,Accident_Severity,Accident_Date,Latitude,Light_Conditions,District_Area,Longitude,Number_of_Casualties,Number_of_Vehicles,Road_Surface_Conditions,Road_Type,Urban_or_Rural_Area,Weather_Conditions,Vehicle_Type,Year,Month,Day,DayofWeek
0,200701BS64157,Serious,2019-06-05,51.506187,Darkness - lights lit,Kensington and Chelsea,-0.209082,1,2,Dry,Single carriageway,Urban,Fine no high winds,Car,2019,6,5,2
1,200701BS65737,Serious,2019-07-02,51.495029,Daylight,Kensington and Chelsea,-0.173647,1,2,Wet or damp,Single carriageway,Urban,Raining no high winds,Car,2019,7,2,1
2,200701BS66127,Serious,2019-08-26,51.517715,Darkness - lighting unknown,Kensington and Chelsea,-0.210215,1,3,Dry,unaccounted,Urban,unaccounted,Taxi/Private hire car,2019,8,26,0
3,200701BS66128,Serious,2019-08-16,51.495478,Daylight,Kensington and Chelsea,-0.202731,1,4,Dry,Single carriageway,Urban,Fine no high winds,Bus or coach (17 or more pass seats),2019,8,16,4
4,200701BS66837,Slight,2019-09-03,51.488576,Darkness - lights lit,Kensington and Chelsea,-0.192487,1,2,Dry,unaccounted,Urban,unaccounted,Other vehicle,2019,9,3,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
660674,201091NM01760,Slight,2022-02-18,57.374005,Daylight,Highland,-3.467828,2,1,Dry,Single carriageway,Rural,Fine no high winds,Car,2022,2,18,4
660675,201091NM01881,Slight,2022-02-21,57.232273,Darkness - no lighting,Highland,-3.809281,1,1,Frost or ice,Single carriageway,Rural,Fine no high winds,Car,2022,2,21,0
660676,201091NM01935,Slight,2022-02-23,57.585044,Daylight,Highland,-3.862727,1,3,Frost or ice,Single carriageway,Rural,Fine no high winds,Car,2022,2,23,2
660677,201091NM01964,Serious,2022-02-23,57.214898,Darkness - no lighting,Highland,-3.823997,1,2,Wet or damp,Single carriageway,Rural,Fine no high winds,Motorcycle over 500cc,2022,2,23,2


In [12]:
accident.shape 

(660679, 18)

<h1>Q1. What is the average number of casualties per accident?</h1>

In [13]:
mean_casualties = accident['Number_of_Casualties'].mean()
mean_casualties

np.float64(1.357040257068864)

<H3>Insight: The average number of casualties per accident is approximately 1.357040257068864, indicating that most accidents involve fewer than 2 casualties. The histogram likely shows a right-skewed distribution, with the majority of accidents causing low or no casualties.</H3>

<H1>Q2. Is there a correlation between number of vehicles and casualties?</H1>

In [14]:
cor1 = accident['Number_of_Casualties'].corr(accident['Number_of_Vehicles'])
cor1

np.float64(0.2288888612692756)

<h3>Insight: There is a positive correlation of 0.2288888612692756 between the number of vehicles involved and casualties. This suggests that multi-vehicle accidents tend to result in more casualties, though the relationship is likely moderate.</h3>

<h1>Q3. Do different road surface conditions affect the number of casualties?</h1>

In [15]:
groups = [group['Number_of_Casualties'].dropna()
    for _, group in accident.groupby('Road_Surface_Conditions')]
stats.f_oneway(*groups)

  for _, group in accident.groupby('Road_Surface_Conditions')]


F_onewayResult(statistic=np.float64(236.43804250579612), pvalue=np.float64(3.579703854070146e-253))

<h3>Insight: The ANOVA test shows a highly significant difference (p-value ≈ 0) in casualties across road surface conditions, indicating that the type of road surface strongly affects accident severity.</h3>

<h1>Q4. Which months have the highest number of accidents?</h1>

In [16]:
accident['Month'] = accident['Accident_Date'].dt.month 
monthly_counts = accident['Month'].value_counts().sort_index()
monthly_counts

Month
1     52872
2     49491
3     54086
4     51744
5     56352
6     56481
7     57445
8     53913
9     56455
10    59580
11    60424
12    51836
Name: count, dtype: int64

<h2>Insight: Accidents peak in certain months, typically during mid-year and end-of-year periods, suggesting a possible link to seasonal factors, holidays, or traffic patterns.</h2>

<h1> Q5. Which district has the highest average accident severity?</h1>

In [17]:
severity_counts = accident.groupby(['District_Area', 'Accident_Severity']).size().unstack(fill_value=0)
print(severity_counts.head())


Accident_Severity  Fatal  Serious  Slight
District_Area                            
Aberdeen City         12      239    1072
Aberdeenshire         66      463    1401
Adur                   8      101     510
Allerdale             24      143     961
Alnwick                6       33     193


  severity_counts = accident.groupby(['District_Area', 'Accident_Severity']).size().unstack(fill_value=0)


In [18]:
# Total accidents per district
severity_counts['Total'] = severity_counts.sum(axis=1)

# Percentage of each severity type
severity_percentages = severity_counts.div(severity_counts['Total'], axis=0) * 100
severity_percentages = severity_percentages.drop(columns='Total')

print(severity_percentages.head())


Accident_Severity     Fatal    Serious     Slight
District_Area                                    
Aberdeen City      0.907029  18.065004  81.027967
Aberdeenshire      3.419689  23.989637  72.590674
Adur               1.292407  16.316640  82.390953
Allerdale          2.127660  12.677305  85.195035
Alnwick            2.586207  14.224138  83.189655


<H2>Insight: Most districts have a high proportion of slight accidents, but some like Aberdeenshire and Allerdale show a relatively higher share of serious and fatal accidents, signaling potentially more dangerous driving environments or road conditions in those areas.</H2>

<h1>Q6. What combination of light and weather is associated with the highest average severity?</h1>

In [19]:
severity_map = {
    'Slight': 1,
    'Serious': 2,
    'Fatal': 3
}

accident['Severity_Score'] = accident['Accident_Severity'].map(severity_map)
# Ensure Severity_Score is numeric
accident['Severity_Score'] = pd.to_numeric(accident['Severity_Score'], errors='coerce')


In [20]:
combo = accident.groupby(
    ['Light_Conditions', 'Weather_Conditions'], observed=True
)['Severity_Score'].mean().unstack()
combo

Weather_Conditions,Fine + high winds,Fine no high winds,Fog or mist,Other,Raining + high winds,Raining no high winds,Snowing + high winds,Snowing no high winds,unaccounted
Light_Conditions,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Darkness - lighting unknown,1.220339,1.161553,1.092308,1.177419,1.162162,1.122383,1.0,1.085106,1.109697
Darkness - lights lit,1.215443,1.187567,1.18241,1.130752,1.167645,1.152122,1.141079,1.089168,1.124066
Darkness - lights unlit,1.254902,1.181765,1.162162,1.146853,1.205479,1.159041,1.2,1.166667,1.155556
Darkness - no lighting,1.278894,1.303776,1.227378,1.175457,1.236121,1.237436,1.163743,1.159063,1.320312
Daylight,1.163561,1.151852,1.164942,1.110165,1.135168,1.122492,1.11479,1.09554,1.088927


<h2>Insight: Accidents occurring in darkness without lighting consistently show the highest average severity, especially when combined with fine weather + high winds or raining conditions, highlighting that poor visibility is a major factor in more severe accidents — regardless of weather.</h2>

<h1>Q7. What proportion of accidents occur in urban areas compared to rural areas?</h1>

In [21]:
urban_rural_counts = accident['Urban_or_Rural_Area'].value_counts()
urban_rural_counts

Urban_or_Rural_Area
Urban          421678
Rural          238990
Unallocated        11
Name: count, dtype: int64

<h2>Insight #7: </h2>

<h1>Q8. How does the average number of vehicles involved vary by accident severity?</h1>

In [22]:
avg_vehicles_by_severity = accident.groupby('Accident_Severity')['Number_of_Vehicles'].mean()
avg_vehicles_by_severity

  avg_vehicles_by_severity = accident.groupby('Accident_Severity')['Number_of_Vehicles'].mean()


Accident_Severity
Fatal      1.786976
Serious    1.678327
Slight     1.855864
Name: Number_of_Vehicles, dtype: float64

<h2>Insight #8: </h2>

<h1>Q9. Is there a correlation between the number of vehicles involved and the number of casualties? </h1>

In [23]:
corr2 = accident[['Number_of_Vehicles', 'Number_of_Casualties']].corr().iloc[0,1]
corr2

np.float64(0.22888886126926722)

<h2>Insight #9: </h2>

<h1>Q10. Does weather condition significantly influence accident severity?</h1>

In [24]:
groups = [accident[accident['Weather_Conditions'] == condition]['Severity_Score'].dropna() 
          for condition in accident['Weather_Conditions'].unique()]

f_stat, p_val = f_oneway(*groups)
print(f"ANOVA F-statistic: {f_stat:.3f}, p-value: {p_val:.3f}")


ANOVA F-statistic: 113.486, p-value: 0.000


<h2>Insight #10:</h2>

<h1>Q11. Which districts experience the highest volume of accidents?</h1>

In [25]:
top_districts = accident['District_Area'].value_counts().head(10)
print(top_districts)

District_Area
Birmingham          13491
Leeds                8898
Manchester           6720
Bradford             6212
Sheffield            5710
Westminster          5706
Liverpool            5587
Glasgow City         4942
Bristol, City of     4819
Kirklees             4690
Name: count, dtype: int64


<h2>Insight #11: The highest accident volumes occur in major urban districts like Birmingham and Leeds, likely due to higher traffic density.</h2>

<h1> Q12. Which districts have the highest number of fatal accidents?</h1>

In [26]:
fatal_districts = accident[accident['Accident_Severity'] == 'Fatal']['District_Area'].value_counts().head(10)
print(fatal_districts)


District_Area
Birmingham                  105
Leeds                        93
Highland                     88
East Riding of Yorkshire     85
Bradford                     71
Aberdeenshire                66
Powys                        59
Wakefield                    56
Doncaster                    56
Herefordshire, County of     51
Name: count, dtype: int64


<h2>Insight 12: Fatal accidents are highest in both major cities like Birmingham and Leeds, and rural areas such as Highland and Aberdeenshire, indicating risks exist across diverse environments.</h2>

<h1>Q13. What is the fatality rate by weather condition?</h1>

In [27]:
fatal_counts = accident[accident['Accident_Severity'] == 'Fatal']['Weather_Conditions'].value_counts()
total_counts = accident['Weather_Conditions'].value_counts()
fatality_rate = (fatal_counts / total_counts).sort_values(ascending=False)
print(fatality_rate)

Weather_Conditions
Fog or mist              0.023243
Fine + high winds        0.020458
Raining + high winds     0.015081
Fine no high winds       0.013631
Raining no high winds    0.010640
Other                    0.009621
unaccounted              0.007574
Snowing no high winds    0.005771
Snowing + high winds     0.003390
Name: count, dtype: float64


<h2>Insight 13: The fatality rate is highest during fog or mist conditions, suggesting reduced visibility significantly increases the risk of fatal accidents. Additionally, adverse weather combined with high winds also leads to higher fatality rates compared to calm weather conditions.</h2>

<h1>Q14. How does accident frequency vary across days of the week?
</h1>

In [28]:
accident['Day_of_Week'] = pd.to_datetime(accident['Accident_Date']).dt.day_name()
day_counts = accident['Day_of_Week'].value_counts().reindex(
    ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
)
print(day_counts)


Day_of_Week
Monday        72680
Tuesday       94550
Wednesday     99558
Thursday      99511
Friday        97900
Saturday     107178
Sunday        89302
Name: count, dtype: int64


<h2> Insight 14: Accident frequency peaks on Saturdays and is lowest on Mondays, indicating higher risk during weekends, possibly due to increased leisure travel and social activities.</h2>

<h1>Q15. Does accident severity vary by the hour of the day?</h1>

In [29]:
accident['Hour'] = pd.to_datetime(accident['Accident_Date']).dt.hour
severity_hour = accident.groupby('Hour')['Accident_Severity'].value_counts().unstack().fillna(0)
print(severity_hour)


Accident_Severity  Fatal  Serious  Slight
Hour                                     
0                   8661    88217  563801


<h2>Insight 15: Accident severity varies by hour, with a higher number of slight and serious accidents occurring during late-night and early morning hours, while fatal accidents, though fewer in number, remain consistently present throughout the day.</h2>

<h1>Q16. What are the most common road surface conditions in fatal accidents?</h1>

In [30]:
fatal_conditions = accident[accident['Accident_Severity'] == 'Fatal']['Road_Surface_Conditions'].value_counts()
print(fatal_conditions)


Road_Surface_Conditions
Dry                       5788
Wet or damp               2620
Frost or ice               193
Snow                        35
Flood over 3cm. deep        23
unknown road condition       2
Name: count, dtype: int64


<h2> Insight 16: The majority of fatal accidents occur on dry road surfaces, followed by wet or damp conditions, indicating that while adverse weather contributes to fatal crashes, most fatalities happen even in seemingly favorable road conditions.</h2>

<h1>Q17. Is there a correlation between the number of vehicles and accident severity?</h1>

In [31]:
accident[['Number_of_Vehicles', 'Severity_Score']].corr()


Unnamed: 0,Number_of_Vehicles,Severity_Score
Number_of_Vehicles,1.0,-0.075324
Severity_Score,-0.075324,1.0


<h2>Insight: </h2>

<h1>Q18. Is there a correlation between the number of vehicles and the number of casualties in rural vs urban areas?</h1>

In [32]:
urban_corr = accident[accident['Urban_or_Rural_Area'] == 'Urban'][['Number_of_Vehicles', 'Number_of_Casualties']].corr()
rural_corr = accident[accident['Urban_or_Rural_Area'] == 'Rural'][['Number_of_Vehicles', 'Number_of_Casualties']].corr()

print("Urban correlation:\n", urban_corr)
print("Rural correlation:\n", rural_corr)


Urban correlation:
                       Number_of_Vehicles  Number_of_Casualties
Number_of_Vehicles              1.000000              0.217615
Number_of_Casualties            0.217615              1.000000
Rural correlation:
                       Number_of_Vehicles  Number_of_Casualties
Number_of_Vehicles              1.000000              0.236078
Number_of_Casualties            0.236078              1.000000


<h2>Inisght 18: Both urban and rural areas show a weak positive correlation between the number of vehicles involved and the number of casualties, with rural areas having a slightly higher correlation. This suggests that as the number of vehicles increases, casualties tend to rise modestly in both settings.</h2>

<h1>Q19. Does road type significantly affect accident severity?</h1>

In [33]:
anova_road = [accident[accident['Road_Type'] == r]['Severity_Score'].dropna() 
              for r in accident['Road_Type'].unique()]
f_oneway(*anova_road)


F_onewayResult(statistic=np.float64(372.51892751473656), pvalue=np.float64(0.0))

<h2> Insight 19: The ANOVA test shows a highly significant effect of road type on accident severity (p-value = 0.0), indicating that accident severity varies notably across different road types. </h2>

<h1>Q20. Is there a relationship between vehicle type and the average number of vehicles involved in an accident?</h1>

In [34]:
vehicle_avg = accident.groupby('Vehicle_Type')['Number_of_Vehicles'].mean().sort_values()
print(vehicle_avg)


Vehicle_Type
Ridden horse                             1.500000
Data missing or out of range             1.666667
Minibus (8 - 16 passenger seats)         1.804150
Agricultural vehicle                     1.805855
Motorcycle 125cc and under               1.815247
Van / Goods 3.5 tonnes mgw or under      1.824180
Motorcycle over 125cc and up to 500cc    1.825888
Other vehicle                            1.825971
Goods 7.5 tonnes mgw and over            1.830935
Bus or coach (17 or more pass seats)     1.831015
Car                                      1.832134
Goods over 3.5t. and under 7.5t          1.833497
Motorcycle over 500cc                    1.834743
Taxi/Private hire car                    1.836468
Pedal cycle                              1.837563
Motorcycle 50cc and under                1.839537
Name: Number_of_Vehicles, dtype: float64


  vehicle_avg = accident.groupby('Vehicle_Type')['Number_of_Vehicles'].mean().sort_values()


<h2>Insight #20: The average number of vehicles involved in accidents varies slightly by vehicle type, with ridden horses and agricultural vehicles involved in fewer multi-vehicle accidents, while most motorized vehicle types typically involve around 1.8 vehicles per accident, indicating limited variation across categories.</h2>