# **Using machine learning algorithms to predict the severity of traffic accidents in London**

## Introduction

Road traffic accidents have a significant impact on the mortality rates of both developing and developed countries around the world. The World Health Organization points out that road traffic accidents are the main cause of death among young people (aged 5-29) (World Health Organization, 2023). About 1.19 million people die in road traffic accidents every year, which is equivalent to 3300 people dying every day due to traffic accidents. Moreover, the number of deaths from road traffic accidents is continuously increasing. By 2030, road traffic injuries will become the seventh leading cause of death globally (Ahmed, S. K et al., 2023). Although overall, the number of traffic accident fatalities in the UK has been decreasing since 1979, in recent years, particularly between 2021 and 2022, the reported number of road injuries in the UK has increased (Department for Transport, 2023). Studies have shown that the locations of road accidents are not random, and they are often highly concentrated in urban areas (Curiel, Ram í rez and Bishop, 2018). As the largest city in the UK, London faces busy traffic every day, which is the best research target. Therefore, we will predict the severity of road traffic accidents in London. These studies may help hospitals and traffic police departments optimize the deployment of emergency resources such as medical rescue and police.

## Literature review

At present, there are already some studies on predicting road traffic accidents. Choosing which variables is a prerequisite for predicting the severity of traffic accidents, as these are inputs to the traffic accident severity prediction model. Yu et al. (2021) proposed a deep spatial-temporal graph convolutional network (DSTGCN) aimed at predicting the risk of future traffic accidents on specific road sections. They collect data related to traffic accidents, including weather conditions, traffic flow, road structure, and traffic accident records. And use it as input for the neural network model to explore spatial correlation and temporal dependence. For example, adverse weather conditions such as rain and snow can have a negative impact on the severity of traffic accidents (El Basyoung, 2014). Accidents that occur during the day cause less harm than those that occur at night, as reduced visibility can affect the driver's reaction speed, leading to accidents and even fatalities (Behnood and Al Badairi, 2020). Similarly, Liu et al. (2019) determined that lighting conditions are one of the important factors affecting the severity of nighttime traffic accidents in China. Although these research subjects have provided ideas for this study, drivers, as the main body of traffic accidents, should not be ignored. Studies have shown that age and gender can also affect the severity of accidents (Chen, 2019). Therefore, this experiment will consider both environmental conditions and driver conditions simultaneously.

In terms of research methods, decision tree is a machine learning model based on tree structure for decision-making, which recursively divides data into different categories (Quinlan, 1986). Moral Garc í a et al. (2019) proposed a method called Information Root Node Variation (IRNV) based on decision trees to predict the severity of accidents for novice drivers in urban areas. Kononen et al. (2011) developed a multivariate logistic regression model to predict the probability of a vehicle involved in a collision containing one or more serious or disabled occupants. Random Forest (RF) has also been proven to be useful for predicting the severity of traffic accidents (Mallahi et al., 2022). They compared the prediction accuracy of RF, Support Vector Machine (SVM), and Artificial Neural Network (ANN) and found that RF performs better in classifying and predicting the severity of traffic accidents. These studies indicate the possibility of using machine learning models to predict the severity of accidents based on data related to traffic accidents. But when using machine learning for prediction, multiple different models should be used to determine the most suitable method for this study by comparing their accuracy.

## Research question

## Data Preprocessing

All the data used in this study were sourced from the UK Department of Transport (2023), an official website of a government agency with strong reliability in its data sources. This dataset records the severity, road conditions, driver information, and relevant environmental information of each traffic accident in the UK from 2018 to 2023. In this study, we will only select accidents within the London area for analysis.

In [None]:
## Read in the road safety data:
acc2022 = pd.read_csv("https://raw.githubusercontent.com/luzhao2000/CASA0006/main/accidents_2022.csv")
acc2021 = pd.read_csv("https://raw.githubusercontent.com/luzhao2000/CASA0006/main/accidents_2021.csv", skiprows = 1)

## References

Ahmed, S. K., Mohammed, M. G., Abdulqadir, S. O., El‐Kader, R. G. A., El‐Shall, N. A., Chandran, D., Rehman, M. E. U. and Dhama, K. (2023). *Road traffic accidental injuries and deaths: A neglected global health issue*. Health Science Reports, 6(5), e1240-n/a. https://doi.org/10.1002/hsr2.1240

Behnood, A. and Al-Bdairi, N. S. S. (2020). *Determinant of injury severities in large truck crashes: A weekly instability analysis*. Safety Science, 131, 104911-. https://doi.org/10.1016/j.ssci.2020.104911

Chen, F., Song, M., & Ma, X. (2019). *Investigation on the injury severity of drivers in rear-end collisions between cars using a random parameters bivariate ordered probit model*. International Journal of Environmental Research and Public Health, 16(14), 2632-. https://doi.org/10.3390/ijerph16142632

Curiel, R. P., Ramírez, H. G. and Bishop, S. R. (2018). *A novel rare event approach to measure the randomness and concentration of road accidents*. PloS One, 13(8), e0201890–e0201890. https://doi.org/10.1371/journal.pone.0201890

Department for Transport (2023). *Reported road casualties Great Britain, annual report: 2022*. Available at: https://www.gov.uk/government/statistics/reported-road-casualties-great-britain-annual-report-2022/reported-road-casualties-great-britain-annual-report-2022 (Accessed: 15 March 2024).

Department for Transport (2023). *Road Safety Data*. Available at: https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data (Accessed: 15 March 2024).

El-Basyouny, K., Barua, S., Islam, M. T. and Li, R. (2014). *Assessing the Effect of Weather States on Crash Severity and Type by Use of Full Bayesian Multivariate Safety Models*. Transportation Research Record, 2432(1), 65–73. https://doi.org/10.3141/2432-08

J. R. Quinlan. (1986). *Induction of decision trees*. Machine Learning, 1(1): 81-106.

Liu, J., Li, J., Wang, K., Zhao, J., Cong, H. and He, P. (2019). *Exploring factors affecting the severity of night-time vehicle accidents under low illumination conditions*. Advances in Mechanical Engineering, 11(4), 168781401984094-. https://doi.org/10.1177/1687814019840940

Kononen, D. W., Flannagan, C. A. C. and Wang, S. C. (2011). *Identification and validation of a logistic regression model for predicting serious injuries associated with motor vehicle crashes*. Accident Analysis and Prevention, 43(1), 112–122. https://doi.org/10.1016/j.aap.2010.07.018

Mallahi, I. E., Dlia, A., Riffi, J., Mahraz, M. A. and Tairi, H. (2022). *Prediction of Traffic Accidents using Random Forest Model*. 2022 International Conference on Intelligent Systems and Computer Vision (ISCV), 1–7. https://doi.org/10.1109/ISCV54655.2022.9806099


Moral-García, S., Castellano, J. G., Mantas, C. J., Montella, A. and Abellán, J. (2019). *Decision tree ensemble method for analyzing traffic accidents of novice drivers in urban areas*. Entropy (Basel, Switzerland), 21(4), 360-. https://doi.org/10.3390/e21040360


World Health Organization (2023). *Road traffic injuries*. Available at: https://www.who.int/en/news-room/fact-sheets/detail/road-traffic-injuries (Accessed: 15 March 2024).

Yu, L., Du, B., Hu, X., Sun, L., Han, L. and Lv, W. (2021). *Deep spatio-temporal graph convolutional network for traffic accident prediction*. Neurocomputing (Amsterdam), 423, 135–147. https://doi.org/10.1016/j.neucom.2020.09.043
