University of Virginia
Saarthak Gupta, Agi Luong, & Joshua Seiden
Abstract: In this paper, we model a classification problem on car crash data from the Virginia Department of Transportation (VDOT), which contains over 1 million crash records from 2015 to 2023. With the rising number of tragic losses from car accidents, analysis of this data can help save lives. We predict the severity level of accidents based on environmental, social, and geographic factors and discover major characteristics that contribute to lethal crashes. For this purpose, we explore machine learning algorithms like random forests, logistic regression, and Artificial Neural Networks (ANNs). We describe data encoding and cleaning methods for this mostly categorical dataset and employ oversampling techniques to handle highly imbalanced classes in the data. The goal is to provide actionable insight for drivers and the Virginia Department of Transportation to increase road safety. State-of-the-art methods for time series forecasting are used to identify trends in the number of accidents per day at the state and county levels. Using a Recurrent Neural Network-based approach, the trends in the daily crashes are predicted for up to a year into the future. Sequence generation methods for this approach are also described.
Read the full report here.