### Rubric

# COGS 108 - Project Proposal

## Authors

> Analysis, Background research, Conceptualization, Data curation, Experimental investigation, Methodology, Project administration, Software, Visualization, Writing – original draft, Writing – review & editing

- Prerana Gowda: Background Research, Writing
- Zhamilya Shakirova:  Data curation, Analysis
- Arya Mohammadi: Methodology, Project administration
- Chia Lee: Ethics, Experimental investigation
- Jia Aneja: Conceptualization, Software

## Research Question

How do weather conditions affect the rate of traffic violations involving autonomous vehicles compared to human-driven vehicles in California?

Specifically, we examine whether autonomous vehicles operating in California have fewer or more traffic violations than human-driven vehicles under different weather conditions such as clear weather, rain, fog, and snow. The independent variable is weather condition (clear, rain, fog, snow), the dependent variable is the rate of traffic violations (including speeding, running red lights, and improper lane changes), and the primary comparison variable is vehicle type (autonomous vs. human-driven).

Using California-specific traffic and weather datasets, this study applies statistical inference methods to compare violation rates across weather conditions and vehicle types. By focusing on California, where autonomous vehicle testing and deployment are heavily regulated and documented, we aim to determine whether autonomous vehicles demonstrate greater safety than human drivers in adverse weather conditions or whether their performance declines under certain environmental factors.



## Background and Prior Work

Autonomous vehicles (AVs) are increasingly deployed on public roads, raising important questions about their safety and compliance with traffic laws relative to human-driven vehicles. While much public discussion has focused on whether AVs reduce crash risk, less attention has been paid to how AVs behave with respect to traffic violations, such as speeding, red-light running, or improper lane changes. Weather conditions present a particularly relevant context for this comparison. Rain, fog, and snow degrade visibility and road friction, increasing driving difficulty for humans and simultaneously stressing the perception and control systems used by autonomous vehicles. Understanding how traffic violation rates vary by weather and vehicle type is therefore critical for evaluating whether autonomous vehicles maintain safer or more lawful behavior under adverse conditions.

**CRASH FOCUSED RESEARCH:**

Most existing research comparing autonomous and human-driven vehicles focuses on crash outcomes rather than traffic violations. A recent large-scale matched case–control study published in Nature Communications compared crash involvement between autonomous and human-driven vehicles while controlling for exposure factors such as location and time of day. This study found that AVs exhibited lower crash rates in certain contexts but emphasized that conclusions depend strongly on how exposure and confounding variables are handled.<sup><a href="#fn1" id="ref1">1</a></sup>

***Limitation:*** However, crash-based analyses face several limitations: crashes are relatively rare events, often underreported or inconsistently categorized, and may not capture everyday rule compliance or near-miss behavior. As noted by the National Highway Traffic Safety Administration (NHTSA), crash datasets are structured around reportable incidents and do not systematically capture non-crash traffic violations or routine enforcement actions.<sup><a href="#fn7" id="ref7">7</a></sup>

Similarly, Kusano et al. analyzed Waymo’s rider-only crash data and compared it to human-driver benchmarks, concluding that AV crash rates were comparable to or lower than human baselines under many conditions, though the authors noted limitations related to operating domains and reporting standards.<sup><a href="#fn2" id="ref2">2</a></sup> These studies establish methodological precedents, such as rate normalization and contextual matching, that are directly relevant for analyzing traffic violation rates, even though violations themselves were not the primary outcome. Because violations occur far more frequently than crashes, analyzing traffic citations may provide a higher-resolution measure of everyday driving behavior and regulatory compliance, particularly in varied environmental conditions.

**WEATHER AND SENSOR RESEARCH:**

A parallel body of work examines how adverse weather conditions affect autonomous vehicle perception and performance, providing insight into mechanisms that may influence traffic violations. Empirical studies of on-road autonomous systems have demonstrated that LiDAR and camera-based detection performance degrades measurably under rain, fog, and snow, leading to reduced object detection range and increased uncertainty.<sup><a href="#fn3" id="ref3">3</a></sup> Comprehensive review articles further document how adverse weather challenges sensor fusion, lane detection, and object tracking, and describe mitigation strategies such as radar integration and weather-robust perception models.<sup><a href="#fn4" id="ref4">4</a></sup> These findings suggest that even if AVs are conservative by design, degraded perception in poor weather may lead to behaviors that increase the likelihood of certain violations (e.g., improper lane positioning or failure to proceed through intersections efficiently).

***Limitation:*** However, these studies are largely conducted in controlled experimental settings or simulation environments. They typically measure detection range, object classification accuracy, or system disengagement frequency rather than legally defined traffic violations. As the California Department of Motor Vehicles (DMV) notes in its autonomous vehicle disengagement reporting program, disengagements reflect system limitations but do not necessarily indicate illegal driving behavior.<sup><a href="#fn8" id="ref8">8</a></sup> Thus, while perception degradation is documented, its real-world impact on enforceable traffic violations remains understudied.

**ENFORCEMENT AND REPORTING:**

Evidence that autonomous vehicles accumulate real-world traffic citations further motivates direct study of violations rather than crashes alone. In 2024, Waymo vehicles operating in San Francisco received hundreds of traffic tickets, primarily related to parking and obstruction, demonstrating that AVs are subject to, and do incur, formal violations under current enforcement regimes.<sup><a href="#fn5" id="ref5">5</a></sup>

***Limitation:*** While many of these citations are not moving violations, they highlight that AV behavior can conflict with traffic regulations in practice. Reporting has also noted ambiguity in how law enforcement handles moving violations committed by driverless vehicles, with California introducing new legislation (effective in 2026) to clarify enforcement procedures for autonomous systems.<sup><a href="#fn6" id="ref6">6</a></sup> This regulatory context suggests that violation data may be incomplete or unevenly recorded across vehicle types, an important consideration for statistical inference.
This regulatory transition creates a methodological challenge - comparisons between AVs and human drivers must account for potential enforcement bias, differences in citation practices, and uneven exposure across geographic regions.

**RESEARCH GAP:**

Finally, existing data infrastructures shape what types of comparative analyses are currently feasible. The National Highway Traffic Safety Administration’s Standing General Order (SGO) requires standardized reporting of crashes involving automated driving systems, providing a structured national dataset for AV safety analysis but excluding non-crash violations.<sup><a href="#fn7" id="ref7">7</a></sup> At the state level, California DMV disengagement reports offer detailed information about AV testing and deployment conditions, which researchers often use to study system limitations and environmental challenges, though these reports do not directly capture violations.<sup><a href="#fn8" id="ref8">8</a></sup> Together, prior work reveals a gap in the literature: while crashes and disengagements have been studied extensively, comparative traffic violation rates across weather conditions and vehicle types remain largely unexplored, motivating the present project.

**This project fills a distinct gap by:**

*   Shifting the outcome variable from crashes to traffic violations (a
higher-frequency behavioral metric).
*   Incorporating weather as a central explanatory variable rather than a background control.
* Using California-specific enforcement data to leverage the state’s unique concentration of AV deployment.

By focusing on violation rates rather than crashes alone, this study provides a complementary perspective on autonomous vehicle safety - one centered on lawful behavior and regulatory compliance in everyday driving contexts, especially under adverse weather conditions.


1. <a name="fn1"></a>Abdel-Aty, M. et al. (2024). A matched case-control analysis of autonomous vs. human-driven vehicle accidents. Nature Communications. https://www.nature.com/articles/s41467-024-48526-4 <a href="#ref1">^</a>
2. <a name="fn2"></a>Kusano, K. D. et al. (2024). Comparison of Waymo rider-only crash data to human benchmarks. Traffic Injury Prevention. https://www.tandfonline.com/doi/full/10.1080/15389588.2024.2380786 <a href="#ref2">^</a>
3. <a name="fn3"></a>Kim, J. et al. (2023). Empirical analysis of an autonomous vehicle’s LiDAR detection performance under adverse weather. https://pmc.ncbi.nlm.nih.gov/articles/PMC10051412/ <a href="#ref3">^</a>
4. <a name="fn4"></a>Xu, C. et al. (2024). A comprehensive review of autonomous driving algorithms. Algorithms (MDPI). https://www.mdpi.com/1999-4893/17/11/526 <a href="#ref4">^</a>
5. <a name="fn5"></a>Washington Post (2025). Waymo robotaxis received hundreds of tickets in San Francisco. https://www.washingtonpost.com/technology/2025/03/13/waymo-robotaxis-parking-tickets/ <a href="#ref5">^</a>
6. <a name="fn6"></a>Associated Press (2025). Police pull over driverless Waymo as California updates AV enforcement laws. https://apnews.com/article/0a0dffb19bf38c5ee85681a6f83591ff <a href="#ref6">^</a>
7. <a name="fn7"></a>National Highway Traffic Safety Administration. Standing General Order on Crash Reporting for Automated Driving Systems. https://www.nhtsa.gov/laws-regulations/standing-general-order-crash-reporting <a href="#ref7">^</a>
8. <a name="fn8"></a>California Department of Motor Vehicles. Autonomous Vehicle Disengagement Reports. https://www.dmv.ca.gov/portal/vehicle-industry-services/autonomous-vehicles/disengagement-reports/ <a href="#ref8">^</a>


## Hypothesis


We hypothesize that autonomous vehicles will have lower traffic violation rates than human-driven vehicles under clear and rainy conditions, but that the gap will narrow or reverse in more challenging conditions such as fog and snow. This expectation is based on the idea that autonomous systems excel at rule-based driving and sensor fusion in normal or moderately adverse conditions, but their performance may degrade in severe weather due to sensor limitations (e.g., reduced visibility for cameras and lidar). In contrast, human drivers may adapt more flexibly in extreme conditions despite generally higher baseline violation rates.

## Data

## Ideal Dataset

The ideal dataset for this project would consist of detailed traffic violation and crash records that include both autonomous and human-driven vehicles across a variety of weather conditions. Each row in the dataset would represent a single traffic incident or violation, while columns would describe the characteristics of that incident.

Key variables would include:
- Vehicle type (autonomous vs. human-driven)
- Level of automation (e.g., Level 2 ADAS, Level 4/5 ADS, if available)
- Weather conditions at the time of the incident (clear, rain, fog, snow)
- Type of traffic violation (e.g., speeding, red-light running, improper lane change)
- Date and time of the incident
- Location information (e.g., ZIP code or city)
- Road conditions (wet, icy, dry, if available)

To achieve sufficient statistical power, we would ideally want thousands of observations, particularly because adverse weather conditions such as fog and snow are less frequent. A large sample size would allow us to meaningfully compare violation rates across both vehicle types and multiple weather categories.

These data would ideally be collected through official government reporting systems, such as state DMV crash reports or federal transportation safety databases, to ensure consistency, reliability, and standardized reporting practices. The data would be stored in a tabular format (e.g., CSV files), where each row corresponds to one incident and each column represents a variable. This structure would allow for efficient data cleaning, merging with weather datasets if necessary, and statistical analysis.

---

## Potential Real Datasets

### California DMV Autonomous Vehicle Collision Reports  
**Source:** https://www.dmv.ca.gov/portal/vehicle-industry-services/autonomous-vehicles/autonomous-vehicle-collision-reports/

The California DMV provides publicly available collision reports involving autonomous vehicles from 2019 through 2026. These reports are accessible online, while older reports can be obtained by request. Important variables include vehicle manufacturer, whether the vehicle was operating in autonomous mode at the time of the incident, date and time of the collision, location, and weather conditions. This dataset is especially useful for analyzing autonomous vehicle performance under different environmental conditions.

### California Crash Reporting System (CCRS)  
**Source:** https://test.lab.data.ca.gov/dataset?name=ccrs

The CCRS dataset contains statewide traffic crash data for California and includes incidents involving primarily human-driven vehicles. This dataset would serve as a comparison or control group when analyzing whether autonomous vehicles exhibit different violation or crash patterns in adverse weather. Relevant variables include crash type, road and weather conditions, time and location of the crash, and vehicle information. The data is publicly available and can be downloaded directly for analysis.

### NHTSA Standing General Order (SGO) Crash Reporting  
**Source:** https://www.nhtsa.gov/laws-regulations/standing-general-order-crash-reporting#83381

The National Highway Traffic Safety Administration provides crash reports involving vehicles equipped with automated driving systems (ADS). These data distinguish between different levels of automation and include detailed descriptions of crash circumstances. Key variables include automation level, environmental conditions, vehicle behavior prior to the crash, and location. This dataset would allow for more nuanced analysis of autonomous vehicle performance across varying weather conditions and system capabilities.


## Ethics

Instructions: Keep the contents of this cell. For each item on the checklist
-  put an X there if you've considered the item
-  IF THE ITEM IS RELEVANT place a short paragraph after the checklist item discussing the issue.
  
Items on this checklist are meant to provoke discussion among good-faith actors who take their ethical responsibilities seriously. Your teams will document these discussions and decisions for posterity using this section.  You don't have to solve these problems, you just have to acknowledge any potential harm no matter how unlikely.

Here is a [list of real world examples](https://deon.drivendata.org/examples/) for each item in the checklist that can refer to.

[![Deon badge](https://img.shields.io/badge/ethics%20checklist-deon-brightgreen.svg?style=popout-square)](http://deon.drivendata.org/)

### A. Data Collection
 - [ ] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?
 - [ ] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?
 - [X] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?
> We will strictly avoid collecting any information about individuals in the vehicle at the time since our analysis is about the failure of autonomous vehicles, which should not be attributed to any drivers or passengers. By not including such data in our models, we eliminate the risk of personal data leaks.

 - [ ] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?

### B. Data Storage
 - [ ] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?
 - [ ] **B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed?
 - [ ] **B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed?

### C. Analysis
 - [ ] **C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)?
 - [X] **C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)?
> We can conduct an analysis of zip codes to determine if demographic or economic factors end up correlating with autonomous vehicle performance. Due to variations in road infrastructure such as quality of asphalt and clear lane markings, we should make sure not to attribute the failure of autonomous vehicles to weather alone.

 - [X] **C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data?
> We will have normalization of data to prevent misinforming visualizations, such as looking at the driving trips “per-mile”, making sure that a high number of failures in clear driving is not due to more clear weather driving. Furthermore, we will incorporate correct color palettes that will avoid affecting one’s perception of risk, as well as correct and proper labeling of data.

 - [X] **C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis?
> Any PII will be removed as they are not necessary for determining the performance of autonomous vehicles in various weather conditions.

 - [X] **C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future?
> Yes, we will provide a comprehensive walkthrough of our analysis with code and detailed description of analysis and documentation to show exactly how we got our results.


### D. Modeling
 - [ ] **D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory?
 - [X] **D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)?
> We want to ensure that the model is strictly looking at atmospheric conditions rather than location-based proxies that correlate with marginalized communities

 - [X] **D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics?
> We should verify if possible that certain incident reports in low contrast conditions such as story environments don’t vary significantly based on detecting pedestrians with certain physical characteristics, such as skin tone, which can be harder for certain vision systems to pick up.

 - [ ] **D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed?
 - [ ] **D.5 Communicate limitations**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?

### E. Deployment
 - [ ] **E.1 Monitoring and evaluation**: Do we have a clear plan to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)?
 - [ ] **E.2 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)?
 - [ ] **E.3 Roll back**: Is there a way to turn off or roll back the model in production if necessary?
 - [ ] **E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed?


## Team Expectations

* Communication: We will use iMessage as our primary form of communication. Team members should respond within 24 hours.
* Meetings: We will meet twice a week on weekdays. Exact times will be determined based on everyone's availability.
* Task Tracking: We will use a shared Google Doc to assign tasks and track responsibilities. GitHub Issues will be used for code-related work.
* Deadlines: All tasks should be completed one to two days before the official deadline to allow time for group review and edits.
* Tone & Respect: All communication should be respectful and professional. Feedback should be constructive and focused on the work, not the individual.
* Equal Contribution: All members are expected to contribute equally across all aspects of the project, including research, coding, writing, and editing. Tasks will be divided fairly so that no single person carries the majority of the workload.
* If You're Struggling: If a team member is struggling or falling behind, they should notify the group as soon as possible so we can redistribute tasks or offer support.
* Decision-Making: Decisions will be made by majority vote. If a team member does not respond within 24 hours and a decision needs to be made, the remaining members will proceed.
* Accountability: If a team member is not meeting expectations, we will address it directly with them first through written communication. If there is no improvement within one week, we will escalate the issue to the professor.
* Conflict Resolution: If disagreements arise, we will discuss them as a group and work toward a solution that everyone can agree on. All discussions will remain respectful and focused on the success of the project.


## Project Timeline Proposal

| Meeting Date | Meeting Time      | Completed Before Meeting                         | Discuss at Meeting                                                                 |
|-------------|-------------------|--------------------------------------------------|------------------------------------------------------------------------------------|
| 2/4         | TBD               | Finalize and submit Project Proposal             | Review proposal; start searching for datasets on autonomous vehicles, weather, and traffic violations |
| 2/9         | TBD               | Find and review potential datasets               | Discuss which datasets to use; assign data cleaning and wrangling tasks to team members |
| 2/16        | TBD               | Import and clean data                            | Review data wrangling progress; prepare for Checkpoint #1 submission               |
| 2/18        | TBD               | Submit Checkpoint #1: Data                       | Assign EDA tasks; discuss initial analysis plan                                     |
| 2/25        | TBD               | Begin EDA; create visualizations                 | Review EDA progress; refine analysis approach                                       |
| 3/4         | TBD               | Finalize EDA; submit Checkpoint #2: EDA           | Assign final report sections; begin analysis                                        |
| 3/11        | TBD               | Complete analysis; draft results, conclusion, and discussion | Review and edit full project; begin working on final video                          |
| 3/16        | TBD               | Finalize report and video                        | Submit Final Project, Final Video, and Team Evaluation Survey                       |
| 3/18        | Before 11:59 PM   | N/A                                              | Submit Final Project, Final Video, and Team Evaluation Survey                       |
