# COGS 188 - Project Proposal

# Names

- Katie Chung
- Jiawei Gao
- Grace Ortiz
- Hsiang-An Pao

# Abstract

Autonomous driving has the potential to improve road safety, enhance transportation efficiency, and reduce human driving errors. However, realiability of decision-making in dynamic environments remains a major challenge. This study aims to evaluate AI models for autonomous vehicle control using Webots, a high-fidelity simulation platform. The selected models will be tested under diverse driving scenarios to determine the most effective strategies for safe and efficient navigation. Each model will process sensor data, including RGB images, LiDAR point clouds, GPS coordinates, proximity sensors, and motion data, which will be preprocessed and stored as NumPy arrays to ensure consistency and stability in training. The models will be assessed based on key performance metrics. Generally, it includes collision avoidance, lane adherence, traffic rule compliance, and travel efficiency. Success will be measured by model’s ability to navigate under complex scenarios with minimized infractions and optimal route completion time. By systematically testing AI-driven decision-making strategies, this project contributes to the development of robustness, interpretability, and high-performance of autonomous vehicle systems.

# Background
Autonomous vehicles have the potential to revolutionize transportation by reducing human driving errors and improving traffic efficiency. According to the National Highway Traffic Safety Administration (NHTSA), there are about 94% of road accidents are caused by human errors, which indicates the potential safety benefits of self-driving technology. However, despite the advancements in AI algorithms, how to ensure safety and reliability of self-driving technology, and also make perfect decisions remains a significant challenge, especially in complex, multi-agent environments where vehicles interact with a mass of unpredictable human drivers, cyclists, and pedestrians<a name="shalev"></a><sup>[1]</sup>. Algorithms must dynamically adjust to current environments such as uncontrolled intersections, lane merging, or sudden braking from other vehicles. Therefore, a robust decision-making system is crucial for the success.

One promising approach to enhancing the realiability of decision-making is through deep reinforcement learning (RL). Unlike traditional rule-based systems, which rely on handcrafted heuristics, RL allows vehicles to learn from experience and optimize their actions over time. El Sallab et al. propose an RL-based framework that integrates recurrent neural networks (RNNs) to improve decision-making in partially observable environments<a name="sallab"></a><sup>[1]</sup>. This framework helps autonomous vehicles remember past information, which is crucial in some complicated scenarios like hidden pedestrians suddenly emerging from behind parked cars or anticipating the acceleration patterns of nearby vehicles. Furthermore, RL enables autonomous vehicles to optimize their driving policies based on multiple objectives, for example, minimizing energy consumption while maintaining safety and efficiency.

In addition, autonomous vehicles are supposed to be designed to navigate real-world traffic interactions where human's unpredictability plays a major role. Shalev-Shwartz et al. emphasize that multi-agent RL techniques are essential to equip autonomous vehicles with the ability to interact with human drivers dynamically<a name="shalev"></a><sup>[2]</sup>. For instance, when merging onto a highway, a self-driving car must balance confidence and caution. It faces to decide whether to yield or proceed based on the behavior of nearby vehicles. Similarly, in urban settings, an self-driving car must determine when it is appropriate to make an left turn based on the flow of oncoming traffic. Traditional models struggle with these dynamic interactions, which makes RL-based methods a promising direction for improving human-like driving behavior.

While RL has shown potential, one of its key limitations is the lack of interpretability in deep learning models. Unlike human drivers, we can explain their decisions based on rules and prior experiences. RL-based sel-driving cars operate like black-box systems, it is difficult to justify their actions in some critical situations. Kim et al. address this issue by developing a method for generating textual explanations for self-driving decisions. They use the attention-based deep learning models to highlight relevant visual cues that influenced the vehicle’s choices<a name="kim"></a><sup>[3]</sup>. For example, an autonomous vehicle might explain its decision to slow down by stating: "The vehicle is slowing down because a pedestrian is approaching the crosswalk", which makes AI-driven decisions more transparent and trustworthy to human passengers.

Beyond technical challenges, public perception is also a crucial barrier to the widespread adoption of autonomous vehicles. Schoettle and Sivak conducted a cross-national survey to examine public attitudes to self-driving technology in the U.S., U.K., and Australia<a name="schoettle"></a><sup>[4]</sup>. Their findings indicate that even though many individuals support advancements of self-driving cars, a significant portion of respondents express concerns about safety, cybersecurity, and reliability. Notably, people are more comfortable with partial automation (e.g., lane-keeping assistance) but still remain skeptical of fully autonomous systems, which remove human control entirely. This skepticism is mainly caused by some incidents of high social interest, such as Tesla’s Autopilot-related crashes, which raise many questions about the readiness of this technology under real-world scenario.

Another critical issue is the transition period, where human-driven cars and autonomous vehicles must share the road. Sivak and Schoettle warn that during this special phase, safety may initially decline because human drivers might struggle to anticipate self-driving cars' behaviors, and autonomous vehicles might fail to predict erratic human driving patterns as well<a name="sivak"></a><sup>[5]</sup>. For example, human drivers often rely on eye contact or subtle hand gestures to negotiate with others about the right of the way.  Self-driving cars currently struggle to interpret it. Additionally, autonomous vehicles may obey too strictly to traffic rules. It potentially causes disruptions in environments where human drivers often follow by some informal rules, such as stops at empty intersections or road sides. These challenges highlight the need for gradual integration strategies to combine self-driving cars with V2X (vehicle-to-everything) communication systems to facilitate smoother coordination between human and autonomous vehicles.

Our study builds on prior work by testing AI models under dynamic driving conditions through using Webots, a simulation platform for autonomous vehicle research. We aim to identify the most effective strategies for ensuring safty and efficiecy of autonomous navigation. By integrating reinforcement learning techniques with explainable models, this project contributes to the development of autonomous vehicles that can solve to real-world driving challenges and also maintain interpretability and public trust.

# Problem Statement

Autonomous driving systems must make real-time decisions in dynamic environments while balancing safety, efficiency, and adherence to traffic rules. Traditional rule-based approaches struggle to generalize across diverse driving conditions, thus reinforcement learning (RL) is a promising alternative for self-driving applications. However, training RL models to drive safely and effectively remains a significant challenge due to the need for reliable evaluation metrics and the complexity of real-world driving scenarios.

In this project, we aim to develop and compare reinforcement learning models for self-driving car simulations using CARLA. Our goal is to identify the model that achieves the highest overall performance across multiple key metrics, including:

- Safety: Minimizing collisions with obstacles, pedestrians, and other vehicles
- Lane Adherence: Ensuring the vehicle stays within lane boundaries and follows lane discipline
- Traffic Rule Compliance: Obeying traffic lights, stop signs, and yielding rules
- Efficiency: Reaching the intended destination within a reasonable time frame while maintaining safe driving behavior

The vehicle's actions can be evaluated using numerical metrics such as collision count, lane deviation, and time to destination. Additionally, each episode in the CARLA simulation can be analyzed for performance using well-defined criteria. This is also replicable, as the experiment can be conducted multiple times with different RL models and configurations to assess their effectiveness under various driving conditions.

Through this project, we seek to determine which RL algorithm and model architecture yield the best trade-off between safety, rule adherence, and efficiency, contributing to the broader field of autonomous vehicle research.

# Data

As this is a reinforcement learning project, the agent will generate its own data through its interaction with the Webots environment and will not use a pre-existing static dataset. Each observation from the agent will consist of various sensor readings <a name="webots_sensors"></a>[<sup>[6]</sup>](#webots_sensors_note):
- **RGB image frames**: 1D byte array 
- **Point cloud distance data (LiDAR)**: 1D float array
- **Proximity sensor data**: Float
- **GPS coordinates**: 3D float array
- **Acceleration**: 3D float array
- **Angular velocity**: 3D float array
- **Cardinal direction**: 3D float array
- **Wheel rotation**: Float (in radians) 
- Control commands <a name="webots_carlib"></a>[<sup>[7]</sup>](#webots_carlib_note)
    - **Steering**: Float (in radians)
    - **Throttle**: Float ([0, max_speed])
    - **Braking**: Float ([0, 1])
- **Time step**: Float

To ensure the vehicle's safety, lane adherance, and traffic rule compliance the most critical variables are:
- RGB image frames for detecting pedestrains, other vehicles, traffic signs/signals, and lanes
- Point cloud distance data for determining following distance and preventing collisions
- Proximity sensor data for accurate close range obstacle and collision detection

To ensure the vehicle's efficiency the most critical variables are time step and GPS coordinates to minimize drive time and verify the correct final destination. 

Webots by default runs at 32ms per time step, meaning approximately 31 observations will be recorded per 1 second of simulation time. Observations will be stored as NumPy arrays for optimal reinforcement learning training. To ensure consistency and prevent feature bias, the sensor data will be preprocessed. All sensor readings will be normalized to a common scale to improve stability and convergence speed. In addition, RGB images that are returned as 1D arrays will be reshaped into 3D arrays (height x width x channels) and pixel values will be normalized. GPS coordinates will be converted from absolute to relative positioning to simplify state representation. Lastly, null values returned by LiDAR sensors will be replaced with the maximum range of the sensor. 

# Proposed Solution

The proposed solution combines **Reinforcement Learning (RL)** and **Convolutional Neural Networks (CNNs)** to develop a self-driving car system in a simulated environment created using **Webots**. The system learns to navigate autonomously by processing visual inputs from a front-facing camera and optimizing driving policies through trial and error.

### Model
- **Webots** provides a realistic simulation environment with customizable tracks and driving scenarios. The car is equipped with a front-facing camera to capture visual input, simulating real-world driving conditions.
- A **CNN** processes raw image data to extract meaningful features (e.g., lane markings, obstacles, traffic signs).
- Implemented using **PyTorch**, the CNN serves as the perception module, transforming visual inputs into a state space for the RL agent<a name="self_driving_sim"></a>[<sup>[8]</sup>](#self_driving_sim_note).
- The RL agent may use **Proximal Policy Optimization (PPO)** to learn an optimal driving policy.
- The state space consists of CNN-extracted features, and the action space includes controls like steering, throttle, and brake.
- Includes reward fuctions that gives positive rewards for staying within lanes and maintaining safe speeds, negative rewards for collisions, going off-road, or violating traffic rules.

### Training Pipeline
1. The car collects image data from the Webots environment.
2. The CNN processes the images and extracts features.
3. The RL agent selects actions based on the features and receives rewards.
4. The agent updates its policy iteratively using collected experiences.

### Testing and Evaluation
- The trained model is tested on unseen tracks or scenarios to evaluate generalization.
- Evalutaion metrics are listed below 
- A **rule-based controller** serves as the **benchmark**. It follows predefined rules (e.g., stay in the center of the lane, stop at obstacles) without learning capabilities.

---

### Why This Solution Works

- CNNs excel at processing image data and have been successfully used in autonomous driving tasks like lane detection and object recognition. By extracting meaningful features from raw images, the CNN enables the RL agent to interpret complex visual inputs effectively.
- RL allows the agent to learn optimal policies through trial and error, making it well-suited for dynamic driving scenarios. The reward-based learning process ensures the agent improves over time by maximizing safe and efficient driving behaviors.
- Webots provides a realistic and customizable environment for training and testing, enabling the simulation of diverse driving scenarios.

---


# Evaluation Metrics

In this project, we evaluate the performance of both the **benchmark model** and the **solution model** using key safety and efficiency metrics. These metrics ensure that the self-driving agent follows safe driving behavior while effectively navigating to its destination.

#### **1. Collision Rate (Safety)**
**Definition:** Measures the frequency of collisions per episode.  
- Lower values indicate better performance in avoiding obstacles and other vehicles.  
- Derived from the number of collisions detected during the simulation.  

**Mathematical Representation:**  
$$
\text{Collision Rate} = \frac{\text{Total Collisions}}{\text{Total Episodes}}
$$
Where:  
- **Total Collisions** is the number of times the agent collides with an object
- **Total Episodes** is the number of completed simulation runs

**Example Interpretation:**  
- If the agent crashes 10 times in 50 episodes, the collision rate is 0.2 (or 20%)  
- A safer model should minimize this rate

#### **2. Lane Adherence (Safety & Rule Compliance)**
**Definition:** Measures how well the vehicle stays within lane boundaries  
- Calculated as the deviation from the center of the assigned lane over time

**Mathematical Representation:**  
$$
\text{Lane Deviation} = \frac{1}{T} \sum_{t=1}^{T} |d_t|
$$
Where:  
- \( d_t \) is the lateral distance from the lane center at time \( t \)
- \( T \) is the total number of time steps in an episode

**Example Interpretation:**  
- A **higher deviation** means the vehicle frequently strays out of its lane
- The goal is to **minimize lane deviation** for better lane-keeping performance

#### **3. Traffic Rule Compliance (Safety & Legal Adherence)**
**Definition:** Tracks the number of violations related to red lights, stop signs, and illegal lane changes
- Lower values indicate better adherence to traffic laws

**Mathematical Representation:**  
$$
\text{Violation Rate} = \frac{\text{Total Violations}}{\text{Total Episodes}}
$$

**Example Interpretation:**  
- If a model runs 100 episodes and violates traffic rules 15 times, the violation rate is 0.15 (or 15%)
- A safer model will have a near-zero violation rate

#### **4. Time to Destination (Efficiency)**
**Definition:** Measures the time taken to successfully reach the goal
- A balance is needed: the car should not drive recklessly fast but also should not drive too slowly

**Mathematical Representation:**  
$$
\text{Time Efficiency} = \frac{\text{Total Distance Traveled}}{\text{Total Time Taken}}
$$

**Example Interpretation:**  
- If an agent takes 100 seconds to reach a 500m destination, its speed efficiency score is 5 m/s
- A good model should balance speed while following safety rules


# Ethics & Privacy

Ethical concerns in this project are complex, with safety being a primary consideration. Self-driving vehicles rely on ML models to interpret their surroundings and make real-time decisions, but these models may encounter unpredictable scenarios that lead to accidents. Unlike human drivers, AI systems lack personal accountability, making it difficult to determine who is responsible when failures occur. Questions of liability (manufacturer, software engineers, or the vehicle owner) will become even more complicated if our model contributes to such issues. Another ethical dilemma arises in unavoidable accident scenarios, where the system may need to choose between different harmful outcomes. Should the vehicle prioritize the safety of its passengers over pedestrians or other drivers? This challenge highlights the difficulty engineers face in programming and defining ethical guidelines.

Bias in machine learning models is another important consideration, as training data may not always represent the full diversity of real-world driving conditions. If the dataset lacks a wide range of pedestrian appearances or road environments, the AI may struggle to make fair and accurate decisions. Without comprehensive testing across diverse environments, the system could unintentionally discriminate against certain groups, leading to unsafe or unfair outcomes.

Privacy is also a key concern, as self-driving vehicles collect vast amounts of data, including location, passenger behavior, to even personal/veicle information. We would make sure to include data that has been anonymized so that the data is essential for improving AI performance, but is not at risk of privacy leaks.


# Team Expectations 


* Meet once a week via Zoom, more as needed closer to the end of the quarter
* Respond in group chat within 12 hours
* Conflict resolved by majority, any conflicts should be brought up within group before seeking TA assistance 
* Work should be divided evenly to the best of the group's ability
* Be aware of deadlines, each member's portion of work should be completed at least a couple hours prior to deadline to allow for revision 


# Project Timeline Proposal


| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 2/12  |  5 PM |  Brainstorm topics/questions and split parts for research (all)  | Determine best form of communication; Discuss and decide on final project topic; discuss hypothesis; begin background research | 
| 2/14  |  12 PM |  Finalize Project Proposal (all) | Complete background, Discuss datasets and metrics, finalize questions, Turn in proposal | 
| 2/21  | 5 PM  | Import and Wrangle Data (Hsiang-An), EDA (Grace)  | Discuss and finalize datasets and metrics for EDA, Review if work is divided in meaningful manner   |
| 2/28  |  5 PM  | Finalize data, Continue on EDA (Grace), Programming start for RL (Katie) | Review if EDA and data wrangling is completed, Discuss possible algorithm, validations, model selection |
| 3/5  | 6 PM  | Finalize EDA, continue programming (Katie), Start Analysis (Jiawei, Hsiang-An) | Review project code, Analyze algorithms and model performance, Split work based on what’s lacking |
| 3/12  | 12 PM  | Complete Analysis ; Start results/conclusion/discussion (Grace, Katie)| Discuss and complete project, Plan for extra meeting if needed |
| 3/16  | 5 PM  | Complete and Edit project (all)| Discuss and review report |
| 3/19  | Before 11:59 PM  | Finalize project (all) | Turn in Final Project  |

# Footnotes
<a name="shalevnote"></a>1.^: Shalev-Shwartz, S., Shammah, S., & Shashua, A. (2016). Safe, multi-agent reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295. https://arxiv.org/abs/1610.03295<br>
<a name="sallabnote"></a>2.^: El Sallab, A., Abdou, M., Perot, E., & Yogamani, S. (2017). Deep reinforcement learning framework for autonomous driving. arXiv preprint arXiv:1704.02532. https://arxiv.org/abs/1704.02532<br>
<a name="kimnote"></a>3.^: Kim, J., Rohrbach, A., Darrell, T., Canny, J., & Akata, Z. (2018). Textual explanations for self-driving vehicles. European Conference on Computer Vision (ECCV). https://openaccess.thecvf.com/content_ECCV_2018/html/Jinkyu_Kim_Textual_Explanations_for_ECCV_2018_paper.html<br>
<a name="schoettlenote"></a>4.^: Schoettle, B., & Sivak, M. (2014). A survey of public opinion about autonomous and self-driving vehicles in the U.S., the U.K., and Australia. University of Michigan Transportation Research Institute. https://deepblue.lib.umich.edu/handle/2027.42/108384<br>
<a name="sivaknote"></a>5.^: Sivak, M., & Schoettle, B. (2015). Road safety with self-driving vehicles: General limitations and road sharing with conventional vehicles. University of Michigan Transportation Research Institute. https://deepblue.lib.umich.edu/handle/2027.42/110789<br>
<a name="webots_sensors_note"></a>6.[^](#webots_sensors): Cyberbotics API Reference: doc. https://cyberbotics.com/doc/reference/nodes-and-api-functions?tab-language=python  <br> 
<a name="webots_carlib_note"></a>7.[^](#webots_carlib): Cyberbotics Car & Driver Library Reference: doc. https://cyberbotics.com/doc/automobile/car-and-driver-libraries <br> 
<a name="self_driving_sim_note"></a>8.[^](#self_driving_sim): Aryan Jha, "Creating a Self-Driving Car Simulation," Medium, [https://aryanjha.medium.com/creating-a-self-driving-car-simulation-977bed8f49b4](https://aryanjha.medium.com/creating-a-self-driving-car-simulation-977bed8f49b4).  