# Evaluating RL Algorithms for Autonomous Navigation in Webots

[Github Repository](https://github.com/grace-ortiz/COGS188_Final)

## Group members

- Katie Chung
- Jiawei Gao
- Grace Ortiz
- Hsiang-An Pao

# Abstract
This section should be short and clearly stated. It should be a single paragraph <200 words.  It should summarize:
- what your goal/problem is
- what the data used represents
- the solution/what you did
- major results you came up with (mention how results are measured)

__NB:__ this final project form is much more report-like than the proposal and the checkpoint. Think in terms of writing a paper with bits of code in the middle to make the plots/tables

Autonomous driving has the potential to enhance road safety, reduce humans error, and improve traffic efficiency. However, a reliable decision-making in dynamic environments is still challenging. The objective of this study is to evaluate the performance of multiple reinforcement learning (RL) algorithms, which includes SARSA, Deep Q-Network (DQN), Proximal Policy Optimization (PPO), and Monte Carlo (MC), based on the high-fidelity driving simulator, Webots. These models process sensor data, including RGB images, LiDAR data, GPS coordinates, and speed measurements. Their goal is to navigate safely and optimizing performance at the same time. Our comparison focuses on some key aspects: handling of continuous vs. discrete state spaces, suitability for large vs. small state representations, reward function effectiveness, and hyperparameter sensitivity. Results are evaluated based on collision avoidance rate. Initial findings suggest that although value-based methods like SARSA and DQN perform well in discrete state spaces, PPO's policy optimization is better suited for continuous control, and Monte Carlo faces high variance in returns. This work provides insights into the potential trade-offs between different RL approaches and also their applicability to autonomous driving, which can contribute to safer and more interpretable decision-making models in the future.

# Background

Fill in the background and discuss the kind of prior work that has gone on in this research area here. **Use inline citation** to specify which references support which statements.  You can do that through HTML footnotes (demonstrated here). I used to reccommend Markdown footnotes (google is your friend) because they are simpler but recently I have had some problems with them working for me whereas HTML ones always work so far. So use the method that works for you, but do use inline citations.

Here is an example of inline citation. After government genocide in the 20th century, real birds were replaced with surveillance drones designed to look just like birds<a name="lorenz"></a>[<sup>[1]</sup>](#lorenznote). Use a minimum of 3 to 5 citations, but we prefer more <a name="admonish"></a>[<sup>[2]</sup>](#admonishnote). You need enough citations to fully explain and back up important facts.

Remeber you are trying to explain why someone would want to answer your question or why your hypothesis is in the form that you've stated.

Self-driving cars could change transportation for the better by minimizing human-induced errors in driving, improving safety standards, and optimizing time-efficient driving. However, making reasonable decisions in a fast-moving and ever-changing situation is a huge difficulty. Traditional rule-based AI systems aren't sufficient when it comes to learning different driving conditions. They require advanced AI techniques and more specifically reinforcement learning (RL) algorithms, which enable the vehicle to optimally learn from experience<a name="duguleana"></a><sup>[1]</sup>. This research aims to analyze the effectiveness of multiple RL variants, including SARSA, deep Q networks (DQN), Monte Carlo, and Proximal Policy Optimization (PPO) algorithms for one essential problem in self-driving functionality scenario, obstacle detection and evasion.

One fundamental barrier in AI vehicle operational decision-making is to find the optimal solution in a mixture of continuous and discrete dimensionalities space. These state representations have different efficiency of processing for different RL algorithms. Discrete state space represents for 2D SARSA and Monte Carlo (MC) has better performance, but loses a lot of details. In comparison, continuous action space DQN and PPO have better fine-tuned control on actions but are much more expensive computationally and require sophisticated policy function approximations<a name="pandey"></a><sup>[3]</sup>.

Another critical aspect of autonomous driving is collision avoidance, a fundamental requirement for safe navigation. One such approach is the Collision Cone Method that has been adopted for defining safe movement regions around moving obstacles<a name="chakravarthy"></a><sup>[4]</sup>. Our project uses the current LiDAR data to detect obstacles in real-time, and the various RL models are analyzed in terms of their efficiency in avoiding collisions along with the driving tasks. Compared to traditional controllers, obstacle avoidance features have been improved with neural network-based RL techniques<a name="choinote"></a><sup>[5]</sup>, which deepens the understanding of the need for policy optimization without a harmonizing reinforcer.

In addition, autonomous vehicles have to follow lanes while driving, which requires them to find the most effective routes. It has been shown that some particular attention should be paid to reward shaping in reinforcement learning, especially in cases where optimization of the speed of rule-following should be counterbalanced<a name="duguleana"></a><sup>[2]</sup>. This project seeks to understand how reward structure impacts the results of learning with particular focus on the balance between exploration and exploitation within an RL model. Moreover, the tuning of hyperparameters (γ, α, and ε) is another factor that determines model performance and convergence, which is critical to the reliability of the decisions made.

By systematically testing and comparing SARSA, DQN, MC, and PPO, this study aims to see which RL model provides the most effective trade-off between safety, efficiency, and interpretability. The insights gained can contribute to the ongoing development of reliable and scalable reinforcement learning frameworks for self-driving applications.

# Problem Statement

Clearly describe the problem that you are solving. Avoid ambiguous words. The problem described should be well defined and should have at least one ML-relevant potential solution. Additionally, describe the problem thoroughly such that it is clear that the problem is quantifiable (the problem can be expressed in mathematical or logical terms), measurable (the problem can be measured by some metric and clearly observed), and replicable (the problem can be reproduced and occurs more than once).

# Data

Detail how/where you obtained the data and cleaned it (if necessary)

If the data cleaning process is very long (e.g., elaborate text processing) consider describing it briefly here in text, and moving the actual clearning process to another notebook in your repo (include a link here!).  The idea behind this approach: this is a report, and if you blow up the flow of the report to include a lot of code it makes it hard to read.

Please give the following infomration for each dataset you are using
- link/reference to obtain it - at top of repo
- description of the size of the dataset (# of variables, # of observations)
- what an observation consists of
- what some critical variables are, how they are represented
- any special handling, transformations, cleaning, etc you have done should be demonstrated here!

The agent generated its own data through its intereaction with the Webots environment. Each observation from the agent consisted of various sensor readings:

| Sensor | Data Description | Raw Data Type | 
|---|---|---|
| Camera | BGRA image frame | 1D byte array |
| GPS | XYZ coordinates | 3D float array |
| LiDAR | SICK LMS 291 point cloud distance | 1D float array | 
| Gyro | 3-axis angular velocity| 3D float array |

The raw input data was processed into useable obserations for the state space. The BGRA image frames were processed by created a lane mask using a CNN and using the sliding window algorithm to detect lanes as the car position changes. The middle of the lane was then calculated and deviation from the middle lane was derived from the camera image and lane mask. The GPS x and y coordinates were extracted and passed to the state space, and the speed was extracted from the GPS and converted from m/s to km/h. The minimum value from the LiDAR data was extracted and passed to the state space as the distance to the nearest obstacle. Additionally from the LiDAR, the approximate angle of the nearest obstacle was calculated from the field of view (FOV) and converted from radians to degrees. The gyroscope data wasnot passed to the state space, however the x coordinate was extracted and used to determine if the agent detected a collision. 

In [None]:
# action space: [steering, speed]
self.action_space = spaces.Box(
    low=np.array([MIN_STEER_ANGLE, MIN_SPEED]), 
    high=np.array([MAX_STEER_ANGLE, MAX_SPEED]),
    dtype=np.float32
)

# state space 
self.state_space = spaces.Dict({
    "speed": spaces.Box(low=0, high=MAX_SPEED, shape=(1,), dtype=np.float32),  # gps speed
    "gps": spaces.Box(low=-np.inf, high=np.inf, shape=(2,), dtype=np.float32),  # (x, y) gps coordinates
    "lidar_dist": spaces.Box(low=0, high=100, shape=(1,), dtype=np.float32),  # distance to nearest obstacle
    "lidar_angle": spaces.Box(low=-90, high=90, shape=(1,), dtype=np.float32),  # angle to nearest obstacle
    "lane_deviation": spaces.Box(low=0, high=np.inf, shape=(1,), dtype=np.float32),  # pixels away from lane center
    "lane_mask": spaces.Box(low=0, high=1, shape=(64, 128, 1), dtype=np.uint8)  # binary mask for lane line (yellow line only)
})

The state space had 6 total variables: speed, gps, lidar_dist, lidar_angle, lane_deviation, lane_mask. The number of obervations was dependent on the number of episodes the agent trained and how many steps each epsiode lasted. Webots by default runs at 32ms per time step, meaning approximately 31 observations will be recorded per 1 second of simulation time. 


# Proposed Solution

In this section, clearly describe a solution to the problem. The solution should be applicable to the project domain and appropriate for the dataset(s) or input(s) given. Provide enough detail (e.g., algorithmic description and/or theoretical properties) to convince us that your solution is applicable. Make sure to describe how the solution will be tested.  

If you know details already, describe how (e.g., library used, function calls) you plan to implement the solution in a way that is reproducible.

If it is appropriate to the problem statement, describe a benchmark model<a name="sota"></a>[<sup>[3]</sup>](#sotanote) against which your solution will be compared.

# Evaluation Metrics

Propose at least one evaluation metric that can be used to quantify the performance of both the benchmark model and the solution model. The evaluation metric(s) you propose should be appropriate given the context of the data, the problem statement, and the intended solution. Describe how the evaluation metric(s) are derived and provide an example of their mathematical representations (if applicable). Complex evaluation metrics should be clearly defined and quantifiable (can be expressed in mathematical or logical terms).

# Results

You may have done tons of work on this. Not all of it belongs here.

Reports should have a __narrative__. Once you've looked through all your results over the quarter, decide on one main point and 2-4 secondary points you want us to understand. Include the detailed code and analysis results of those points only; you should spend more time/code/plots on your main point than the others.

If you went down any blind alleys that you later decided to not pursue, please don't abuse the TAs time by throwing in 81 lines of code and 4 plots related to something you actually abandoned.  Consider deleting things that are not important to your narrative.  If its slightly relevant to the narrative or you just want us to know you tried something, you could keep it in by summarizing the result in this report in a sentence or two, moving the actual analysis to another file in your repo, and providing us a link to that file.

### Subsection 1

You will likely have different subsections as you go through your report. For instance you might start with an analysis of the dataset/problem and from there you might be able to draw out the kinds of algorithms that are / aren't appropriate to tackle the solution.  Or something else completely if this isn't the way your project works.

### Subsection 2

Another likely section is if you are doing any feature selection through cross-validation or hand-design/validation of features/transformations of the data

### Subsection 3

Probably you need to describe the base model and demonstrate its performance.  Probably you should include a learning curve to demonstrate how much better the model gets as you increase the number of trials

### Subsection 4

Perhaps some exploration of the model selection (hyper-parameters) or algorithm selection task. Generally reinforement learning tasks may require a huge amount of training, so extensive grid search is unlikely to be possible. However expoloring a few reasonable hyper-parameters may still be possible.  Validation curves, plots showing the variability of perfromance across folds of the cross-validation, etc. If you're doing one, the outcome of the null hypothesis test or parsimony principle check to show how you are selecting the best model.

### Subsection 5

Maybe you do model selection again, but using a different kind of metric than before?  Or you compare a completely different approach/alogirhtm to the problem? Whatever, this stuff is just serving suggestions.



# Discussion

### Interpreting the result

OK, you've given us quite a bit of tech informaiton above, now its time to tell us what to pay attention to in all that.  Think clearly about your results, decide on one main point and 2-4 secondary points you want us to understand. Highlight HOW your results support those points.  You probably want 2-5 sentences per point.


### Limitations

Are there any problems with the work?  For instance would more data change the nature of the problem? Would it be good to explore more hyperparams than you had time for?   


### Future work
Looking at the limitations and/or the toughest parts of the problem and/or the situations where the algorithm(s) did the worst... is there something you'd like to try to make these better.

### Ethics & Privacy

If your project has obvious potential concerns with ethics or data privacy discuss that here.  Almost every ML project put into production can have ethical implications if you use your imagination. Use your imagination.

Even if you can't come up with an obvious ethical concern that should be addressed, you should know that a large number of ML projects that go into producation have unintended consequences and ethical problems once in production. How will your team address these issues?

Consider a tool to help you address the potential issues such as https://deon.drivendata.org

### Conclusion

Reiterate your main point and in just a few sentences tell us how your results support it. Mention how this work would fit in the background/context of other work in this field if you can. Suggest directions for future work if you want to.

# Footnotes
<a name="lorenznote"></a>1.[^](#lorenz): Lorenz, T. (9 Dec 2021) Birds Aren’t Real, or Are They? Inside a Gen Z Conspiracy Theory. *The New York Times*. https://www.nytimes.com/2021/12/09/technology/birds-arent-real-gen-z-misinformation.html<br>
<a name="admonishnote"></a>2.[^](#admonish): Also refs should be important to the background, not some randomly chosen vaguely related stuff. Include a web link if possible in refs as above.<br>
<a name="sotanote"></a>3.[^](#sota): Perhaps the current state of the art solution such as you see on [Papers with code](https://paperswithcode.com/sota). Or maybe not SOTA, but rather a standard textbook/Kaggle solution to this kind of problem


<a name="duguleananote"></a>1.^: Duguleana, M., & Mogan, G. (2016). Neural networks based reinforcement learning for mobile robots obstacle avoidance. Expert Systems With Applications, 62, 104–115. https://www.sciencedirect.com/science/article/pii/S0957417416303001?casa_token=rMm9DFyZgCMAAAAA:EczvI-ohrg_NX6XsU-PF3BgEKKIJDEX26VUfTIkIFhP_RTSpk_pvSx-1HMyGwq--Boh_o9bMWgg<br>

<a name="almazroueinote"></a>2.^: Almazrouei, K., Kamel, I., & Rabie, T. (2023). Dynamic Obstacle Avoidance and Path Planning through Reinforcement Learning. Applied Sciences, 13(8174). https://www.mdpi.com/2076-3417/13/14/8174<br>

<a name="pandeynote"></a>3.^: Pandey, A., Pandey, S., & Parhi, D. R. (2017). Mobile Robot Navigation and Obstacle Avoidance Techniques: A Review. International Robotics & Automation Journal, 2(3), 00023. https://www.researchgate.net/profile/Dr-Anish-Pandey-2/publication/317101750_Mobile_Robot_Navigation_and_Obstacle_Avoidance_Techniques_A_Review/links/59266dad458515e3d45393b3/Mobile-Robot-Navigation-and-Obstacle-Avoidance-Techniques-A-Review.pdf<br>

<a name="chakravarthynote"></a>4.^: Chakravarthy, A., & Ghose, D. (1998). Obstacle avoidance in a dynamic environment: A collision cone approach. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, 28(5), 562–574. https://ieeexplore.ieee.org/abstract/document/709600?casa_token=GcKiffriAj0AAAAA:GgeA985jE4U0L_1R9n8tkh6-RjT_j60BvuiVUeYw-yTfG2uElm3qF85BI8eJULafBhxp977v6Ik<br>

<a name="choinote"></a>5.^: Choi, J., Lee, G., & Lee, C. (2021). Reinforcement learning-based dynamic obstacle avoidance and integration of path planning. Intelligent Service Robotics, 14, 663–677. https://link.springer.com/article/10.1007/s11370-021-00387-2<br>