
# TuIT_DeepRL - DRL Project
![OST](./resources/ost_logo.png)
## MiniF1RL: My pygame 2d racing environment for DRL
Author: Lars Herrmann    
Date: 12.04.2024  
Repository: [Github - miniF1RL](https://github.com/lherrman/miniF1RL)   
Python Version: 3.11

## Introduction
After a lot of experimentaition and consideration of what environment to use. (Playing with e.g. TrackManiaRL, Gymnasium Integrated Environemnts and Unity Environments), i decided to implement my own 2d racing environment. The main reason for this is that i wanted a environment where no pre-trained models are available what motivates me to train my own models. As i started watching formula one this season, i wanted to go with a racing environment. 

## Implementing the environment

### The CarModel
As a starting point, i was able to use a Python class of a simple 2d topdown car model that i implemented for another project about a year ago. The class already had methods to draw a simple 2d box car and it's wheels onto a pygame screen. As it wasn't intended as a racing game, and rather a geometrical model, i had to implement an updated method for updating the cars position and velocity to make it more fun to drive. The original CarModel class can be found in my github repository for the [SlamCar Project](https://github.com/lherrman/slamcar-controller) in the file `base_model.py`.

To make this into a racing environment, there were still some parts missing i startet to implement, starting with a track to drive on.

### The Track
In order to have a track to drive on, i decided to draw a black and white image of a track in photoshop. When initializing the car model, i wrote a python method that uses the 'cv2.findContours' method to find the track boundaries in the image. The track is then represented as two lists of points, one for the inner and one for the outer boundary.

![Track Image](./resources/MakeTrack.png)


Python Code:
```python
    def _get_track_boundaries_from_image(self, image_path) -> dict:
        '''
        Using opencv to read the image and find the contours of the track boundaries.
        The boundaries are represented as list of points. ([x1, y1], [x2, y2], ...)
        '''
        image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
        contours, _ = cv2.findContours(image.astype('uint8'), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
        assert len(contours) == 2, "There should be exactly 2 contours in the image"

        # Transform and downsample the contours
        downsample_rate = 10
        inner_boundary = contours[0][:, 0, :][::downsample_rate]
        outer_boundary = contours[1][:, 0, :][::downsample_rate]

        # # Append last point to close the loop
        inner_boundary = np.append(inner_boundary, [inner_boundary[0]], axis=0)
        outer_boundary = np.append(outer_boundary, [outer_boundary[0]], axis=0)

        # Scale the contours to make suitable for the simulation
        inner_boundary, outer_boundary = inner_boundary / 60, outer_boundary / 60

        return {
            'inner': inner_boundary,
            'outer': outer_boundary
        }   
```

#### Track progress
To approximate the progress the car has made on the track, i implemented a method `_calculate_track_progress` that calculates the distance to the next point on the inner boundary. The first point on the inner boundary is set as the starting point. The progress is then calculated as the index of the nearest point divided by the total number of points on the inner boundary. This could be improved by using the distance from the starting point to the nearest point on the inner boundary as the progress.

### The observation space

I wanted to keep the observation space simple, so i decided to use 'lidar sensors', that give the car only a few distance measurements in front of it. I decided to use 3 sensors, one in the middle and one on each side of the car in a 45 degree angle. The sensors are implemented as a method of the CarModel class that returns the distances to the track boundaries. 

The implementation of the `_segment_intersection` and `_raycast` methods was prompted to ChatGPT, which returned an working implementation after 2-3 iterations. The method uses an all python implementation for which the performance isn't optimal, but i guess it's good enough for my purposes. As the raycast method has to itterate over each segment of the boundaries, a more performant implementation would be beneficial.

For easier validation, and because it looks cool, i decided to draw the lidar sensors as lines on the pygame screen.

![Lidar Sensors](./resources/lidars.png)


### The action space

The action space is also very simple. To make the problem even easier for the RL algorithm, i decided to always have the car accelerate, as slowing down isn't necessary in this environment. The car can steer left or right, or go straight. To make it a little more interesting, i added a third action, boost, that allows the car to go faster on the straights.

### The reward function





# Training

In [1]:
# Helper functions


In [1]:
%load_ext autoreload
%autoreload 2

from datetime import datetime
import threading
import torch
import pygame
from minif1env import MiniF1RLEnv
from stable_baselines3 import A2C, PPO
from stable_baselines3.common.env_util import make_vec_env
from torch.utils.tensorboard import SummaryWriter

def get_writer(texts: dict):
    writer = SummaryWriter()
    for key, value in texts.items():
        writer.add_text(str(key), str(value))
    return writer

# Use a separate thread to render the environment
# Avoids crashing the pygame window during training
def render_thread(env, close_event):
    while True:
        env.render()
        pygame.time.wait(10) 
        # get events for the window to stay responsive
        for event in pygame.event.get():
            if event.type == pygame.QUIT:
                close_event.set()
                return
        if close_event.is_set():
            return
        
def make_env():
    env = MiniF1RLEnv(render_mode='no')
    return env

vec_env = make_vec_env(make_env, n_envs=10)

# Create the environment
env = MiniF1RLEnv(render_mode='no')
model_description = "a2c_minif1rl"
run_name = datetime.now().strftime("%Y-%m-%d_%H-%M-%S") + "_" + model_description

# Create and start the rendering thread
close_event = threading.Event()
render_thread = threading.Thread(target=render_thread, args=(env,close_event))
render_thread.start()

# Set the total timesteps for training
total_timesteps = 1_000_000

# Tensorboard writer
writer = get_writer({"run_name": run_name,
                    "model_description": model_description,
                    "total_timesteps": total_timesteps,
                    "env_name": "MiniF1RLEnv",
                    "reward_weights": env.get_reward_weights()})

# Define the policy network and enable GPU usage
policy_kwargs = dict(activation_fn=torch.nn.ReLU, net_arch=[64, 64])
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Create the A2C model with PyTorch policy and enable GPU training
#model = A2C("MlpPolicy", env, policy_kwargs=policy_kwargs, device=device, verbose=1)
model = PPO("MlpPolicy", env, verbose=1)
# Train the model
model.learn(total_timesteps=total_timesteps, progress_bar=False, tb_log_name=run_name)
model.save(f"models/{run_name}")

# Close the rendering thread and environment
close_event.set()
render_thread.join()
env.close()

# Close the tensorboard writer
writer.close()


pygame 2.5.2 (SDL 2.28.3, Python 3.11.5)
Hello from the pygame community. https://www.pygame.org/contribute.html
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 135      |
|    ep_rew_mean     | 368      |
| time/              |          |
|    fps             | 148      |
|    iterations      | 1        |
|    time_elapsed    | 13       |
|    total_timesteps | 2048     |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 122         |
|    ep_rew_mean          | 407         |
| time/                   |             |
|    fps                  | 136         |
|    iterations           | 2           |
|    time_elapsed         | 29          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.01

KeyboardInterrupt: 

# Training Log

## 01
Model: 2024-04-12_14-00-00  
Timesteps: 10'000   
Training Time: 1h 10min  
Result: The car learned to always directly drive into the right border.   
Explaination. After checking the rewards that were given to the model, i realized that the reward function was not working as intended. The weight for the progress reward was way to low so the model did not learn to follow the track. Probebly also the punishment for hitting the track boundaries was to low.