In [1]:
from IPython.display import Image

- make hyperparams explicitly configable

# AI Car Racer

## Introduction

Welcome to the AI Car Racer competition! You're about to teach a car to race around a track using reinforcement learning (RL). This is a competition: **complete a full lap in the fastest time to win**.


## What You're Building

You'll train an AI agent using **Deep Q-Learning (DQN)** to control a car racing around a track. The model "sees" an image of the track, and learns to choose actions such as (turn left, turn right, accelerate, brake, or do nothing) to maximize its score.


## Competition Rules

- **Goal**: Complete a full lap around the track
- **Winner**: Fastest lap time
- **Track**: Fixed seed (everyone gets the same track)
- **Starting point**: Train from scratch
- **You can**: Tune any hyperparameters and modify wrappers


In [2]:
Image(url='https://gymnasium.farama.org/_images/car_racing.gif')

## Google Colab

### What is Colab?

Google Colab is like Google Docs for code—it's a free Jupyter notebook environment that runs in your browser.

### Key Concepts

- **Cells**: Blocks of code or text. Run them with `Shift + Enter` or click the ▶️ button
- **Code cells**: Contain Python code you can execute
- **Text cells**: Contain formatted text (like this README)
- **Runtime**: The virtual computer running your code
  - Go to `Runtime → Change runtime type` to select GPU
  - Free GPUs speed up training significantly
- **Session timeout**: After 12 hours or if idle, your runtime disconnects. Your code remains but variables reset

### Essential Shortcuts

- `Shift + Enter`: Run current cell and move to next
- `Ctrl + Enter`: Run current cell and stay on it
- `Ctrl + S`: Save notebook
- `Ctrl + /`: Comment/uncomment code

### Colab File System

- Files you create live in `/content/` directory
- **Important**: Files are temporary! They disappear when runtime disconnects
- Download important files (models, logs) to your local machine



## Reinforcement Learning

In [3]:
Image(url='https://media.geeksforgeeks.org/wp-content/uploads/20220214110501/ImagefromiOS1-660x296.jpg')

### Environment

This is the game world and all of its components, think of the track, the car, the physics and how they all interact together.

### Observations (state)
This is what the model can 'see'

https://gymnasium.farama.org/environments/box2d/car_racing/#observation-space


### Actions
Based on the observations the model will then pick the best action to take.

**Continous**

This is like having a real steering wheel, with a gas and brake pedal.

| Index | Control  | Range        | Meaning                            |
|-------|----------|--------------|------------------------------------|
| 0     | Steering | [-1.0, +1.0] | -1 = full left, +1 = full right    |
| 1     | Gas      | [0.0, 1.0]   | 0 = no throttle, 1 = full throttle |
| 2     | Brake    | [0.0, 1.0]   | 0 = no brake, 1 = full brake       |


_Example action_:
[0.3, 0.8, 0.0] → "Turn slightly right and 80% throttle, no brake."

Note the `continuous=True`

```python
import gymnasium
env = gymnasium.make("CarRacing-v3", continuous=True)
```

**Discrete**


Discrete actions are like buttons where the model chooses a button to press at each timestep.

| Action | Meaning     |
|--------|-------------|
| 0      | Do nothing  |
| 1      | Steer right |
| 2      | Steer left  |
| 3      | Gas         |
| 4      | Brake       |

Note the `continuous=False`

```python
import gymnasium
env = gymnasium.make("CarRacing-v3", continuous=False)
```


**Choosing Between Continuous and Discrete**

| Mode           | Pros                      | Cons                            |
|----------------|---------------------------|---------------------------------|
| **Continuous** | Realistic, smooth control | Model will take longer to train |
| **Discrete**   | Simple controls           | Quicker to learn, less control  |


### Reward

The model is provided with a reward for each action it takes. The reward provides feedback to the model to determine if it should take more actions like that or less actions like that.

## Hyperparameters Explained

These are the knobs you can turn to improve performance. **This is where you'll win the competition!**

In Python, `5e-4` is just a shorthand for writing `0.0005`. The `e-4` means 'shift the decimal 4 places left' or '5 times 10 to the power of -4'.

### Core Training Parameters

#### `total_timesteps` (Default: 500,000)
**What it does**: Total number of actions the agent takes during training. Start small 10k and build up to see how your model reacts
**Think of it as**: How many practice laps your car gets.  
- **Lower** (10k): Faster training but may not learn completely
- **Higher** (500k): Better final performance but takes longer

#### `learning_rate` (Default: 1e-4)
**What it does**: How big each update step is when learning.  
**Think of it as**: How quickly the car adjusts its strategy after each mistake.  
**Range to try**: 5e-5 to 5e-4
- **Lower** (5e-5): More stable, slower learning, less likely to "forget"
- **Higher** (5e-4): Faster learning but can be unstable, might overshoot
- **Sweet spot**: 1e-4 is a solid default

#### `gamma` (Default: 0.98)
**What it does**: Discount factor for future rewards.  
**Think of it as**: How much the car values long-term success vs immediate rewards.  
**Range to try**: 0.95 - 0.995
- **Lower** (0.95): Car focuses on immediate rewards, more aggressive
- **Higher** (0.995): Car plans ahead more, smoother driving





### Exploration Parameters

#### `exploration_fraction` (Default: 0.3)
**What it does**: What fraction of training to spend exploring randomly.  
**Think of it as**: How long the car experiments before settling on a strategy.  
**Range to try**: 0.2 - 0.5
- **Lower** (0.2): Commits to learned strategy sooner
- **Higher** (0.5): Explores longer, might find better solutions
- **Sweet spot**: 0.3 for most cases


## Experimentation Strategy

### Phase 1: Quick Iteration (45 min)
Try faster training runs to test ideas:
- Reduce `total_timesteps` to 50k for quick tests
- Try 2-3 different hyperparameter combinations
- Focus on `learning_rate`, `gamma`, and exploration parameters

### Phase 2: Final Training (60 min)
Once you find promising settings, do a full training run



## Monitoring Training

### Watch the Logs

Key metrics to monitor in the training output:

```python
# After training starts, you'll see:
ep_rew_mean: -50 → -20 → 100 → 300 → 500+  # Getting better!
ep_len_mean: 50 → 100 → 200 → 500+          # Driving for longer!
```

**Good signs:**
- `ep_rew_mean` increasing over time
- `ep_len_mean` increasing (car survives longer)
- Fewer negative rewards

**Bad signs:**
- `ep_rew_mean` stuck or decreasing
- Very short episodes throughout training
- Loss values exploding (> 100)


## Training

In [4]:
!pip install "swig>=4.3.1.post0"
!pip install "gymnasium[box2d]==1.2.0"
!pip install "stable-baselines3[extra]==2.7.0"
!pip install "pyvirtualdisplay"
!sudo apt-get install -y xvfb ffmpeg

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip instal

In [5]:
from datetime import datetime
import gymnasium
from gymnasium.wrappers import ResizeObservation
from stable_baselines3 import DQN
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.callbacks import EvalCallback
from stable_baselines3.common.monitor import Monitor
from IPython.display import HTML
from IPython import display as ipythondisplay
from pyvirtualdisplay import Display
from gymnasium.wrappers import RecordVideo
import glob
import io
import base64


track_seed = 69

### Training Parameters

In [6]:
## Change these !
continuous = False
learning_rate = 1e-4
gamma=0.95
exploration_fraction=0.3
## Change these !

In [7]:
train_env = gymnasium.make("CarRacing-v3", render_mode=None, continuous=continuous)

  from pkg_resources import resource_stream, resource_exists


In [8]:
train_env.reset(seed=track_seed)

(array([[[0, 0, 0],
         [0, 0, 0],
         [0, 0, 0],
         ...,
         [0, 0, 0],
         [0, 0, 0],
         [0, 0, 0]],
 
        [[0, 0, 0],
         [0, 0, 0],
         [0, 0, 0],
         ...,
         [0, 0, 0],
         [0, 0, 0],
         [0, 0, 0]],
 
        [[0, 0, 0],
         [0, 0, 0],
         [0, 0, 0],
         ...,
         [0, 0, 0],
         [0, 0, 0],
         [0, 0, 0]],
 
        ...,
 
        [[0, 0, 0],
         [0, 0, 0],
         [0, 0, 0],
         ...,
         [0, 0, 0],
         [0, 0, 0],
         [0, 0, 0]],
 
        [[0, 0, 0],
         [0, 0, 0],
         [0, 0, 0],
         ...,
         [0, 0, 0],
         [0, 0, 0],
         [0, 0, 0]],
 
        [[0, 0, 0],
         [0, 0, 0],
         [0, 0, 0],
         ...,
         [0, 0, 0],
         [0, 0, 0],
         [0, 0, 0]]], shape=(96, 96, 3), dtype=uint8),
 {})

In [9]:
model = DQN(
    policy="CnnPolicy",
    env=train_env,
    verbose=1,
    learning_rate=learning_rate,
    gamma=gamma,
    exploration_fraction=exploration_fraction,
    seed=track_seed,
)



Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.


In [10]:
eval_env = gymnasium.make("CarRacing-v3", render_mode=None, continuous=continuous)

In [11]:
eval_callback = EvalCallback(
    eval_env,
    best_model_save_path="./models/",
    log_path="./logs/",
    eval_freq=1_000,  # evaluate every n steps
    deterministic=True,
    render=False,
)

## Monitoring

In [12]:
model.learn(total_timesteps=2000, callback=eval_callback)

run_id = datetime.now().strftime("%Y%m%d_%H%M%S")

model.save(f"models/{run_id}_DQN_CarRacingv3")




Eval num_timesteps=1000, episode_reward=-51.89 +/- 4.17
Episode length: 1000.00 +/- 0.00
----------------------------------
| eval/               |          |
|    mean_ep_length   | 1e+03    |
|    mean_reward      | -51.9    |
| rollout/            |          |
|    exploration_rate | 0.05     |
| time/               |          |
|    total_timesteps  | 1000     |
| train/              |          |
|    learning_rate    | 0.0001   |
|    loss             | 0.0011   |
|    n_updates        | 224      |
----------------------------------
New best mean reward!
Eval num_timesteps=2000, episode_reward=-28.17 +/- 46.53
Episode length: 1000.00 +/- 0.00
----------------------------------
| eval/               |          |
|    mean_ep_length   | 1e+03    |
|    mean_reward      | -28.2    |
| rollout/            |          |
|    exploration_rate | 0.05     |
| time/               |          |
|    total_timesteps  | 2000     |
| train/              |          |
|    learning_rate    | 0.000

# How good is my model ?

This will check that the model you have trained, over 10 laps and calculate the reward.  

A top score would be in the 800s. Around 200s you will see some driving abilities, slow turns on corners.

In [13]:
check_env = gymnasium.make("CarRacing-v3",  render_mode='rgb_array', continuous=False)
check_env = Monitor(check_env)
mean_reward, std_reward = evaluate_policy(model, check_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

mean_reward=-42.45 +/- 30.753049953576127


# See your model drive round the track



In [2]:
# Start virtual display
display = Display(visible=0, size=(1400, 900))
display.start()

# Setup the wrapper to record the video
video_callable=lambda episode_id: True
check_env = RecordVideo(check_env, video_folder='./videos', episode_trigger=video_callable)
obs, info = check_env.reset()

# Run the environment until done
terminated = False
truncated = False
while not (terminated or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = check_env.step(action)

check_env.close()

# Display the video
video = io.open(glob.glob('videos/*.mp4')[0], 'r+b').read()
encoded = base64.b64encode(video)
ipythondisplay.display(HTML(data='''
    <video width="640" height="480" controls>
        <source src="data:video/mp4;base64,{0}" type="video/mp4" />
    </video>
'''.format(encoded.decode('ascii'))))

NameError: name 'Display' is not defined

# Save and submit your model

In [3]:
run_id = datetime.now().strftime("%Y%m%d_%H%M%S")

model.save(f"models/{run_id}_DQN_CarRacingv3")

NameError: name 'datetime' is not defined