# Racing Car Project

In this project we'll be seeing how to train a neural network with reinforcement learning in order to successfully drive a car in a virtual game environment

In [1]:
!pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio===0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

Looking in links: https://download.pytorch.org/whl/cu113/torch_stable.html


You should consider upgrading via the 'c:\users\giuse\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command.


In [9]:
import gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import  DummyVecEnv
from stable_baselines3.common.evaluation import evaluate_policy
import os
import torch

## Test environment

First of all we create the test environment and render it, after that we start the game without first training the model. Here we expect to have a low score

In [5]:
env = gym.make("CarRacing-v0")

In [6]:
env.render()



In [14]:
episodes = 1
for episode in range(1, episodes+1):
    obs = env.reset()
    done = False
    score = 0
    
    while not done:
        env.render()
        action = env.action_space.sample()
        obs, reward, done, info = env.step(action)
        score+=reward
    print("Episode:{} Score:{}".format(episode, score))

Track generation: 1140..1429 -> 289-tiles track
Episode:1 Score:-34.02777777777822


In [15]:
env.close()

## Training

Here we start training the model using Torch. (Disclaimer, running the following lines of code on your computer might not work if you don't have the right graphic card)

In [16]:
use_cuda = torch.cuda.is_available()
if use_cuda:
    print('__CUDNN VERSION:', torch.backends.cudnn.version())
    print('__Number CUDA Devices:', torch.cuda.device_count())
    print('__CUDA Device Name:',torch.cuda.get_device_name(0))
    print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9)

__CUDNN VERSION: 8200
__Number CUDA Devices: 1
__CUDA Device Name: NVIDIA GeForce RTX 2060
__CUDA Device Total Memory [GB]: 6.442123264


In [17]:
env = DummyVecEnv([lambda: env])

In [18]:
model = PPO("CnnPolicy", env, verbose=1)

Using cuda device
Wrapping the env in a VecTransposeImage.


The cell below is where we train our model, the training is gonna take some a few minutes (depending on your pc specs). 
The longer we train our model, the higher the score will be

In [21]:
model.learn(total_timesteps=10000)

Track generation: 1075..1348 -> 273-tiles track
Track generation: 1145..1443 -> 298-tiles track
Track generation: 1163..1458 -> 295-tiles track
-----------------------------
| time/              |      |
|    fps             | 113  |
|    iterations      | 1    |
|    time_elapsed    | 17   |
|    total_timesteps | 2048 |
-----------------------------
Track generation: 1136..1424 -> 288-tiles track
Track generation: 1078..1358 -> 280-tiles track
-----------------------------------------
| time/                   |             |
|    fps                  | 104         |
|    iterations           | 2           |
|    time_elapsed         | 39          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.009157797 |
|    clip_fraction        | 0.0938      |
|    clip_range           | 0.2         |
|    entropy_loss         | -4.25       |
|    explained_variance   | 0.181       |
|    learning_rate        | 0.0003      |
|   

Track generation: 1047..1323 -> 276-tiles track
Track generation: 1094..1372 -> 278-tiles track
----------------------------------------
| time/                   |            |
|    fps                  | 98         |
|    iterations           | 11         |
|    time_elapsed         | 229        |
|    total_timesteps      | 22528      |
| train/                  |            |
|    approx_kl            | 0.02131917 |
|    clip_fraction        | 0.23       |
|    clip_range           | 0.2        |
|    entropy_loss         | -4         |
|    explained_variance   | 0.348      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.0165     |
|    n_updates            | 110        |
|    policy_gradient_loss | -0.0324    |
|    std                  | 0.915      |
|    value_loss           | 0.305      |
----------------------------------------
Track generation: 1192..1494 -> 302-tiles track
Track generation: 1133..1427 -> 294-tiles track
-----------------------------

----------------------------------------
| time/                   |            |
|    fps                  | 96         |
|    iterations           | 20         |
|    time_elapsed         | 426        |
|    total_timesteps      | 40960      |
| train/                  |            |
|    approx_kl            | 0.06382162 |
|    clip_fraction        | 0.344      |
|    clip_range           | 0.2        |
|    entropy_loss         | -3.64      |
|    explained_variance   | 0.919      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.0571     |
|    n_updates            | 200        |
|    policy_gradient_loss | -0.0367    |
|    std                  | 0.813      |
|    value_loss           | 0.471      |
----------------------------------------
Track generation: 1119..1403 -> 284-tiles track
Track generation: 1395..1751 -> 356-tiles track
Track generation: 1023..1287 -> 264-tiles track
retry to generate track (normal if there are not manyinstances of this messag

Track generation: 994..1256 -> 262-tiles track
Track generation: 1356..1699 -> 343-tiles track
-----------------------------------------
| time/                   |             |
|    fps                  | 94          |
|    iterations           | 30          |
|    time_elapsed         | 649         |
|    total_timesteps      | 61440       |
| train/                  |             |
|    approx_kl            | 0.040270522 |
|    clip_fraction        | 0.327       |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.38       |
|    explained_variance   | 0.973       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.096       |
|    n_updates            | 300         |
|    policy_gradient_loss | -0.0259     |
|    std                  | 0.751       |
|    value_loss           | 0.824       |
-----------------------------------------
Track generation: 1227..1538 -> 311-tiles track
Track generation: 1242..1556 -> 314-tiles track
-----------

Track generation: 1272..1594 -> 322-tiles track
Track generation: 868..1095 -> 227-tiles track
----------------------------------------
| time/                   |            |
|    fps                  | 94         |
|    iterations           | 40         |
|    time_elapsed         | 869        |
|    total_timesteps      | 81920      |
| train/                  |            |
|    approx_kl            | 0.07894936 |
|    clip_fraction        | 0.408      |
|    clip_range           | 0.2        |
|    entropy_loss         | -3.1       |
|    explained_variance   | 0.94       |
|    learning_rate        | 0.0003     |
|    loss                 | 0.0713     |
|    n_updates            | 400        |
|    policy_gradient_loss | -0.0145    |
|    std                  | 0.691      |
|    value_loss           | 1.04       |
----------------------------------------
Track generation: 1097..1376 -> 279-tiles track
Track generation: 1180..1479 -> 299-tiles track
------------------------------

<stable_baselines3.ppo.ppo.PPO at 0x19e1d006250>

In [23]:
 evaluate_policy(model, env, n_eval_episodes=3, render= True)

Track generation: 1267..1588 -> 321-tiles track
Track generation: 1163..1458 -> 295-tiles track
Track generation: 1145..1435 -> 290-tiles track
Track generation: 1199..1503 -> 304-tiles track


(182.5498843540748, 115.31087818476664)

In [22]:
env.close()