# CarRacing game


##### General description 
- It is one of the easiest **continuous control task** to learn from pixels, a top-down racing
  environment.  
- State consists of STATE_W x STATE_H pixels.
- The episode finishes when all the tiles are visited. The car also can go
  outside of the PLAYFIELD -  that is far off the track, then it will get -100
  and die.
- Some indicators are shown at the bottom of the window along with the state RGB
  buffer. From left to right: 
  - the true speed 
  - four ABS sensors 
  - the steering wheel position 
  - gyroscope.
  
<img src="./res/car_racing_indicators.png" width="400px" height="500px" />


*Indicators example*


##### Reward
- The reward is -0.1 every frame and +1000/N for every track tile visited, where
  N is the total number of tiles visited in the track. For example, if you have
  finished in 732 frames, your reward is 1000 - 0.1*732 = 926.8 points.
- The game is solved when the agent consistently gets 900+ points. The generated
  track is random every episode.
  

*Taken from official **gym** game description [github repo](https://github.com/openai/gym/blob/master/gym/envs/box2d/car_racing.py)*


##### Rendered env example

<img src="./res/car_racing_rendered_env.png" width="400px" height="500px" />

## 1. Install necessary packages/libs

In [None]:
!pip install 'stable-baselines3[extra]'
!pip install tensorboard==1.15.0
!pip install 'gym[box2d]' 
!pip install pyglet==1.5.11 

## 2. Imports

In [1]:
import gym 
import os

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecFrameStack
from stable_baselines3.common.evaluation import evaluate_policy

## 3. Create & Explore the environment

In [2]:
def make_environment(env_name):
    return gym.make(env_name)
    
env = make_environment("CarRacing-v0")

In [3]:
print(f"Action space type -> {env.action_space}")
print(f"Action space sample -> {env.action_space.sample()}")

Action space type -> Box([-1.  0.  0.], [1. 1. 1.], (3,), float32)
Action space sample -> [-0.43569368  0.54446155  0.7904896 ]


In [4]:
print(f"Observation space type -> {env.observation_space}")
print(f"Observation space sample -> {env.observation_space.sample()}")

Observation space type -> Box([[[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 ...

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]], [[[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 ...

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 25

In [5]:
print(f"Observation space shape -> {env.observation_space.sample().shape}")

Observation space shape -> (96, 96, 3)


## 4. Random dummy agent

In [None]:
EPISODES = 10

In [None]:
for ep in range(0, EPISODES):
    state = env.reset()
    is_done = False
    score = 0 
    
    while not is_done:
        
        env.render()
        
        # take a sample from action space (random action)
        action = env.action_space.sample()
        state, reward, is_done, additional_info = env.step(action)
        score += reward
    print(f'Step -> {ep+1} | Score -> {score}')
env.close()

## 5. Train Model

In [6]:
log_path = os.path.join('car_racing_training', 'logs')

In [None]:
# Let's go with 800 points ("solved threshold")
REWARD_THRESHOLD = 800

from stable_baselines3.common.callbacks import (
    EvalCallback,
    StopTrainingOnRewardThreshold
)

best_model_save_path = os.path.join('car_racing_training', 'trained_models')

stop_callback = StopTrainingOnRewardThreshold(reward_threshold=REWARD_THRESHOLD, 
                                             verbose=1)
evaluation_callback = EvalCallback(env,
                                  callback_on_new_best=stop_callback,
                                  eval_freq=6000,
                                  best_model_save_path=best_model_save_path,
                                  verbose=1)

In [None]:
model = PPO("CnnPolicy", env, verbose=1, tensorboard_log=log_path)

In [None]:
model.learn(total_timesteps=500000)

## 6. Save Model

In [7]:
save_trained_models_path = os.path.join('car_racing_training', 'trained_models', '300k_trained_model')

In [None]:
model.save(save_trained_models_path)

### Reload

In [None]:
del model

In [8]:
model = PPO.load(save_trained_models_path, env)

Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.


## 7. Evaluate model performance

In [9]:
evaluate_policy(model, env, n_eval_episodes=10, render=True)



Track generation: 1207..1513 -> 306-tiles track
Track generation: 1255..1573 -> 318-tiles track
Track generation: 1043..1313 -> 270-tiles track
Track generation: 1320..1654 -> 334-tiles track
Track generation: 1180..1479 -> 299-tiles track
Track generation: 1013..1271 -> 258-tiles track
Track generation: 1123..1408 -> 285-tiles track
Track generation: 1043..1308 -> 265-tiles track
Track generation: 905..1139 -> 234-tiles track
retry to generate track (normal if there are not manyinstances of this message)
Track generation: 1056..1324 -> 268-tiles track
Track generation: 1118..1402 -> 284-tiles track
Track generation: 1197..1508 -> 311-tiles track


(239.85585741400718, 149.00814733178092)

##### Training took about 2.5 hrs for 300k steps

### Explore logs

In [None]:
!tensorboard --logdir={log_path}

#### How reward and episode length grew:

<img src="./res/racing_results.png" width="800px" height="1000px" />

## 8. Test env with trained model

In [None]:
def make_environment(env_name):
    return gym.make(env_name)
    
env = make_environment("CarRacing-v0")

for ep in range(0, 10):
    state = env.reset()
    is_done = False
    score = 0 
    print(state)
    while not is_done:
        
        env.render()
        
        # take a sample from action space (random action)
        action = model.predict(state)
        state, reward, is_done, additional_info = env.step(action)
        score += reward
    print(f'Step -> {ep+1} | Score -> {score}')
env.close()

## 9. Result with untrained / trained model

#### Random agent
<img src="./res/dummy_example.gif" width="400px" height="500px" />

#### Trained agent
<img src="./res/trained_300k.gif" width="400px" height="500px" />