<h2>1. Importing Dependencies

In [None]:
!pip install gym[box2d] pyglet

In [1]:
import gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.evaluation import evaluate_policy
import os

<h2>2. Loading and Testing Environment

In [2]:
environment_name = 'CarRacing-v0'
env = gym.make(environment_name)

In [None]:
env.reset()

In [48]:
env.close()

In [None]:
env.reset()

In [None]:
env.render('human')

In [None]:
episodes = 5
for episode in range(1,episodes + 1):
  state = env.reset()
  done = False
  score = 0

  while not done:
    env.render('human')
    action = env.action_space.sample()
    n_state, reward, done, info = env.step(action)
    score += reward
  print('Episode:{} Score:{}'.format(episode,score))
env.close()

<h2>3.Train Model

In [3]:
env = gym.make(environment_name)
env = DummyVecEnv([lambda: env])

In [4]:
log_path = os.path.join('Training','Logs')
model = PPO('CnnPolicy',env,verbose=1,tensorboard_log=log_path)

Using cpu device
Wrapping the env in a VecTransposeImage.


In [30]:
model.learn(total_timesteps=200000)

Track generation: 1102..1389 -> 287-tiles track
Logging to Training\Logs\PPO_21
Track generation: 1156..1449 -> 293-tiles track
Track generation: 1255..1573 -> 318-tiles track
-----------------------------
| time/              |      |
|    fps             | 80   |
|    iterations      | 1    |
|    time_elapsed    | 25   |
|    total_timesteps | 2048 |
-----------------------------
Track generation: 1111..1393 -> 282-tiles track
Track generation: 1211..1518 -> 307-tiles track
----------------------------------------
| time/                   |            |
|    fps                  | 28         |
|    iterations           | 2          |
|    time_elapsed         | 142        |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.12089412 |
|    clip_fraction        | 0.552      |
|    clip_range           | 0.2        |
|    entropy_loss         | -3.18      |
|    explained_variance   | 0.893      |
|    learning_rate        

<stable_baselines3.ppo.ppo.PPO at 0x283d83184c0>

<h2>4. Save Model 

In [42]:
model_name = "PPO_DrivingX_" + str(5)
ppo_path = os.path.join('Training','Saved Models',model_name)

In [43]:
ppo_path

'Training\\Saved Models\\PPO_DrivingX_5'

In [44]:
ppo_path = os.path.join('Training','Saved Models','PPO_DrivingX_400k')

In [32]:
model.save(ppo_path)

In [46]:
for training in range(5,21):
    model.learn(total_timesteps=100000)
    model_name = "PPO_DrivingX_" + str(training)
    ppo_path = os.path.join('Training','Saved Models',model_name)
    model.save(ppo_path)
    del model
    model = PPO.load(ppo_path,env)

Track generation: 1053..1321 -> 268-tiles track
Logging to Training\Logs\PPO_22
Track generation: 1291..1618 -> 327-tiles track
Track generation: 1192..1494 -> 302-tiles track
-----------------------------
| time/              |      |
|    fps             | 81   |
|    iterations      | 1    |
|    time_elapsed    | 25   |
|    total_timesteps | 2048 |
-----------------------------
Track generation: 1160..1454 -> 294-tiles track
Track generation: 1314..1656 -> 342-tiles track
-----------------------------------------
| time/                   |             |
|    fps                  | 28          |
|    iterations           | 2           |
|    time_elapsed         | 143         |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.093476474 |
|    clip_fraction        | 0.5         |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.17       |
|    explained_variance   | 0.894       |
|    learning_

KeyboardInterrupt: 

In [33]:
del model

In [4]:
ppo_path = os.path.join('Training','Saved Models', 'PPO_Driving_5M')

In [45]:
model = PPO.load(ppo_path,env)

Wrapping the env in a VecTransposeImage.


<h2>5. Evaluate and Test

In [36]:
evaluate_policy(model,env,n_eval_episodes=2,render=True)

Track generation: 1259..1583 -> 324-tiles track
Track generation: 1190..1492 -> 302-tiles track
Track generation: 1183..1483 -> 300-tiles track


(218.00087825208902, 110.57053758949041)

In [7]:
env.close()

In [None]:
episodes = 5
for episode in range(1,episodes + 1):
  obs = env.reset()
  done = False
  score = 0

  while not done:
    env.render()
    action, _= model.predict(obs)
    obs, reward, done, info = env.step(action)
    score += reward
  print('Episode:{} Score:{}'.format(episode,score))
env.close()