<a href="https://colab.research.google.com/github/kuds/rl-car-racing/blob/main/%5BCar%20Racing%5D%20Proximal%20Policy%20Optimization%20(PPO).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install swig moviepy



In [2]:
!pip install gymnasium gymnasium[box2d] stable_baselines3 rl_zoo3

Collecting rl_zoo3
  Downloading rl_zoo3-2.3.0-py3-none-any.whl.metadata (1.8 kB)
Collecting sb3-contrib<3.0,>=2.3.0 (from rl_zoo3)
  Downloading sb3_contrib-2.3.0-py3-none-any.whl.metadata (3.6 kB)
Collecting huggingface-sb3<4.0,>=3.0 (from rl_zoo3)
  Downloading huggingface_sb3-3.0-py3-none-any.whl.metadata (6.3 kB)
Collecting optuna>=3.0 (from rl_zoo3)
  Downloading optuna-3.6.1-py3-none-any.whl.metadata (17 kB)
Collecting pytablewriter~=1.2 (from rl_zoo3)
  Downloading pytablewriter-1.2.0-py3-none-any.whl.metadata (37 kB)
Collecting alembic>=1.5.0 (from optuna>=3.0->rl_zoo3)
  Downloading alembic-1.13.2-py3-none-any.whl.metadata (7.4 kB)
Collecting colorlog (from optuna>=3.0->rl_zoo3)
  Downloading colorlog-6.8.2-py3-none-any.whl.metadata (10 kB)
Collecting DataProperty<2,>=1.0.1 (from pytablewriter~=1.2->rl_zoo3)
  Downloading DataProperty-1.0.1-py3-none-any.whl.metadata (11 kB)
Collecting mbstrdecoder<2,>=1.0.0 (from pytablewriter~=1.2->rl_zoo3)
  Downloading mbstrdecoder-1.1.3-p

In [1]:
import gymnasium
from gymnasium.wrappers import RecordVideo

import stable_baselines3
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import VecFrameStack, VecVideoRecorder
from stable_baselines3.common.callbacks import BaseCallback, EvalCallback
from stable_baselines3.common.vec_env import VecTransposeImage
from stable_baselines3.common.evaluation import evaluate_policy

import platform

import torch
import time
torch.backends.cudnn.benchmark = True

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

In [4]:
print("Is Cuda Available: {}".format(torch.cuda.is_available()))
print("Torch Version: {}".format(torch.__version__))
print("Cuda Version: {}".format(torch.version.cuda))
print("Stable Baseline Version: {}".format(stable_baselines3.__version__))
print("Gymnasium Version: {}".format(gymnasium.__version__))
print("Python Version: {}".format(platform.python_version()))

Is Cuda Available: True
Torch Version: 2.3.1+cu121
Cuda Version: 12.1
Stable Baseline Version: 2.3.2
Gymnasium Version: 0.29.1
Python Version: 3.10.12


In [5]:
# Create CarRacing environment

#env = gymnasium.make("CarRacing-v2", render_mode="rgb_array")
env = make_vec_env("CarRacing-v2", n_envs=32)
env = VecFrameStack(env, n_stack=4)
#env = VecTransposeImage(env)

#env_val = gymnasium.make("CarRacing-v2", render_mode="rgb_array")
env_val = make_vec_env("CarRacing-v2", n_envs=1)
env_val = VecFrameStack(env_val, n_stack=4)
env_val = VecTransposeImage(env_val)

eval_callback = EvalCallback(env_val, best_model_save_path="./logs/", log_path="./logs/", eval_freq=1000, deterministic=True, render=False)

# Initialize PPO
model = PPO('CnnPolicy', env, verbose=1)

# Train the model
model.learn(total_timesteps=1000000,  progress_bar=True, callback=eval_callback)

# Save the model
model.save("ppo_car_racing")

mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=1)
print(f"Mean reward: {mean_reward:.2f} +/- {std_reward:.2f}")

Using cuda device
Wrapping the env in a VecTransposeImage.


Output()

---------------------------------
| eval/              |          |
|    mean_ep_length  | 1e+03    |
|    mean_reward     | -31.2    |
| time/              |          |
|    total_timesteps | 3200     |
---------------------------------


---------------------------------
| eval/              |          |
|    mean_ep_length  | 1e+03    |
|    mean_reward     | -19.7    |
| time/              |          |
|    total_timesteps | 6400     |
---------------------------------


---------------------------------
| eval/              |          |
|    mean_ep_length  | 1e+03    |
|    mean_reward     | -15.8    |
| time/              |          |
|    total_timesteps | 9600     |
---------------------------------


---------------------------------
| eval/              |          |
|    mean_ep_length  | 1e+03    |
|    mean_reward     | -18.7    |
| time/              |          |
|    total_timesteps | 12800    |
---------------------------------


KeyboardInterrupt: 

In [3]:
# prompt: Write code to get the best model and run gymnasium's Car racing against it. Save the results to a gif

# Load the best model
env = make_vec_env("CarRacing-v2", n_envs=1)
env = VecFrameStack(env, n_stack=4)
env = VecTransposeImage(env)
best_model_path = "./logs/best_model.zip"
best_model = PPO.load(best_model_path, env=env)

# Record video of the best model playing CarRacing
env = VecVideoRecorder(env, "./videos/", record_video_trigger=lambda x: x == 0, video_length=1000, name_prefix="best_model_car_racing")

obs = env.reset()
for _ in range(1000):
    action, _states = best_model.predict(obs, deterministic=True)
    obs, rewards, dones, info = env.step(action)
    env.render()

env.close()

Saving video to /content/videos/best_model_car_racing-step-0-to-step-1000.mp4
Moviepy - Building video /content/videos/best_model_car_racing-step-0-to-step-1000.mp4.
Moviepy - Writing video /content/videos/best_model_car_racing-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/best_model_car_racing-step-0-to-step-1000.mp4
