<a href="https://colab.research.google.com/github/sczopek/spaceInvadersAtariRl/blob/main/copy_of_atari_games.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Stable Baselines3 - Train on Atari Games

Github Repo: [https://github.com/DLR-RM/stable-baselines3](https://github.com/DLR-RM/stable-baselines3)


[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) is a training framework for Reinforcement Learning (RL), using Stable Baselines3.

It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

Documentation is available online: [https://stable-baselines3.readthedocs.io/](https://stable-baselines3.readthedocs.io/)

## Install Dependencies and Stable Baselines Using Pip


```
pip install stable-baselines3[extra]
```

In [None]:
!pip install "stable-baselines3[extra]<=2.3.1"

Collecting stable-baselines3>=2.0.0a4 (from stable-baselines3[extra]>=2.0.0a4)
  Downloading stable_baselines3-2.4.0a7-py3-none-any.whl.metadata (5.1 kB)
Collecting gymnasium<0.30,>=0.28.1 (from stable-baselines3>=2.0.0a4->stable-baselines3[extra]>=2.0.0a4)
  Downloading gymnasium-0.29.1-py3-none-any.whl.metadata (10 kB)
Collecting shimmy~=1.3.0 (from shimmy[atari]~=1.3.0; extra == "extra"->stable-baselines3[extra]>=2.0.0a4)
  Downloading Shimmy-1.3.0-py3-none-any.whl.metadata (3.7 kB)
Collecting autorom~=0.6.1 (from autorom[accept-rom-license]~=0.6.1; extra == "extra"->stable-baselines3[extra]>=2.0.0a4)
  Downloading AutoROM-0.6.1-py3-none-any.whl.metadata (2.4 kB)
Collecting AutoROM.accept-rom-license (from autorom[accept-rom-license]~=0.6.1; extra == "extra"->stable-baselines3[extra]>=2.0.0a4)
  Downloading AutoROM.accept-rom-license-0.6.1.tar.gz (434 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m434.7/434.7 kB[0m [31m22.5 MB/s[0m eta [36m0:00:00[0m
[?25h

## Import policy, RL agent, ...

In [None]:
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack

In [None]:
from google.colab import files

  and should_run_async(code)


## Training on Atari

We will use atari wrapper (it will downsample the image and convert it to gray scale).

About Atari preprocessing: [Frame Skipping and Pre-Processing for Deep Q-Networks on Atari 2600 Games](https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/)

![Pong](https://cdn-images-1.medium.com/max/800/1*UHYJE7lF8IDZS_U5SsAFUQ.gif)

In [None]:
# There already exists an environment generator that will make and wrap atari environments correctly.
env = make_atari_env("SpaceInvaders-v4", n_envs=8, seed=0)
# Stack 4 frames
vec_env = VecFrameStack(env, n_stack=4)

In [None]:
import os
models_dir = "models/PPO"
if not os.path.exists(models_dir):
    os.makedirs(models_dir)

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
from stable_baselines3.common.callbacks import CheckpointCallback

# use CheckpointCallback to save model during training
# copy of model at training checkpoint is written to google drive

# Save a copy of the model when training reaches a checkpoint
#  checkpoint = every 2_000_000 frames of training
# 8 env running simultaneously, so set save_freq=250000
# save_freq*8 = 2_000_000
checkpoint_callback = CheckpointCallback(save_freq=250000, save_path="/content/gdrive/MyDrive/models/PPO",
                                         name_prefix="ppo_spaceInvadersFrameSkipWinningParams_136000000_steps")

In [None]:
# if not loading model, make new model here to start training from scratch
# hyper params for space invaders model as inputs
# model = PPO("CnnPolicy", vec_env, verbose=1, tensorboard_log="/content/gdrive/MyDrive/models/ppo_spaceinvaders_tensorboard/",
#             batch_size=256,
#             clip_range=0.001,
#             ent_coef=0.01,
#             learning_rate=0.0001,
#             n_epochs=4,
#             n_steps=128,
#             vf_coef=0.5
#             )

Using cuda device
Wrapping the env in a VecTransposeImage.


In [None]:
model = PPO.load("/content/gdrive/MyDrive/models/PPO/ppo_spaceInvadersFrameSkipWinningParams_136000000_steps",
                 verbose=1,
                 tensorboard_log="/content/gdrive/MyDrive/models/ppo_spaceinvaders_tensorboard/",
                 force_reset=False)
model.set_env(vec_env)

Wrapping the env in a VecTransposeImage.


In [None]:
# can train from 20 Millions frames before booted from google colab server
# for excessive connection time
model.learn(total_timesteps=20000000, callback=[checkpoint_callback], reset_num_timesteps=False)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
|    value_loss           | 0.87         |
------------------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 9.79e+03      |
|    ep_rew_mean          | 2.39e+03      |
| time/                   |               |
|    fps                  | 555           |
|    iterations           | 29060         |
|    time_elapsed         | 53593         |
|    total_timesteps      | 63712528      |
| train/                  |               |
|    approx_kl            | 0.00065765623 |
|    clip_fraction        | 0.861         |
|    clip_range           | 0.001         |
|    entropy_loss         | -1.34         |
|    explained_variance   | 0.943         |
|    learning_rate        | 0.0001        |
|    loss                 | 0.0635        |
|    n_updates            | 266444        |
|    policy_gradient_loss | 0.0013        |
|    value_lo

<stable_baselines3.ppo.ppo.PPO at 0x7894d3d9bdf0>

Save and download trained model