<a href="https://colab.research.google.com/github/jeffheaton/app_deep_learning/blob/main/t81_558_class_12_4_atari.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-558: Applications of Deep Neural Networks
**Module 12: Reinforcement Learning**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 12 Video Material

* Part 12.1: Introduction to Introduction to Gymnasium [[Video]](https://www.youtube.com/watch?v=FvuyrpzvwdI&list=PLjy4p-07OYzuy_lHcRW8lPTLPTTOmUpmi) [[Notebook]](t81_558_class_12_1_reinforcement.ipynb)
* Part 12.2: Introduction to Q-Learning [[Video]](https://www.youtube.com/watch?v=VKuqvbG_KAw&list=PLjy4p-07OYzuy_lHcRW8lPTLPTTOmUpmi) [[Notebook]](t81_558_class_12_2_qlearningreinforcement.ipynb)
* Part 12.3: Stable Baselines Q-Learning [[Video]](https://www.youtube.com/watch?v=kl7zsCjULN0&list=PLjy4p-07OYzuy_lHcRW8lPTLPTTOmUpmi) [[Notebook]](t81_558_class_12_3_pytorch_reinforce.ipynb)
* **Part 12.4: Atari Games with Stable Baselines Neural Networks** [[Video]](https://www.youtube.com/watch?v=maLA1_d4pzQ&list=PLjy4p-07OYzuy_lHcRW8lPTLPTTOmUpmi) [[Notebook]](t81_558_class_12_4_atari.ipynb)
* Part 12.5: Future of Reinforcement Learning [[Video]](https://www.youtube.com/watch?v=-euo5pTjP8E&list=PLjy4p-07OYzuy_lHcRW8lPTLPTTOmUpmi) [[Notebook]](t81_558_class_12_5_rl_future.ipynb)


# Google CoLab Instructions

The following code ensures that Google CoLab is running the correct version of TensorFlow, and has the necessary Python libraries installed.

In [1]:
# HIDE OUTPUT
try:
    from google.colab import drive
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

if COLAB:
  !pip install stable-baselines3[extra] gymnasium
  !pip install gymnasium[accept-rom-license,atari]
  !pip install pyvirtualdisplay
  !sudo apt-get install -y python-opengl ffmpeg
  !sudo apt-get install -y xvfb

Note: using Google CoLab
Collecting stable-baselines3[extra]
  Downloading stable_baselines3-2.1.0-py3-none-any.whl (178 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m178.7/178.7 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting gymnasium
  Downloading gymnasium-0.29.1-py3-none-any.whl (953 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m953.9/953.9 kB[0m [31m18.1 MB/s[0m eta [36m0:00:00[0m
Collecting shimmy[atari]~=1.1.0 (from stable-baselines3[extra])
  Downloading Shimmy-1.1.0-py3-none-any.whl (37 kB)
Collecting autorom[accept-rom-license]~=0.6.1 (from stable-baselines3[extra])
  Downloading AutoROM-0.6.1-py3-none-any.whl (9.4 kB)
Collecting farama-notifications>=0.0.1 (from gymnasium)
  Downloading Farama_Notifications-0.0.4-py3-none-any.whl (2.5 kB)
Collecting AutoROM.accept-rom-license (from autorom[accept-rom-license]~=0.6.1->stable-baselines3[extra])
  Downloading AutoROM.accept-rom-license-0.6.1.tar.gz (434 kB)
[2K  

# Part 12.4: Atari Games with Stable Baselines Neural Networks

The Atari 2600 is a home video game console from Atari, Inc., Released on September 11, 1977. Most credit the Atari with popularizing microprocessor-based hardware and games stored on ROM cartridges instead of dedicated hardware with games built into the unit. Atari bundled their console with two joystick controllers, a conjoined pair of paddle controllers, and a game cartridge: initially [Combat](https://en.wikipedia.org/wiki/Combat_(Atari_2600)), and later [Pac-Man](https://en.wikipedia.org/wiki/Pac-Man_(Atari_2600)).

Atari emulators are popular and allow gamers to play many old Atari video games on modern computers. These emulators are even available as JavaScript.

* [Virtual Atari](http://www.virtualatari.org/listP.html)

Atari games have become popular benchmarks for AI systems, particularly reinforcement learning. OpenAI Gym internally uses the [Stella Atari Emulator](https://stella-emu.github.io/). You can see the Atari 2600 in Figure 12.ATARI.

**Figure 12.ATARI: The Atari 2600**
![Atari 2600 Console](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/atari-1.png "Atari 2600 Console")

## Actual Atari 2600 Specs

* CPU: 1.19 MHz MOS Technology 6507
* Audio + Video processor: Television Interface Adapter (TIA)
* Playfield resolution: 40 x 192 pixels (NTSC). It uses a 20-pixel register that is mirrored or copied, left side to right side, to achieve the width of 40 pixels.
* Player sprites: 8 x 192 pixels (NTSC). Player, ball, and missile sprites use pixels 1/4 the width of playfield pixels (unless stretched).
* Ball and missile sprites: 1 x 192 pixels (NTSC).
* Maximum resolution: 160 x 192 pixels (NTSC). Max resolution is achievable only with programming tricks that combine sprite pixels with playfield pixels.
* 128 colors (NTSC). 128 possible on screen. Max of 4 per line: background, playfield, player0 sprite, and player1 sprite. Palette switching between lines is common. Palette switching mid-line is possible but not common due to resource limitations.
* 2 channels of 1-bit monaural sound with 4-bit volume control.

## Gymnasium Atari Breakout

You can use OpenAI Gym with Windows; however, it requires a special [installation procedure](https://towardsdatascience.com/how-to-install-openai-gym-in-a-windows-environment-338969e24d30).

This chapter demonstrates playing [Atari Breakout](https://en.wikipedia.org/wiki/Breakout_(video_game)). Atari Breakout is a classic arcade game that was released by Atari, Inc. in 1976. In the game, the player controls a paddle at the bottom of the screen, using it to bounce a ball against a wall of bricks at the top. The objective is to destroy all the bricks by hitting them with the ball, which the player deflects with the paddle. As the player progresses, the ball moves increasingly faster, and some bricks may require multiple hits to break. The player loses a turn when the ball misses the paddle and hits the bottom of the screen. The simplicity of Breakout's gameplay, combined with its increasing difficulty as the game progresses, has made it a quintessential example of the easy-to-learn-yet-hard-to-master design ethos that characterized many early video games.

In the context of artificial intelligence research and particularly within reinforcement learning, Atari Breakout has been adapted as an environment within OpenAI's Gym toolkit, a collection of environments that provide a standardized interface for algorithm development and benchmarking. Stable Baselines is a set of high-quality implementations of reinforcement learning algorithms, which offers a simple way to train and evaluate agents on various tasks, including playing Atari games like Breakout. The adaptation of Breakout to the Gym environment, often referred to as 'Breakout-v0' or 'BreakoutDeterministic-v4' in the Gym library, abstracts the game's mechanics into observations, actions, and rewards, which an AI agent can interact with. In this setup, the agent observes the game state (typically the pixel data from the screen), selects actions (like moving the paddle left or right), and receives rewards (such as the score for breaking bricks). This allows researchers and enthusiasts to apply and test reinforcement learning algorithms using Stable Baselines to develop AI agents that can learn to play Breakout at a superhuman level, offering a playground to advance the field of machine learning.


## Training the Agent

We are now ready to train the DQN. Depending on how many episodes you wish to run through, this process can take many hours. This code will update both the loss and average return as training occurs. As training becomes more successful, the average return should increase. The losses reported reflecting the average loss for individual training batches.

In [2]:
import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack

# Set this constant to either 'Breakout' or 'Atlantis' to choose the game
GAME_NAME = 'Breakout'  # Or 'Atlantis'

# Create the game environment, note that we wrap it with VecFrameStack for preprocessing
env_id = f"{GAME_NAME}NoFrameskip-v4"
env = make_atari_env(env_id, n_envs=4, seed=0)
env = VecFrameStack(env, n_stack=4)

# Initialize the agent, here we use Proximal Policy Optimization (PPO)
model = PPO('CnnPolicy', env, verbose=1, tensorboard_log="./atari_ppo_tensorboard/")

# Train the agent
TIMESTEPS = 1e5
model.learn(total_timesteps=TIMESTEPS)

# Save the model
model.save(f"{GAME_NAME}_ppo_model")

# Evaluate the trained agent
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=10)

print(f"Mean reward: {mean_reward} +/- {std_reward}")

# Don't forget to close the environment when you are done
env.close()


  from tensorflow.tsl.python.lib.core import pywrap_ml_dtypes


Using cuda device
Wrapping the env in a VecTransposeImage.
Logging to ./atari_ppo_tensorboard/PPO_1
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 774      |
|    ep_rew_mean     | 1.52     |
| time/              |          |
|    fps             | 336      |
|    iterations      | 1        |
|    time_elapsed    | 24       |
|    total_timesteps | 8192     |
---------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 745          |
|    ep_rew_mean          | 1.4          |
| time/                   |              |
|    fps                  | 318          |
|    iterations           | 2            |
|    time_elapsed         | 51           |
|    total_timesteps      | 16384        |
| train/                  |              |
|    approx_kl            | 0.0089585865 |
|    clip_fraction        | 0.0609       |
|    clip_range           | 0.2          |
|

## Videos

Perhaps the most compelling way to view an Atari game's results is a video that allows us to see the agent play the game. We now have a trained model and observed its training progress on a graph. The following functions are defined to watch the agent play the game in the notebook.

In [3]:
from stable_baselines3.common.vec_env import VecFrameStack
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecVideoRecorder
from stable_baselines3 import PPO
import os

# Set the game name here
GAME_NAME = 'Breakout'  # Can be 'Atlantis' as well

# Load your previously trained model
model_path = f"{GAME_NAME}_ppo_model.zip"
model = PPO.load(model_path)

# Create the Atari environment and apply the correct wrappers
env_id = f"{GAME_NAME}NoFrameskip-v4"
env = make_atari_env(env_id, n_envs=1, seed=0)
env = VecFrameStack(env, n_stack=4)

# Record the environment
video_folder = '/content/videos'
if not os.path.exists(video_folder):
    os.makedirs(video_folder)

env = VecVideoRecorder(env, video_folder,
                       record_video_trigger=lambda step: step == 0,
                       video_length=500,
                       name_prefix=f"{GAME_NAME}-agent")

# Reset the environment and observe the initial observation shape
obs = env.reset()
print("Initial observation shape:", obs.shape)  # Should be (1, 4, 84, 84)

# Run one episode
done = False
while not done:
    action, _states = model.predict(obs, deterministic=True)
    obs, rewards, done, info = env.step(action)
    env.render()

# Close the environment which should also save the video
env.close()


  and should_run_async(code)
  logger.warn(


Initial observation shape: (1, 84, 84, 4)
Moviepy - Building video /content/videos/Breakout-agent-step-0-to-step-500.mp4.
Moviepy - Writing video /content/videos/Breakout-agent-step-0-to-step-500.mp4



                                                               

Moviepy - Done !
Moviepy - video ready /content/videos/Breakout-agent-step-0-to-step-500.mp4




In [4]:
from IPython.display import HTML
from base64 import b64encode

# Load the video and encode it
video_path = '/content/videos/'  # Make sure this matches the path where the videos are saved
video_files = [f for f in os.listdir(video_path) if f.endswith('.mp4')]

if video_files:
    video_filename = video_files[-1]  # if you expect multiple videos, modify this to select the correct one
    full_video_filename = f"{video_path}/{video_filename}"
    mp4 = open(full_video_filename, 'rb').read()
    encoded = b64encode(mp4).decode('ascii')
    html = HTML(data=f'<video width="640" height="480" controls><source src="data:video/mp4;base64,{encoded}" type="video/mp4"></video>')
else:
    html = HTML(data="Error: No video found")

html


  and should_run_async(code)
