# DQN for Pong environment

> Enable GPU

## [Check if PyTorch and TensorFlow are capable of using GPU](https://stackoverflow.com/a/60338745)

In [1]:
# Check if GPU is running or not
!nvidia-smi

Fri Jul 23 14:46:52 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   59C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
# check if PyTorch is capable of using GPU
import torch
torch.cuda.is_available()
# Output would be True if Pytorch is using GPU otherwise it would be False.

True

In [3]:
# check if TensorFlow is capable of using GPU
import tensorflow as tf
tf.test.gpu_device_name()
# Standard output is '/device:GPU:0'

'/device:GPU:0'

## Mount Google Drive and change to working directory

In [4]:
from google.colab import drive
import os

drive.mount('/gdrive')
os.chdir('/gdrive/MyDrive/earth-7/')

Mounted at /gdrive


In [5]:
# Check if current directory includes ROMS and models folders
!ls

CartPole-v0	    DQN_Pong.ipynb  plots    results
DQN_CartPole.ipynb  models	    Pong-v0  ROMS


In [6]:
!apt-get install -y xvfb python-opengl x11-utils > /dev/null 2>&1
!pip install gym pyvirtualdisplay scikit-video > /dev/null 2>&1
!python -m atari_py.import_roms ROMS

copying pong.bin from ROMS/Video Olympics - Pong Sports (Paddle) (1977) (Atari, Joe Decuir - Sears) (CX2621 - 99806, 6-99806, 49-75104) ~.bin to /usr/local/lib/python3.7/dist-packages/atari_py/atari_roms/pong.bin


In [7]:
import gym

import matplotlib.pyplot as plt
%matplotlib inline

In [8]:
# Create the environment
env = gym.make('Pong-v0')
env.seed(1)
print('State shape: ', env.observation_space.shape)
print('Number of actions: ', env.action_space.n)
env.close()

State shape:  (210, 160, 3)
Number of actions:  6


$$
target\_update\_frequency = 10
$$

In [None]:
!python '/gdrive/MyDrive/earth-7/Pong-v0/train.py' --env Pong-v0 --evaluate_freq 25 --evaluation_episodes 5

  return torch.tensor(obs, device=device).float()
Best reward -20.6 so far in episode 0/2000
Saving model for epsilon: 0.9930597908090933
Best reward -20.6 so far in episode 25/2000
Saving model for epsilon: 0.9586357910545258
Best reward -20.6 so far in episode 50/2000
Saving model for epsilon: 0.925476727383189
Best reward -20.4 so far in episode 75/2000
Saving model for epsilon: 0.8925751518881533
Best reward -20.4 so far in episode 100/2000
Saving model for epsilon: 0.8605727474755062
Best reward -20.4 so far in episode 225/2000
Saving model for epsilon: 0.7198246325573551
Best reward -20.4 so far in episode 250/2000
Saving model for epsilon: 0.6918658537591907
Best reward -19.2 so far in episode 275/2000
Saving model for epsilon: 0.6623935232540206
Best reward -18.2 so far in episode 300/2000
Saving model for epsilon: 0.6333037393835275
Best reward -17.8 so far in episode 425/2000
Saving model for epsilon: 0.492009557710373
Best reward -17.4 so far in episode 475/2000
Saving model

$$
target\_update\_frequency = 500
$$

In [None]:
!python '/gdrive/MyDrive/earth-7/Pong-v0/train.py' --env Pong-v0 --evaluate_freq 25 --evaluation_episodes 5

  return torch.tensor(obs, device=device).float()
Best reward -20.6 so far in episode 0/2000
Saving model for epsilon: 0.9930951815985928
Best reward -20.2 so far in episode 25/2000
Saving model for epsilon: 0.9571731747276139
Best reward -20.2 so far in episode 75/2000
Saving model for epsilon: 0.8906602789630467
Best reward -20.2 so far in episode 125/2000
Saving model for epsilon: 0.8295061831055492
Best reward -19.2 so far in episode 150/2000
Saving model for epsilon: 0.7989878670743874
Best reward -18.6 so far in episode 225/2000
Saving model for epsilon: 0.7035310024826478
Best reward -18.4 so far in episode 250/2000
Saving model for epsilon: 0.6696734521609922
Best reward -18.4 so far in episode 275/2000
Saving model for epsilon: 0.6362320210774698
Best reward -17.6 so far in episode 300/2000
Saving model for epsilon: 0.6048111398414758
Best reward -17.0 so far in episode 350/2000
Saving model for epsilon: 0.5444450629921226
Best reward -15.8 so far in episode 425/2000
Saving mo

$$
target\_update\_frequency = 1000
$$

In [9]:
!python '/gdrive/MyDrive/earth-7/Pong-v0/train.py' --env Pong-v0 --evaluate_freq 25 --evaluation_episodes 5

  return torch.tensor(obs, device=device).float()
Best reward -19.8 so far in episode 0/2000
Saving model for epsilon: 0.9923267016813924
Best reward -19.8 so far in episode 25/2000
Saving model for epsilon: 0.9561384800113071
Best reward -19.6 so far in episode 75/2000
Saving model for epsilon: 0.8889086286941142
Best reward -19.6 so far in episode 225/2000
Saving model for epsilon: 0.696706473896143
Best reward -19.2 so far in episode 250/2000
Saving model for epsilon: 0.6629398052200947
Best reward -18.6 so far in episode 275/2000
Saving model for epsilon: 0.6305200663537617
Best reward -17.8 so far in episode 300/2000
Saving model for epsilon: 0.5979847093940603
Best reward -17.4 so far in episode 350/2000
Saving model for epsilon: 0.534898517091558
Best reward -16.6 so far in episode 425/2000
Saving model for epsilon: 0.4512351455163386
Best reward -15.4 so far in episode 450/2000
Saving model for epsilon: 0.4240351787861123
Best reward -14.6 so far in episode 625/2000
Saving mode