<a href="https://colab.research.google.com/github/laurelkeys/machine-learning/blob/master/assignment-4/Trajectories.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
# from google.colab import drive
# drive.mount('/content/drive', force_remount=True)
# PATH_TO_DATA = os.path.join("drive", "My Drive", "unicamp", "MC886", "atari")

import os
PATH_TO_DATA = ""

In [2]:
SAVE_DIR = os.path.join(PATH_TO_DATA, "data")
os.makedirs(SAVE_DIR, exist_ok=True)

SAVE_DIR

'data'

## Generate a dataset of trajectories from pre-trained RL agents on [Atari](https://gym.openai.com/envs/#atari) [environments](https://github.com/openai/gym/wiki/Table-of-environments).
That is, by the end of this notebook we will have $observation \rightarrow action$ mappings, where $observation$s are images of shape `IMG_SHAPE` and $action$s are integer values in the range $[0, 18)$, meaning:

| 0 | 1 | 2 | 3 | 4 | 5 |
| --- | --- | --- | --- | --- | --- |
| NOOP | FIRE | UP | RIGHT | LEFT | DOWN |


| 6 | 7 | 8 | 9 |
| --- | --- | --- | --- |
| UPRIGHT | UPLEFT | DOWNRIGHT | DOWNLEFT |


| 10 | 11 | 12 | 13 |
| --- | --- | --- | --- |
| UPFIRE | RIGHTFIRE | LEFTFIRE | DOWNFIRE |


| 14 | 15 | 16 | 17 |
| --- | --- | --- | --- |
| UPRIGHTFIRE | UPLEFTFIRE | DOWNRIGHTFIRE | DOWNLEFTFIRE |

In [0]:
# number of trajectories to generate
N_OF_TRAJECTORIES = 20

# number of steps per trajectory
N_OF_STEPS = 1000

# # list of string tuples in the format (RL Algorithm, Game Environment)
GAMES = [
    ("PPO2", "BreakoutNoFrameskip-v4"),
    ("PPO2", "PongNoFrameskip-v4"),
]

In [4]:
[env_id for algo, env_id in GAMES]

['BreakoutNoFrameskip-v4', 'PongNoFrameskip-v4']

[RL Baselines Zoo](https://github.com/araffin/rl-baselines-zoo) currently has the following environments with `PPO2` pre-trained agents:  
`BeamRiderNoFrameskip-v4`, `BreakoutNoFrameskip-v4`, `EnduroNoFrameskip-v4`, `MsPacmanNoFrameskip-v4`, `PongNoFrameskip-v4`, `QbertNoFrameskip-v4`, `SeaquestNoFrameskip-v4`, `SpaceInvadersNoFrameskip-v4`

## Install dependencies

Note that we're not installing MPI, so the following algorithms will probably not work: `DDPG`, `GAIL`, `PPO1`, `TRPO`.

In [0]:
!apt-get update                                                  > /dev/null 2>&1
!apt-get install swig cmake zlib1g-dev ffmpeg freeglut3-dev xvfb > /dev/null 2>&1
!pip install pyyaml pytablewriter optuna scikit-optimize         > /dev/null 2>&1

In [6]:
#### Stable Baselines only supports TF 1.x for now ####
try:
    # Colab only
    # %tensorflow_version 2.x
    %tensorflow_version 1.x
except Exception:
    pass

import tensorflow as tf
from tensorflow import keras
print(tf.__version__)

1.15.0


In [0]:
import os
from time import time

import cv2
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import gym
from gym.envs.atari.atari_env import ACTION_MEANING

### Clone [Stable Baselines](https://github.com/hill-a/stable-baselines) and [RL Zoo Baselines](https://github.com/araffin/rl-baselines-zoo)

In [8]:
!pip list | grep baselines

stable-baselines         2.2.1      


In [0]:
!yes | pip uninstall stable-baselines                           > /dev/null 2>&1
!pip install git+https://github.com/hill-a/stable-baselines.git > /dev/null 2>&1

In [10]:
!pip list | grep baselines

stable-baselines         2.9.0a0    


In [11]:
!git clone https://github.com/araffin/rl-baselines-zoo.git

Cloning into 'rl-baselines-zoo'...
remote: Enumerating objects: 25, done.[K
remote: Counting objects:   4% (1/25)[Kremote: Counting objects:   8% (2/25)[Kremote: Counting objects:  12% (3/25)[Kremote: Counting objects:  16% (4/25)[Kremote: Counting objects:  20% (5/25)[Kremote: Counting objects:  24% (6/25)[Kremote: Counting objects:  28% (7/25)[Kremote: Counting objects:  32% (8/25)[Kremote: Counting objects:  36% (9/25)[Kremote: Counting objects:  40% (10/25)[Kremote: Counting objects:  44% (11/25)[Kremote: Counting objects:  48% (12/25)[Kremote: Counting objects:  52% (13/25)[Kremote: Counting objects:  56% (14/25)[Kremote: Counting objects:  60% (15/25)[Kremote: Counting objects:  64% (16/25)[Kremote: Counting objects:  68% (17/25)[Kremote: Counting objects:  72% (18/25)[Kremote: Counting objects:  76% (19/25)[Kremote: Counting objects:  80% (20/25)[Kremote: Counting objects:  84% (21/25)[Kremote: Counting objects:  88% (22/25)[Kremote: 

In [12]:
from stable_baselines.common.cmd_util import make_atari_env

from stable_baselines.common.vec_env import VecFrameStack, DummyVecEnv

# NOTE add more algorithms here
from stable_baselines import PPO2, ACER, ACKTR
ALGO_IMPL = {
    'PPO2': PPO2,
    'ACER': ACER,
    'ACKTR': ACKTR,
}

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



## Load the pre-trained agents

In [13]:
!ls rl-baselines-zoo/trained_agents/

a2c  acer  acktr  ddpg	dqn  her  ppo2	sac  td3  trpo


In [0]:
PATH_TO_AGENTS = os.path.join("rl-baselines-zoo", "trained_agents")

In [15]:
# check the available pre-trained models
algorithms = ["PPO2"]
for algo in algorithms:
    algo_path = os.path.join(PATH_TO_AGENTS, algo.lower())
    print(algo_path + '/')
    for f in sorted(os.listdir(algo_path), key=lambda x: x[::-1]):
        # sort by the reverse filename, so env types get grouped together
        if f.endswith(".pkl"):
            print("|___", f)
            # try:
            #     model = ALGO_IMPL[algo].load(os.path.join(algo_path, f), verbose=0)
            #     print("     observation_space:", model.observation_space)
            #     print("     action_space:", model.action_space)
            # except:
            #     print("     ERROR: couldn't load model")

rl-baselines-zoo/trained_agents/ppo2/
|___ Pendulum-v0.pkl
|___ MountainCar-v0.pkl
|___ MountainCarContinuous-v0.pkl
|___ MinitaurBulletDuckEnv-v0.pkl
|___ Walker2DBulletEnv-v0.pkl
|___ HumanoidBulletEnv-v0.pkl
|___ HalfCheetahBulletEnv-v0.pkl
|___ InvertedDoublePendulumBulletEnv-v0.pkl
|___ InvertedPendulumSwingupBulletEnv-v0.pkl
|___ ReacherBulletEnv-v0.pkl
|___ HopperBulletEnv-v0.pkl
|___ MinitaurBulletEnv-v0.pkl
|___ AntBulletEnv-v0.pkl
|___ CartPole-v1.pkl
|___ Acrobot-v1.pkl
|___ BipedalWalkerHardcore-v2.pkl
|___ LunarLander-v2.pkl
|___ BipedalWalker-v2.pkl
|___ LunarLanderContinuous-v2.pkl
|___ PongNoFrameskip-v4.pkl
|___ MsPacmanNoFrameskip-v4.pkl
|___ EnduroNoFrameskip-v4.pkl
|___ BeamRiderNoFrameskip-v4.pkl
|___ SpaceInvadersNoFrameskip-v4.pkl
|___ QbertNoFrameskip-v4.pkl
|___ SeaquestNoFrameskip-v4.pkl
|___ BreakoutNoFrameskip-v4.pkl


In [16]:
for algo, env_id in GAMES:
    print(f"('{algo}', '{env_id}')")
    agent_path = os.path.join(PATH_TO_AGENTS, algo.lower(), env_id + '.pkl')
    model = ALGO_IMPL[algo].load(agent_path, verbose=0)
    print("observation_space:", model.observation_space)
    print("action_space:", model.action_space)
    print()

('PPO2', 'BreakoutNoFrameskip-v4')









Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where



observation_space: Box(84, 84, 4)
action_space: Discrete(4)

('PPO2', 'PongNoFrameskip-v4')
observation_space: Box(84, 84, 4)
action_space: Discrete(6)



## Generate trajectories

Note that we use `make_atari_env` + `VecFrameStack` for `NoFrameskip-v4` environments, so each frame is converted to grayscale and downscaled from 210x160 to 84x84. Therefore, the $observation$ shape is `(84, 84, 4)` (four stacked frames), and **not** `(210, 160, 3)`, nor `(84, 84, 1)`.

In [0]:
PRINT_EARLY_DONE = False
PRINT_ACTIONS_TAKEN = False

PRINT_EVERY_N_STEPS = N_OF_STEPS + 1
PRINT_EVERY_N_TRAJECTORIES = N_OF_TRAJECTORIES // 10

# uncomment below not to print
# PRINT_EVERY_N_STEPS = N_OF_STEPS + 1
# PRINT_EVERY_N_TRAJECTORIES = N_OF_TRAJECTORIES + 1

In [31]:
from tqdm import tqdm

time_start = time()
print("PRINT_EVERY_N_TRAJECTORIES:", PRINT_EVERY_N_TRAJECTORIES)
print("N_OF_TRAJECTORIES:", N_OF_TRAJECTORIES)
print("N_OF_STEPS:", N_OF_STEPS)
print("================")
for algo, env_id in GAMES:
    time_start_env = time()

    env = make_atari_env(env_id, num_env=1, seed=0)
    env = VecFrameStack(env, n_stack=4)
    agent_path = os.path.join(PATH_TO_AGENTS, algo.lower(), env_id + '.pkl')
    
    print(f"('{algo}', '{env_id}')")
    print(f"Getting pre-trained agent from: '{agent_path}'\n")
    
    model = ALGO_IMPL[algo].load(agent_path, env)
    
    for trajectory in tqdm(range(N_OF_TRAJECTORIES), position=0, leave=True):
        # store the "obs -> action" mapping
        observed_states, actions_taken = [], []

        obs = env.reset() # (84, 84, 4)
        for step in range(N_OF_STEPS):
            action = model.predict(obs)
            observed_states.append(obs)
            actions_taken.append(action)

            obs, reward, done, info = env.step(action)
            if done:
                obs = env.reset()
                if PRINT_EARLY_DONE:
                    print(f"done at step {step + 1} (reseting env)")
            
            if (step + 1) % PRINT_EVERY_N_STEPS == 0:
                print(f"step {step+1}")
        
        # NOTE action, reward and done are arrays since we're using a vectorized env
        observed_states = [obs[0] for obs in observed_states]
        actions_taken = [action[0][0] for action in actions_taken]
        
        np.savez_compressed(file=os.path.join(SAVE_DIR, f"{env_id}_{algo}_t{trajectory+1}_{N_OF_STEPS}s"), 
                            observations=observed_states, actions=actions_taken)
        
        if (trajectory + 1) % 10 == 0:
            print(f"Saved trajectory {trajectory+1} (of {N_OF_TRAJECTORIES})")

        if PRINT_ACTIONS_TAKEN and trajectory == N_OF_TRAJECTORIES - 1:
            print("\nActions taken:", ", ".join([ACTION_MEANING[action] for action in set(actions_taken)]))

    del observed_states
    del actions_taken
    env.close()
    print(f"Δt = {(time() - time_start_env):.2f}s")
    print("================")

print(f"Total Δt = {(time() - time_start):.2f}s")

PRINT_EVERY_N_TRAJECTORIES: 2
N_OF_TRAJECTORIES: 20
N_OF_STEPS: 1000
('PPO2', 'BreakoutNoFrameskip-v4')
Getting pre-trained agent from: 'rl-baselines-zoo/trained_agents/ppo2/BreakoutNoFrameskip-v4.pkl'



 25%|██▌       | 5/20 [00:22<01:06,  4.41s/it]

KeyboardInterrupt: ignored

In [0]:
trajectory_filenames = []
for r, ds, fs in os.walk(SAVE_DIR): # r=root, d=directories, f=files
    print(r + '/')
    for f in fs:
        print("|___", f)
        trajectory_filenames.append(f)

In [0]:
test_trajectory_filename = trajectory_filenames[0]
print(f"Loading from '{test_trajectory_filename}'\n")

test_trajectory_load = np.load(os.path.join(SAVE_DIR, test_trajectory_filename), 
                               allow_pickle=True)

print("observations shape:", test_trajectory_load['observations'].shape)
print("actions shape:", test_trajectory_load['actions'].shape)

In [0]:
# https://github.com/araffin/rl-baselines-zoo/blob/master/utils/record_video.py
# https://github.com/araffin/rl-baselines-zoo/blob/master/enjoy.py
# https://github.com/hill-a/stable-baselines#try-it-online-with-colab-notebooks-

## Old

In [0]:
# def save_as_image(observation, save_dir, img_name, prefix="img_", downscale=False):
#     # downscaling the image
#     if downscale:
#         im_array = cv2.resize(observation, INP_IMAGE_SHAPE) # TODO test tf.image.resize
#         im_array = np.array(im_array, dtype='float32')
#         im_array = (im_array/127.5) - 1
#         im = PIL.Image.fromarray(im_array, 'RGB')
#     else:
#         try:
#             im = PIL.Image.fromarray(observation, 'RGB')
#         except:
#             print(type(observation))
#     imname = "{}{}.png".format(prefix, img_name)
#     im.save(os.path.join(save_dir, imname))

In [0]:
# # you can change the default values here
# save_dir = SAVE_DIR
# num_images = IMAGES_TO_GENERATE

In [0]:
# os.makedirs(save_dir, exist_ok=True)

In [0]:
# envs = [gym.make(env_id) for env_id in ENV_IDS]

In [0]:
# for env_id, env in zip(ENV_IDS, envs):
#     print(env_id)
#     env_dir = os.path.join(save_dir, f"{env_id}_{IMAGES_TO_GENERATE}")
#     os.makedirs(env_dir, exist_ok=True)
    
#     env.reset()
#     i, current_env_images = 0, 0
    
#     actions_taken = []
#     while i < num_images:
#         # take a random action (sampled from the action space)
#         action = env.action_space.sample()
#         actions_taken.append(action)
#         assert 0 <= action < 18, f"action = {action}"
#         obs, _, done, _ = env.step(action)
#         if np.mean(obs) > 0.01:
#             save_as_image(obs, env_dir, str(i))
#             i += 1
#         else:
#             print("should I have been reached?")
#             continue
#         if done:
#             print(f"reseting {env_id} at i={i}")
#             env.reset()
    
#     actions_taken = np.asarray(actions_taken, dtype='int8')
#     print(actions_taken.shape, actions_taken.size, actions_taken.dtype)
#     np.save(os.path.join(save_dir, f"{env_id}_{IMAGES_TO_GENERATE}_actions"), actions_taken)

In [0]:
# IMG_SIZE = 160 # All images will be resized to 160x160

# def load_image(image_path):
#     image = tf.io.read_file(image_path)
#     image = tf.image.decode_png(image, channels=3)
#     image = tf.cast(image, tf.float32)
#     image = (image/127.5) - 1
#     image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
#     return image, image_path

# IMG_SHAPE = (IMG_SIZE, IMG_SIZE, 3)

# # Create the base model from the pre-trained model MobileNet V2
# base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
#                                                include_top=False,
#                                                weights='imagenet')

# s = time()
# # Get unique images
# encode_train = img_name_vector

# # Feel free to change batch_size according to your system configuration
# image_dataset = tf.data.Dataset.from_tensor_slices(encode_train)
# image_dataset = image_dataset.map(
#   load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE).batch(16)

# print((time()-s)/1000)

# for img, path in image_dataset:
#   batch_features = image_features_extract_model(img)
#   batch_features = tf.reshape(batch_features,
#                               (batch_features.shape[0], -1, batch_features.shape[3]))

#   for bf, p in zip(batch_features, path):
#     path_of_feature = p.numpy().decode("utf-8")
#     np.save(path_of_feature, bf.numpy())