# Transformer model + RPG testing pipeline

This document lays out the procedure for:
- Training the forward model to learn the RPG environment and saving it to disk
- Using the trained forward model to generate expert memories and saving them
- Loading the expert memories into the transformer model
- Training the transformer model

## Table of Contents
1. [Train the forward model](#Train-the-forward-model)
2. [Locate the saved model](#Locate-the-saved-model)
3. [Generate memories and save to disk](#Generate-memories-and-save-to-disk)
4. [Evaluate the transformer model](#Evaluate-the-transformer-model)

In [None]:
# --------------- #
# region: Imports #
# --------------- #

# Import base packages
import os
import sys
import argparse

# region: Fix path nonsense
module_path = os.path.abspath("../../..")
if module_path not in sys.path:
    sys.path.insert(0, module_path)
# endregion

# --------------- #
# endregion       #
# --------------- #

## Train the forward model

To give the transformer model something to learn, we need to first train a model to solve the RPG task.

To edit details of the model training regime, use the configuration file (by default, stored in `../configs/config.yaml` relative to this Python notebook) to change them. Some details need to be changed together for the model to successfully run.

In addition to providing a simple console log summary of each epoch, more detailed data is stored in TensorBoard if `log` in the configuration file is set to `True`.

In [None]:
from datetime import datetime
from examples.RPG.main import run, load_config

# Specify configuration path
config_path = "../configs/config.yaml"
# Load configuration
cfg = load_config(argparse.Namespace(config=config_path))
# Specify the model path to save your model to (cfg.root specifies the root module directory in the configuration file)
model_path = f'{cfg.root}/examples/RPG/models/checkpoints/iRainbowModel_{datetime.now().strftime("%Y%m%d-%H%m%s")}.pkl'

# Run the model. The model has a **kwargs parameter that allows you to specify two model conditions:
#   - load_weights allows you to specify a file path start the model with pretrained model weights
#   - save weights allows you to specify a file path to save your model weights to.
#     NOTE: The model directory must already exist, or the save function will fail!

# Run the forward model with the specified configuration
run(cfg, save_weights=model_path)

## Locate the saved model

Once a trained model is saved to disk, using it again is just a matter of creating a new python object with the same configurations.

```py
model_path = '/path/to/model.pkl'

model = iRainbowModel(
    ...# pass in parameters
)

model.load(model_path) # Now the model is ready to use again.
```

## Generate memories and save to disk

Generating memories of game trajectories, replays of games, and model scores on games is achieved using the `eval_model()` function. This function plays a specified number of games (by default, 1) and returns a dictionary of output variables. The memories file is somewhat large (~ 1.5 GB for 1024 games), so keep that in mind.

Flags that can be used:
- `'memories'` returns a stored memory buffer of size (n_games x max_turns)
- `'frames'` returns a list of size (n_games x max_turns) with images of each turn.
- `'scores` returns a record of the model's reward on each turn.
- `'jupyter-mode'` should be added when you are using `'frames'` from a Python notebook rather than from the command line.

**NOTE**: As generating the frames takes a while, it's generally faster to generate animated replays and memories separately.

In [None]:
from examples.RPG.test import eval_model
from gem.utils import animate

results = eval_model(
    "memories", model_path=model_path, config_path=config_path, n_games=1024
)

# Save the stored memories
memories = results["memories"]
memories_path = "../data/memories.pkl"
memories.save(file_path=memories_path)

# # Replay one game
# results = eval_model(
#     'frames', 'jupyter-mode'
#     model_path=model_path,
#     config_path=config_path,
#     n_games=1
# )

# data_folder = '../data/'
# game = results['frames'][0]
# animate(
#     game,
#     filename='model_test',
#     folder=data_folder
# )

## Evaluate the transformer model

The transformer model uses a separate configuration file (by default, `../configs/transformer.yaml`). This file is shorter and handles only the details of the transformer model itself, since all of the details of the environment were already established when training the forward model. Make sure that the input parameters are compatible with those used by the forward model.

**NOTE**: Be aware that at this stage, some specific configurations are not compatible with the transformer model. For example, the transformer model requires the state space to be evenly divisible by a patch size. Since the agent vision results in an odd-numbered state H x W, the state size in the forward model must have a H x W of e.g., 9 x 9, 15 x 15, 21 x 21 in order to have a patch size of 3.

In [None]:
from examples.RPG.test import train_transformer_model

# Load configuration path
transformer_config_path = "../configs/transformer.yaml"
cfg = load_config(argparse.Namespace(config=transformer_config_path))

# Train the transformer model
train_transformer_model(cfg, memories_path=memories_path)