In [21]:
import PretrainedCNNFeatureExtractor
import TrainableCustomCNN
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack
from utils import hyperparam_search

## Setting up the Gymnasium environment

We'll try setting up a game of Breakout, vectorizing it with 4 games, and using a frame stacking wrapper to help introduce temporality.

In [22]:
environment = "BreakoutNoFrameskip-v4"
vec_env = make_atari_env(environment, n_envs=4)
fs_vec_env = VecFrameStack(vec_env, 4, channels_order='first')

## Training a CNN subclass of [BaseFeaturesExtractor](https://stable-baselines3.readthedocs.io/en/v0.11.1/guide/custom_policy.html)

The provided TrainableCustomCNN class allows us to train a CNN learning how to do feature extraction along with the fully connected network that determines actions/values. In the below example, we provide the `TrainableCustomCNN` class to the `policy_kwargs` dictionary, as well as `1024` as the number of output features we would like at the head of the CNN.

In [18]:
policy_kwargs = dict(features_extractor_class=TrainableCustomCNN.TrainableCustomCNN,
                    features_extractor_kwargs=dict(features_dim=1024))

The `hyperparams_search` function can help us train with different values of hyperparameters. In this case we'll just use one value each for the learning rate, the batch size, and the fully connected network architecture.

In [None]:
timesteps = 3_000_000
lr_values = [2.5e-4]
net_arch_values = [[128, 128]]
batch_size_values = [128]

hyperparam_search(fs_vec_env, lr_values, batch_size_values, net_arch_values, policy_kwargs, timesteps)

Training PPO_lr0.00025_netarch[128, 128]_batchsize128_timesteps3000000...
Using cuda device
Wrapping the env in a VecTransposeImage.
weights is None. Currently unsupported behavior.
setting model
Defaulting to basic trainable CNN.
obs space: (1, 336, 84)
model is set
Logging to runs/PPO_lr0.00025_netarch[128, 128]_batchsize128_timesteps3000000_3
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 530      |
|    ep_rew_mean     | 0        |
| time/              |          |
|    fps             | 840      |
|    iterations      | 1        |
|    time_elapsed    | 0        |
|    total_timesteps | 512      |
---------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 579          |
|    ep_rew_mean          | 0.5          |
| time/                   |              |
|    fps                  | 967          |
|    iterations           | 2            |
|    time_e

## Using a pre-trained CNN

We can also try using a pre-trained CNN as a feature extractor. PretrainedCNNFeatureExtractor.py contains some helper functions to get some PyTorch computer vision models and remove their classifier heads. Currently there is support for EfficientNet and Resnet50. In the below example we use the EfficientNet model. The weights are needed to run observations through the same preprocessing that was used to train the model, and the number of output features, which is needed for the policy network.

We add these as key-value pairs to the `policy_kwargs` variable. Note that there is an additional variable `preprocessing_function` that puts together a `Grayscale` layer with the pre-trained network's preprocessing. This is necessary here because the `make_atari_env` helper function from Stable Baselines provides observations in only 1 channel instead of the usual 3. Because the pre-trained networks expect 3 channel data, this preprocessing simply copies the single channel across three to keep things compatible.

In [24]:
efficientnet_model, efficientnet_weights, efficientnet_num_features = PretrainedCNNFeatureExtractor.efficientnet()
preprocessing_function = PretrainedCNNFeatureExtractor.create_grayscale_preprocessing(efficientnet_weights)

policy_kwargs = dict(features_extractor_class=PretrainedCNNFeatureExtractor.PretrainedCNNFeatureExtractor,
                     features_extractor_kwargs=dict(features_dim=efficientnet_num_features,
                                                    base_model=efficientnet_model,
                                                    weights=efficientnet_weights,
                                                    preprocessing_function=preprocessing_function))

efficient net num features: 1280


In [None]:
timesteps = 3_000_000
lr_values = [2.5e-4]
net_arch_values = [[128, 128]]
batch_size_values = [128]

hyperparam_search(fs_vec_env, lr_values, batch_size_values, net_arch_values, policy_kwargs, timesteps)

Training PPO_lr0.00025_netarch[128, 128]_batchsize128_timesteps3000000...
Using cuda device
Wrapping the env in a VecTransposeImage.
Logging to runs/PPO_lr0.00025_netarch[128, 128]_batchsize128_timesteps3000000_4




---------------------------------
| rollout/           |          |
|    ep_len_mean     | 522      |
|    ep_rew_mean     | 0        |
| time/              |          |
|    fps             | 476      |
|    iterations      | 1        |
|    time_elapsed    | 1        |
|    total_timesteps | 512      |
---------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 577          |
|    ep_rew_mean          | 0.333        |
| time/                   |              |
|    fps                  | 171          |
|    iterations           | 2            |
|    time_elapsed         | 5            |
|    total_timesteps      | 1024         |
| train/                  |              |
|    approx_kl            | 0.0003730537 |
|    clip_fraction        | 0            |
|    clip_range           | 0.1          |
|    entropy_loss         | -1.39        |
|    explained_variance   | -1.67e-06    |
|    learning_r