MERLIn

MERLIn short for modular extensible reinforcement learning interface, allows to easily define and run reinforcement learning experiments on top of PyTorch and Gym.

This project started as a homework assignment for a reinforcement learning module from my Master's studies. I made it public, hoping you find it useful or interesting.

Usage

0. Install

MERLIn uses poetry for dependency management. To install all dependencies, run:

poetry install

1. Configure experiments

Experiments can be defined as YAML files merged with the default configuration before being passed into the main training loop. Parameters are identical to the attributes of the Config class, and a table of all parameters is given further down.

Example:

experiments/experiment_one.yaml

---
max_episodes: 1000
agent_name: dueling_dqn
alpha: 0.05

This will train the agent dueling_dqn for 1000 episodes at a learning rate alpha of 0.5, while all other parameters will fall back to their default values as defined in the Config class.

Nested element definitions

Using the variants array, different flavors of the same base configuration can be defined as objects in that array. The deeper nested parameter will overwrite those higher up. Variants can be nested.

variants Example

---
max_episodes: 1000
variants:
  - {}
  - alpha: 0.01243
  - max_episodes: 333
    variants:
      - gamma: 0.5
        memory_size: 99000
      - batch_size: 64

The above configuration defines the following experiments:

max_episodes: 1000
max_episodes: 1000 and alpha: 0.01243
max_episodes: 333, gamma: 0.5 and memory_size: 99000
max_episodes: 333, and batch_size: 64

2. Start training

After defining at least one experiment as described in the previous section, start training by simply invoking the following command:

poetry run train

Training in the background

To start training in the background, to allow training to proceed beyond the shell session, run the following script:

./scripts/traing_bg.sh

The script will also watch the generated log statements to provide continuous console output.

3. Results

Console output

During training, the following outputs are continuously logged to the console:

episode index
epsilon
reward
train loss
episode steps
total episode time

Special events like model saving or video recording will also be logged if they occur.

File output

Each experiment will generate a subfolder in the results/ directory. Within that subfolder, the following files will be placed:

experiment.yaml: The exact parameters the experiment was run with
A log holding the training logs, as printed out to the console (see section before)
Model checkpoints.
Video files of selected episode runs.
Images of the preprocessed state (optional).

Statistical Analysis

MERLIn will automatically conduct some crude statistical analysis of the experimental results post-training. You can manually trigger the analysis by running: poetry run analyze <path/to/experiment/results>. Analysis results will be written to a subfolder of the results directory analysis/.

Summarization

As of v1.0.0, the last 2,000 episodes (as a hard-coded assumption of plateauing) are used to compare different algorithms. The statistical analysis will aggregate all runs of each variant and calculate the following:

mean reward
std reward
lower bound of the confidence interval for mean reward
mean steps
std steps

Plottings

Line plots of rewards over episodes and histograms showing the reward distribution of all variants are produced.

Training Parameters

Below is an overview of the parameters to configure experiments.

Parameter Name	Description	Optional	Default
experiment	Unique id of the experiment.	No
variant	Unique id of the variant of an experiment.	No
run	Unique id of the run of a variant.	Yes	0
run_count	The number of independent runs of an experiment.	Yes	3
env_name	The environment to be used.	Yes	'pong'
frame_skip	The number of frames to skip per action.	Yes	4
input_dim	The input dimension of the model.	Yes	64
num_stacked_frames	The number of frames to stack.	Yes	4
step_penalty	Penalty given to the agent per step.	Yes	0.0
agent_name	The agent to be used.	Yes	'double_dqn'
net_name	The neural network to be used.	Yes	'linear_deep_net'
target_net_update_interval	The number of steps after which the target network should be updated.	Yes	1024
episodes	The number of episodes to train for.	Yes	5000
alpha	The learning rate of the agent.	Yes	5e-6
epsilon_decay_start	The episode to start epsilon decay on.	Yes	1000
epsilon_step	The absolute value to decrease epsilon by per episode.	Yes	1e-3
epsilon_min	The minimum epsilon value for epsilon-greedy exploration.	Yes	0.1
gamma	The discount factor for future rewards.	Yes	0.99
memory_size	The size of the replay memory.	Yes	500,000
batch_size	The batch size for learning.	Yes	32
model_save_interval	The number of steps after which the model should be saved. If None, model will be saved at the end of epoch only.	Yes	None
video_record_interval	Steps between video recordings.	Yes	2500
save_state_img	Whether to take images during training.	Yes	False
use_amp	Whether to use automatic mixed precision.	Yes	True

Extending Agents, Environments, and Neural Networks

MERLIn boasts itself of being modular and extensible, meaning you can quickly implement new agents, environments, and neural networks. So that you know, all you need to extend said objects is to derive a new class from the respective abstract base class and register it at the regarding registry.

Example: Implementing a new Neural Network

Create a new Python module, app/nets/new_net.py, holding a new class deriving from BaseNet. You must provide a unique name via the name property.

from app.nets._base_net import BaseNet


class NewNet(BaseNet):
    @classmethod
    @property
    def name(cls) -> str:
        return "new_net"  # give it a unique name here

    def _define_net(
        self, state_shape: tuple[int, int, int], num_actions: int
    ) -> nn.Sequential:
      # your PyTorch network definition goes here

Add NewNet to the registry of neural networks in app/nets/__init__.py, to make it automatically available to the make_net factory function.

...

net_registry = [
    ...
    NewNet,  # register here
]

...

That's it. That simple. From now on, you can use the new network in your experiment definitions:

---
net_name: new_net

Scripts

The application comes with several bash scripts to help conduct certain functions.

`check_cuda.sh` & `watch_gpu`

Print out information regarding the system's current CUDA installation and GPU usage for sanity-checking and troubleshooting.

`install_atari.sh`

Installs the Atari ROMs used by Gym into the virtual environment.

Sync scripts

Typically, you want to offload the training workload to a cloud virtual machine. In In this regard, sync_up.sh will upload sources and experiments to that machine. Afterward, the training results can be downloaded to your local system using sync_down.sh.

A configuration-like connection data for both sync scripts is within the sync.cfg file.

Limitations

This project is now more of a didactic exercise rather than an attempt to topple established reinforcement learning frameworks such as RLlib.

As of v1.0.0 the most crucial limitations of MERLIn stand as:

Single environment implemented, namely Pong.
Single class of agents implemented, namely variations of DQN.
Statistical analysis is rudimentary and does not happen parallel to training.

Contributions welcome

If you like MERLIn and want to develop it further, feel free to fork and open any pull request. 🤓

Name		Name	Last commit message	Last commit date
Latest commit History 509 Commits
.github/workflows		.github/workflows
.vscode		.vscode
analysis		analysis
app		app
docs		docs
scripts		scripts
.gitignore		.gitignore
.prettierrc		.prettierrc
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
sync.cfg		sync.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MERLIn

Usage

0. Install

1. Configure experiments

Nested element definitions

variants Example

2. Start training

Training in the background

3. Results

Console output

File output

Statistical Analysis

Summarization

Plottings

Training Parameters

Extending Agents, Environments, and Neural Networks

Example: Implementing a new Neural Network

Scripts

`check_cuda.sh` & `watch_gpu`

`install_atari.sh`

Sync scripts

Limitations

Contributions welcome

About

Releases 3

Languages

License

pykong/MERLIn

Folders and files

Latest commit

History

Repository files navigation

MERLIn

Usage

0. Install

1. Configure experiments

Nested element definitions

variants Example

2. Start training

Training in the background

3. Results

Console output

File output

Statistical Analysis

Summarization

Plottings

Training Parameters

Extending Agents, Environments, and Neural Networks

Example: Implementing a new Neural Network

Scripts

check_cuda.sh & watch_gpu

install_atari.sh

Sync scripts

Limitations

Contributions welcome

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Languages

`check_cuda.sh` & `watch_gpu`

`install_atari.sh`