Quick Start

Getting Started

Installation

This section contains installation methods

Requirements

OS requirements

Library was tested on the following OS versions:

Windows 11
Ubuntu 22.04
macOS 12 Monterey

Minimal working hardware parameters:

CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
8GB RAM

Python3.9 or higher is required

List of dependency versions can be found here: https://github.com/jbr-ai-labs/marlben/blob/main/requirements.txt

GPU usage

To use GPU accelerators during training, simply specify number of GPUs available in the config:

EnvConfig = get_config("Corridor")
EnvConfig.NUM_GPUS = 1
run_tune_experiment(EnvConfig(), 'Corridor', rllib_wrapper.PPOCustom)

Minimal working hardware parameters:

NVidia RTX 2060

Automatic installation via Pip

[Optional] If you using conda or another python environment manager, you may create a separate environment first.
Run pip install marlben in your Terminal. Wait for installation to complete.
[Optional] Install RLLib integration: pip install marlben[rllib]

Manual installation via Pip

[Optional] If you using conda or another python environment manager, you may create a separate environment first.
Clone the repository: git clone https://github.com/jbr-ai-labs/marlben
Move to the root folder of the package: cd marlben
Run pip install -r requirements.txt in your Terminal. Wait for installation to complete.
Run pip install . in your Terminal. Wait for installation to complete.

Available Integrations

RLLib

RLLib integration example can be seen in train.py

To run train.py, full installation is required:

pip install marlben[rllib]

Troubleshooting

In case of the following error:

undefined symbol: cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11

Run pip3 uninstall nvidia_cublas_cu11

PettingZoo

All environment classes inherit from the base Env class, which implements ParallelEnv API from PettingZoo:

WandB

To use wandb integration for logging agents performance, create wandb_api_key file in the project root folder with wandb token.

wandb token can be found here -- https://wandb.ai/authorize
- You will need to create wandb account to use it -- https://docs.wandb.ai/quickstart

Launching an environment

Common information

All environments are compatible with OpenAI Gym interface. Hence, you may create an environment instance using gym.make() method. This is a simple method to get started with MARLBEN, as it is not require you to specify any additional parameters for the environment.

A simple example:

import gymnasium as gym
import marlben

env = gym.make("MARLBEN-BossFight-v1")

Alternatively, you may create an environment instance by directly calling a constructor method of corresponding class. This way, you will be able to create an environment with a customized configuration. Use this method if you want to simplify a task or make it harder to solve.

A simple example:

from marlben.envs import BossFight, BossFightConfig

env = BossFight(BossFightConfig())

For the full list of available environments, their descriptions and additional configuration suggestions please, reffer to the list of environments page.

Environment API

A basic API of all environments implements an OpenAI Gym environment API:

env.reset() - resets an environment to the initial state. Returns a dictionary, which maps an agent ID to it's observation, а reward and a done flag for each agent.
env.step(action) - as an argument, takes a dictionary, which maps an agent ID to it's action.

Note, that MARLBEN environments do not support default env.render() method. If you want to visualize an environment, see the section "Rendering" below.

Observation description

An observation for a single agent is a dictionary, that contains information about entities ("Entity" key) and about visible part of the map ("Tile" key).

For entities, there are a number of available continuous ("Continuous" key) and discrete ("Discrete" key) features. By default, there may be represented up to 100 agents with the current agent always going first.

For tiles, there are also a number of available continuous ("Continuous" key) and discrete ("Discrete" key) features. For simplicity, a visible part of the map is flattened to a vector and have a dimensionality of n_tiles X n_features. Feel free to reshape it back if you need to.

The most simple method to get specific information from an observation is to use marlben.scripting.Observation class, which allows you to get a required information from observation dictionary using Observation.attribute method and fields of marlben.io.stimulus.Serialized.Entity and marlben.io.stimulus.Serialized.Tile classes.

Action description

Depending on the environment type, there may be a different actions subsets available. Please, refer to the list of environments for more information.

In MARLBEN, each agent's action is represented as a dictionary. In this dictionary, agent may declare multiple types of actions:

marlben.io.action.Move - action that allows an agent to move within the map. Enabled for all environments by default.
marlben.io.action.Attack - allows to attack another entity with specified attack type. Requires Combat system to be enabled.
marlben.io.action.Build - allows to build an unpassable rock at the previous position. Requires Building system to be enabled.
marlben.io.action.Plant - allows to plant a Food resource by spending Water resource. Requires Planting system to be enabled.
marlben.io.action.Share - allows to share a given amount of specified resource with another entity. Requires Sharing system to be enabled.

Note, that agent not necessary need to perform all types of available actions at each turn, so some of this action may epsent in the action dictionary. For each of desired action types, agent must specify a dictionary of this action type parameters. For example, Move action requires a Direction to be specified.

An example action dictionary for a single agent:

from marlben.io import action

env.step(
    {
        <agent_id>:
            {
                action.Move: {action.Direction: action.North.index},
                action.Attack: {action.Style: action.Range.index, action.Target: 1}
            }
    }
)

Comparing to heuristic baselines

For most of the provided environment we implemented heuristic baselines. Such heuristics baselines may be used to measure an efficiency of your own agents.

Scrtipted agents

For the Boss Fight and Raid environments it's highly recommended to use BossFightTankAgent, BossRaidFighterAgent and BossRaidHealerAgent as a scripted baselines. Definition of this scripted agents can be found here, the example of usage can be found here.

For the Gathering, Exploring, Spying and Colors environments it's highly recommended to use ObscuredAndExclusiveGatheringAgent, which is defined here. Example of it's usage can be found here.

You also can use basic NMMO scripted agents. However, this agents may provide you with less solid baseline solutions for the environments listed above because of theirs general purpose policies.