# MAVA Quickstart Notebook
<img src="https://raw.githubusercontent.com/instadeepai/Mava/develop/docs/images/mava.png" />

### Guide to installing Mava, creating and training your first Multi-Agent System. 

For more details about Mava and an overview of its design/features, please visit our [repo](https://github.com/instadeepai/Mava). 

<a href="https://colab.research.google.com/github/instadeepai/Mava/blob/develop/examples/quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## 1. Installation

In [None]:
#@title Install Mava and Some Supported Environments (Run Cell)
%%capture
!pip install git+https://github.com/instadeepai/Mava#egg=id-mava[reverb,tf,launchpad,envs]

In [None]:
#@title Installs and Imports for Agent Visualization (Run Cell)
%%capture
!pip install git+https://github.com/instadeepai/Mava#egg=id-mava[record_episode]
! apt-get update -y &&  apt-get install -y xvfb &&  apt-get install -y python-opengl && apt-get install ffmpeg && apt-get install python-opengl -y && apt install xvfb -y && pip install pyvirtualdisplay 

import os
from IPython.display import HTML
from pyvirtualdisplay import Display

display = Display(visible=0, size=(1024, 768))
display.start()
os.environ["DISPLAY"] = ":" + str(display.display)

## 2. Import Modules

In [None]:
#@title Imports Modules (Run Cell)
import functools
from datetime import datetime
from typing import Any, Dict, Mapping, Sequence, Union

import launchpad as lp
import numpy as np
import sonnet as snt
import tensorflow as tf
from absl import app, flags
from acme import types
from mava.components.tf import networks
from acme.tf import utils as tf2_utils


from mava import specs as mava_specs
from mava.systems.tf import maddpg
from mava.utils import lp_utils
from mava.utils.environments import debugging_utils
from mava.wrappers import MonitorParallelEnvironmentLoop
from mava.components.tf import architectures
from mava.utils.loggers import logger_utils

## 3. Train a Multi-Agent Reinforcement Learning (MARL) `DDPG` System

### Define Agent Networks
We will use the default agent networks for the `maddpg` system.

In [None]:
network_factory = lp_utils.partial_kwargs(maddpg.make_default_networks)

### Select Environment
We will use our [debug environment](https://github.com/instadeepai/Mava#debugging).

In [None]:
env_name = "simple_spread"
action_space = "continuous"

environment_factory = functools.partial(
    debugging_utils.make_environment,
    env_name=env_name,
    action_space=action_space,
)

### Create MARL System

#### Specify logging and checkpointing config. 

In [None]:
# Directory to store checkpoints and log data. 
base_dir = "~/mava"

# File name 
mava_id = datetime.now().strftime("%Y-%m-%d_%H:%M:%S")

# Log every [log_every] seconds
log_every = 15
logger_factory = functools.partial(
    logger_utils.make_logger,
    directory=base_dir,
    to_terminal=True,
    to_tensorboard=True,
    time_stamp=mava_id,
    time_delta=log_every,
)

# Checkpointer appends "Checkpoints" to checkpoint_dir
checkpoint_dir = f"{base_dir}/{mava_id}"

#### Create Multi-Agent DDPG System.

In [None]:
system = maddpg.MADDPG(
    environment_factory=environment_factory,
    network_factory=network_factory,
    logger_factory=logger_factory,
    num_executors=1,
    policy_optimizer=snt.optimizers.Adam(learning_rate=1e-4),
    critic_optimizer=snt.optimizers.Adam(learning_rate=1e-4),
    checkpoint_subpath=checkpoint_dir,
    max_gradient_norm=40.0,
    checkpoint=False,
    batch_size=1024,

    # Record agents in environment. 
    eval_loop_fn=MonitorParallelEnvironmentLoop,
    eval_loop_fn_kwargs={"path": checkpoint_dir, "record_every": 10, "fps": 5},
).build()

### Run Multi-Agent DDPG System.

In [None]:
# Ensure only trainer runs on gpu, while other processes run on cpu. 
local_resources = lp_utils.to_device(program_nodes=system.groups.keys(),nodes_on_gpu=["trainer"])

lp.launch(
    system,
    lp.LaunchType.LOCAL_MULTI_PROCESSING,
    terminal="output_to_files",
    local_resources=local_resources,
)

### Logs and Outputs

#### View outputs from the evaluator process.
*You might need to wait a few moments after launching the run.*
The `CUDA_ERROR_NO_DEVICE` error is expected since the GPU is only used by the trainer. 

In [None]:
!cat /tmp/launchpad_out/evaluator/0

#### View Stored Data 
*You might need to wait a few moments after launching the run.*

In [None]:
! ls ~/mava/$mava_id

### Tensorboard
*You might need to wait a few moments after launching the run.*

In [None]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

To view training results, start tensorboard and filter for the `evaluator/RawEpisodeReturn` tag.

A good score is a `RawEpisodeReturn` between 30-40. Although this system is stochastic, it should reach that score atleast by 100 evaluator episodes.    

In [None]:
%tensorboard --logdir ~/mava/$mava_id/tensorboard/evaluator

### View Agent Recording
Once a good score is reached, you can view intelligent multi-agent behaviour by viewing the agent recordings.

#### Check if any agent recordings are available. 

In [None]:
! ls ~/mava/$mava_id/recordings

#### View the latest agent recording. 

In [None]:
import glob
import os 
import IPython

# Recordings
list_of_files = glob.glob(f"/root/mava/{mava_id}/recordings/*.html")

if(list_of_files == 0):
  print("No recordings are available yet. Please wait or run the 'Run Multi-Agent DDPG System.' cell if you haven't already done this.")
else:
  latest_file = max(list_of_files, key=os.path.getctime)
  print("Run the next cell to visualize your agents!")

If the agents are trained (*usually around agents_200_eval...*), they should move to assigned landmarks.

<img src="https://raw.githubusercontent.com/instadeepai/Mava/develop/docs/images/simple_spread.png" width="250" height="250" />

In [None]:
# Latest file needs to point to the latest recording
IPython.display.HTML(filename=latest_file)

## 4. What's next?
- Run MARL System with custom agent networks.
- Try Different Architectures.
- Scaling. 

### Run MARL System with custom agent networks

#### Build your own custom networks

In [None]:
def make_custom_network(environment_spec, agent_net_keys):

  """Creates networks used by the agents."""
  specs = environment_spec.get_agent_specs()

  # Create agent_type specs
  specs = {agent_net_keys[key]: specs[key] for key in specs.keys()}

  observation_networks = {}
  policy_networks = {}
  critic_networks = {}

  for agent in specs.keys():
    
    agent_act_spec = specs[agent].actions

    # Get total number of action dimensions from action spec.
    num_dimensions = np.prod(agent_act_spec.shape, dtype=int)
    
    # Create policy network
    policy_network = snt.Sequential([
        snt.Linear(output_size=100),
        tf.nn.relu,
        snt.Linear(output_size=num_dimensions),
        tf.nn.relu,
        networks.TanhToSpec(agent_act_spec)
    ])

    # Create the critic network.
    critic_network = snt.Sequential([
         # The multiplexer concatenates the observations/actions.
        networks.CriticMultiplexer(),
        snt.Linear(output_size=256),
        tf.nn.relu,
        snt.Linear(output_size=256),
        tf.nn.relu,
        snt.Linear(1)
    ])

    # An optional network to process observations
    observation_network = tf2_utils.to_sonnet_module(tf.identity)

    observation_networks[agent] = observation_network
    policy_networks[agent] = policy_network
    critic_networks[agent] = critic_network

  return {
      "policies": policy_networks,
      "critics": critic_networks,
      "observations": observation_networks,
  }

network_factory = lp_utils.partial_kwargs(make_custom_network)


#### Run System with custom networks
Let build our own custom agent networks. 

##### Run System

In [None]:
%%capture
#@title Kill old runs. (Run Cell)
!ps aux  |  grep -i launchpad  |  awk '{print $2}'  |  xargs sudo kill -9

In [None]:
#@title Logging config. (Run Cell)
# Directory to store checkpoints and log data. 
base_dir = "~/mava/"

# File name 
mava_id = datetime.now().strftime("%Y-%m-%d_%H:%M:%S")

# Log every [log_every] seconds
log_every = 15
logger_factory = functools.partial(
    logger_utils.make_logger,
    directory=base_dir,
    to_terminal=True,
    to_tensorboard=True,
    time_stamp=mava_id,
    time_delta=log_every,
)

# Checkpointer appends "Checkpoints" to checkpoint_dir
checkpoint_dir = f"{base_dir}/{mava_id}"

In [None]:
#@title Run system with custom networks. (Run Cell)

# System
system = maddpg.MADDPG(
    environment_factory=environment_factory,
    network_factory=network_factory,
    logger_factory=logger_factory,
    num_executors=1,
    policy_optimizer=snt.optimizers.Adam(learning_rate=1e-4),
    critic_optimizer=snt.optimizers.Adam(learning_rate=1e-4),
    checkpoint_subpath=checkpoint_dir,
    max_gradient_norm=40.0,
    checkpoint=False,

    # Record agents in environment. 
    eval_loop_fn=MonitorParallelEnvironmentLoop,
    eval_loop_fn_kwargs={"path": checkpoint_dir, "record_every": 10, "fps": 5},
).build()

# Ensure only trainer runs on gpu, while other processes run on cpu. 
local_resources = lp_utils.to_device(program_nodes=system.groups.keys(),nodes_on_gpu=["trainer"])

lp.launch(
    system,
    lp.LaunchType.LOCAL_MULTI_PROCESSING,
    terminal="output_to_files",
    local_resources=local_resources,
)

##### View logs
*You might need to wait a few moments after launching the run.*

In [None]:
cat /tmp/launchpad_out/evaluator/0

#### Tensorboard
You might need to wait a few moments after launching the run.

In [None]:
%tensorboard --logdir ~/mava/$mava_id/tensorboard/evaluator 

### Try Different Architectures
Mava provides several components to support the design of MARL systems such as different system architectures and modules. For more information on different architectures, please have a look at our [components](https://github.com/instadeepai/Mava#components), visit [here](https://github.com/instadeepai/Mava/tree/develop/mava/components/tf/architectures) or view our [examples](https://github.com/instadeepai/Mava/tree/develop/examples).



In [None]:
%%capture
#@title Kill old runs. (Run Cell)
!ps aux  |  grep -i launchpad  |  awk '{print $2}'  |  xargs sudo kill -9

In [None]:
#@title Logging config. (Run Cell)
# Directory to store checkpoints and log data. 
base_dir = "~/mava/"

# File name 
mava_id = datetime.now().strftime("%Y-%m-%d_%H:%M:%S")

# Log every [log_every] seconds
log_every = 15
logger_factory = functools.partial(
    logger_utils.make_logger,
    directory=base_dir,
    to_terminal=True,
    to_tensorboard=True,
    time_stamp=mava_id,
    time_delta=log_every,
)

# Checkpointer appends "Checkpoints" to checkpoint_dir
checkpoint_dir = f"{base_dir}/{mava_id}"

Let try switch from **Decentralised** (default) to **Centralised** architecture. 

In [None]:
# networks
network_factory = lp_utils.partial_kwargs(maddpg.make_default_networks)

# distributed program
system = maddpg.MADDPG(
    environment_factory=environment_factory,
    network_factory=network_factory,
    logger_factory=logger_factory,
    num_executors=1,
    policy_optimizer=snt.optimizers.Adam(learning_rate=1e-4),
    critic_optimizer=snt.optimizers.Adam(learning_rate=1e-4),
    checkpoint_subpath=checkpoint_dir,
    max_gradient_norm=40.0,
    checkpoint=False,

    # Record agents in environment. 
    eval_loop_fn=MonitorParallelEnvironmentLoop,
    eval_loop_fn_kwargs={"path": checkpoint_dir, "record_every": 10, "fps": 5},

    # Centralised architecture and training. 
    architecture=architectures.CentralisedQValueCritic,
    trainer_fn=maddpg.MADDPGCentralisedTrainer,
).build()

# Ensure only trainer runs on gpu, while other processes run on cpu. 
local_resources = lp_utils.to_device(program_nodes=system.groups.keys(),nodes_on_gpu=["trainer"])

lp.launch(
    system,
    lp.LaunchType.LOCAL_MULTI_PROCESSING,
    terminal="output_to_files",
    local_resources=local_resources,
)

##### View logs
*You might need to wait a few moments after launching the run.*

In [None]:
cat /tmp/launchpad_out/evaluator/0

#### Tensorboard
You might need to wait a few moments after launching the run.

In [None]:
%tensorboard --logdir ~/mava/$mava_id/tensorboard/evaluator 

### Scaling
Mava allows for simple scaling of MARL systems. 



In [None]:
%%capture
#@title Kill old runs. (Run Cell)
!ps aux  |  grep -i launchpad  |  awk '{print $2}'  |  xargs sudo kill -9

In [None]:
#@title Logging config. (Run Cell)
# Directory to store checkpoints and log data. 
base_dir = "~/mava/"

# File name 
mava_id = datetime.now().strftime("%Y-%m-%d_%H:%M:%S")

# Log every [log_every] seconds
log_every = 15
logger_factory = functools.partial(
    logger_utils.make_logger,
    directory=base_dir,
    to_terminal=True,
    to_tensorboard=True,
    time_stamp=mava_id,
    time_delta=log_every,
)

# Checkpointer appends "Checkpoints" to checkpoint_dir
checkpoint_dir = f"{base_dir}/{mava_id}"

Simply increase the **num_executors**. 

In [None]:
# networks
network_factory = lp_utils.partial_kwargs(maddpg.make_default_networks)

# distributed program
system = maddpg.MADDPG(
    environment_factory=environment_factory,
    network_factory=network_factory,
    logger_factory=logger_factory,
    num_executors=4,
    policy_optimizer=snt.optimizers.Adam(learning_rate=1e-4),
    critic_optimizer=snt.optimizers.Adam(learning_rate=1e-4),
    checkpoint_subpath=checkpoint_dir,
    max_gradient_norm=40.0,
    checkpoint=False,

    # Record agents in environment. 
    eval_loop_fn=MonitorParallelEnvironmentLoop,
    eval_loop_fn_kwargs={"path": checkpoint_dir, "record_every": 10, "fps": 5},
).build()

# Ensure only trainer runs on gpu, while other processes run on cpu. 
local_resources = lp_utils.to_device(program_nodes=system.groups.keys(),nodes_on_gpu=["trainer"])

lp.launch(
    system,
    lp.LaunchType.LOCAL_MULTI_PROCESSING,
    terminal="output_to_files",
    local_resources=local_resources,
)

##### View logs
*You might need to wait a few moments after launching the run.*

In [None]:
cat /tmp/launchpad_out/evaluator/0

#### Tensorboard
You might need to wait a few moments after launching the run.

In [None]:
%tensorboard --logdir ~/mava/$mava_id/tensorboard/evaluator 

## For more examples using different systems, environments and architectures, visit our [github page](https://github.com/instadeepai/Mava/tree/develop/examples).