# MAVA Quickstart Notebook
<img src="https://raw.githubusercontent.com/instadeepai/Mava/develop/docs/images/mava.png" />

### Guide to installing Mava, creating and training your first Multi-Agent System on Flatland. 

For more details about Mava and an overview of its design/features, please visit our [repo](https://github.com/instadeepai/Mava). 

<a href="https://colab.research.google.com/github/instadeepai/amld-africa-2021/blob/main/Part-I/mava_flatland_quickstart.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## 1. Installation

In [None]:
#@title Install Mava and Some Supported Environments (Run Cell)
%%capture
!pip install git+https://github.com/instadeepai/Mava#egg=id-mava[reverb,tf,launchpad,envs]

In [None]:
#@title Installs and Imports for Agent Visualization (Run Cell)
%%capture
! pip install git+https://github.com/instadeepai/Mava#egg=id-mava[record_episode]
! apt-get update -y &&  apt-get install -y xvfb &&  apt-get install -y python-opengl && apt-get install ffmpeg && apt-get install python-opengl -y && apt install xvfb -y && pip install pyvirtualdisplay 

import os
from IPython.display import HTML
from pyvirtualdisplay import Display

display = Display(visible=0, size=(1024, 768))
display.start()
os.environ["DISPLAY"] = ":" + str(display.display)

In [None]:
#@title Install Flatland (Run Cell)
%%capture
!pip install flatland-rl

## 2. Import Modules

In [None]:
#@title Imports Modules (Run Cell)
import functools
from datetime import datetime
from typing import Any, Dict, Mapping, Sequence, Union

import glob
import os 
import IPython

import launchpad as lp
import numpy as np
import sonnet as snt
import tensorflow as tf
from absl import app, flags
from acme import types
from mava.components.tf import networks
from acme.tf import utils as tf2_utils

from flatland.envs.observations import TreeObsForRailEnv
from flatland.envs.predictions import ShortestPathPredictorForRailEnv
from flatland.envs.rail_generators import sparse_rail_generator
from flatland.envs.schedule_generators import sparse_schedule_generator

from mava import specs as mava_specs
from mava.systems.tf import madqn
from mava.utils import lp_utils
from mava.utils.environments.flatland_utils import flatland_env_factory
from mava.wrappers import MonitorParallelEnvironmentLoop
from mava.components.tf import architectures
from mava.utils.loggers import logger_utils
from mava.components.tf.modules.exploration.exploration_scheduling import (
    LinearExplorationScheduler,
)

## 3. Launch a Multi-Agent Reinforcement Learning (MARL) `DQN` System

### Define Agent Networks
We will use the default agent networks for the `madqn` system, a simple feedforward neural network with one hidden layer with 128 hidden neurons.

In [None]:
network_factory = lp_utils.partial_kwargs(
    madqn.make_default_networks,
    policy_networks_layer_sizes=(128,)
)

### Select Environment
We will use Flatland with the following configs:
* 3 trains
* The width and height of the map are 25.
* Infrastructure generated with the `sparse rail generator`. The infrastructure consists of cities, each composed of maximum 3 rails, and they are interconnected with maximum 2 rails.
* Scenario generated with the `sparse schedule generator`. It produces tasks for the agents by selecting a starting city and a target city.
* Tree observation with a `max_depth` for exploring the branches.

<img src="https://i.imgur.com/sGBBhzJ.png">


In [None]:
# flatland environment config
rail_gen_cfg: Dict = {
    "max_num_cities": 3,
    "max_rails_between_cities": 2,
    "max_rails_in_city": 3,
    "grid_mode": True,
    "seed": 0,
}

flatland_env_config: Dict = {
    "number_of_agents": 3,
    "width": 25,
    "height": 25,
    "rail_generator": sparse_rail_generator(**rail_gen_cfg),
    "schedule_generator": sparse_schedule_generator(),
    "obs_builder_object": TreeObsForRailEnv(
        max_depth=2, predictor=ShortestPathPredictorForRailEnv()
    ),
}

environment_factory = functools.partial(
     flatland_env_factory, env_config=flatland_env_config, include_agent_info=False
)

### Specify Logging and Checkpointing 
We will log to Tensorboard so that we can monitor the training progress. 

In [None]:
# Directory to store checkpoints and log data. 
base_dir = "~/mava"

# File name 
mava_id = datetime.now().strftime("%Y-%m-%d_%H:%M:%S")

# Log every [log_every] seconds
log_every = 10
logger_factory = functools.partial(
    logger_utils.make_logger,
    directory=base_dir,
    to_terminal=True,
    to_tensorboard=True,
    time_stamp=mava_id,
    time_delta=log_every,
)

# Checkpointer appends "Checkpoints" to checkpoint_dir
checkpoint_dir = f"{base_dir}/{mava_id}"

### Creating the Multi-Agent DQN System.

In [None]:
system = madqn.MADQN(
    environment_factory=environment_factory,
    network_factory=network_factory,
    logger_factory=logger_factory,
    num_executors=2,
    exploration_scheduler_fn=LinearExplorationScheduler,
    epsilon_min=0.05,
    epsilon_decay=3e-4,
    max_replay_size=100_000,
    executor_variable_update_period = 100,
    # NOTE (Claude) Toggle this line on/off to control add/remove prioritized experience replay
    # importance_sampling_exponent=0.2,
    optimizer=snt.optimizers.Adam(learning_rate=1e-2),
    checkpoint=True,
    batch_size=256,
    # Record agents in environment. 
    eval_loop_fn=MonitorParallelEnvironmentLoop,
    eval_loop_fn_kwargs={"path": checkpoint_dir, "record_every": 10, "fps": 5},
).build()

### Launch the Distributed Program

In [None]:
# Ensure only trainer runs on gpu, while other processes run on cpu. 
local_resources = lp_utils.to_device(program_nodes=system.groups.keys(),nodes_on_gpu=["trainer"])

lp.launch(
    system,
    lp.LaunchType.LOCAL_MULTI_PROCESSING,
    terminal="output_to_files",
    local_resources=local_resources,
)

## 4. Logs and Outputs

### View outputs from the evaluator process.
*You might need to wait a few moments after launching the run.*
The `CUDA_ERROR_NO_DEVICE` error is expected since the GPU is only used by the trainer. 

In [None]:
!cat /tmp/launchpad_out/evaluator/0

### View Stored Data 
*You might need to wait a few moments after launching the run.*
You should see a directory for `tensorboard` logs and another directory for storing `recordings` of the agents.

In [None]:
! ls ~/mava/$mava_id

### Tensorboard
*You might need to wait a few moments after launching the run.*

In [None]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

To view training results, start tensorboard and filter for the `RawEpisodeReturn` tag.   

In [None]:
%tensorboard --logdir ~/mava/

### View Agent Recording
Once a good score is reached, you can view intelligent multi-agent behaviour by viewing the agent recordings.

#### First check if any agent recordings are available.
If no entries like `agents_11_eval_episode.html` appear below, then just wait a few more moments before trying again.

In [None]:
! ls ~/mava/$mava_id/recordings

#### Choose an agent recording to view. 
You can set the varaible `RECORDING_NAME` in the cell below to any of the file names that appeared in the list above to view a specific recording eg. `agents_11_eval_episode.html`. If an invalid file is given, then we will try to chose the latest recording for you.

In [None]:
RECORDING_NAME = "agents_11_eval_episode.html"

# Check if valid path
latest_file = glob.glob(f"/root/mava/{mava_id}/recordings/{RECORDING_NAME}")

# If we found a file.
if len(latest_file) != 0:
    latest_file = latest_file[0]
    print("Running user defined recording.")
else:
  # Try get list of all recordings.
  list_of_files = glob.glob(f"/root/mava/{mava_id}/recordings/*.html")

  if(len(list_of_files) == 0):
    print("No recordings are available yet. Please wait or run the 'Run Multi-Agent MADQN System.' cell if you haven't already done this.")
  else:
    # Chose the latest recording.
    latest_file = max(list_of_files, key=os.path.getctime)
    print("Running latest recording.")

# Display the recording.
IPython.display.HTML(filename=latest_file)

## 4. Kill the Launchpad Program
When you are done training your DQN system or would like to restart a run, you can run the cell below to kill the Launchpad process.

In [None]:
# Kill old runs. (Run Cell)
%%capture
!ps aux  |  grep -i launchpad  |  awk '{print $2}'  |  xargs sudo kill -9

## For more examples using different systems, environments and architectures, visit our [github page](https://github.com/instadeepai/Mava/tree/develop/examples).