Google Colab Setup
---

Make sure to select GPU in Runtime > Change runtime type > Hardware accelerator

In [None]:
#@title << Setup Google Colab by running this cell {display-mode: "form"}
import sys
if 'google.colab' in sys.modules:
    # Clone GitHub repository
    !git clone https://github.com/pacm/rl-workshop.git
        
    # Copy files required to run the code
    !cp -r "rl-workshop/agents" "rl-workshop/env" "rl-workshop/helpers" "rl-workshop/videos" .
    
    # Install packages via pip
    !pip install -r "rl-workshop/colab-requirements.txt"
    
    # Restart Runtime
    import os
    os.kill(os.getpid(), 9)

File structure
---

Take a moment to look at the repo structure

```
├── Notebooks, Readme, packages ..
├── agents: RL agents implementation
│   ├── curiosity.py
│   ├── dqn.py
│   ├── qlearning.py
│   └── random.py
├── env: Workshop RL environment
│   ├── 16ShipCollection.png
│   ├── Inconsolata-Bold.ttf, ..
│   └── env.py
├── helpers: Helpers to train, test, inspect agents
│   └── rl_helpers.py
└── videos: Save videos of your best agents here!
    └── video.mp4
```

In [None]:
%run env/env.py
%run helpers/rl_helpers.py
%run agents/dqn.py
%run agents/qlearning.py
%run agents/random.py

In [None]:
# You might want to import other libraries
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from PIL import Image

Presentation of the RL environment
---

After creating the environment, you will need to call `reset()` to initalize it

In [None]:
# Create environment
env = DeliveryDrones()
env.env_params.update({
    'n_drones': 3, 'rgb_render_rescale': 1.0,
    'packets_factor': 3, 'dropzones_factor': 2, 'stations_factor': 2, 'skyscrapers_factor': 3
})
states = env.reset()

# Render in text
print(env.render(mode='ainsi'))

In [None]:
# Render as an RGB image
Image.fromarray(env.render(mode='rgb_array'))

Presentation of the observations spaces
---

By default, the environment returns `ground` and `air` grids as observations

In [None]:
# Observations are returned after env.reset() or env.step() calls
print(states)

In [None]:
# We can inspect what's on the ground
states['ground'].grid

Use **observation wrappers** to produce states that can be used with RL agents. See OpenAI Gym code [here](https://github.com/openai/gym/blob/c6a97e17ee392b5bbfd297fb3b49ab86b6d94836/gym/core.py#L252)

In [None]:
# Observation wrappers for Q-table RL methods:
# - CompassQTable, CompassChargeQTable, LidarCompassQTable, LidarCompassChargeQTable
env = LidarCompassChargeQTable(DeliveryDrones())
env.env_params.update({
    'n_drones': 3, 'rgb_render_rescale': 1.0,
    'packets_factor': 3, 'dropzones_factor': 2, 'stations_factor': 2, 'skyscrapers_factor': 3
})
states = env.reset()
print('states:', env.observation(states))
Image.fromarray(env.render(mode='rgb_array'))

In [None]:
{drone: env.format_state(state) for drone, state in states.items()}

Presentation of WindowedGridView
---

This is the "official" wrapper for the competition

```
Observation wrapper: (N, N, 6) numerical arrays with location of
(0) drones         marked with                   1 / 0 otherwise
(1) packets        marked with                   1 / 0 otherwise
(2) dropzones      marked with                   1 / 0 otherwise
(3) stations       marked with                   1 / 0 otherwise
(4) drones charge  marked with   charge level 0..1 / 0 otherwise
(5) obstacles      marked with                   1 / 0 otherwise
Where N is the size of the window, i the number of drones
```

In [None]:
env = WindowedGridView(DeliveryDrones(), radius=2)
states = env.reset()
Image.fromarray(env.render(mode='rgb_array'))

In [None]:
{drone: env.format_state(state) for drone, state in states.items()}

In [None]:
states[0][:, :, 5] # Obstacles from the perspective of drone 0

Create and run agents
---

After creating your agents, you can run them with the `test_agents()` method

In [None]:
# Create and setup the environment
env = CompassQTable(DeliveryDrones())
env.env_params.update({
    'n_drones': 3, 'rgb_render_rescale': 1.0,
    'pickup_reward': 0, 'delivery_reward': 1, 'crash_reward': -1, 'charge_reward': -0.1
})
states = env.reset()

# Create the agents
agents = {drone.index: RandomAgent(env) for drone in env.drones}
agents

In [None]:
# Run agents
rewards_log = test_agents(env, agents, n_steps=1000, seed=0)

# Print rewards
for drone_index, rewards in rewards_log.items():
    print('Drone {} rewards: {} ..'.format(drone_index, rewards[:10]))

And visualize the rewards with the helpers functions

In [None]:
plot_cumulative_rewards(
    rewards_log,
    events={'pickup': [1], 'crash': [-1]}, # Optional, default: pickup/crash ±1
    drones_labels={0: 'My drone'}, # Optional, default: drone index 
)

Train agents
---

To train your agents, you will need to use the `MultiAgentTrainer()`

In [None]:
# Create and setup the environment
env = CompassQTable(DeliveryDrones())
env.env_params.update({'n_drones': 3, 'skyscrapers_factor': 0, 'charge_reward': 0, 'discharge': 0})
states = env.reset()

# Create the agents
agents = {drone.index: RandomAgent(env) for drone in env.drones}
agents[0] = QLearningAgent(env, gamma=0.9, alpha=0.1, epsilon_start=1, epsilon_decay=0.99, epsilon_end=0.01)
agents

In [None]:
# Create trainer
trainer = MultiAgentTrainer(env, agents, reset_agents=True, seed=0)

# Train with different grids
trainer.train(5000)

# Print rewards
for drone_index, rewards in trainer.rewards_log.items():
    print('Drone {} rewards: {} ..'.format(drone_index, rewards[:10]))

And visualize training with helpers functions

In [None]:
plot_rolling_rewards(
    trainer.rewards_log,
    events={'pickup': [1], 'crash': [-1]}, # Optional, default: pickup/crash ±1
    drones_labels={0: 'My drone'}, # Optional, default: drone index 
)

In [None]:
plot_cumulative_rewards(trainer.rewards_log)

Test agents
---

In [None]:
rewards_log = test_agents(env, agents, n_steps=1000, seed=0)
plot_cumulative_rewards(rewards_log, drones_labels={0: 'My drone'})

Visualize a "run"
---

Share videos of your best agents! `#AMLD2020`

In [None]:
path = os.path.join('videos', 'intro-run.mp4')
render_video(env, agents, video_path=path, n_steps=120, fps=1, seed=None)

In [None]:
ColabVideo(path)