# Introduction to Flatland Rail for MultiAgent Reinforcement Learning  

In this notebook, we will learn how to:  
   * Create a Flatland railway environment.  
   * Build a random agent to take actions in the environment.
   * Visualize a rollout of actions.

In [None]:
import numpy as np
from flatland.envs.observations import GlobalObsForRailEnv
from flatland.envs.rail_env import RailEnv
from environments.custom_rail_generator import simple_rail_generator
from environments.custom_schedule_generator import sparse_schedule_generator
from environments.observations import TreeObsForRailEnv
from environments.visualization_utils import animate_env, get_patch, render_env

# Start virtual display before importing RenderTool
from pyvirtualdisplay import Display
display = Display(visible=0, size=(1024, 768))
display.start()
    
from flatland.utils.rendertools import RenderTool, AgentRenderVariant

# Observations

First, we will set up the environment and discuss the observations available.

In [None]:
# 1. Let's see how the environment looks with 3 trains
n_trains = 3

# 2. Set all trains to have the same constant speed
speed_ration_map = {1.: 1.}

# 3. Rail generator creates railway track in environment
rail_generator = simple_rail_generator(n_trains=n_trains, seed=42)

# 4. Schedule generator assigns starting positions and targets to trains
schedule_generator = sparse_schedule_generator(speed_ration_map)

# 5. Build the observation vectors for agents in the RailEnv environment - more on this later
obs_builder_object = TreeObsForRailEnv(max_depth=2, predictor=None)

env = RailEnv(
            width=20,
            height=8,
            number_of_agents=n_trains,
            rail_generator=rail_generator,
            schedule_generator=schedule_generator,
            obs_builder_object=obs_builder_object,
            remove_agents_at_target=True,  # Removes agents at the end of their journey to make space for others
        )

In [None]:
# Instantiate Renderer
env_renderer = RenderTool(env, gl="PILSVG",
                          agent_render_variant=AgentRenderVariant.AGENT_SHOWS_OPTIONS_AND_BOX,
                          show_debug=False,
                          screen_height=726,
                          screen_width=1240)
env.reset()
env_renderer.reset()

Let's see how one snapshot of the environment looks

In [None]:
render_env(env_renderer, show_observations=False)

Observations in Flatland form a tree structure.  



<img src="https://storage.cloud.google.com/gtc-2020/images/flatland_obs.png" align="left">

[Image from Flatland-rl-docs]

# Features

Each node is filled with information gathered along the path to the node:

1: If own target lies on the explored branch the current distance from the agent in number of cells is stored

2: If another agent’s target is detected, the distance in number of cells from the current agent position is stored

3: If another agent is detected, the distance in number of cells from the current agent position is stored

4: Possible conflict detected (using a predictor)

5: Distance to an unusable switch (for this agent), if detected. An unusable switch is a switch where the agent does not have any choice of path, but other agents coming from different directions might

6: Distance (in number of cells) to the next node (e.g. switch or target or dead-end)

7: Minimum remaining travel distance from this node to the agent’s target given the direction of the agent if this path is chosen

8: Number of agents present in the same direction found on path to node

9: Number of agents in the opposite direction on path to node

Run the cell below to visualize the observations available to each train in our environment.

In [None]:
render_env(env_renderer, show_observations=True)

Just as in real life, each train is not able to see the entire rail network, or fleet of other trains.  
Here, each train can see a prescribed tree depth of observations ahead.

# Actions

  * We have introduced the observation space.
  * Now we will look at the action space.

In [None]:
print("Number of actions available: ", env.action_space[0])

As we can see, there are up to 5 actions available at each step.  
These are:  

    0 Do Nothing: 
        If the train is already moving, it continous moving.  
        If it is already stopped, it remains stopped.  
        
    1 Deviate Left: 
        If the train is at a switch with a transition to its left, the train will chose the left path.  
        Otherwise this action has no effect.  
        If the train is stopped, this action will start train movement again if allowed by the transitions.  
        
    2 Go Forward:
        This action will start the train when stopped.  
        This will move the agent forward and chose the go straight direction at switches.
        
    3 Deviate Right: 
        Same as deviate left but for right turns.
        
    4 Stop: 
        Causes the train to stop moving.
        
Run the cell below to see how the actions available lead to a decision tree when considering the next state into which the train can transition.

<img src="https://storage.cloud.google.com/gtc-2020/images/flatland_tree.png" align="left">

[Image from Flatland-rl-docs]

# Rewards

The following reward function is used in Flatland to give feedback to our agents:

   * 'step_penalty' = -1: for every time-step taken in the environment, regardless of the action taken by the agent. Intuitively, this encourages the agent to finish as quickly as possible by taking the shortest path to its target.
   * 'global_reward' = +1: every time an agent reaches its target destination

In the rest of this notebook, we are going to create a simple agent that chooses a random action for each of the trains in the environment.

In [None]:
def random_agent(n_trains, n_actions):
    """Generates actions from a random policy."""
    action_dict = {}
    for idx in range(n_trains):
        action_dict[idx] = np.random.randint(0, n_actions)
    return action_dict

In [None]:
## Visualize random rollout steps in environment
env.reset()
env_renderer.reset()

frames = []
n_actions = env.action_space[0]
for step in range(10):
    # 1. Sample actions from agent
    action_dict = random_agent(n_trains, n_actions)
    
    # 2. Each train takes a step in the environment at the same time
    obs, rewards, done, info = env.step(action_dict)
    
    # 3. Render results. Change show_observations=True to see observations during the rollout
    env_renderer.render_env(show=False, frames=False, show_observations=False)
    frames.append(env_renderer.gl.get_image())
    
animate_env(frames)

### Up Next: TensorFlow Models...

In this section, we built a Flatland rail environment, investigated the observation and action spaces, and ran a random agent.

In the next section, we will discuss how to use TensorFlow to build a neural network as a function approximator for mapping states to actions.