# Flatland Javascript Renderer

This is in active development and is intended to replace the tkinter / PIL renderer.
This notebook shows how the JS renderer can be started in a notebook using an "iframe".  It starts a Flask server in a background thread, listening on port 8080, localhost.

In [1]:
import time
import os
import numpy as np 
import threading
import webbrowser
from IPython.display import HTML

In Flatland you can use custom observation builders and predicitors<br>
Observation builders generate the observation needed by the controller<br>
Preditctors can be used to do short time prediction which can help in avoiding conflicts in the network

In [2]:
from flatland.envs.malfunction_generators import malfunction_from_params, MalfunctionParameters
from flatland.envs.observations import GlobalObsForRailEnv
# First of all we import the Flatland rail environment
from flatland.envs.rail_env import RailEnv
from flatland.envs.rail_env import RailEnvActions
from flatland.envs.rail_generators import sparse_rail_generator
from flatland.envs.schedule_generators import sparse_schedule_generator
# We also include a renderer because we want to visualize what is going on in the environment
from flatland.utils.rendertools import RenderTool, AgentRenderVariant
#from flatland.utils.flask_util import simple_flask_server

Flask static folder:  /home/jeremy/projects/aicrowd/rl-trains/flatland/notebooks/static


This is an introduction example for the Flatland 2.1.* version.<br>
Changes and highlights of this version include<br>
- Stochastic events (malfunctions)<br>
- Different travel speeds for differet agents<br>
- Levels are generated using a novel generator to reflect more realistic railway networks<br>
- Agents start outside of the environment and enter at their own time<br>
- Agents leave the environment after they have reached their goal<br>
Use the new sparse_rail_generator to generate feasible network configurations with corresponding tasks<br>
Training on simple small tasks is the best way to get familiar with the environment<br>
We start by importing the necessary rail and schedule generators<br>
The rail generator will generate the railway infrastructure<br>
The schedule generator will assign tasks to all the agent within the railway network

The railway infrastructure can be build using any of the provided generators in env/rail_generators.py<br>
Here we use the sparse_rail_generator with the following parameters

In [3]:
in_notebook = True
in_browser = True
in_native = False

In [4]:
nSize = 3

if nSize == 1: 
    width = 30  # With of map
    height = 30  # Height of map
    nr_trains = 5  # Number of trains that have an assigned task in the env
    cities_in_map = 2  # Number of cities where agents can start or end
elif nSize == 2:
    width = 8 * 7  # With of map
    height = 8 * 7  # Height of map
    nr_trains = 20  # Number of trains that have an assigned task in the env
    cities_in_map = 10  # Number of cities where agents can start or end
else:
    width = 16 * 7  # With of map
    height = 9 * 7  # Height of map
    nr_trains = 50  # Number of trains that have an assigned task in the env
    cities_in_map = 20  # Number of cities where agents can start or end
    
seed = 14  # Random seed
grid_distribution_of_cities = False  # Type of city distribution, if False cities are randomly placed
max_rails_between_cities = 2  # Max number of tracks allowed between cities. This is number of entry point to a city
max_rail_in_cities = 6  # Max number of parallel tracks within a city, representing a realistic trainstation

In [5]:
rail_generator = sparse_rail_generator(max_num_cities=cities_in_map,
                                       seed=seed,
                                       grid_mode=grid_distribution_of_cities,
                                       max_rails_between_cities=max_rails_between_cities,
                                       max_rails_in_city=max_rail_in_cities,
                                       )

The schedule generator can make very basic schedules with a start point, end point and a speed profile for each agent.<br>
The speed profiles can be adjusted directly as well as shown later on. We start by introducing a statistical<br>
distribution of speed profiles

Different agent types (trains) with different speeds.

In [6]:
speed_ration_map = {1.: 0.25,  # Fast passenger train
                    1. / 2.: 0.25,  # Fast freight train
                    1. / 3.: 0.25,  # Slow commuter train
                    1. / 4.: 0.25}  # Slow freight train

We can now initiate the schedule generator with the given speed profiles

In [7]:
schedule_generator = sparse_schedule_generator(speed_ration_map)

We can furthermore pass stochastic data to the RailEnv constructor which will allow for stochastic malfunctions<br>
during an episode.

In [8]:
stochastic_data = MalfunctionParameters(malfunction_rate=10000,  # Rate of malfunction occurence
                                        min_duration=15,  # Minimal duration of malfunction
                                        max_duration=50  # Max duration of malfunction
                                        )
# Custom observation builder without predictor
observation_builder = GlobalObsForRailEnv()

Custom observation builder with predictor, uncomment line below if you want to try this one<br>
observation_builder = TreeObsForRailEnv(max_depth=2, predictor=ShortestPathPredictorForRailEnv())

Construct the enviornment with the given observation, generataors, predictors, and stochastic data

In [9]:
env = RailEnv(width=width,
              height=height,
              rail_generator=rail_generator,
              schedule_generator=schedule_generator,
              number_of_agents=nr_trains,
              obs_builder_object=observation_builder,
              malfunction_generator_and_process_data=malfunction_from_params(stochastic_data),
              remove_agents_at_target=True)
env.reset()
pass

Initiate the renderer

In [10]:
env_renderer = RenderTool(env, gl="BROWSER", # gl="PILSVG",
                          agent_render_variant=AgentRenderVariant.ONE_STEP_BEHIND,
                          show_debug=False,
                          screen_height=600,  # Adjust these parameters to fit your resolution
                          screen_width=800)  # Adjust these parameters to fit your resolution

url = env_renderer.get_endpoint_URL()
HTML('<a href="{}">Right-click Open in New Window - {}</a>'.format(url, url))

WebSocket transport not available. Install eventlet or gevent and gevent-websocket for improved performance.


 * Serving Flask app "flatland.utils.flask_util" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: off


 * Running on http://127.0.0.1:8080/ (Press CTRL+C to quit)


The first thing we notice is that some agents don't have feasible paths to their target.<br>
We first look at the map we have created

nv_renderer.render_env(show=True)<br>
time.sleep(2)<br>
Import your own Agent or use RLlib to train agents on Flatland<br>
As an example we use a random agent instead

In [11]:
class RandomAgent:
    def __init__(self, state_size, action_size):
        self.state_size = state_size
        self.action_size = action_size
    def act(self, state):
        """
        :param state: input is the observation of the agent
        :return: returns an action
        """
        return np.random.choice([RailEnvActions.MOVE_FORWARD, RailEnvActions.MOVE_RIGHT, RailEnvActions.MOVE_LEFT,
                                 RailEnvActions.STOP_MOVING])
    def step(self, memories):
        """
        Step function to improve agent by adjusting policy given the observations
        :param memories: SARS Tuple to be
        :return:
        """
        return
    def save(self, filename):
        # Store the current policy
        return
    def load(self, filename):
        # Load a policy
        return

Initialize the agent with the parameters corresponding to the environment and observation_builder

In [12]:
controller = RandomAgent(218, env.action_space[0])

We start by looking at the information of each agent<br>
We can see the task assigned to the agent by looking at

In [13]:
print("\n Agents in the environment have to solve the following tasks: \n")
for agent_idx, agent in enumerate(env.agents):
    print(
        ("The agent with index {} has the task to go from its initial position {}," +
        "facing in the direction {} to its target at {}.").format(
            agent_idx, agent.initial_position, agent.direction, agent.target))


 Agents in the environment have to solve the following tasks: 

The agent with index 0 has the task to go from its initial position (22, 74),facing in the direction 2 to its target at (19, 93).
The agent with index 1 has the task to go from its initial position (7, 12),facing in the direction 0 to its target at (44, 16).
The agent with index 2 has the task to go from its initial position (19, 92),facing in the direction 2 to its target at (21, 9).
The agent with index 3 has the task to go from its initial position (42, 71),facing in the direction 3 to its target at (18, 52).
The agent with index 4 has the task to go from its initial position (39, 16),facing in the direction 3 to its target at (19, 95).
The agent with index 5 has the task to go from its initial position (32, 90),facing in the direction 2 to its target at (43, 102).
The agent with index 6 has the task to go from its initial position (6, 93),facing in the direction 0 to its target at (53, 64).
The agent with index 7 has 

The agent will always have a status indicating if it is currently present in the environment or done or active<br>
For example we see that agent with index 0 is currently not active

In [14]:
print("\n Their current statuses are:")
print("============================")


 Their current statuses are:


In [15]:
for agent_idx, agent in enumerate(env.agents):
    print("Agent {} status is: {} with its current position being {}".format(agent_idx, str(agent.status),
                                                                             str(agent.position)))

Agent 0 status is: RailAgentStatus.READY_TO_DEPART with its current position being None
Agent 1 status is: RailAgentStatus.READY_TO_DEPART with its current position being None
Agent 2 status is: RailAgentStatus.READY_TO_DEPART with its current position being None
Agent 3 status is: RailAgentStatus.READY_TO_DEPART with its current position being None
Agent 4 status is: RailAgentStatus.READY_TO_DEPART with its current position being None
Agent 5 status is: RailAgentStatus.READY_TO_DEPART with its current position being None
Agent 6 status is: RailAgentStatus.READY_TO_DEPART with its current position being None
Agent 7 status is: RailAgentStatus.READY_TO_DEPART with its current position being None
Agent 8 status is: RailAgentStatus.READY_TO_DEPART with its current position being None
Agent 9 status is: RailAgentStatus.READY_TO_DEPART with its current position being None
Agent 10 status is: RailAgentStatus.READY_TO_DEPART with its current position being None
Agent 11 status is: RailAgentSt

The agent needs to take any action [1,2,3] except do_nothing or stop to enter the level<br>
If the starting cell is free they will enter the level<br>
If multiple agents want to enter the same cell at the same time the lower index agent will enter first.

Let's check if there are any agents with the same start location

In [16]:
agents_with_same_start = set()
print("\n The following agents have the same initial position:")
print("=====================================================")
for agent_idx, agent in enumerate(env.agents):
    for agent_2_idx, agent2 in enumerate(env.agents):
        if agent_idx != agent_2_idx and agent.initial_position == agent2.initial_position:
            print("Agent {} as the same initial position as agent {}".format(agent_idx, agent_2_idx))
            agents_with_same_start.add(agent_idx)


 The following agents have the same initial position:
Agent 0 as the same initial position as agent 34
Agent 2 as the same initial position as agent 24
Agent 4 as the same initial position as agent 14
Agent 4 as the same initial position as agent 37
Agent 6 as the same initial position as agent 41
Agent 7 as the same initial position as agent 40
Agent 7 as the same initial position as agent 49
Agent 8 as the same initial position as agent 25
Agent 8 as the same initial position as agent 44
Agent 10 as the same initial position as agent 28
Agent 10 as the same initial position as agent 38
Agent 10 as the same initial position as agent 48
Agent 11 as the same initial position as agent 18
Agent 12 as the same initial position as agent 17
Agent 12 as the same initial position as agent 27
Agent 13 as the same initial position as agent 29
Agent 14 as the same initial position as agent 4
Agent 14 as the same initial position as agent 37
Agent 15 as the same initial position as agent 42
Agent

Lets try to enter with all of these agents at the same time

In [17]:
action_dict = dict()

In [18]:
for agent_id in agents_with_same_start:
    action_dict[agent_id] = 1  # Try to move with the agents

Do a step in the environment to see what agents entered:

In [19]:
env.step(action_dict)
pass

Current state and position of the agents after all agents with same start position tried to move

In [20]:
print("\n This happened when all tried to enter at the same time:")
print("========================================================")
for agent_id in agents_with_same_start:
    print(
        "Agent {} status is: {} with the current position being {}.".format(
            agent_id, str(env.agents[agent_id].status),
            str(env.agents[agent_id].position)))


 This happened when all tried to enter at the same time:
Agent 0 status is: RailAgentStatus.ACTIVE with the current position being (22, 74).
Agent 2 status is: RailAgentStatus.ACTIVE with the current position being (19, 92).
Agent 4 status is: RailAgentStatus.ACTIVE with the current position being (39, 16).
Agent 6 status is: RailAgentStatus.ACTIVE with the current position being (6, 93).
Agent 7 status is: RailAgentStatus.ACTIVE with the current position being (41, 29).
Agent 8 status is: RailAgentStatus.ACTIVE with the current position being (50, 44).
Agent 10 status is: RailAgentStatus.ACTIVE with the current position being (41, 16).
Agent 11 status is: RailAgentStatus.ACTIVE with the current position being (19, 96).
Agent 12 status is: RailAgentStatus.ACTIVE with the current position being (32, 86).
Agent 13 status is: RailAgentStatus.ACTIVE with the current position being (53, 63).
Agent 14 status is: RailAgentStatus.READY_TO_DEPART with the current position being None.
Agent 15 

As you see only the agents with lower indexes moved. As soon as the cell is free again the agents can attempt<br>
to start again.

You will also notice, that the agents move at different speeds once they are on the rail.<br>
The agents will always move at full speed when moving, never a speed inbetween.<br>
The fastest an agent can go is 1, meaning that it moves to the next cell at every time step<br>
All slower speeds indicate the fraction of a cell that is moved at each time step<br>
Lets look at the current speed data of the agents:

In [21]:
print("\n The speed information of the agents are:")
print("=========================================")


 The speed information of the agents are:


In [22]:
for agent_idx, agent in enumerate(env.agents):
    print(
        "Agent {} speed is: {:.2f} with the current fractional position being {}".format(
            agent_idx, agent.speed_data['speed'], agent.speed_data['position_fraction']))

Agent 0 speed is: 0.25 with the current fractional position being 0.0
Agent 1 speed is: 0.33 with the current fractional position being 0.0
Agent 2 speed is: 0.25 with the current fractional position being 0.0
Agent 3 speed is: 0.25 with the current fractional position being 0.0
Agent 4 speed is: 1.00 with the current fractional position being 0.0
Agent 5 speed is: 0.50 with the current fractional position being 0.0
Agent 6 speed is: 0.25 with the current fractional position being 0.0
Agent 7 speed is: 0.25 with the current fractional position being 0.0
Agent 8 speed is: 0.25 with the current fractional position being 0.0
Agent 9 speed is: 0.50 with the current fractional position being 0.0
Agent 10 speed is: 0.33 with the current fractional position being 0.0
Agent 11 speed is: 0.33 with the current fractional position being 0.0
Agent 12 speed is: 0.25 with the current fractional position being 0.0
Agent 13 speed is: 0.50 with the current fractional position being 0.0
Agent 14 speed i

New the agents can also have stochastic malfunctions happening which will lead to them being unable to move<br>
for a certain amount of time steps. The malfunction data of the agents can easily be accessed as follows

In [23]:
print("\n The malfunction data of the agents are:")
print("========================================")


 The malfunction data of the agents are:


In [24]:
for agent_idx, agent in enumerate(env.agents):
    print(
        "Agent {} is OK = {}".format(
            agent_idx, agent.malfunction_data['malfunction'] < 1))

Agent 0 is OK = True
Agent 1 is OK = True
Agent 2 is OK = True
Agent 3 is OK = True
Agent 4 is OK = True
Agent 5 is OK = True
Agent 6 is OK = True
Agent 7 is OK = True
Agent 8 is OK = True
Agent 9 is OK = True
Agent 10 is OK = True
Agent 11 is OK = True
Agent 12 is OK = True
Agent 13 is OK = True
Agent 14 is OK = True
Agent 15 is OK = True
Agent 16 is OK = True
Agent 17 is OK = True
Agent 18 is OK = True
Agent 19 is OK = True
Agent 20 is OK = True
Agent 21 is OK = True
Agent 22 is OK = True
Agent 23 is OK = True
Agent 24 is OK = True
Agent 25 is OK = True
Agent 26 is OK = True
Agent 27 is OK = True
Agent 28 is OK = True
Agent 29 is OK = True
Agent 30 is OK = True
Agent 31 is OK = True
Agent 32 is OK = True
Agent 33 is OK = True
Agent 34 is OK = True
Agent 35 is OK = True
Agent 36 is OK = True
Agent 37 is OK = True
Agent 38 is OK = True
Agent 39 is OK = True
Agent 40 is OK = True
Agent 41 is OK = True
Agent 42 is OK = True
Agent 43 is OK = True
Agent 44 is OK = True
Agent 45 is OK = Tru

Now that you have seen these novel concepts that were introduced you will realize that agents don't need to take<br>
an action at every time step as it will only change the outcome when actions are chosen at cell entry.<br>
Therefore the environment provides information about what agents need to provide an action in the next step.<br>
You can access this in the following way.

Chose an action for each agent

In [25]:
for a in range(env.get_num_agents()):
    action = controller.act(0)
    action_dict.update({a: action})
# Do the environment step
observations, rewards, dones, information = env.step(action_dict)
print("\n The following agents can register an action:")
print("========================================")
for info in information['action_required']:
    print("Agent {} needs to submit an action.".format(info))


 The following agents can register an action:
Agent 0 needs to submit an action.
Agent 1 needs to submit an action.
Agent 2 needs to submit an action.
Agent 3 needs to submit an action.
Agent 4 needs to submit an action.
Agent 5 needs to submit an action.
Agent 6 needs to submit an action.
Agent 7 needs to submit an action.
Agent 8 needs to submit an action.
Agent 9 needs to submit an action.
Agent 10 needs to submit an action.
Agent 11 needs to submit an action.
Agent 12 needs to submit an action.
Agent 13 needs to submit an action.
Agent 14 needs to submit an action.
Agent 15 needs to submit an action.
Agent 16 needs to submit an action.
Agent 17 needs to submit an action.
Agent 18 needs to submit an action.
Agent 19 needs to submit an action.
Agent 20 needs to submit an action.
Agent 21 needs to submit an action.
Agent 22 needs to submit an action.
Agent 23 needs to submit an action.
Agent 24 needs to submit an action.
Agent 25 needs to submit an action.
Agent 26 needs to submit an

We recommend that you monitor the malfunction data and the action required in order to optimize your training<br>
and controlling code.

Let us now look at an episode playing out with random actions performed

In [26]:
print("\nStart episode...")


Start episode...


Reset the rendering system

In [27]:
env_renderer.reset()


Here you can also further enhance the provided observation by means of normalization<br>
See training navigation example in the baseline repository

## Javascript Renderer
The cell below should create an HTML "iframe" which is like a new window within a window.
It loads from the Flask server created above, and which should now be running in the background, listening on localhost:8080.  The server should have been created with the Flatland RailEnv.  When the JS Renderer connects to the server, it should load the env data, ie the rails and agent positions.

Note that only one server can be running on the host / machine / VM because it is configured to listen on a fixed port (8080).  Only one JS renderer can connect to the server. 

If you start more than one, only one will work, and it may be glitchy.  However it can be convenient to start a new window pointing to the server on http://localhost:8080/index.html - you can then position the viewer elsewhere, and run steps from the cells below.

In [28]:
#%%html
#<iframe src="http://127.0.0.1:8080/index.html" height=500 width=600 />


In [29]:
score = 0
# Run episode
frame_step = 0

In [30]:
nSteps = 1000

In [31]:
for step in range(3):
    # Chose an action for each agent in the environment
    for a in range(env.get_num_agents()):
        action = controller.act(observations[a])
        action_dict.update({a: action})

    # Environment step which returns the observations for all agents, their corresponding
    # reward and whether their are done
    next_obs, all_rewards, done, _ = env.step(action_dict)
    env_renderer.render_env(show=not in_notebook, show_observations=False, show_predictions=False)
    
    # env_renderer.gl.save_image('./misc/Fames2/flatland_frame_{:04d}.png'.format(step))
    frame_step += 1
    # Update replay buffer and train agent
    for a in range(env.get_num_agents()):
        controller.step((observations[a], action_dict[a], all_rewards[a], next_obs[a], done[a]))
        score += all_rewards[a]
    observations = next_obs.copy()
    if done['__all__']:
        print("All done!")
        break
    print('Episode: Steps {}\t Score = {}'.format(step, score))
    
    time.sleep(0.01)
    

Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signal that rendering complete
Waiting for browser to signa

In [32]:
env_renderer.get_endpoint_URL()

'http://localhost:8080'


<app-root></app-root>
<script>
try{
    window.nodeRequire = require;
    delete window.require;
    delete window.exports;
    delete window.module;
} catch (e) {}
</script>

<script src="static/runtime.js" defer></script>
<script src="static/polyfills-es5.js" nomodule defer></script>
<script src="static/polyfills.js" defer></script>
<script src="static/styles.js" defer></script>
<script src="static/scripts.js" defer></script>
<script src="static/vendor.js" defer></script>
<script src="static/main.js" defer></script>
</body>

In [33]:
if False:
    tuple_dummy = env.reset()
    server.send_env()