# Flow Hands-On Tutorial

The best way to get start with reinforcement learning... is to get started with reinforcement learning! In this tutorial, we'll use Flow, our tool for optimization of traffic networks by applying control, to train autonomous vehicles to stbilize a network.

## Question 0: Installing Flow

We begin this tutorial by installing Flow and all its necessary dependencies, namely sumo and some machine learning and python-based modules.

### a. Using Windows? (If not, continue to part b)

Not all the software packages we wish to install work natively on Windows. Instead, if you are using Windows 10, we recommend you install a Windows Linux Subsystem (WLS) onto your device. In order to do so:

- Go the Windows store and download “Ubuntu 18.04”
- Download the Xming X Server for Windows: https://sourceforge.net/projects/xming/
- Run the WLS from the start menu by typing “Ubuntu 18.04”
    - The first time you open an Ubuntu terminal, type: `echo “export DISPLAY=:0” >> ~/.bashrc && source ~/.bashrc`
    - In order for graphic user intergace to work properly, make sure to also run Xming whenever you open a new terminal

If you are using an earlier version of Windows, your only other option is to install a virtual machine (e.g. [VirtualBox](https://www.virtualbox.org/wiki/Downloads)), set up an [Ubuntu](https://www.ubuntu.com/download/desktop) virtual environment, and install everything you need onto it. If you are in this situation and need some help setting up a virtual environment, please talk to one of the assistants.

### b. Installed Anaconda/Miniconda? (If yes, continue to part c)

Conda environments are an ideal way to create, export, list, remove and update environments that have different versions of Python and/or packages installed in them. If you have not installed [Anaconda](https://www.anaconda.com/download/#linux) or [Miniconda](https://conda.io/miniconda.html) before, we highly recommend you do so now. Moreover, since Anaconda takes long to install, we recommend installing the latter. In order to do so, run the following commands from your terminal (**Note**: update the second command to include the correct URL for your distribution):

    cd ~
    wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
    bash miniconda.sh -b -p $HOME/miniconda
echo 'export PATH="$HOME/miniconda/bin:$PATH"' >> ~/.bashrc

### c. Installation instructions

You are now prepared to install Flow and its dependencies, e.g. sumo. In order to do so, follow the setup instructions located at: https://flow.readthedocs.io/en/latest/flow_setup.html. When installing sumo, follow the instruction located in [section e](https://flow.readthedocs.io/en/latest/flow_setup.html#e-easy-install-sumo-optional).

### d. Unable to install Flow?

If you unable to install Flow, or if the installation instruction is taking too long, we recommend using an EC2 instance provided by us (as you will in problem 2c). Please see that section to understand how to do so.

## Question 1: Simulating Traffic in a Ring Road

In this problem, we will simulate traffic instabilities on a ring road network. The formation of traffic instabilities (often referred to as traffic congestion, stop-and-go traffic, traffic waves, etc.) in ring roads is a widely studied problem, as it is analytically tractable and has been empirically shown to generate similar instabilities in field studies (see [this video](https://www.youtube.com/watch?v=7wm-pZp_mi0)) as those witnessed in real network settings such as highways (see [this video](https://www.youtube.com/watch?v=6ZC9h8jgSj4)). We will simulate the performance of vehicles in a ring road using the microscopic traffic simulator SUMO (see the figure below).

<img src="img/ring_road.png" width="400">

### a. Modeling microscopic car-following dynamics

We begin by implementing a car-following model in Flow that can recreate the types of traffic instabilities experienced in reality. Several car-following models exist to realistically depict the longitudinal (acceleration) behavior of vehicles in a network. One such model is the Intelligent Driver Model (IDM), in which the acceleration $a_{IDM}$ of a vehicle is defined as:

\begin{equation}
a_{IDM}(v, v_l, h) = a \bigg[ 1 - \bigg( \frac{v}{v_0} \bigg) ^\delta - \bigg( \frac{s^* (v, v_l)}{h} \bigg)^2 \bigg]
\end{equation}

where $v$ the vehicle's speed, $h$ is its bumper-to-bumper headway, $v_l$ is the speed of the vehicle ahead of it, and $s^*$ is the desired headway of the vehicle, denoted by:

\begin{equation}
s^*(v, v_l)  = s_0 + \max \bigg( 0, v T + \frac{v (v - v_l)}{2 \sqrt{ab}}  \bigg)
\end{equation}

where $s_0$, $v_0$, $T$, $\delta$, $a$, $b$ are given parameters that may be calibrated to model highway traffic.

In order to create an acceleration, or car-following, model in Flow, we will used the `BaseController` class. This class can be inherited and it's `get_accel` method can be modified to recreate the desirable acceleration at every given time step (see the cell below).

Using the `BaseController` class in Flow, design a controller called *IDM* that can recreate the behavior of this model in simulation. You can create this controller class by filling in the below script. Use the following values for each of the model parameters:

- $s_0$: 2 m
- $v_0$: 30 m/s
- $T$: 1 s
- $\delta$: 4
- $a$: 1 m/s$^2$
- $b$: 1.5 m/s$^2$

For more information of designing controllers in Flow, we recommend you review this [tutorial](https://github.com/berkeleyflow/flow/blob/master/tutorials/tutorial07_controllers.ipynb). **Note**: You are allowed to import any module you find valuable.

In [1]:
from flow.controllers import BaseController


class IDM(BaseController):

    def get_accel(self, env):
        # bumper-to-bumper headway
        h = env.vehicles.get_headway(self.veh_id)

        # speed of the current vehicle
        v = env.vehicles.get_speed(self.veh_id)

        # speed of the lead vehicle
        v_l = env.vehicles.get_speed(env.vehicles.get_leader(self.veh_id))

        ######################################
        ###### your implementation here ######

        s0 = 2
        v0 = 30
        T = 1
        delta = 4
        a = 1
        b = 1.5

        s_star = s0 + max(0, v*T + v*(v-v_l) / (2 * np.sqrt(a*b)))

        acceleration = a * (1 - (v/v0)**delta - (s_star/h)**2)

        ######################################
        ######################################

        # return the acceleration of the current vehicle
        return acceleration

### b. Executing the simulation

Next, we will run the simulation in SUMO using the *SumoExperiment* class in Flow. This class allows us to specify the type of scenario we would like to simulate as well as the longitudinal and lateral dynamics of vehicles in the simulation. Referring to the [tutorial in Flow on simulating traffic](https://github.com/berkeleyflow/flow/blob/master/tutorials/tutorial01_sumo.ipynb), fill in the below parameters in order to produce an experiment with a single lane ring road network of length 230 m with a total of 22 vehicles following the IDM model from part a), where the vehicles are initially perturbed from equal spacing by an additive random normal term with standard deviation 1.0 m.

In [2]:
# some objects we will use to define the parameters of the simulation
from flow.core.params import EnvParams, SumoParams, NetParams, InitialConfig
from flow.core.vehicles import Vehicles

# these is the scenario class for the ring road 
# (this does not need to be modified)
from flow.scenarios import LoopScenario

# this is the environment we will be using during the simulation 
# (it does not need to be modified)
from flow.envs import TestEnv

# the EnvParams object is left empty for the above environment 
# (this does not need to be modified)
env_params = EnvParams()

######################################################
############ modify everything below here ############
######################################################

# add 22 vehicles with the "IDM" acceleration controller from section a)
from flow.controllers import ContinuousRouter
vehicles = Vehicles()
vehicles.add(
    "human",
    acceleration_controller=(IDM, {}),
    routing_controller=(ContinuousRouter, {}),
    num_vehicles=22
)  ### modify this function call ###

# modify the NetParams object to support a ring road of length 230 m
net_params = NetParams(
        additional_params={
        'length': 230, 
        'lanes': 1, 
        'speed_limit': 30, 
        'resolution': 40
    }
)  ### modify this class instantiation ###

# start all vehicles with perturbation standard deviation of 1.0 m
initial_config = InitialConfig(
    spacing="uniform", 
    perturbation=1
)  ### modify this class instantiation ###

# run the simulation with a simulation step of 0.1s and activate the GUI for visualization purposes
sumo_params = SumoParams(
    sim_step=0.1, 
    render=True,
)  ### modify this class instantiation ###

Once the above parameters are ready, we can start the simulation using the code snippet below to see how well the network performs when the vehicles are initially perturbed. If your model and network are designed correctly, then after some time the vehicles should begin bunching together and accelerating quickly when they are at the front of the backwards propagating queue. This is known as a "stop-and-go wave".

In [3]:
from flow.core.experiment import SumoExperiment
import numpy as np

scenario = LoopScenario(name="ring_road",
                        vehicles=vehicles,
                        net_params=net_params,
                        initial_config=initial_config)

env = TestEnv(env_params, sumo_params, scenario)

exp = SumoExperiment(env, scenario)
info_dict = exp.run(1, 3000)

print("------------------")
print("Average speed in final time step: {} m/s".format(info_dict["velocities"][0][-1]))

Round 0, return: 0
Average, std return: 0.0, 0.0
Average, std speed: 2.8690173915964916, 0.0
Closing connection to TraCI and stopping simulation.
Note, this may print an error message when it closes.
------------------
Average speed in final time step: 2.3813660760450355 m/s


### c. Additional exercises (optional)

In this optional section, we provide you with additional exercises to further familiarize yourself with the workings of flow. While the components of the previous section closely followed the content available within the tutorial, these questions are meant to encourage you to further explore the various parametrizations of a scenario and/or environment.

To begin with, in the below cell, modify the relevant components of the cell at the start of section 1.b) to replace the ring road scenario with a figure eight with 14 vehicles (we will actually use this scenario in question 2).

In [4]:
from flow.scenarios import Figure8Scenario ### FILL IN ###

### specify the new vehicles component
vehicles_figureeight = Vehicles()
vehicles_figureeight.add(
    veh_id="human",
    acceleration_controller=(IDM, {}),
    routing_controller=(ContinuousRouter, {}),
    speed_mode="no_collide",
    num_vehicles=14
)

### specify the new net_params component
### Note: you should set no_internal_links to False to include intersections
net_params_figureeight = NetParams(
    no_internal_links=False,
    additional_params={
        'radius_ring': 30, 
        'lanes': 1, 
        'speed_limit': 30, 
        'resolution': 40
    }
)

### recreate the scenario ###
scenario = Figure8Scenario(
    name="figure8",
    vehicles=vehicles_figureeight,
    net_params=net_params_figureeight,
    initial_config=InitialConfig()   
)

# everything else is the same as before
env = TestEnv(env_params, sumo_params, scenario)
exp = SumoExperiment(env, scenario)
_ = exp.run(1, 3000)

Round 0, return: 0
Average, std return: 0.0, 0.0
Average, std speed: 4.033795536601443, 0.0
Closing connection to TraCI and stopping simulation.
Note, this may print an error message when it closes.


Next. modify the cell below to generate random initial positions for vehicles in the ring road.

In [5]:
initial_config_random = InitialConfig(spacing="random")  ### modify this line ###

# add the new parameter
scenario = LoopScenario(
    name="ring_road",
    vehicles=vehicles,
    net_params=net_params,
    initial_config=initial_config_random,  # use the new random initial_config
)

# everything else is the same as before
env = TestEnv(env_params, sumo_params, scenario)
exp = SumoExperiment(env, scenario)
_ = exp.run(1, 3000)

Round 0, return: 0
Average, std return: 0.0, 0.0
Average, std speed: 2.4571605975246738, 0.0
Closing connection to TraCI and stopping simulation.
Note, this may print an error message when it closes.


Finally, add a traffic light at the node of the ring road called "right" using the `add` method from the `TrafficLights` class.

In [6]:
from flow.core.traffic_lights import TrafficLights

# place a traffic light at the central node
traffic_lights = TrafficLights()
traffic_lights.add(node_id="right")  ### modify this line ###

# add the new parameter
scenario = LoopScenario(
    name="ring_road",
    vehicles=vehicles,
    net_params=net_params,
    initial_config=initial_config,
    traffic_lights=traffic_lights,
)

# everything else is the same as before
env = TestEnv(env_params, sumo_params, scenario)
exp = SumoExperiment(env, scenario)
_ = exp.run(1, 3000)

Round 0, return: 0
Average, std return: 0.0, 0.0
Average, std speed: 2.551052639338779, 0.0
Closing connection to TraCI and stopping simulation.
Note, this may print an error message when it closes.


## Question 2: Training RL Experiments with Flow

Having walked through the procedure through which traffic can be simulated in Flow, we will now walk through the process through which an MDP representing a certain traffic-motivated task can be generated in Flow, and will then attempt to use reinforcement learning techniques to train autonomous vehicles to mitigate traffic deficiencies. For this question, we will consider the toy problem of coordinating vehicles through an intersection represented as a figure eight (see the figure below).

<img src="img/figure_eight.png" width="400">

### a. Creating custom environments

We begin by designing an MDP that is representative of our problem. Using the Flow computational framework, this can be done by creating an environment object, similar to `TestEnv` environment we were using during simulation in question 1.

In the below cell, design an environment than can be used to coordinate vehicles through a figure eight intersection. In order to do so, perform the following tasks:

1. Modify the `observation_space` and `get_state` methods so that your state is the speed and position of every vehicle in the network. When collecting the positions of vehicle, use the `self.get_x_by_id` method.
2. Modify the `action_space` and `_apply_rl_actions` methods so that your policy's actions are converted to desired accelerations by the RL vehicle in the environment. The actions as defined in the `action_space` method should *not* be bounded.
3. Modify the `compute_reward` function to return the average speed of vehicles in the network.

**Hints**:

- For a review of creating custom environments in flow, please see the following [tutorial](https://github.com/flow-project/flow/blob/master/tutorials/tutorial06_environments.ipynb).
- Individual vehicle state information can be collected from the Vehicles class within an environmnet (called by `self.vehicles`). Refer to [this file](https://github.com/flow-project/flow/blob/master/flow/core/vehicles.py) for what sort of information can be collected. The same could be done for scenario/network information using the variable `self.scenario`, which the associated get methods available [here](https://github.com/flow-project/flow/blob/master/flow/scenarios/base_scenario.py).

In [7]:
from flow.envs import Env
import numpy as np
from gym.spaces import Box

class MyEnv(Env):  # create a new environment
    """Environment used to train vehicles to coordinate through
    an intersection.

    States
        The states are the speeds and positions of all vehicles.

    Actions
        The actions are an acceleration for each automated vehicle.

    Rewards
        The reward function is the average speed of all vehicles in 
        the network.

    Termination
        The rollout is terminated if the time horizon is met.
    """

    @property
    def action_space(self):
        ##############################################################
        # specify dimensions and properties of the action space here #
        ##############################################################
        return Box(low=-float("inf"), high=float("inf"), 
                   shape=(self.vehicles.num_rl_vehicles,),
                   dtype=np.float32)

    @property
    def observation_space(self):
        #############################################################
        # specify dimensions and properties of the state space here #
        #############################################################
        return Box(low=-float("inf"), high=float("inf"), 
                   shape=(3,), dtype=np.float32)

    def get_state(self, **kwargs):
        ####################################
        # specify desired state space here #
        ####################################
        ids = self.vehicles.get_ids()
        speeds = self.vehicles.get_speed(ids)
        pos = np.array([self.get_x_by_id(veh_id) for veh_id in ids])
        return np.concatenate((speeds, pos))
        
    def _apply_rl_actions(self, rl_actions):
        #####################################
        # specify desired action space here #
        #####################################
        self.apply_acceleration(self.vehicles.get_rl_ids(), rl_actions)

    def compute_reward(self, rl_actions, **kwargs):
        ########################################
        # specify desired reward function here #
        ########################################
        return np.mean(self.vehicles.get_speed(self.vehicles.get_ids()))

### b. Testing the Environment

In order to test whether your scenario is working properly, modify and run the below cell. If all is well, it will print a "Sucess!" at the end.

In [15]:
from flow.core.params import EnvParams, SumoParams, InitialConfig
from flow.core.vehicles import Vehicles
from flow.controllers import RLController

env_params = EnvParams()
sumo_params = SumoParams(render=False)
initial_config = InitialConfig()
vehicles = Vehicles()
vehicles.add("rl", acceleration_controller=(RLController, {}), num_vehicles=1)

#################################
# Create the scenario file here #
#################################
from flow.scenarios import Figure8Scenario
from flow.core.params import NetParams

net_params_figureeight = NetParams(
    no_internal_links=False,
    additional_params={
        'radius_ring': 30, 
        'lanes': 1, 
        'speed_limit': 30, 
        'resolution': 40
    }
)

scenario = Figure8Scenario(
    name="figure8",
    vehicles=vehicles,
    net_params=net_params_figureeight,
    initial_config=initial_config   
)

env = MyEnv(env_params=env_params, 
            sumo_params=sumo_params, 
            scenario=scenario)

if all(env.get_state() != np.array([0, 0])):
    print("get_state failed.")
elif env.compute_reward([]) != 0:
    print("compute_reward failed")
else:
    env.step(rl_actions=[0.1])

    if abs(env.vehicles.get_speed(env.vehicles.get_rl_ids())[0] - 0.1) > 10e-2:
        print("RL action failed", env.vehicles.get_speed(env.vehicles.get_rl_ids())[0])
    else:
        print("Sucess!")

Sucess!


### c. Training and Visualizing on RLlib

The environment we created in section a) is compatible with OpenAI gym, a popular standardization of MDP tasks in the RL community, and accordingly can be trained a variety of differen off-the-shelf RL libraries. For this, we will utilize the RL library `RLLib`.

In order to avoid the installation procedure for RLlib (which are described in the installation instructions), for this workshop we will provide each user an EC2 instance to with RLlib and Flow preinstalled, and a jupyter-notebook titled "rl_exercise.ipynb" that contains the code you will need to run to execute your experiment. In order to open this notebook, type in a web browser:

        http://[url]:8888/?token=fd0720594f408865c7c0185bc209a531c16480c91d18bad2

where an appropriate url will be provided by on the of the tutorial assistants upon request.

The tutorial new jupyter notebook will walk you through running a RL experiment in RLlib and then visualizing the performance of the learned control strategy.