# Exercise 03: Running rllab Experiments

This tutorial walks you through the process of running traffic simulations in Flow with rllab-powered agents. Autonomous agents will learn to maximize a certain reward over the rollouts, using the **rllab** library. Simulations of this form will depict the propensity of RL agents to influence the traffic of a human fleet in order to make the whole fleet more efficient (for some given metrics). 

In this exercise, we simulate an initially perturbed single lane ring road, where we introduce a single autonomous vehicle. We witness that, after some training, that the autonomous vehicle learns to avoid the "phantom jams" which form when only human dynamics are involved.

## 1. Components of a Simulation
All simulations, both in the presence and absence of RL, require two components: a *scenario*, and an *environment*. Scenarios describe the features of the transportation network used in simulation. This includes the positions and properties of nodes and edges constituting the lanes and junctions, as well as properties of the vehicles, traffic lights, inflows, etc... in the network. Environments, on the other hand, initialize, reset, and advance simulations, and act as the primary interface between the reinforcement learning algorithm and the scenario. Moreover, custom environments may be used to modify the dynamical features of an scenario. Finally, in the RL case, it is in the *environment* that the state/action spaces and the reward function are defined. 

## 2. Setting up a Scenario
Flow contains a plethora of pre-designed scenarios used to replicate highways, intersections, and merges in both closed and open settings. All these scenarios are located in flow/scenarios. In order to recreate a ring road network, we begin by importing the scenario `LoopScenario`

In [1]:
from flow.scenarios.loop.loop_scenario import LoopScenario

This scenario, as well as all other scenarios in Flow, are parameterized by the following arguments: 
* name
* generator_class
* vehicles
* net_params
* initial_config
* traffic_lights

These parameters allow a single scenario to be recycled for multitude of different network settings. For example, `LoopScenario` may be used to create ring roads of variable length with a variable number of lanes and vehicles.

### 2.1 Name
The `name` argument is a string variable depicting the name of the scenario. This has no effect on the type of network created.

In [2]:
name = "ring_example"

### 2.2 Generator Class
Generator classes are used to create configuration and net files needed to initialize a simulation instance. The methods of this class are called by the base scenario class. All scenarios in Flow are come with an analogous generator that are located in the same directory as the scenario. In the case of `LoopScenario`, this generator is called `CircleGenerator`.

In [3]:
from flow.scenarios.loop.gen import CircleGenerator

### 2.3 Vehicles
The `Vehicles` class stores state information on all vehicles in the network. This class is used to identify the dynamical features of a vehicle and whether it is controlled by a reinforcement learning agent. Morover, information pertaining to the observations and reward function can be collected from various `get` methods within this class.

The initial configuration of this class describes the number of vehicles in the network at the start of every simulation, and specifies the characteristic features of these vehicles. We begin by creating an empty `Vehicles` object.

In [4]:
from flow.core.vehicles import Vehicles

vehicles = Vehicles()

Once this object is created, vehicles may be introduced using the `add` method. This method specifies the types and quantities of vehicles at the start of a simulation rollout. For a description of the various arguements associated with the `add` method, we refer the reader to the following documentation (reference readthedocs).

When adding vehicles, their dynamical behaviors may be specified either (default) by the simulator, or by a user-generated model. For longitudonal (acceleration) dynamics, several prominent car-following models are implemented in Flow. For this example, the acceleration behavior of all vehicles will be defined by the Intelligent Driver Model (IDM) [2]. As such, we import the `IDMController` to specify the longitudinal dynamics of human-controlled cars.

In [5]:
from flow.controllers.car_following_models import IDMController

Another controller we will need to define is for the vehicle's routing behavior. For closed network where the route for any vehicle is repeated, the `ContinuousRouter` controller is used to pertpetually reroute all vehicles to the initial set route.

In [6]:
from flow.controllers.routing_controllers import ContinuousRouter

We add 13 vehicles of type "human" with the above longitudinal and routing behavior into the `Vehicles` class.

In [7]:
vehicles.add("human",
             acceleration_controller=(IDMController, {}),
             routing_controller=(ContinuousRouter, {}),
             num_vehicles=13)

Now, we want to add RL-powered vehicles (only one in this case). We do something similar to before, with one change: we import Flow's `RLController`, which applies actions returned by the reinforcement learning algorithm (as specified in the environment's action space and `apply_rl_actions` method). 

In [8]:
from flow.controllers.rlcontroller import RLController
    
vehicles.add(veh_id="rl",
             acceleration_controller=(RLController, {}),
             routing_controller=(ContinuousRouter, {}),
             num_vehicles=1)

### 2.4 NetParams

`NetParams` are network-specific parameters used to define the shape and properties of a network. Unlike most other parameters, `NetParams` may vary drastically dependent on the specific network configuration, and accordingly most parameters are stored in the `additional_params` attribute. In order to determine which `additional_params` variable may be needed for a specific scenario, we refer to the `ADDITIONAL_NET_PARAMS` variable located in the scenario file.

In [9]:
from flow.scenarios.loop.loop_scenario import ADDITIONAL_NET_PARAMS

ADDITIONAL_NET_PARAMS

{'lanes': 1, 'length': 230, 'resolution': 40, 'speed_limit': 30}

Importing the `ADDITIONAL_NET_PARAMS` dict from the ring road scenario, we see that the required parameters are:

* **length**: length of the ring road
* **lanes**: number of lanes
* **speed**: speed limit for all edges
* **resolution**: resolution of the curves on the ring. Setting this value to 1 converts the ring to a diamond.


At times, other inputs (for example `no_internal_links`) may be needed by the scenario to recreate proper network features/behavior. These requirements can be founded in the scenario's documentation. Furthermore, for this exercise, we use the scenario's default parameters when creating the NetParams object.

In [10]:
from flow.core.params import NetParams

net_params = NetParams(additional_params=ADDITIONAL_NET_PARAMS)

### 2.5 InitialConfig

`InitialConfig` specifies parameters that affect the positioning of vehicles in the network at the start of a simulation. These parameters can be used to limit the edges and number of lanes vehicles originally occupy, and provide a means of adding randomness to the starting positions of vehicles. In order to introduce a small initial disturbance to the system of vehicles in the network, we set the `perturbation` term in `InitialConfig` to 1.

In [11]:
from flow.core.params import InitialConfig

initial_config = InitialConfig(spacing="uniform", perturbation=1)

### 2.6 TrafficLights

TrafficLights are used to desribe the positions and types of traffic lights in the network. These inputs are outside the scope of this tutorial, and instead are covered in `exercise06_traffic_lights.ipynb`. For our example, we create an empty `TrafficLights` object, ensuring that none are placed on any nodes.

In [12]:
from flow.core.traffic_lights import TrafficLights

traffic_lights = TrafficLights()

## 3. Setting up an Environment

Several environments in Flow exist to train RL agents of different forms (e.g. autonomous vehicles, traffic lights) to perform a variety of different tasks. These environments are often scenario or task specific; however, some can be deployed on an ambiguous set of scenarios as well. One such enviornment, `AccelEnv`, may be used to train a variable number of vehicles in a fully observable network with a *static* number of vehicles.

In [13]:
from flow.envs.loop.loop_accel import AccelEnv

The use of an environment allows us to view the cumulative reward simulation rollouts receive, along with to specify the state/action spaces.

Envrionments in Flow are parametrized by three components:
* env_params
* sumo_params
* scenario

### 3.1 SumoParams
`SumoParams` specifies simulation-specific variables. These variables include the length of any simulation step and whether to render the GUI when running the experiment. For this example, we consider a simulation step length of 0.1s and activate the GUI. 

**Note** For training purposes, it is highly recommanded to deactivate the GUI in order to avoid global slow down. In such case, one just need to specify the following: `sumo_binary="sumo"`

In [14]:
from flow.core.params import SumoParams

# sumo_params = SumoParams(sim_step=0.1, sumo_binary="sumo-gui")
sumo_params = SumoParams(sim_step=0.1, sumo_binary="sumo")

### 3.2 EnvParams

`EnvParams` specifies environment and experiment-specific parameters that either affect the training process or the dynamics of various components within the scenario. Much like `NetParams`, an experiment demands to specify some additional parameters that are specific to a given experience. In this case, we want to specify the scenario used (`LoopScenario`) plus the `target_velocity`. The latter is basically the velocity that the autonomous vehicles will strive to impose to the whole fleet. Most generally, the only possible actions of autonomous vehicles will be to decelerate/accelerate (this comes from the action space in `AccelEnv`). It is also here that we want to specify the range of acceleration that they are capable of. Finally, it is important to specify here the *HORIZON* of the experiment, which is the duration of one episode (during which the RL-agent acquire data). 

In [15]:
from flow.core.params import EnvParams

HORIZON = 100

additional_env_params = {"target_velocity":8,
                         "scenario_type":LoopScenario,
                         "ring_length":[220, 270],
                         "max_decel":1,
                         "max_accel":1}

env_params = EnvParams(horizon=HORIZON, additional_params=additional_env_params)

## 4. Setting up an Experiment

First, we need to define the scenario that we'll use: 

In [16]:
# create the scenario object
scenario = LoopScenario(name="ring_example",
                        generator_class=CircleGenerator,
                        vehicles=vehicles,
                        net_params=net_params,
                        initial_config=initial_config,
                        traffic_lights=traffic_lights)

Now, we have to specify our Gym Environment and the algorithm that our RL agents we'll use.
To specify the environment, one has to use the environment's name (a simple string). A list of all environment names is located in `flow/envs/__init__.py`.
To create the Gym Environment, the only necessary parameters are the environment name plus the previously defined variables.
In this experiment, we use a Gaussian MLP policy: we just need to specify its dimensions `(32,32)` and the environment name. We'll use linear baselines and the Trust Region Policy Optimization (TRPO) algorithm (see https://arxiv.org/abs/1502.05477):
- The `batch_size` parameter specifies the size of the batch during one step of the gradient descent. 
- The `max_path_length` parameter indicates the biggest rollout size possible of the experiment. 
- The `n_itr` parameter gives the number of iterations used in training the agent.

In [17]:
from rllab.algos.trpo import TRPO
from rllab.baselines.linear_feature_baseline import LinearFeatureBaseline
from rllab.policies.gaussian_mlp_policy import GaussianMLPPolicy
from rllab.envs.normalized_env import normalize
from rllab.envs.gym_env import GymEnv

env_name = "WaveAttenuationPOEnv"
pass_params = (env_name, sumo_params, vehicles, env_params, net_params,
                   initial_config, scenario)

env = GymEnv(env_name, record_video=False, register_params=pass_params)
horizon = env.horizon
env = normalize(env)

policy = GaussianMLPPolicy(
        env_spec=env.spec,
        hidden_sizes=(32, 32)
    )

baseline = LinearFeatureBaseline(env_spec=env.spec)

algo = TRPO(
        env=env,
        policy=policy,
        baseline=baseline,
        batch_size=3600 * 72 * 2,
        discount=0.999,
        max_path_length=horizon,
        n_itr=5,
        # whole_paths=True,
        # step_size=v["step_size"],
    )
algo.train(),

  return f(*args, **kwds)
  from ._conv import register_converters as _register_converters




  "downsample module has been moved to the theano.tensor.signal.pool module.")
Traceback (most recent call last):
  File "/Users/nishant/Development/research/rllab-multiagent/rllab/envs/gym_env.py", line 11, in <module>
    from gym.wrappers.monitoring import logger as monitor_logger
ImportError: cannot import name 'logger'


2018-05-04 13:42:13.661876 PDT | observation space: Box(4,)
2018-05-04 13:42:13.663054 PDT | action space: Box(1,)
2018-05-04 13:42:14.710899 PDT | Populating workers...
2018-05-04 13:42:14.712704 PDT | Populated

-----------------------
ring length: 249
v_max: 11.613727166217824
-----------------------


0% [###                           ] 100% | ETA: 00:00:25


-----------------------
ring length: 258
v_max: 12.25853090863283
-----------------------


0% [######                        ] 100% | ETA: 00:00:22


-----------------------
ring length: 235
v_max: 10.593260524824172
-----------------------


0% [#########                     ] 100% | ETA: 00:00:19


-----------------------
ring length: 257
v_max: 12.187368194492763
-----------------------


0% [############                  ] 100% | ETA: 00:00:16


-----------------------
ring length: 247
v_max: 11.469173128137612
-----------------------


0% [###############               ] 100% | ETA: 00:00:13


-----------------------
ring length: 244
v_max: 11.251538419220173
-----------------------


0% [##################            ] 100% | ETA: 00:00:11


-----------------------
ring length: 221
v_max: 9.555454406750316
-----------------------


0% [#####################         ] 100% | ETA: 00:00:08


-----------------------
ring length: 241
v_max: 11.03297886088859
-----------------------


0% [########################      ] 100% | ETA: 00:00:05


-----------------------
ring length: 255
v_max: 12.044671048467864
-----------------------


0% [###########################   ] 100% | ETA: 00:00:02


-----------------------
ring length: 270
v_max: 13.102160898070554
-----------------------


0% [##############################] 100% | ETA: 00:00:00

2018-05-04 13:42:42.345154 PDT | itr #0 | fitting baseline...
2018-05-04 13:42:42.385677 PDT | itr #0 | fitted



Total time elapsed: 00:00:27
  featmat.T.dot(returns)


[35m=: Compiling function f_loss[0m
[35mdone in 11.401 seconds[0m
[35m=: Compiling function constraint[0m
[35mdone in 2.219 seconds[0m
2018-05-04 13:42:56.023931 PDT | itr #0 | computing loss before
2018-05-04 13:42:56.031837 PDT | itr #0 | performing update
2018-05-04 13:42:56.032869 PDT | itr #0 | computing descent direction
[35m=: Compiling function f_grad[0m
[35mdone in 16.651 seconds[0m
[35m=: Compiling function f_Hx_plain[0m
[35mdone in 18.989 seconds[0m
2018-05-04 13:43:32.012524 PDT | itr #0 | descent direction computed
[35m=: Compiling function f_loss_constraint[0m
[35mdone in 2.866 seconds[0m
2018-05-04 13:43:34.946434 PDT | itr #0 | backtrack iters: 10
2018-05-04 13:43:34.947732 PDT | itr #0 | computing loss after
2018-05-04 13:43:34.948722 PDT | itr #0 | optimization finished
2018-05-04 13:43:34.966746 PDT | itr #0 | saving snapshot...
2018-05-04 13:43:34.968953 PDT | itr #0 | saved
2018-05-04 13:43:34.972359 PDT | -----------------------  --------------

(None,)

## 5. The whole code 

In the following, we regroup all the previous commands in one single cell

In [None]:
import logging

from rllab.envs.normalized_env import normalize
from rllab.misc.instrument import stub, run_experiment_lite
from rllab.algos.trpo import TRPO
from rllab.baselines.linear_feature_baseline import LinearFeatureBaseline
from rllab.policies.gaussian_mlp_policy import GaussianMLPPolicy

# recurrent stuff
from rllab.policies.gaussian_gru_policy import GaussianGRUPolicy
from rllab.optimizers.conjugate_gradient_optimizer import ConjugateGradientOptimizer, FiniteDifferenceHvp

from flow.scenarios.loop.gen import CircleGenerator
from flow.scenarios.loop.loop_scenario import LoopScenario
from flow.controllers.rlcontroller import RLController
from flow.controllers.lane_change_controllers import *
from flow.controllers.car_following_models import *
from flow.controllers.routing_controllers import *
from flow.core.vehicles import Vehicles
from flow.core.params import *
from rllab.envs.gym_env import GymEnv

import numpy as np
import sys

HORIZON = 10


def run_task(v):
    logging.basicConfig(level=logging.INFO)

    sumo_params = SumoParams(sim_step=0.1, sumo_binary="sumo-gui", seed=0)

    vehicles = Vehicles()
    vehicles.add(veh_id="rl",
                 acceleration_controller=(RLController, {}),
                 routing_controller=(ContinuousRouter, {}),
                 num_vehicles=1)
    vehicles.add(veh_id="idm",
                 acceleration_controller=(IDMController, {}),
                 routing_controller=(ContinuousRouter, {}),
                 num_vehicles=21)


    additional_env_params = {"target_velocity": 8,
                             "scenario_type": LoopScenario,
                             "ring_length":[220, 270],
                             "max_decel":1,
                             "max_accel":1}
    env_params = EnvParams(horizon=HORIZON,
                           additional_params=additional_env_params)

    additional_net_params = {"length": 260, "lanes": 1, "speed_limit": 30,
                             "resolution": 40}
    net_params = NetParams(additional_params=additional_net_params)

    initial_config = InitialConfig(spacing="uniform", bunching=50)

    print("XXX name", exp_tag)
    scenario = LoopScenario(exp_tag, CircleGenerator, vehicles, net_params,
                            initial_config=initial_config)

    env_name = "WaveAttenuationPOEnv"
    pass_params = (env_name, sumo_params, vehicles, env_params, net_params,
                   initial_config, scenario)

    env = GymEnv(env_name, record_video=False, register_params=pass_params)
    horizon = env.horizon
    env = normalize(env)

    policy = GaussianMLPPolicy(
        env_spec=env.spec,
        hidden_sizes=(32, 32)
    )

    baseline = LinearFeatureBaseline(env_spec=env.spec)

    algo = TRPO(
        env=env,
        policy=policy,
        baseline=baseline,
        batch_size=3600 * 72 * 2,
        max_path_length=horizon,
        discount=0.999,
        n_itr=5,
        # whole_paths=True,
        # step_size=v["step_size"],
    )
    algo.train(),

## 6. Running the experiment

The last few parameters to specify are : 
- The `n_parallel` cores you want to use for your experiment. If you set `n_parallel=2`, two processors will execute your code in parallel which results in a global roughly linear speed-up.
- The `mode` which can set to be `local` is you want to run the experiment locally, or to `ec2` for launching the experiment on an Amazon Web Services instance. 
- The `seed` parameter which calibrates the randomness in the experiment. 
- The `tag` for your experiment.

In [None]:
exp_tag = "car-stabilizing-the-ring-local"

for seed in [5]:  # , 20, 68]:
    run_experiment_lite(
        run_task,
        # Number of parallel workers for sampling
        n_parallel=1,
        # Keeps the snapshot parameters for all iterations
        snapshot_mode="all",
        # Specifies the seed for the experiment. If this is not provided, a
        # random seed will be used
        seed=seed,
        mode="local",
        exp_prefix=exp_tag,
        # plot=True,
    )