Stochasticity
=============

An important aspect of Flatland scenarios will be their **stochasticity**, which means how often and for how long trains will malfunction. Malfunctions force the
agents to reconsider their plans, which can be costly.







## Effects Generators

As described in <a href="../../key-concepts/key_concepts.html">Key Concepts</a> section, effects generators allow to modify the environment state by two hooks:
* `on_episode_step_start`
* `on_episode_step_end`
  
This allows to introduce stochstasticity into a Flatland run.

In contrast to `FlatlandCallbacks` which are used called by a trajectory runner/evaluator, effects generators are called by the simulation directly before and after the env's `step`.

## Malfunctions
Malfunctions are implemented by `MalfunctionEffectsGenerator` to simulate delays by stopping agents at random times for random durations. Trains that malfunction can’t move for a random, but
known, number of steps. They of course block the trains following them 😬.
Malfunctions

Stochastic events are common in railway networks. The initial plan often needs to be rescheduled during operations as minor events such as delayed departure
from train stations, various malfunctions on trains or infrastructure, or even problematic weather lead to delayed trains.

Malfunctions are implemented using a [Poisson process](https://en.wikipedia.org/wiki/Poisson_point_process) to simulate delays by stopping agents at random
times for random durations. Train that malfunction can't move for a random, but known, number of steps. They of course block the trains following them 😬

The parameters necessary for the stochastic events are provided as a `NamedTuple` called `MalfunctionParameters`:


In [None]:
from flatland.envs.malfunction_generators import MalfunctionParameters
stochastic_data = MalfunctionParameters(
    malfunction_rate=1 / 10000,  # Rate of malfunction occurence
    min_duration=15,  # Minimal duration of malfunction
    max_duration=50  # Max duration of malfunction
)
stochastic_data

The parameters are as follows:

- `malfunction_rate` is the mean rate of the poisson process in number of environment steps.
- `min_duration` and `max_duration` set the range of malfunction durations. They are sampled uniformly.

You can then introduce stochasticity in an environment by using the `malfunction_generator` parameter of the `RailEnv` constructor:



```python
from flatland.envs.rail_env import RailEnv
RailEnv(
    ...
malfunction_generator = ParamMalfunctionGen(stochastic_data),
...
)
```

In your controller, you can then check whether an agent is malfunctioning:

In [None]:
from flatland.envs.rail_env import RailEnv
from flatland.env_generation.env_generator import env_generator


env, _, _ = env_generator(
    malfunction_interval=1,  # Insanely low interval to show the effect. Inverse of rate of malfunction occurrence. Goes into `ParamMalfunctionGen`.
    malfunction_duration_min=15,  # Minimal duration of malfunction
    malfunction_duration_max=50  # Max duration of malfunction
)
env

action_dict = dict()
obs, rew, done, info = env.step(action_dict)


for a in range(env.get_num_agents()):
    if info['malfunction'][a] > 0:
        # info['malfunction'][a] contains the number of steps this agent will still be blocked
        print(f"agent {a} is malfunctioning and can't move for {info['malfunction'][a]} steps!")

You will quickly realize that this will lead to unforeseen difficulties which means that your controller needs to observe the environment at all times to be
able to react to the stochastic events!

<!-- [Check out the starter kit](https://gitlab.aicrowd.com/flatland/neurips2020-flatland-starter-kit/blob/master/reinforcement_learning/multi_agent_training.py#L55) for a complete example of how to train a model using malfunctions. -->

## Conditional Malfunctions

`ConditionalMalfunctionEffectsGenerator` Generate agent malfunctions conditionally with conditional rate and duration:
* `earliest_malfunction`
* `max_num_malfunctions`
* `condition` : `MalfunctionCondition` `EnvAgent`

A `MalfunctionCondition` takes an `EnvAgent` and decides whether a malfunction should be drawn from the Poisson process. There are 3 pre-defined such conditions:

* `on_map_state_condition`: is agent on map?
* `condition_stopped_intermediate_and_range`: is agent stopped at an intermediate waypoint and in range of timesteps?
* `condition_stopped_cells_and_range`: is agent stopped on any given cell and during range of timesteps`

This feature allows to generate minor disruptions with faster recovery times at train stations.

> This feature was introduced in [4.2.1](https://github.com/flatland-association/flatland-rl/pull/263)


## Observation Perturbations

Another stochastic feature are "noisy observations". They do not change the behaviour of the simulation/environment but introduce noise in the observation passed to the agent or controller, see [Figure 3.1: The agent–environment interaction in reinforcement learning Sutton and Burton: Reinforcement Learning, 2nd edition](http://incompleteideas.net/book/RLbook2020.pdf).

Find more in the section on observation perturbations in the
<a href="../observation_builder.html">Observations</a> section.