Actions
=======

The trains in Flatland have strongly limited movements, as you would expect from a railway simulation. This means that only a few actions are valid in most
cases.

Here are the possible actions:

- **`DO_NOTHING`**:  If the agent is already moving, it continues moving. If it is stopped, it stays stopped. Special case: if the agent is at a dead-end, this action will result in the train turning around. The train keeps the same speed. Special case: a symmetric switch in facing direction cannot be entered by this action.
- **`MOVE_LEFT`**: This action is only valid at cells where the agent can change direction towards the left. If chosen, the left transition and a rotation of  the agent orientation to the left is executed. If the agent is stopped, this action will cause it to start moving in any cell where forward or left is allowed! The train keeps the same speed.
- **`MOVE_FORWARD`**: The agent will move forward. This action will start the agent when stopped. At switches, this will choose the forward direction. In addition, the train accelerates by `RailEnv.acceleration_delta`. Special case: a symmetric switch in facing direction cannot be entered by this action.
- **`MOVE_RIGHT`**: The same as deviate left but for right turns, keeping the same speed.
- **`STOP_MOVING`**: This action causes the agent to decelerate/brake reducing speed by `RailEnv.braking_delta`. Special case: a symmetric switch in facing direction cannot be entered by this action, the train is stopped immediately instead of braking.

Flatland is a discrete time simulation, i.e. it performs all actions with constant time step. A single simulation step synchronously moves the time forward by a
constant increment, thus enacting exactly one action per agent per timestep.

```{admonition} Code reference
The actions are defined in [flatland.envs.rail_env.RailEnvActions](https://github.com/flatland-association/flatland-rl/blob/main/flatland/envs/rail_env.py#L69).

You can refer to the directions in your code using e.g., `RailEnvActions.MOVE_FORWARD`, `RailEnvActions.MOVE_RIGHT`...
```

In [None]:
from flatland.envs.rail_env import RailEnv, RailEnvActions
for a in RailEnvActions:
    print(f"{a.name}: {a.value}")

The following diagram shows the interplay of agent position/direction and actions.

The agent (red triangle) is in left switch cell with direction `W`. The left neighbor cell is a left switch, too.
Upon entering the new cell, the `MOVE_LEFT` action will update the agent's direction to `S`, and the `MOVE_FORWARD` direction will keep the agent's direction at
`W`.

![Flatland_3_Update.drawio.png](../../assets/images/Flatland_3_Update.drawio.png)

> *Pro memoria*
>
> **current position and direction** determine **next cell**
>
> **action** determines **next direction**


In [None]:
from flatland.envs.rail_grid_transition_map import RailGridTransitionMap
from flatland.envs.fast_methods import fast_argmax, fast_count_nonzero
from flatland.core.grid.grid4 import Grid4Transitions, Grid4TransitionsEnum
from flatland.core.grid.grid4_utils import get_new_position
from collections import defaultdict
import warnings

env = RailEnv(1,1)
env.rail = RailGridTransitionMap(1,1)
position = (0, 0)
from flatland.envs.grid.rail_env_grid import RailEnvTransitionsEnum
print(len(RailEnvTransitionsEnum))
pairs = set()
tuples = set()
for t in RailEnvTransitionsEnum:
    print(f"{t.name}:\t{t.value:016b} ")
    env.rail.set_transitions(position,t)
    for direction in range(4):
        for a in RailEnvActions:
            possible_transitions = env.rail.get_transitions((position, direction))
            num_transitions = fast_count_nonzero(possible_transitions)
            check = env.rail.check_action_on_agent(a, (position,direction))
            if num_transitions > 0:
                s = f" - facing {Grid4TransitionsEnum.to_char(direction)}, action {a}Â --> {check}"
                print(s)
                pairs.add((a,check[3]))
                new_cell_valid, (new_position,new_direction), transition_valid, preprocessed_action = check
                if RailEnvActions.is_moving_action(preprocessed_action) and num_transitions > 1:
                    # todo stop_moving may mean breaking, so might be invalid
                    assert env.rail.get_transitions((position, direction))[new_direction]
                    tup = (direction,preprocessed_action,new_direction)
                    tuples.add(tup)
print("action preprocessing pairs")
# verify 
# - R can only come from R
# - L can only come from L
# - F can only come from L/F/R
# - STOP can come from STOP or DO_NOTHING/F at symmetric switch
# - DO_NOTHING can only come from DO_NOTHING
for p in pairs:
    print(f"  {p}")
print("verify preprocessed action reflects direction change whenever num transitions > 1")
for t in list(tuples):
    print(f"  {t}")
    direction, action, new_direction = t
    if action == RailEnvActions.MOVE_FORWARD:
        assert direction == new_direction
    elif action == RailEnvActions.MOVE_LEFT:
        assert ((direction-1)%4) == new_direction
    elif action == RailEnvActions.MOVE_RIGHT:
        assert ((direction+1)%4) == new_direction

Variable Speeds
---------------

> This feature was introduced in [4.0.6](https://github.com/flatland-association/flatland-rl/pull/136)

In Flatland, agents reflect both partially the Infrastructure Manager's (route choice) and the train drivers' decisions (stop/go).

Variable speeds make Flatland more realistic. Agents can now run slower than their maximum speed (speed profile) by a re-interpretation of the `MOVE_FORWARD` action as acceleration and `STOP_MOVING` action as braking.

Variable speeds reflect real-world trains running slower than their maximum allowed speed, either due to their physical properties (cargo vs. passenger trains) or due to reduced speed signalled by the Infrastructure Manager's safety system in order to ensure trains can stop within their allocated paths. Train driver's decisions to accelerate/brake also reflect resource optimisation (e.g. trains should wait and accelerator to run max speed through infrastructure bottlnecks) or energy optimsation ("eco drive").

`RailEnv` takes two options:
```
        acceleration_delta : float
            Determines how much speed is increased by MOVE_FORWARD action up to max_speed set by train's Line (sampled from `speed_ratios` by `LineGenerator`).
            As speed is between 0.0 and 1.0, acceleration_delta=1.0 restores to previous constant speed behaviour
            (i.e. MOVE_FORWARD always sets to max speed allowed for train).
        braking_delta : float
            Determines how much speed is decreased by STOP_MOVING action.
            As speed is between 0.0 and 1.0, braking_delta=-1.0 restores to previous full stop behaviour.
```
If `acceleration_delta < 1.0`, then `MOVE_FORWARD` is re-interpreted as acceleration, i.e. the train's speed is increased by  `acceleration_delta` instead of going to full speed according to the train's speed profile. Hence, a train's speed profile is re-interpreted as its maximum speed.

Similarly, if `braking_delta > -1.0`, then `STOP_MOVING` is re-interpreted as braking, i.e. the train's speed is decreased by `braking_delta` until it reaches speed `0.0`. When the train reaches speed `0.0`, then it goes into state `STOPPED` and a new moving action is required to get it moving.

The rewards configuration `from flatland.envs.rewards import Rewards` takes an additional option:
```
        crash_penalty_factor = 0.0  # penalty for stopping train in conflict
```
This allows to penalize agents which produces trains running into each other. The Flatland env will set them to `STOPPED` state and add a negative reward in that step.

In [None]:
import inspect

from tests.test_variable_speed import test_variablespeed_actions_no_malfunction_no_blocking

print("".join(inspect.getsourcelines(test_variablespeed_actions_no_malfunction_no_blocking)[0]))