# Why and When Self-Attention Matters in Reinforcement Learning
> "Using RLlib and AttentionNet to master environments with stateless observations, here, stateless CartPole."

- hide: true
- toc: true
- branch: master
- badges: true
- comments: true
- categories: [python, ray, rllib, tensorflow, machine learning, reinforcement learning, attention]
- image: images/cartpole.jpg

TODOs:
* Image and short explanation of (self-)attention.
* Proper description of different options


In reinforcement learning (RL), the RL agent typically selects a suitable action based on the last observation.
Since many practical environments are stateful, this state should be taken into account when selecting an action.
As an example, consider the popular [OpenAI Gym CartPole environment](https://gym.openai.com/envs/CartPole-v1/),
where the task is to move a cart left or right in order to balance a pole on the cart as long as possible.

![OpenAI Gym CartPole-v1 Environment](attention/cartpole.gif "OpenAI Gym CartPole-v1 Environment")

Whether the cart should be moved left or right clearly depends on how the pole is currently moving,
i.e., in which direction it is swinging and with which velocity.
In this example, the pole's movement and velocity are an important part of the state,
which should determine the selected action (left or right).

There are different options how to deal with this state:

* Simplest case: Ignore it. If the RL agent only observes the raw pixels or current position of the pole,
but not the pole movement and velocity (i.e., the full state), it is very hard to learn a useful policy.
* Observation with state, i.e., explicitly containing pole movement and velocity.
* Sequence of last observations
* Sequence of last observations + attention


## Setup

Install Ray RLlib and TensorFlow (also works with PyTorch):

In [None]:
!pip install ray[rllib]==1.8.0
!pip install tensorflow==2.7.0

I am using Python 3.8 on Windows 10.


## Solving the Default CartPole with Explicit State

In [1]:
import ray
import ray.tune
from ray.rllib.agents import ppo
from ray.rllib.examples.env.stateless_cartpole import StatelessCartPole


ray.init()
registry.register_env("StatelessCartPole", lambda _: StatelessCartPole())

config = ppo.DEFAULT_CONFIG.copy()
config["env"] = "StatelessCartPole"


ModuleNotFoundError: No module named 'ray'

In [None]:
# train for 2 iterations
results = ray.tune.run("PPO", config=config, stop={"train_iterations": 2})