# Gymnasium integration
[Gymnasium](https://gymnasium.farama.org/) is an API standard for reinforcement learning with a diverse collection of reference environments. Stormvogel supports some integration with gymnasium. In particular, you can construct explicit models from the gymnasium environmnents under Gymnasium's [ToyText](https://gymnasium.farama.org/environments/toy_text/) (except Blackjack). 

## FrozenLake
Let us create one of these environments, called FrozenLake.
Our agent wants to get to the present. Currently, it just chooses a random action.

In [5]:
import gymnasium as gym
from stormvogel.extensions.gym_grid import *

env = gym.make("FrozenLake-v1", render_mode="rgb_array", is_slippery=False)  # Set `is_slippery=True` for stochastic behavior
#embed_video(create_video(env, "./video", None))

We can convert it into an explicit MDP as follows. Each state has a label that relates to the coordinates of the tile.

In [6]:
from stormvogel.show import show
from stormvogel.layout import Layout
sv_model = gymnasium_to_stormvogel(env)
vis = show(sv_model, layout=Layout("layouts/frozenlake.json"))

Output()

Output()

Now, let's do some model checking to calculate a strategy to solve the puzzle. We will tell the model checker to maximize the probability of getting to the target state (the present).

In [7]:
from stormvogel.model_checking import model_checking
target = get_target_state(env)
res = model_checking(sv_model, f'Pmax=? [F "target"]')
vis2 = show(sv_model, result=res, layout=Layout("layouts/frozenlake.json"))

Output()

Output()

Let's highlight the path to see what the scheduler is doing.

In [8]:
from stormvogel.simulator import simulate_path
path = simulate_path(sv_model, scheduler=res.scheduler, steps=20)
#vis2.highlight_path(path, color="orange")

Alternatively, we can show what our scheduler does in the frozen lake environment itself.

In [11]:
from stormvogel.extensions.gym_grid import *
embed_video(create_video_scheduler(sv_model, env, "./video", res.scheduler))

We can also define a function to act as the scheduler.

In [12]:
def my_scheduler(s: stormvogel.model.State):
    # "←" "↓" "→" "↑"
    return stormvogel.model.Action.create("→")

embed_video(create_video_scheduler(sv_model, env, "./video", my_scheduler))

## CliffWalking
CliffWalking is a slightly more boring version of FrozenLake. You can apply the same principles that we just applied to FrozenLake.

In [13]:
import gymnasium as gym
from stormvogel.extensions.gym_grid import *

env = gym.make("CliffWalking-v0", render_mode="rgb_array")
#embed_video(create_video(env, "./video", None))

In [14]:
from stormvogel.show import show
from stormvogel.layout import Layout

sv_model = gymnasium_to_stormvogel(env)
vis = show(sv_model, layout=Layout("layouts/cliffwalking.json"))

Output()

Output()

In [15]:
from stormvogel.model_checking import model_checking
target = get_target_state(env)
res = model_checking(sv_model, f'Pmax=? [F "target"]')
embed_video(create_video_scheduler(sv_model, env, "./video", res.scheduler))

## Taxi
In the Taxi scenario, a taxi has to pick up passengers and transport them to the hotel.

In [24]:
import gymnasium as gym
from stormvogel.extensions.gym_grid import *

env = gym.make("Taxi-v3", render_mode="rgb_array")  # Set `is_slippery=True` for stochastic behavior
initial_state, _ = env.reset()
print(initial_state)

168


For the Taxi scenario, the initial state is randomly chosen. It is important that it is the same for the stormvogel model and the video.

In [25]:
sv_model = gymnasium_to_stormvogel(env, initial_state=initial_state)
# This model is so big that it is better not to display it.
sv_model.summary()

'ModelType.MDP model with name None, 126 states, 6 actions, and 252 distinct labels.'

In [26]:
from stormvogel.model_checking import model_checking

target = get_target_state(env)
res = model_checking(sv_model, f'Rmax=? [S]')
embed_video(create_video_scheduler(sv_model, env, "./video", res.scheduler, limit=20, initial_state=initial_state))