# Frozen Lake Domain Description
## https://gymnasium.farama.org/environments/toy_text/frozen_lake/

Frozen lake involves crossing a frozen lake from start to goal without falling into any holes by walking over the frozen lake. The player may not always move in the intended direction due to the slippery nature of the frozen lake.

## Observation Space
The observation is a value representing the player’s current position as current_row * ncols + current_col (where both the row and col start at 0).

For example, the goal position in the 4x4 map can be calculated as follows: 3 * 4 + 3 = 15. The number of possible observations is dependent on the size of the map.

The observation is returned as an int().

* **S**: Starting position of the agent
* **F**: Frozen surface, safe to walk on
* **H**: Hole, falling into one ends the episode with a reward of 0
* **G**: Goal, reaching it ends the episode with a reward of 1

## Starting State
The episode starts with the player in state [0] (location [0, 0]).

## Rewards

* **Reach goal**: +1
* **Reach hole**: 0
* **Reach frozen**: 0

## Action Space
The action shape is (1,) in the range {0, 3} indicating which direction to move the player:

* **0: Left**
* **1: Down**
* **2: Right**
* **3: Up**

## Episode End

The episode ends if the following happens:

* **Termination:**:
    * The player moves into a hole. 
    * The player reaches the goal at max(nrow) * max(ncol) - 1 (location [max(nrow)-1, max(ncol)-1]).
* **Truncation (when using the time_limit wrapper):**:
    * The length of the episode is 100 for 4x4 environment, 200 for FrozenLake8x8-v1 environment.


In [2]:
import gymnasium as gym
import numpy as np
# Create the environment
env = gym.make('FrozenLake-v1', render_mode="human", is_slippery=True)
#env = gym.make('FrozenLake-v1', render_mode="ansi", is_slippery=False)  # 'ansi' mode for text-based rendering

# Reset the environment to the initial state
observation, info = env.reset(seed=42)
policy = [0, 3, 0, 3, 0, 0, 0, 0, 3, 1, 0, 0, 0, 2, 1, 0]
policy2 = [
    1, 0, 0, 0, 
    1, 0, 1, 0, 
    2, 1, 1, 0, 
    0, 2, 2, 0
]
for _ in range(25):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(policy[observation])

    # Render the environment
    env.render()

    if terminated or truncated:
        observation, info = env.reset()
env.close()

In [3]:
""" "ansi" rendering mode in Gymnasium produces a text-based representation of the environment, but the output won't be automatically printed when you call env.render(). Instead, it returns a string, and you need to explicitly print it. """
import gymnasium as gym
# Create the environment
env = gym.make('FrozenLake-v1', render_mode="ansi", is_slippery=False)  # 'ansi' mode for text-based rendering

# Reset the environment to the initial state
observation, info = env.reset(seed=42)
policy = [
    1, 0, 0, 0, 
    1, 0, 1, 0, 
    2, 1, 1, 0, 
    0, 2, 2, 0
]
for _ in range(25):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(policy[observation])

    # Render the environment
    print(env.render())  # Print the ANSI string representation of the environment

    if terminated or truncated:
        observation, info = env.reset()
env.close()

  (Down)
SFFF
[41mF[0mHFH
FFFH
HFFG

  (Down)
SFFF
FHFH
[41mF[0mFFH
HFFG

  (Right)
SFFF
FHFH
F[41mF[0mFH
HFFG

  (Down)
SFFF
FHFH
FFFH
H[41mF[0mFG

  (Right)
SFFF
FHFH
FFFH
HF[41mF[0mG

  (Right)
SFFF
FHFH
FFFH
HFF[41mG[0m

  (Down)
SFFF
[41mF[0mHFH
FFFH
HFFG

  (Down)
SFFF
FHFH
[41mF[0mFFH
HFFG

  (Right)
SFFF
FHFH
F[41mF[0mFH
HFFG

  (Down)
SFFF
FHFH
FFFH
H[41mF[0mFG

  (Right)
SFFF
FHFH
FFFH
HF[41mF[0mG

  (Right)
SFFF
FHFH
FFFH
HFF[41mG[0m

  (Down)
SFFF
[41mF[0mHFH
FFFH
HFFG

  (Down)
SFFF
FHFH
[41mF[0mFFH
HFFG

  (Right)
SFFF
FHFH
F[41mF[0mFH
HFFG

  (Down)
SFFF
FHFH
FFFH
H[41mF[0mFG

  (Right)
SFFF
FHFH
FFFH
HF[41mF[0mG

  (Right)
SFFF
FHFH
FFFH
HFF[41mG[0m

  (Down)
SFFF
[41mF[0mHFH
FFFH
HFFG

  (Down)
SFFF
FHFH
[41mF[0mFFH
HFFG

  (Right)
SFFF
FHFH
F[41mF[0mFH
HFFG

  (Down)
SFFF
FHFH
FFFH
H[41mF[0mFG

  (Right)
SFFF
FHFH
FFFH
HF[41mF[0mG

  (Right)
SFFF
FHFH
FFFH
HFF[41mG[0m

  (Down)
SFFF
[41mF[0mHFH
FFFH
HFFG



In [4]:
import gymnasium as gym
import numpy as np
# Create the environment
env = gym.make('FrozenLake-v1', render_mode="ansi", is_slippery=False)

The main API methods that users of this class need to know are:

Methods:

reset() - Resets the environment to an initial state, required before calling step. Returns the first agent observation for an episode and information, i.e. metrics, debug info.

In [5]:
env.reset()

(0, {'prob': 1})

step() - Updates an environment with actions returning the next agent observation, the reward for taking that actions, if the environment has terminated or truncated due to the latest action and information from the environment about the step, i.e. metrics, debug info.

In [6]:
env.reset()

(0, {'prob': 1})

In [7]:
env.step(2)

(1, 0.0, False, False, {'prob': 1.0})

In [8]:
env.step(env.action_space.sample())

(5, 0.0, True, False, {'prob': 1.0})

In [9]:
observation, reward, terminated, truncated, info = env.step(1)

In [10]:
print("Observation Space:",observation," Reward:", reward, " Terminated: ",terminated, " Truncated: ",truncated," Info:", info)

Observation Space: 5  Reward: 0  Terminated:  True  Truncated:  False  Info: {'prob': 1.0}


render() - Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
env = gym.make('FrozenLake-v1', render_mode="rgb_array", is_slippery=False)

In [11]:
print(env.render())  # Print the ANSI string representation of the environment

  (Down)
SFFF
F[41mH[0mFH
FFFH
HFFG



close() - Closes the environment, important when external software is used, i.e. pygame for rendering, databases

In [12]:
env.close()

Environments have additional attributes for users to understand the implementation

Attributes:

action_space - The Space object corresponding to valid actions, all valid actions should be contained within the space.

In [13]:
env.action_space

Discrete(4)

In [14]:
env.action_space.sample()

0

In [15]:
actions = list(range(env.action_space.n))
actions

[0, 1, 2, 3]

In [16]:
for action in actions:
    print(action)

0
1
2
3


In [17]:
actions + [action]

[0, 1, 2, 3, 3]

In [33]:
env.P[14][2] # []state[]action
# prob, next_state, reward, done

[(0.3333333333333333, 14, 0.0, False),
 (0.3333333333333333, 15, 1.0, True),
 (0.3333333333333333, 10, 0.0, False)]

observation_space - The Space object corresponding to valid observations, all valid observations should be contained within the space.

In [19]:
env.observation_space

Discrete(16)

In [20]:
env.observation_space.n

16

In [21]:
list(range(env.observation_space.n))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

spec - An environment spec that contains the information used to initialize the environment from gymnasium.make()

In [22]:
env = gym.make('FrozenLake-v1', render_mode="ansi", is_slippery=True)

metadata - The metadata of the environment, e.g. {“render_modes”: [“rgb_array”, “human”], “render_fps”: 30}. For Jax or Torch, this can be indicated to users with “jax”=True or “torch”=True.

In [23]:
env.metadata

{'render_modes': ['human', 'ansi', 'rgb_array'], 'render_fps': 4}

In [24]:
env.render_mode

'ansi'

In [25]:
env.unwrapped

<gymnasium.envs.toy_text.frozen_lake.FrozenLakeEnv at 0x2584777f020>

np_random - The random number generator for the environment. This is automatically assigned during super().reset(seed=seed) and when assessing np_random.

In [26]:
observation, info = env.reset()
test = env.step(1)
print(test)

(0, 0.0, False, False, {'prob': 0.3333333333333333})


In [27]:
observation, info = env.reset(seed=42)
test = env.step(1)
print(test)

(4, 0.0, False, False, {'prob': 0.3333333333333333})


In [28]:
""" import gymnasium as gym
import numpy as np
from gymnasium.utils.play import play
play(gym.make('FrozenLake-v1', render_mode="rgb_array", is_slippery=False),  
    keys_to_action={
        "w": 3,
        "a": 0,
        "s": 1,
        "d": 2,
    },
    noop= 1
) """

' import gymnasium as gym\nimport numpy as np\nfrom gymnasium.utils.play import play\nplay(gym.make(\'FrozenLake-v1\', render_mode="rgb_array", is_slippery=False),  \n    keys_to_action={\n        "w": 3,\n        "a": 0,\n        "s": 1,\n        "d": 2,\n    },\n    noop= 1\n) '

In [29]:
policy = [0, 3, 0, 3, 0, 0, 0, 0, 3, 1, 0, 0, 0, 2, 1, 0]

In [30]:
V = np.zeros(env.observation_space.n)
V

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [31]:
# actions = list(range(env.action_space.n))
# list(range(env.observation_space.n))
# observation, reward, terminated, truncated, info = env.step(1)
# env.P[14][2] # prob, next_state, reward, done
# env.reset()

In [32]:
actions = list(range(env.action_space.n))
for a in actions:
    print("State 0 action:", a)
    print(env.P[0][a])

State 0 action: 0
[(0.3333333333333333, 0, 0.0, False), (0.3333333333333333, 0, 0.0, False), (0.3333333333333333, 4, 0.0, False)]
State 0 action: 1
[(0.3333333333333333, 0, 0.0, False), (0.3333333333333333, 4, 0.0, False), (0.3333333333333333, 1, 0.0, False)]
State 0 action: 2
[(0.3333333333333333, 4, 0.0, False), (0.3333333333333333, 1, 0.0, False), (0.3333333333333333, 0, 0.0, False)]
State 0 action: 3
[(0.3333333333333333, 1, 0.0, False), (0.3333333333333333, 0, 0.0, False), (0.3333333333333333, 0, 0.0, False)]
