
# The Frozen Lake Environment (Mendes 2019)
In this article, we are going to learn how to create and explore the 
Frozen Lake environment using the Gym library, an open source project created by OpenAI used 
for reinforcement learning experiments.The Gym library defines a uniform interface for environments 
what makes the integration between algorithms and environment easier for developers. 
Among many ready-to-use environments, the default installation includes a text-mode version of 
the Frozen Lake game, used as example in our last post.


In [None]:
import gym
#Importing the gym library and creating the environment
env = gym.make("FrozenLake-v1")
env.reset()                    
env.render() 


The output is a matrix of tiles showing the player's current state, some tiles are holes, 
others are frozen. Finally, there's a Goal tile that the agent should try to reach. 
Also, we can inspect the possible actions to perform in the environment, 
as well as the possible states of the game.





In [None]:
print("Action space: ", env.action_space)
print("Observation space: ", env.observation_space)

Action space:  Discrete(4)
Observation space:  Discrete(16)


In the code above, we print on the console the field action_space and the field observation_space. 
The returned objects are of the type Discrete, which describes a discrete space of size n. 
For example, the action_space for the Frozen Lake environment is a discrete space of 4 values, 
which means that the possible values for this space are 0 (zero), 1, 2 and 3. 
Yet, the observation_space is a discrete space of 16 values, which goes from 0 to 15. 
Besides, these objects offer some utility methods, like the sample() method which returns a random value 
from the space. With this method, 
we can easily create a dummy agent that plays the game randomly:

In [None]:


MAX_ITERATIONS = 10
 
env = gym.make("FrozenLake-v1")
env.reset()
env.render()
for i in range(MAX_ITERATIONS):
    random_action = env.action_space.sample()
    new_state, reward, done, info = env.step(
       random_action)
    env.render()
    if done:
        break
        

The code above executes the game for a maximum of 10 iterations using the method sample() from the action_space object to select a random action. Then the env.step() method takes the action as input, executes the action on the environment and returns a tuple of four values:

*   **new_state:** The new state of the environment
*   **reward:** The reward
*   **done:** A boolean flag indicating if the returned state is a terminal state
*   **info:** An object with additional information for debugging purposes

Finally, we use the method env.render() to print the grid on the console and use the returned “done” flag to 
break the loop. Notice that the selected action is printed together with the grid.

# **Stochastic vs Deterministic**
Note in the previous output the cases in which the player moves in a different direction than the one chosen by the agent. This behavior is completely normal in the Frozen Lake environment because it simulates a slippery surface. Also, this behavior represents an important characteristic of real-world environments: the transitions from one state to another, for a given action, are probabilistic. For example, if we shoot a bow and arrow there’s a chance to hit the target as well as to miss it. The distribution between these two possibilities will depend on our skill and other factors, like the direction of the wind, for example. Due to this probabilistic nature, the final result of a state transition does not depend entirely on the taken action.

By default, the Frozen Lake environment provided in Gym has probabilistic transitions between states. In other words, even when our agent chooses to move in one direction, the environment can execute a movement in another direction:

In [None]:
actions = {
    'Left': 0,
    'Down': 1,
    'Right': 2, 
    'Up': 3
}
 
print('---- winning sequence ------ ')
winning_sequence = (2 * ['Right']) + (3 * ['Down'])+ ['Right']
print(winning_sequence)
 
env = gym.make("FrozenLake-v1")
env.reset()
env.render()
 
for a in winning_sequence:
    new_state, reward, done, info = env.step(actions[a])
    print()
    env.render()
    print("Reward: {:.2f}".format(reward))
    print(info)
    if done:
        break  
 
print()

---- winning sequence ------ 
['Right', 'Right', 'Down', 'Down', 'Down', 'Right']

Reward: 0.00
{'prob': 0.3333333333333333}

Reward: 0.00
{'prob': 0.3333333333333333}

Reward: 0.00
{'prob': 0.3333333333333333}

Reward: 0.00
{'prob': 0.3333333333333333}

Reward: 0.00
{'prob': 0.3333333333333333}

Reward: 0.00
{'prob': 0.3333333333333333}



Executing the code above, we can observe different results and paths at each execution. Also, using the info object returned by the step method we can inspect the probability used by the environment to choose the executed movement.


However, the Frozen Lake environment can also be used in deterministic mode. By setting 
the property is_slippery=False when creating the environment, the slippery surface is turned 
off and then the environment always executes the action chosen by the agent:

In [None]:
env = gym.make("FrozenLake-v1", is_slippery=False)
env.reset()
env.render()
 
for a in winning_sequence:
    new_state, reward, done, info = env.step(actions[a])
    print()
    env.render()
    print("Reward: {:.2f}".format(reward))
    print(info)
    if done:
        break
print()


Reward: 0.00
{'prob': 1.0}

Reward: 0.00
{'prob': 1.0}

Reward: 0.00
{'prob': 1.0}

Reward: 0.00
{'prob': 1.0}

Reward: 0.00
{'prob': 1.0}

Reward: 1.00
{'prob': 1.0}



# **Map sizes and custom maps** 
The default 4×4 map is not the only option to play the Frozen Lake game. 
Also, there’s an 8×8 version that we can create in two different ways. 
The first one is to use the specific environment id for the 8×8 map:


In [None]:
env = gym.make("FrozenLake8x8-v1")
env.reset()
env.render()


The second option is to call the make method 
passing the value “8×8” as an argument to the map_name parameter:

In [None]:
env = gym.make('FrozenLake-v1', map_name='8x8')
env.reset()
env.render()


And finally, we can create our custom map of the Frozen Lake game by passing 
an array of strings representing the map as an argument to the parameter desc:

In [None]:

custom_map = [
    'SFFHF',
    'HFHFF',
    'HFFFH',
    'HHHFH',
    'HFFFG'
]
 
env = gym.make('FrozenLake-v1', desc=custom_map)
env.reset()
env.render()


# **Conclusion**
In this post, we learned how to use the Gym library to create an environment to train a 
reinforcement learning agent. We focused on the Frozen Lake environment, a text mode game with 
simple rules but that allows us to explore the fundamental concepts of reinforcement learning.

# Research Questions

## Question 1:

The gym library is a collection of simulated environments that can be used to develop and test reinforcement learning algorithms through a simple interface. The framework provides an "Environment" object, the intelligent agent''s actuators interact with the Environment object through the "step" function. This is a description of the return values (OpenAi 2016):

* observation (object): an environment-specific object representing your observation of the environment. For example, pixel data from a 
camera, joint angles and joint velocities of a robot, or the board state in a board game.
* reward (float): amount of reward achieved by the previous action. The scale varies between environments, but the goal is always to increase your total reward.
* done (boolean): whether it’s time to reset the environment again. Most (but not all) tasks are divided up into well-defined episodes, and done being True indicates the episode has terminated. (For example, perhaps the pole tipped too far, or you lost your last life.)
* info (dict): diagnostic information useful for debugging.  It can sometimes be useful for learning  (for example, it might contain the raw probabilities behind the environment’s last state change). 

This is a useful abstraction on the classic "Agent-Environment" interaction; The agent interacts with the environment, the environment returns "metadata" about the action (Observation + Reward) that the agent uses to inform its next decision based on the algorithm employed.

The Environment interface also provides both the set of possible actions in a given position as well as the set of all possible states in the game through the "action_space" and "observation_space" attributes respectively.

These can be useful for checking if a given action is valid or sampling from the action_space to simulate random trials.

Fundamentally, the difference between the enviornment and the agent is that the former is the objective situation and the latter is the subjective surrogate. The environment is set up so that the agent can interact with it. The environment is largely static: it has a number of paramteres such as "state," "action," and "reward." The agent then has a responsibility to interact with the environment is its own specific way. It will try to analyze the landscape to find the *optimal policy* for its own personal objectives.

## Question 2:

Markdown is a powerful and simple tool to style your text according to your needs, offering great support across a range of devices. The main principle for using Markdown when sharing your code is to make sure that your code is easily readable by other humans. Indenting your line in by the use of tab, leaving empty lines in between paragraphs, and bolding, italicizing, and underlying important information all serve to more professionally convey your information to the reader. First, in order to have principled markdown, one must learn how to use the tool. Two references for markdown tutorials are markdownguide.org and markdowntutorial.com. As for the principles of writing lucid communication as part of your code, a useful guide is the Engineer’s Guide to Writing Code Comments available [here](https://www.stepsize.com/blog/the-engineers-guide-to-writing-code-comments). In it, the author emphasizes the importance of writing comments as you go, so as to not get bogged down in increased complexity and to keep your references contained in the current document.


## Question 3:

For my assignment, I used Colab. Colab was a great tool as it allowed me to have cells for code and cells for text, allowing for different programming languages for each use.
The principle for effective storytelling using the Frozen-AI environment in openAI gym is to explain the intuition of the algorithm in every step of the way. The reader should be introduced, through a text cell, to the what, the how and the why of each code cell. In this way, the logical reasoning of the author is transmitted to the reader through logical steps, without any jumps or holes in mutual understanding. This is very useful when the story is complex as the reader can be lost, lose interest, and therefore lucid communication is not achieved. A good example in The Frozen Lake article (Mendes 2019) is the Stochastic vs Deterministic section. The act of contrasting the two different settings gracefully elucidates the probabilistic aspect of the random trials.

# References:

Mendes, R. (2019, June 16). Gym Tutorial: The Frozen Lake. Reinforcement Learning for Fun. Retrieved April 3, 2022, from https://reinforcement-learning4.fun/2019/06/16/gym-tutorial-frozen-lake/

Gym: A toolkit for developing and comparing reinforcement learning algorithms. (2016). OpenAI. Retrieved April 3, 2022, from https://gym.openai.com/docs/

