In [1]:
import gym
import matplotlib.pyplot as plt
import helper
import imageio
import os
import time

1. Initialisation: Create and initialise the environment.

2. Execution: Take repeated actions in the environment. At each step the environment provides information to describe its new state and the reward received as a consequence of taking the specified action. This continues until the environment signals that the episode is complete.

3. Termination: Cleanup and destroy the environment.

In [2]:
gym.__version__

'0.25.1'

## Introduction
* Example was created using `gym` version `0.25.1`
* Don't mind the deprecation warning, all the examples I saw used `new_step_api=False`. Couldn't find documentation that uses the new one.

In [3]:
env = gym.make('MountainCar-v0', new_step_api=False, render_mode='human')

  deprecation(
  deprecation(


The basic structure of the environment is described by the `observation_space` and the `action_space` attributes of the Gym `Env` class.
The `observation_space` defines the structure as well as the legitimate values for the observation of the state of the environment. The observation can be different things for different environments. The most common form is a screenshot of the game. There can be other forms of observations as well, such as certain characteristics of the environment described in vector form.

Similarly, the `Env` class also defines an attribute called the action_space, which describes the numerical structure of the legitimate actions that can be applied to the environment.

In [4]:
# Observation and action space 
obs_space = env.observation_space
action_space = env.action_space
print("The observation space: {}".format(obs_space))
print("The action space: {}".format(action_space))

The observation space: Box([-1.2  -0.07], [0.6  0.07], (2,), float32)
The action space: Discrete(3)


In this section, we cover functions of the Env class that help the agent interact with the environment. Two such important functions are:

* `reset`: This function resets the environment to its initial state, and returns the observation of the environment corresponding to the initial state.
* `step` : This function takes an action as an input and applies it to the environment, which leads to the environment transitioning to a new state. The reset function returns four things:
    * `observation`: The observation of the state of the environment.
    * `reward`: The reward that you can get from the environment after executing the action that was given as the input to the step function.
    * `done`: Whether the episode has been terminated. If true, you may need to end the simulation or reset the environment to restart the episode.
    * `info`: This provides additional information depending on the environment, such as number of lives left, or general information that may be conducive in debugging.

`env.render()` will allow you to view the environment in a separate window if you are using visual studio code.

In [5]:
env = gym.make('MountainCar-v0', new_step_api=False, render_mode='human')

# reset the environment and see the initial observation
obs = env.reset()
print("The initial observation is {}".format(obs))

# show the environment
env.render(mode="human")

# Sample a random action from the entire action space
random_action = env.action_space.sample()

# # Take the action and get the new observation space
new_obs, reward, done, info = env.step(random_action)
print("The new observation is {}".format(new_obs))

env.close()

The initial observation is [-0.47815958  0.        ]
The new observation is [-0.4774993   0.00066026]


See here for more information: https://www.gymlibrary.ml/content/api/[0m
  deprecation(


: 

In [None]:
# To show the observation as a screenshot:
env = gym.make('MountainCar-v0', new_step_api=False, render_mode='rgb_array')

# reset the environment and see the initial observation
obs = env.reset()
print("The initial observation is {}".format(obs))

# show the environment
env_screen = env.render(mode = 'rgb_array')

# Sample a random action from the entire action space
random_action = env.action_space.sample()

# # Take the action and get the new observation space
new_obs, reward, done, info = env.step(random_action)
print("The new observation is {}".format(new_obs))

env.close()

# This displays a single screen shot
plt.imshow(env_screen[0])

## Different methods to render and display environment

In [None]:
# Can work but too slow
# from IPython import display
# %matplotlib inline
# env = gym.make('MountainCar-v0')
# num_episodes=40
# for i_episode in range(num_episodes):
#     observation = env.reset()
#     for t in range(500):
#         plt.imshow(env.render(mode='rgb_array'))
#         display.display(plt.gcf())
#         display.clear_output(wait=True)
#         env.step(env.action_space.sample()) # take a random action
# env.close()

In [None]:
# Number of steps you run the agent for 
num_steps = 1500
env = gym.make('MountainCar-v0')

num_episodes = 40
for i_episode in range(num_episodes):
    obs = env.reset()
    for step in range(num_steps):
        # take random action, but you can also do something more intelligent
        # action = my_intelligent_agent_fn(obs) 
        action = env.action_space.sample()
        
        # apply the action
        obs, reward, done, info = env.step(action)
        
        # Render the env
        env.render(mode='human')

        # Wait a bit before the next frame unless you want to see a crazy fast video
        time.sleep(0.001)
        
        # If the epsiode is up, then start another one
        if done:
            print("\rEpisode {}/{} finished after {} timesteps".format(i_episode+1, num_episodes, t+1), end="")
            break

# Close the env
env.close()

In [None]:
## Rendering on new pygame window
env = gym.make('MountainCar-v0')
cum_reward = 0
num_episodes=40
for i_episode in range(num_episodes):
    observation = env.reset()
    for t in range(500):
        # Render into buffer.
        action = env.action_space.sample() # random action
        env.render()
        observation, reward, done, info = env.step(action)
        if done:
            print("\rEpisode {}/{} finished after {} timesteps".format(i_episode+1, num_episodes, t+1), end="")
            break
env.close()

In [None]:
## Saving to a gif (can take a while)

In [None]:
env = gym.make('MountainCar-v0')
cum_reward = 0
frames = []
num_episodes=40
for i_episode in range(num_episodes):
    observation = env.reset()
    for t in range(500):
        # Render into buffer.
        action = env.action_space.sample() # random action
        frame = env.render(mode='rgb_array')
        frames.append(helper._label_with_episode_number(frame, episode_num=i_episode))
        observation, reward, done, info = env.step(action)
        if done:
            print("\rEpisode {}/{} finished after {} timesteps".format(i_episode+1, num_episodes, t+1), end="")
            break
env.close()
imageio.mimwrite(os.path.join('./videos/', 'random_agent.gif'), frames, fps=60)

## Spaces
The `observation_space` for our environment was `Box(2,)`, and the `action_space` was `Discrete(2,)`. What do these actually mean? Both Box and Discrete are types of data structures called "Spaces" provided by Gym to describe the legitimate values for the observations and actions for the environments.

All of these data structures are derived from the gym.Space base class.

In [None]:
type(env.observation_space)

`Box(n,)` corresponds to the n-dimensional continuous space. In our case `n=2`, thus the observational space of our environment is a 2-D space. Of course, the space is bounded by upper and lower limits which describe the legitimate values our observations can take. We can determine this using the high and low attributes of the observation space. These correspond to the maximum and minimum positions/velocities in our environment, respectively.

In [None]:
print("Upper Bound for Env Observation", env.observation_space.high)
print("Lower Bound for Env Observation", env.observation_space.low)

The discrete `action_space` are the actions that the agent can take. In the `MountainCar` example, the car can do 1 of 3 things, `[stop]`, `[move left]` and `[move right]`.

In [None]:
action_space = env.action_space
print("The action space: {}".format(action_space))