# RL Environments

The availability of simulation environments that can be used for training RL agents are of uttermost importance. Hopefully, OpenAI Gym provides plenty of them!


Let's see how we can load and and use one!


You can safely ignore the following code (needed to render the environments inside a jupyter notebook)

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from JSAnimation.IPython_display import display_animation
from matplotlib import animation
from IPython.display import display

def display_in_jupyter(frames):
    """
    Thanks to http://mckinziebrandon.me/TensorflowNotebooks/2016/12/21/openai.html
    Install JSAnimation: pip3 install git+https://github.com/jakevdp/JSAnimation.git --user
    """
    patch = plt.imshow(frames[0])
    plt.axis('off')

    def animate(i):
        patch.set_data(frames[i])
    anim = animation.FuncAnimation(plt.gcf(), animate, frames = len(frames), interval=50)
    display(display_animation(anim, default_mode='loop'))
    

In [2]:
import gym

Let's examine an environment! First we will create the environment and get its initial state (observation):

In [3]:
env = gym.make('CartPole-v1')
observation = env.reset()
print(observation)

[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m
[ 0.00813606 -0.00037589 -0.02479676 -0.01750932]


What are the available actions?


In [4]:
print(env.action_space)

Discrete(2)


Let's sample some random actions and render the environment. To do so we use the step() function that return the next state of the environment (observations), the reward, whether the environment has been completed and various info (if available).

In [5]:
def run_environment_random(env):
    env.reset()
    frames = []
    total_reward = 0
    for t in range(2500):
        # Keep the frames of the simulator
        frames.append(env.render(mode='rgb_array'))
        # Sample a random action
        action = env.action_space.sample()
        # Perform the action and get the reward and the new state (observation)
        observation, reward, done, info = env.step(action)
        total_reward +=reward
        if done:
            break

    return frames, total_reward

frames, total_reward = run_environment_random(env)


Let's examine the saved frames and the total reward

In [6]:
display_in_jupyter(frames)
print("Total reward: ", total_reward)

Total reward:  15.0


Open AI provides a large variety of different environment ranging from simpler environments to more complex ones.

#### MountainCar-v0

In [7]:
env = gym.make('MountainCar-v0')
frames, total_reward = run_environment_random(env)
display_in_jupyter(frames)
print("Total reward: ", total_reward)

[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m


Total reward:  -200.0


#### HandManipulatePen-v0


In [8]:
env = gym.make('CarRacing-v0')
frames, total_reward = run_environment_random(env)
display_in_jupyter(frames)
print("Total reward: ", total_reward)

[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m
Track generation: 1114..1397 -> 283-tiles track


Total reward:  -29.078014184397606


#### Breakout-v0

In [9]:
env = gym.make('Breakout-v0')
frames, total_reward = run_environment_random(env)
display_in_jupyter(frames)
print("Total reward: ", total_reward)

Total reward:  2.0
