# OpenAI Gym Intro

OpenAI Gym is a standard python API for RL environments. It's primary use is to help baseline and measure the effectiveness of different RL algorithms. In this course, we will use it a bit differently. We will be building a custom environment and using well known RL algorithms to analyze the environment we modeled!

Why do we use OpenAI Gym for this? It provides a lot of helpful tools for building out environments in a systematic way. It also includes a lot of build in things to help make the job faster and easier.

In this notebook, we'll show some basic OpenAI Gym functionality and hit on the major elements of the API.

Examples adapted from: 
- https://stable-baselines.readthedocs.io/en/master/guide/custom_env.html
- https://ai-mrkogao.github.io/reinforcement%20learning/openaigymtutorial/

## Basic Logic Flow
In the example below, we can see the major elements coming together. 
1. First we are going to instantiate our environment (here we will use an out of the box environment called Taxi).
2. We reset our environment and obtain the initial state
3. We begin taking steps (in this case a maximum of 1000 steps)
4. We render the environment so we can see the initial state
5. Our agent (human or machine) makes actions that we capture and pass into the step function
6. Our step function will execute the game logic, consuming actions and returning a new state (observation), reward, done, and info
7. We check if we are done, stop if so, continue if not
8. This ends the "episode". For training, we would want to run many episodes.

In [1]:
import gym
env = gym.make("Taxi-v2")
observation = env.reset()
for _ in range(1000):
    # env.render()  # usually we render but we'll skip in jupyter
    # your agent here (this takes random actions)
    action = env.action_space.sample()
    observation, reward, done, info = env.step(action)
    if done:
        # env.render()  # usually we render but we'll skip in jupyter
        break

## Spaces
There are 4 major space datatypes that OpenAI Gym provides. There are:
1. Discrete
2. MultiDiscrete
3. Box
4. Tuple

### Discrete Space
A fixed and finite set of points, which map directly to actions or states. This is mostly used for action spaces as state spaces tend to be much more rich.

In [2]:
discrete_space = gym.spaces.Discrete(10)
discrete_space.sample()

0

### MultiDiscrete Space
Contains k-dimensions, each one made up a discrete space. This is mostly used to capture state spaces and one hot encoding of actions.

In [3]:
multidiscrete_space = gym.spaces.MultiDiscrete([5, 2, 2])
multidiscrete_space.sample()

array([0, 1, 0], dtype=int64)

### Box Space
This is similar to the MultiDiscrete Space except that it allows continuous values.

In [4]:
import numpy as np
box_space = gym.spaces.Box(np.array((-1.0, -2.0)), np.array((1.0, 2.0)))
box_space.sample()

array([ 0.59720695, -1.6172327 ], dtype=float32)

### Tuple Space
While not used frequently, it can be very helpful. The tuple space allows you to combine different simple states into a single space. In the example below we combine two different discrete spaces into a single tuple space.

In [5]:
space_1 = gym.spaces.Discrete(2)
space_2 = gym.spaces.Discrete(3)
tuple_space = gym.spaces.Tuple((space_1, space_2))
tuple_space.sample()

(0, 2)

## Custom Environments
Finally we will show a bare bones example of an example environment.

### Defining the Custom Environment
This shows the basic environment structure with the mandatory functions being overridden. In this example, we show a discrete action space with a box state space used to capture RBG channel images.

In [6]:
import gym
from gym import spaces

class CustomEnv(gym.Env):

    def __init__(self):
        super(CustomEnv, self).__init__()
        # Define action and observation space
        N_DISCRETE_ACTIONS = 5
        HEIGHT = 100
        WIDTH = 100
        N_CHANNELS = 3
        # They must be gym.spaces objects
        # Example when using discrete actions:
        self.action_space = spaces.Discrete(N_DISCRETE_ACTIONS)
        # Example for using image as input:
        self.observation_space = spaces.Box(low=0, 
                                            high=255,
                                            shape=(HEIGHT, WIDTH, N_CHANNELS), 
                                            dtype=np.uint8)

    def step(self, action):
        observation = 'observation'
        reward = 1
        done = false
        info = 'test'
        return observation, reward, done, info
  
    def reset(self):
        pass
        observaton = 'observation'
        return observation  # reward, done, info can't be included
  
    def render(self, mode='human'):
        pass
    
    def close (self):
        pass

### Running the Custom Environment
We can then use this environment like the build in OpenAI Gym environments. While we usually set up human play and RL training / playback, this example just shows how we can interrogate the action and state spaces of a given OpenAI Gym environment. 

In [7]:
env = CustomEnv()
sample_action = env.action_space.sample()
sample_state = env.observation_space.sample()
print('Action Sample: {}'.format(sample_action))
print('State Sample: {}'.format(sample_state))

Action Sample: 3
State Sample: [[[ 35 131 160]
  [217 183 230]
  [173 160 135]
  ...
  [163 218  93]
  [177  70 191]
  [162  92 140]]

 [[204 239  73]
  [234  15 247]
  [130 166  96]
  ...
  [ 35  55  67]
  [173 163  96]
  [ 95 190 226]]

 [[ 55 135  79]
  [ 17 185  16]
  [243 229 156]
  ...
  [151  25  74]
  [ 89 195   8]
  [ 50  76 135]]

 ...

 [[145  78 245]
  [156 151   2]
  [ 78  12 140]
  ...
  [229 141 113]
  [254  80  49]
  [196 143 209]]

 [[231 243 231]
  [ 13  48  30]
  [ 37 145 134]
  ...
  [141 113 119]
  [240  75 244]
  [ 19 146  56]]

 [[184 214 151]
  [ 70 215 103]
  [152 193 194]
  ...
  [ 85 200  42]
  [152  47 176]
  [208  96 162]]]
