d-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 1200px">
</div>

-sandbox
# Introduction to OpenAI gym

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) In this lesson you learn:<br>
 - OpenAI gym
 - Run your first RL code
  
## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Library Requirements

<img alt="Caution" title="Caution" style="vertical-align: text-bottom; position: relative; height:1.3em; top:0.0em" src="https://files.training.databricks.com/static/images/icon-warning.svg"/> Additional libraries must be attached to your cluster for this lesson to work.

We will use the PyPI library **`gym==0.15.4`**.
* This library is used to develop and compare reinforcement learning algorithms

For more information on how to create and/or install PyPI libraries see:
* <a href="https://www.databricks.training/step-by-step/creating-pypi-libraries" target="_blank">Creating a Workspace Library</a>
* <a href="https://www.databricks.training/step-by-step/installing-libraries-from-pypi" target="_blank">Installing a Cluster Library</a> (recommended)

### What is OpenAI gym?

OpenAI gym provides different environments for RL problems. The user can either used already-built environment or build environment on top of those environments. This library does not care about what algorithms you are using for the agent. To write your own environment, you need to define 4 methods:
0. initialize the class i.e. init
0. reset(): how to reset the environment?
0. step(): how does the environment respond to an action?
0. render():how to render the results?

<br><br>
In the following demo, we will focus on CartPole-v0 environment. For details see [here](https://github.com/openai/gym/wiki/CartPole-v0).

In [4]:
# import required library
import gym

# import an environment  
env = gym.make('CartPole-v0')

# reset the environment. In this case all states variables are sampled uniformly from [-0.05, +0.05] 
env.reset()

# run for 100 time steps
for i in range(100):
  print(f"Observation, reward, done, info are: {env.step(0)}") # push cart to left (value is 0)


### Observations

0. `step()` function runs agent-environment loop
0. It carries an action, watches the observation
0. Observation, reward, done, info are the 4 elements of observation array

In [6]:
# reset the environment  
env.reset()
# Run for 1000 time steps
for i in range(100):
  print(f"Random action is: {env.action_space.sample()}")
  print(f"Observation, reward, done, info are: {env.step(env.action_space.sample())}") # take a random action


To properly write this, we need to define an episode and inside each episode we move through time steps. The following cell shows the result.

In [8]:
# Move through episodes 
for i_episode in range(100):
  #reset the environment for each episode
    observation = env.reset()
    print(f"Initial observation is: {observation}") # show initial observation
    for i in range(100):
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        print(f"Random action is: {action}")
        print(f"Observation, reward, done, info are: {observation}") # take a random action
        if done:
            print("Episode finished after {} timesteps".format(i+1))
            break


### Spaces

0. Action space
0. Observation space
0. Create a space, sampling from space, etc.

In [10]:
# Print action and observation space
print(env.action_space)
print(env.observation_space)

In [11]:
from gym import spaces
# Set with 8 elements {0, 1, 2, ..., 7}
space = spaces.Discrete(8) #7 different actions
x = space.sample()
assert space.contains(x), f"{x} does not belong to the space"
assert space.n == 8, f"The space must have 8 elements"

### Available Environments
[Click here to see all available environments](https://gym.openai.com/envs/#classic_control)

-sandbox
&copy; 2020 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>