## Chapter 2: OpenAI Gym

This chapter introduces and builds familiarity with the Gym environment.

#### First environment: CartPole

In 2D, we have an upright pole balancing on a horizontally-sliding cart. Our observation consists of four numbers, which represent various physical aspects of the pole's position and movement; our challenge is to keep the pole upright by moving the cart left and right, without explicitly knowing what the numbers mean (i.e. no cheating with proper physics!).

In [1]:
import gym

# make() initialises predefined environments
env = gym.make('CartPole-v0')

# Our first observation after resetting the env (must always do this before starting)
obs = env.reset()
obs

array([ 0.03292125,  0.01402482,  0.03537324, -0.03157429], dtype=float32)

We could have seen the shape and type of the action/observation spaces ahead of time using `e.action_space` and `e.observation_space`.

Now we can perform our first "step" by sending an action of `0` i.e. "move left" (we could instead have chosen `1`).

In [2]:
env.step(0)

(array([ 0.03320175, -0.18158607,  0.03474176,  0.27205607], dtype=float32),
 1.0,
 False,
 {})

#### Random CartPole agent

In [3]:
total_reward = 0.0
total_steps = 0
obs = env.reset()

while True:
    # This won't be a great agent as we're just going to choose a random action at each step
    action = env.action_space.sample()
    obs, reward, done, _ = env.step(action)
    total_reward += reward
    total_steps += 1
    if done:
        break

f"Episode done in {total_steps} steps, total reward {total_reward:.2f}"

'Episode done in 12 steps, total reward 12.00'