# Introduction to OpenAI Gym
wiki link: https://github.com/openai/gym/wiki

In [1]:
import gym

## Environment
Example: CartPole

Category: Classic Control

<img src="CartPole.png">

### Description
A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum starts upright, and the goal is to prevent it from falling over by increasing and reducing the cart's velocity.

### Source
This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson

In [8]:
# CartPole
env = gym.make('CartPole-v0')
print(env)

<TimeLimit<CartPoleEnv<CartPole-v0>>>


### State

In [3]:
# State/Observation Space (Fully Observable)

env.observation_space

Box(4,)

In [4]:
print('State Vector Shape:', env.observation_space.shape)
print('Low Limits:', env.observation_space.low)
print('High Limits:', env.observation_space.high)
print('Draw a Sample:', env.observation_space.sample())

State Vector Shape: (4,)
Low Limits: [-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]
High Limits: [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]
Draw a Sample: [-4.1203725e-01  1.2965357e+38 -3.5606518e-01 -1.0758932e+38]


## Action

env.action_space

In [5]:
print('Number of Actions:', env.action_space.n)
print('Draw an Action:', env.action_space.sample())

Number of Actions: 2
Draw an Action: 1


## Act and Observe

The core gym interface is Env, which is the unified environment interface. There is no interface for agents; that part is left to you. The following are the Env methods you should know:

* reset(self): Reset the environment's state. Returns observation.
* step(self, action): Step the environment by one timestep. Returns observation, reward, done, info.
* render(self, mode='human'): Render one frame of the environment. The default mode will do something human friendly, such as pop up a window.

Starting State: All observations are assigned a uniform random value between ±0.05

<img src="RL.png">

In [6]:
# reset environment
env.reset()

array([ 0.01003332, -0.03823305, -0.01277908,  0.02764704])

In [7]:
# take an action
observation, reward, done, info = env.step(0)
print('Observation:', observation)
print('Reward:', reward)
print('Done?:',done)

Observation: [ 0.00926866 -0.23316943 -0.01222614  0.31627079]
Reward: 1.0
Done?: False


## Reward
Reward is 1 for every step taken, including the termination step

## Episode Termination
* Pole Angle is more than ±12°
* Cart Position is more than ±2.4 (center of the cart reaches the edge of the display)
* Episode length is greater than 200

** Please check these limits as gym makes updates time to time.** 

In [10]:
print(env.spec.max_episode_steps)

200
