# Reinforcement learning with openAI gym

Let us list all available gym environments.

In [2]:
from gym import envs
print(envs.registry.all())

dict_values([EnvSpec(Copy-v0), EnvSpec(RepeatCopy-v0), EnvSpec(ReversedAddition-v0), EnvSpec(ReversedAddition3-v0), EnvSpec(DuplicatedInput-v0), EnvSpec(Reverse-v0), EnvSpec(CartPole-v0), EnvSpec(CartPole-v1), EnvSpec(MountainCar-v0), EnvSpec(MountainCarContinuous-v0), EnvSpec(Pendulum-v0), EnvSpec(Acrobot-v1), EnvSpec(LunarLander-v2), EnvSpec(LunarLanderContinuous-v2), EnvSpec(BipedalWalker-v3), EnvSpec(BipedalWalkerHardcore-v3), EnvSpec(CarRacing-v0), EnvSpec(Blackjack-v0), EnvSpec(KellyCoinflip-v0), EnvSpec(KellyCoinflipGeneralized-v0), EnvSpec(FrozenLake-v0), EnvSpec(FrozenLake8x8-v0), EnvSpec(CliffWalking-v0), EnvSpec(NChain-v0), EnvSpec(Roulette-v0), EnvSpec(Taxi-v3), EnvSpec(GuessingGame-v0), EnvSpec(HotterColder-v0), EnvSpec(Reacher-v2), EnvSpec(Pusher-v2), EnvSpec(Thrower-v2), EnvSpec(Striker-v2), EnvSpec(InvertedPendulum-v2), EnvSpec(InvertedDoublePendulum-v2), EnvSpec(HalfCheetah-v2), EnvSpec(HalfCheetah-v3), EnvSpec(Hopper-v2), EnvSpec(Hopper-v3), EnvSpec(Swimmer-v2), EnvSp

## Cart-pole

Classic cart-pole system implemented by Rich Sutton et al.
Copied from http://incompleteideas.net/sutton/book/code/pole.c
permalink: https://perma.cc/C9ZM-652R

A pole is attached by an un-actuated joint to a **cart**, which moves along a *frictionless* track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.

In [1]:
import gym
import time
env = gym.make('CartPole-v0')
print(env.action_space)

print(env.observation_space) # an environment-specific object representing your observation of the environment. 
print(env.observation_space.high)
print(env.observation_space.low)
env.close()

Discrete(2)
Box(4,)
[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]
[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]


https://github.com/openai/gym/wiki/CartPole-v0 contains the description of what the four observations consists of.
The four values refer to the Cart position, velocity, Pole angle, and Pole velocity at tip.

In [3]:
env = gym.make('CartPole-v1')
for i_episode in range(20):
    observation = env.reset()
    for t in range(100):
        env.render()
        time.sleep(0.1)
        print(observation) # each observation is a box of size 4
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action) # done is True when game is finished
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break
env.close()

[-0.03888316  0.01903183  0.00865326  0.04548348]
[-0.03850252  0.21402863  0.00956293 -0.24445676]
[-0.03422195  0.4090127   0.00467379 -0.53410802]
[-0.02604169  0.60406861 -0.00600837 -0.82531461]
[-0.01396032  0.79927222 -0.02251466 -1.11988118]
[ 0.00202512  0.60445273 -0.04491228 -0.83434477]
[ 0.01411418  0.80015857 -0.06159918 -1.14080727]
[ 0.03011735  0.60589351 -0.08441532 -0.86806107]
[ 0.04223522  0.41201532 -0.10177654 -0.60306798]
[ 0.05047552  0.60840246 -0.1138379  -0.92599497]
[ 0.06264357  0.80486243 -0.1323578  -1.25217364]
[ 0.07874082  1.00140814 -0.15740128 -1.58321325]
[ 0.09876899  0.80847001 -0.18906554 -1.34346885]
Episode finished after 13 timesteps
[-0.02320371  0.00793118  0.02505184  0.04060254]
[-0.02304509 -0.18754088  0.02586389  0.34108311]
[-0.0267959   0.00720371  0.03268555  0.05666705]
[-0.02665183 -0.18837128  0.03381889  0.35948068]
[-0.03041925 -0.38395725  0.04100851  0.66263274]
[-0.0380984  -0.18942912  0.05426116  0.38313899]
[-0.04188698 -

[ 0.03638487 -0.17110826 -0.02039774  0.32693417]
[ 0.03296271  0.02429807 -0.01385906  0.02788913]
[ 0.03344867 -0.17062242 -0.01330128  0.31616735]
[ 0.03003622 -0.36555241 -0.00697793  0.60462601]
[ 0.02272517 -0.17033357  0.00511459  0.30975339]
[ 0.0193185  -0.36552802  0.01130966  0.60404491]
[ 0.01200794 -0.17056605  0.02339056  0.31494563]
[0.00859662 0.02421504 0.02968947 0.02973012]
[ 0.00908092  0.21889892  0.03028407 -0.25343959]
[ 0.0134589   0.41357566  0.02521528 -0.53641864]
[ 0.02173041  0.60833416  0.01448691 -0.82105101]
[ 0.03389709  0.413017   -0.00193411 -0.52384702]
[ 0.04215743  0.21792232 -0.01241105 -0.23177418]
[ 0.04651588  0.41321939 -0.01704654 -0.52834598]
[ 0.05478027  0.21834137 -0.02761346 -0.24108285]
[ 0.05914709  0.41384666 -0.03243511 -0.54234632]
[ 0.06742403  0.6094091  -0.04328204 -0.84506994]
[ 0.07961221  0.4149036  -0.06018344 -0.56630588]
[ 0.08791028  0.22067528 -0.07150956 -0.29317412]
[ 0.09232379  0.02664187 -0.07737304 -0.02387374]
[ 0.

[-0.10596183 -1.00227641  0.12899095  1.49903994]
[-0.12600736 -0.808936    0.15897175  1.24925695]
[-0.14218608 -1.0056978   0.18395689  1.58721915]
Episode finished after 20 timesteps
[0.02593761 0.04692749 0.04308151 0.0064879 ]
[ 0.02687616 -0.14878499  0.04321127  0.31244627]
[0.02390046 0.04569558 0.0494602  0.03369788]
[ 0.02481437  0.24007465  0.05013415 -0.24297878]
[0.02961586 0.04427377 0.04527458 0.06508678]
[ 0.03050134 -0.15146707  0.04657631  0.37170335]
[ 0.027472   -0.34721871  0.05401038  0.67870086]
[ 0.02052762 -0.15288702  0.0675844   0.40349972]
[0.01746988 0.04121458 0.07565439 0.1328667 ]
[ 0.01829418 -0.15490495  0.07831173  0.4484257 ]
[0.01519608 0.03902698 0.08728024 0.18141947]
[ 0.01597662 -0.15722843  0.09090863  0.50030984]
[0.01283205 0.03650227 0.10091483 0.2376041 ]
[ 0.01356209  0.23004857  0.10566691 -0.02161971]
[0.01816306 0.03358242 0.10523451 0.30244411]
[ 0.01883471 -0.16286965  0.1112834   0.62637393]
[ 0.01557732 -0.35935461  0.12381088  0.95

## Mountain-Car

A car is on a one-dimensional track, positioned between two "mountains". The goal is to drive up the mountain on the right; however, the car's engine is not strong enough to scale the mountain in a single pass. Therefore, the only way to succeed is to drive back and forth to build up momentum.

In [8]:
import gym
env = gym.make('MountainCar-v0')
env.reset()
for _ in range(500):
    env.render()
    env.step(env.action_space.sample()) # take a random action
break
env.close()

SyntaxError: 'break' outside loop (<ipython-input-8-8e6e354cf112>, line 10)

## Atari Space Invaders

In [9]:
import gym
env = gym.make('SpaceInvaders-v0')
env.reset()
for _ in range(500):
    env.render()
    break
env.close()

## Atari Breakout

In [7]:
import gym
import time
env = gym.make('Breakout-v0')
env.reset()
for _ in range(1000):
    env.render()
    time.sleep(0.1)
    env.step(env.action_space.sample()) # take a random action
env.close()