# CartPole-v0:
A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.

More description about this game can be found in [open AI gym](https://gym.openai.com/envs/CartPole-v0/) <br>

You are required to balance the cartpole for as long as possible, but for simplicity, let's balance it for 500 time steps. So, the maximum length of the episode is 500 timesteps. The episode ends before 500 timesteps if the pole falls off.

## Environment

**Stae/ Observation**

Num| Obsrevation| Minimum Value| Maximmum Value|
---|---|---|---|
0| Cart Position| -4.8| 4.8|
1| Cart Velocity| -Inf | Inf|
2| Pole angle|$~-41.8^\circ$| $~41.8^\circ$|
3| Pole velocity at tip| -Inf| Inf|

**Actions**

Num| Action|
---|---|
0| Push cart to the left|
1| Push cart to the right|

**Reward**

Reward is 1 for evary step taken , including terminal state

**Initial state**

Here, the state is represented by 4 values (Cart Position, Cart Velocity, Pole Angle, Pole Velocity at Tip). All observations are assigned a uniform random value 

**Episode Termination**
1. Pole angle is more than $\pm{12}^\circ$
2. cart position is more tha $\pm{2.4}$
3. Episode length is greater than 200


## Inspect Cartpole-V0 from open AI gym

In [1]:
import numpy as np
import gym

#make the environment of a game
env= gym.make('CartPole-v0')

Checking environment requirements/ Observation ranges

In [4]:
print("Maximum allowed (Cart position, Cart Velocity, Pole Angle, Angular Velocity) ", env.observation_space.high)
print("Minimum allowed (Cart position, Cart Velocity, Pole Angle, Angular Velocity) ", env.observation_space.low)

Maximum allowed (Cart position, Cart Velocity, Pole Angle, Angular Velocity)  [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]
Minimum allowed (Cart position, Cart Velocity, Pole Angle, Angular Velocity)  [-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]


Action spaces

In [9]:
print("Allowed actions ", env.action_space.n)

Allowed actions  2


A randomly chosen state

In [11]:
random_observation= env.reset()
print("One random observation (Cart position, Cart Velocity, Pole Angle, Angular Velocity) ", random_observation)

One random observation (Cart position, Cart Velocity, Pole Angle, Angular Velocity)  [ 0.00933478  0.04156223 -0.01158811  0.04236648]


Take an action in the environment

In [29]:
action= env.action_space.sample()
print("Random action ", action)

Random action  1


Calculate next state, reward, terminal staet achieved ? , info

In [32]:
observation, reward, done, info= env.step(action)
print("observation ", observation)
print("reward ", reward)
print("Reached terminal state ", done)

observation  [ 0.02354544  0.62746262 -0.0268198  -0.84762625]
reward  1.0
Reached terminal state  False


### Run for few episodes


In [36]:
for episode in range(20):

  # Initialize at start of every episode
  observation= env.reset() 
  total_reward = 0

  for timesteps in range(200):
     # print the movement of the cart pole
     #env.render()

     # take a random action to generate next step , reward etc
     action= env.action_space.sample()
     obs, rwd, done, info = env.step(action)

     total_reward += reward

     if done:
       print("Episode finished after {0} timesteps with total reward {1}".format(timesteps+1, total_reward))
       break
  env.close()



Episode finished after 15 timesteps with total reward 15.0
Episode finished after 20 timesteps with total reward 20.0
Episode finished after 11 timesteps with total reward 11.0
Episode finished after 21 timesteps with total reward 21.0
Episode finished after 16 timesteps with total reward 16.0
Episode finished after 18 timesteps with total reward 18.0
Episode finished after 24 timesteps with total reward 24.0
Episode finished after 12 timesteps with total reward 12.0
Episode finished after 25 timesteps with total reward 25.0
Episode finished after 24 timesteps with total reward 24.0
Episode finished after 9 timesteps with total reward 9.0
Episode finished after 44 timesteps with total reward 44.0
Episode finished after 13 timesteps with total reward 13.0
Episode finished after 17 timesteps with total reward 17.0
Episode finished after 18 timesteps with total reward 18.0
Episode finished after 19 timesteps with total reward 19.0
Episode finished after 20 timesteps with total reward 20.0