# Cart Pole Environment
The goal of the Cart Pole problem is to balance the pole placed upright on cart where a cart moves along a frictionless track. 

Detail documentation on Cart Pole environment [https://gymnasium.farama.org/environments/classic_control/cart_pole/] 

In [1]:
# Import gymnasium library
import gymnasium as gym

SEED = 42

In [2]:
# Initialize the Frozen Lake Environment
env = gym.make('CartPole-v1')

In [3]:
# Reset an environment to its initial internal state
obs, info = env.reset(seed=SEED)

# Print the initial position of agent in the environment
print("The initial observation is: {}".format(obs))
print("The information is : {}".format(info))

The initial observation is: [ 0.0273956  -0.00611216  0.03585979  0.0197368 ]
The information is : {}


In [4]:
# Print the Observation space (or state space) and action space 
print("The observation space: {}".format(env.observation_space))

The observation space: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)


The observation space consists of minimum and maximum value for 
1. Cart position ( -4.8 to 4.8 ), 
2. Cart Velocity ( -Inf to Inf ), 
3. Pole Angle ( ~-0.418 rad to 0.418 rad ), and, 
4. Pole Angular Velocity ( -Inf to Inf )

Box implies that our state space contains continuous values and not discrete values. We can obtain the maximum and minimum values as below:

In [5]:
print(env.observation_space.high, env.observation_space.low)

[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38] [-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]


And the action space in Cart Pole is discrete and contains two discrete values as:
1. 0: Push cart to the left
2. 1: Push cart to the right

In [6]:
print("The action space: {}".format(env.action_space))


The action space: Discrete(2)


# Cart Pole Balancing with Random Policy in multiple episode

In [7]:
# define action map
action_map = {
    0: 'left',
    1: 'right',
}

# Number of times the agent moves
num_timestep = 50

# Number of episodes
num_episodes = 100

for e in range(num_episodes):
    print("-----------------------------------")
    print("Episode {}/{}".format(e, num_episodes))
    print("-----------------------------------")
    
    # Initialize the return
    RETURN = 0
    
    # Initialize the state by resetting the environment
    state = env.reset(seed=SEED)
    
    # We take a random step for each episode
    for t in range(num_timestep):
        print("timestep: {} of episode: {}".format(t+1, e))
        print("-----------------------------------------------------")
        
        # select random action
        random_action = env.action_space.sample()
        
        # Take the action and get the new observation space
        next_state, reward, done, info, transition_prob = env.step(random_action)
        
        print("Action: {}".format(action_map[random_action]))
        print("Next State: {}".format(next_state))
        print("Reward: {}".format(reward))
        print("")
        
        RETURN = RETURN + reward
        
        # if the agent moves to hole state, then terminate
        if done: 
            break
    
    if e % 10 == 0:
        print("****************************************************")
        print("Episode: {}, Return: {}".format(e, RETURN))
        print("****************************************************")
    
    env.close()
    

-----------------------------------
Episode 0/100
-----------------------------------
timestep: 1 of episode: 0
-----------------------------------------------------
Action: left
Next State: [ 0.02727336 -0.20172954  0.03625453  0.32351476]
Reward: 1.0

timestep: 2 of episode: 0
-----------------------------------------------------
Action: right
Next State: [ 0.02323877 -0.00714208  0.04272482  0.04248186]
Reward: 1.0

timestep: 3 of episode: 0
-----------------------------------------------------
Action: right
Next State: [ 0.02309593  0.187342    0.04357446 -0.23642075]
Reward: 1.0

timestep: 4 of episode: 0
-----------------------------------------------------
Action: right
Next State: [ 0.02684277  0.3818152   0.03884605 -0.51504683]
Reward: 1.0

timestep: 5 of episode: 0
-----------------------------------------------------
Action: right
Next State: [ 0.03447907  0.57636917  0.02854511 -0.79523975]
Reward: 1.0

timestep: 6 of episode: 0
--------------------------------------------

In [8]:

print("****************************************************")
print("Final reward after episode: {}, Return: {}".format(num_episodes, RETURN))
print("****************************************************")

****************************************************
Final reward after episode: 100, Return: 46.0
****************************************************
