# Cart Pole Balancing with Random Policy

Let's create an agent with the random policy, that is, we create the agent that selects the random action in the environment and tries to balance the pole. The agent receives +1 reward every time the pole stands straight up on the cart. We will generate over 100 episodes and we will see the return (sum of rewards) obtained over each episode. Let's learn this step by step.

First, create our cart pole environment:

In [2]:
import gym
env = gym.make('CartPole-v0')


Set the number of episodes and number of time steps in the episode:


In [3]:
num_episodes = 100
num_timesteps = 50

In [4]:
#for each episode
for i in range(num_episodes):
    
    #set the Return to 0
    Return = 0
    #initialize the state by resetting the environment
    state = env.reset()
    
    #for each step in the episode
    for t in range(num_timesteps):
        #render the environment
        env.render()
        
        #randomly select an action by sampling from the environment
        random_action = env.action_space.sample()
        
        #perform the randomly selected action
        next_state, reward, done, info = env.step(random_action)

        #update the return
        Return = Return + reward

        #if the next state is a terminal state then end the episode
        if done:
            break
    #for every 10 episodes, print the return (sum of rewards)
    if i%10==0:
        print('Episode: {}, Return: {}'.format(i, Return))
        

Episode: 0, Return: 23.0
Episode: 10, Return: 12.0
Episode: 20, Return: 23.0
Episode: 30, Return: 15.0
Episode: 40, Return: 19.0
Episode: 50, Return: 10.0
Episode: 60, Return: 16.0
Episode: 70, Return: 10.0
Episode: 80, Return: 22.0
Episode: 90, Return: 38.0


Close the environment:

In [5]:
env.close()