<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Functions" data-toc-modified-id="Functions-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Functions</a></span></li><li><span><a href="#Car-Pole-Example" data-toc-modified-id="Car-Pole-Example-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Car Pole Example</a></span><ul class="toc-item"><li><span><a href="#Pure-Randomness" data-toc-modified-id="Pure-Randomness-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Pure Randomness</a></span></li><li><span><a href="#Pole-Velocity-Error" data-toc-modified-id="Pole-Velocity-Error-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Pole Velocity Error</a></span></li></ul></li></ul></div>

# Car Pole Example

The main idea of this notebook is to interact with the `car_pole` from [Open AI Gym](https://gym.openai.com/) and treat different algorithms for this problem.

In [2]:
#Loading the required libreries
import gym
import numpy as np
import xgboost
import itertools
import math
import pandas as pd
import io
import base64
import os

from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.model_selection import cross_val_score, StratifiedKFold
from gym import wrappers
from IPython.display import HTML

## Functions 

This section has all the function that I will use in the notebook.

In [3]:
#Genera un expand_grid para hacer validacion cruzada
def expand_grid(*itrs):
   product = list(itertools.product(*itrs))
   return pd.DataFrame({'Var{}'.format(i+1):[x[i] for x in product] for i in range(len(itrs))})

## Car Pole Example

This section covers the `car_pole` example.

In [4]:
#My envorinement
game = 'Car Pole'
my_env = gym.make('CartPole-v0')
my_env = wrappers.Monitor(my_env, '../Gym Videos/' + game, force=True)

In [22]:
#Reset the game, the car in this case
my_env.reset()

array([ 0.00537611,  0.0208571 , -0.04964598,  0.01920917])

### Pure Randomness

Here we do 1000 iterations and average the result just picking an action between 0 and 1 randomly.

In [30]:
#Exercise: Determine how many steps, on average, are taken, when
#actions are randomly sampled
steps_array = []
for i in range(10):
    my_env.reset()
    done = False
    steps = 0
    while not done:
#         my_env.render()
        obs, rew, done, _ = my_env.step(action=my_env.action_space.sample())
        steps += 1
    steps_array.append(steps)
my_env.close()

print('AVERAGE OF STEPS:', np.mean(steps_array))
print('STD OF STEPS:', np.round(np.std(steps_array), 2))

AVERAGE OF STEPS: 21.6
STD OF STEPS: 8.88


### Pole Velocity Error

This section pick an action based on the error function $E = p^2$ where $p$ is the pole velocity. I define the error in that way since we want a low pole velocity. Again, we do 1000 iterations an average the results.

In [37]:
#Define an error function using the pole velocity as the unique parameter to penalize the model
def error_function(obs):
    pos, vel, angle, pole_vel = obs
    error = (pole_vel**2)
    return error

In [9]:
#Exercise: Determine how many steps, on average, are taken, when
#actions are randomly sampled
steps_array = []
for i in range(1000):
    my_env.reset()
    action = my_env.action_space.sample()
    previous_error = -math.inf
    done = False
    steps = 0
    while not done:
#         my_env.render()
        obs, rew, done, _ = my_env.step(action=action)
        steps += 1
        new_error = error_function(obs)
        if new_error < previous_error:
            previous_error = new_error
            pass
        else:
            previous_error = new_error
            action = abs(action-1)
            
    steps_array.append(steps)
# my_env.close()
print('AVERAGE OF STEPS:', np.round(np.mean(steps_array), 2))
print('STD OF STEPS:', np.round(np.std(steps_array), 2))

AVERAGE OF STEPS: 157.64
STD OF STEPS: 21.66


Using this error function is clear that the number of steps is greater (much greater) that the randomness algorithm in average. Now, how can we improve the new record? 

As we can see in the video, what happens is that the car arrives at one the corners and die. So, my idea is to use the `position`  in a new error function.

In [58]:
#Define an error function using the pole velocity as the unique parameter to penalize the model
def error_function(obs):
    pos, vel, angle, pole_vel = obs
    error = ((pos*pole_vel)**2)
    return error

In [63]:
#Exercise: Determine how many steps, on average, are taken, when
#actions are randomly sampled
steps_array = []
for i in range(1000):
    my_env.reset()
    action = my_env.action_space.sample()
    previous_error = -math.inf
    done = False
    steps = 0
    while not done:
#         my_env.render()
        obs, rew, done, _ = my_env.step(action=action)
        steps += 1
        new_error = error_function(obs)
        if new_error < previous_error:
            previous_error = new_error
            pass
        else:
            previous_error = new_error
            action = abs(action-1)
            
    steps_array.append(steps)
# my_env.close()
print('AVERAGE OF STEPS:', np.round(np.mean(steps_array), 2))
print('STD OF STEPS:', np.round(np.std(steps_array), 2))

AVERAGE OF STEPS: 162.86
STD OF STEPS: 24.92


In [7]:
#See the videos in the cell of jupyter
mp4_files = os.listdir('../Gym Videos/' + game)
mp4_files = [f for f in mp4_files if f.endswith('mp4')]

#PICK ONE THE VIDEOS
f = mp4_files[1]
#Example of Video
video = io.open('../Gym Videos/' + game + '/' + f, 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''
    <video width="360" height="auto" alt="test" controls><source src="data:video/mp4;base64,{0}" type="video/mp4" /></video>'''
.format(encoded.decode('ascii')))