<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Functions" data-toc-modified-id="Functions-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Functions</a></span></li><li><span><a href="#Car-Pole-Example" data-toc-modified-id="Car-Pole-Example-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Car Pole Example</a></span><ul class="toc-item"><li><span><a href="#Pure-Randomness" data-toc-modified-id="Pure-Randomness-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Pure Randomness</a></span></li><li><span><a href="#Intelligent-Symstem" data-toc-modified-id="Intelligent-Symstem-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Intelligent Symstem</a></span></li><li><span><a href="#Random-Search" data-toc-modified-id="Random-Search-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Random Search</a></span></li></ul></li></ul></div>

# Car Mountain Example

The main idea of this notebook is to interact with the `car_mountain` from [Open AI Gym](https://gym.openai.com/) and treat different algorithms for this problem.

In [1]:
#Loading the required libreries
import gym
import numpy as np
import xgboost
import itertools
import math
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.model_selection import cross_val_score, StratifiedKFold
from random import randint

## Functions

This section has all the function that I will use in the notebook.

In [2]:
#Genera un expand_grid para hacer validacion cruzada
def expand_grid(*itrs):
   product = list(itertools.product(*itrs))
   return pd.DataFrame({'Var{}'.format(i+1):[x[i] for x in product] for i in range(len(itrs))})

## Car Pole Example

This section covers the `car_mountain` example.

In [3]:
#My envorinement
game = 'Car Mountain'
my_env = gym.make('MountainCar-v0')

In [4]:
#Reset the game, the car in this case
obs = my_env.reset()

In [5]:
#See the action space: we can take 3 actions, nothing, left, right
my_env.action_space

Discrete(3)

### Pure Randomness

Here we do 1000 iterations and average the result just picking an action between 0 and 1 randomly. As we can see, choicing random actions it is impossible to win the game (always reach the maximum steps allowed).

In [10]:
#Exercise: Determine how many steps, on average, are taken, when
#actions are randomly sampled
steps_array = []
for i in range(1000):
    my_env.reset()
    done = False
    steps = 0
    while not done:
        my_env.render()
        obs, rew, done, _ = my_env.step(action=my_env.action_space.sample())
        steps += 1
    steps_array.append(steps)
my_env.close()
print('AVERAGE OF STEPS:', np.mean(steps_array))
print('STD OF STEPS:', np.round(np.std(steps_array), 2))

AVERAGE OF STEPS: 200.0
STD OF STEPS: 0.0


### Intelligent Symstem

This section uses an inteligent system in order to determine waht action to take. It's simple:

1. If the car has positive velocity and before had postive velocity: move it to the right.
2. If the car has positive velocity and before had negative velocity: move it to the right.
3. If the car has negative velocity and before had postive velocity: move it to the left.
4. If the car has negative velocity and before had negative velocity: move it to the left.

This will increase the `momentum`.

In order to minimize the number of steps, the `init_action` is taken as a function of the initial postition that takes values in $[-0.4,0.6]$.

If `init_pos` > 0.475, then move the car to the left (let it fall). Otherwise, move it to the right following the same idea.

In [9]:
### Define an error function using the pole velocity as the unique parameter to penalize the model
def assing_action(obs, pre_obs, pre_action):
    pos, vel = obs
    pre_pos, pre_vel = pre_obs
    if (pre_vel > 0) and (vel <= 0):
        return 0
    elif (pre_vel > 0) and (vel > 0):
        return 2
    elif (pre_vel <= 0) and (vel >= 0):
        return 2
    else:
        return 0

In [10]:
#Exercise: Determine how many steps, on average, are taken, when
#actions are randomly sampled
steps_array = []
for i in range(1000):
    obs = my_env.reset()
    init_action = 2
    if obs[0] > -0.475:
        init_action = 0
    action = init_action
    done = False
    steps = 0
    while not done:
#         my_env.render()
        new_obs, rew, done, _ = my_env.step(action=action)
        steps += 1
        action = assing_action(obs=new_obs, pre_obs=obs, pre_action=action)
        
    steps_array.append(steps)
# my_env.close()
print('AVERAGE OF STEPS:', np.round(np.mean(steps_array), 2), 'WITH INIT ACTION CUSTOMIZED')
print('STD OF STEPS:', np.round(np.std(steps_array), 2), 'WITH INIT ACTION CUSTOMIZED')
print('MINIMUM OF STEPS:', np.round(np.min(steps_array), 2), 'WITH INIT ACTION CUSTOMIZED')
print('MAXIMUM OF STEPS:', np.round(np.max(steps_array), 2), 'WITH INIT ACTION CUSTOMIZED')

print('\n')

steps_array = []
for i in range(1000):
    obs = my_env.reset()
    init_action = 1
    action = init_action
    done = False
    steps = 0
    while not done:
#         my_env.render()
        new_obs, rew, done, _ = my_env.step(action=action)
        steps += 1
        action = assing_action(obs=new_obs, pre_obs=obs, pre_action=action)
        
    steps_array.append(steps)
# my_env.close()
print('AVERAGE OF STEPS:', np.round(np.mean(steps_array), 2), 'WITH INIT ACTION:', init_action)
print('STD OF STEPS:', np.round(np.std(steps_array), 2), 'WITH INIT ACTION:', init_action)
print('MINIMUM OF STEPS:', np.round(np.min(steps_array), 2), 'WITH INIT ACTION:', init_action)
print('MAXIMUM OF STEPS:', np.round(np.max(steps_array), 2), 'WITH INIT ACTION:', init_action)

print('\n')

steps_array = []
for i in range(1000):
    obs = my_env.reset()
    init_action = 2
    action = init_action
    done = False
    steps = 0
    while not done:
#         my_env.render()
        new_obs, rew, done, _ = my_env.step(action=action)
        steps += 1
        action = assing_action(obs=new_obs, pre_obs=obs, pre_action=action)
        
    steps_array.append(steps)
# my_env.close()
print('AVERAGE OF STEPS:', np.round(np.mean(steps_array), 2), 'WITH INIT ACTION:', init_action)
print('STD OF STEPS:', np.round(np.std(steps_array), 2), 'WITH INIT ACTION:', init_action)
print('MINIMUM OF STEPS:', np.round(np.min(steps_array), 2), 'WITH INIT ACTION:', init_action)
print('MAXIMUM OF STEPS:', np.round(np.max(steps_array), 2), 'WITH INIT ACTION:', init_action)

print('\n')


steps_array = []
for i in range(1000):
    obs = my_env.reset()
    init_action = 0
    action = init_action
    done = False
    steps = 0
    while not done:
#         my_env.render()
        new_obs, rew, done, _ = my_env.step(action=action)
        steps += 1
        action = assing_action(obs=new_obs, pre_obs=obs, pre_action=action)
        
    steps_array.append(steps)
# my_env.close()
print('AVERAGE OF STEPS:', np.round(np.mean(steps_array), 2), 'WITH INIT ACTION:', init_action)
print('STD OF STEPS:', np.round(np.std(steps_array), 2), 'WITH INIT ACTION:', init_action)
print('MINIMUM OF STEPS:', np.round(np.min(steps_array), 2), 'WITH INIT ACTION:', init_action)
print('MAXIMUM OF STEPS:', np.round(np.max(steps_array), 2), 'WITH INIT ACTION:', init_action)

AVERAGE OF STEPS: 107.88 WITH INIT ACTION CUSTOMIZED
STD OF STEPS: 13.81 WITH INIT ACTION CUSTOMIZED
MINIMUM OF STEPS: 86 WITH INIT ACTION CUSTOMIZED
MAXIMUM OF STEPS: 125 WITH INIT ACTION CUSTOMIZED


AVERAGE OF STEPS: 115.93 WITH INIT ACTION: 1
STD OF STEPS: 26.3 WITH INIT ACTION: 1
MINIMUM OF STEPS: 86 WITH INIT ACTION: 1
MAXIMUM OF STEPS: 189 WITH INIT ACTION: 1


AVERAGE OF STEPS: 119.3 WITH INIT ACTION: 2
STD OF STEPS: 3.72 WITH INIT ACTION: 2
MINIMUM OF STEPS: 113 WITH INIT ACTION: 2
MAXIMUM OF STEPS: 125 WITH INIT ACTION: 2


AVERAGE OF STEPS: 130.67 WITH INIT ACTION: 0
STD OF STEPS: 32.57 WITH INIT ACTION: 0
MINIMUM OF STEPS: 86 WITH INIT ACTION: 0
MAXIMUM OF STEPS: 192 WITH INIT ACTION: 0


Thorugh visualization it was clear that it is no need to climb to the left corner, now we use the `position` parameter: if the car reach some negative position, we just push it to the right.

In [11]:
### Define an error function using the pole velocity as the unique parameter to penalize the model
def assing_action_alt(obs, pre_obs, pre_action):
    pos, vel = obs
    pre_pos, pre_vel = pre_obs
    if (pre_vel > 0) and (vel <= 0):
        return 0
    elif (pre_vel > 0) and (vel > 0):
        return 2
    elif (pre_vel <= 0) and (vel >= 0):
        return 2
    else:
        if pos < -0.85:
            return 2
        return 0

In [12]:
#Exercise: Determine how many steps, on average, are taken, when
#actions are randomly sampled
steps_array = []
for i in range(1000):
    obs = my_env.reset()
    init_action = 2
    if obs[0] > -0.475:
        init_action = 0
    action = init_action
    done = False
    steps = 0
    while not done:
#         my_env.render()
        new_obs, rew, done, _ = my_env.step(action=action)
        steps += 1
        action = assing_action_alt(obs=new_obs, pre_obs=obs, pre_action=action)
        
    steps_array.append(steps)
# my_env.close()
print('AVERAGE OF STEPS:', np.round(np.mean(steps_array), 2), 'WITH INIT ACTION CUSTOMIZED')
print('STD OF STEPS:', np.round(np.std(steps_array), 2), 'WITH INIT ACTION CUSTOMIZED')
print('MINIMUM OF STEPS:', np.round(np.min(steps_array), 2), 'WITH INIT ACTION CUSTOMIZED')
print('MAXIMUM OF STEPS:', np.round(np.max(steps_array), 2), 'WITH INIT ACTION CUSTOMIZED')

AVERAGE OF STEPS: 106.79 WITH INIT ACTION CUSTOMIZED
STD OF STEPS: 11.03 WITH INIT ACTION CUSTOMIZED
MINIMUM OF STEPS: 85 WITH INIT ACTION CUSTOMIZED
MAXIMUM OF STEPS: 116 WITH INIT ACTION CUSTOMIZED


### Random Search

Now we will implement a random search for the weights of a linear model with the parameters of the car i.e.

$$a = \sigma(w_0 + w_1c_v + w_2c_p)$$

In this example, we do not have bias $w_0$ i.e. $w_0=0$.

In [7]:
def assign_action(obs, weights):
    y = np.dot(obs, weights)
    if y > 0:
        return 2
    return 0

In [8]:
#Vectors and values relevant for final analysis
steps_best_sample = None
steps_best_avg = math.inf
step_avgs = []
best_weights = None

#init my_env
obs = my_env.reset()

for r_i in range(1000):
    
    #Init random weights between [-1,1] for each one of the parameters of the models
    random_weights = np.random.random(len(obs))*2 - 1 

    steps_reps = []
    for i in range(100):
        obs = my_env.reset()
        done = False
        steps = 0
        while not done:
            steps += 1
            action = assign_action(obs, weights=random_weights)
            obs, rew, done, _ = my_env.step(action=action)
        steps_reps.append(steps)
    
    step_avgs.append(np.mean(steps_reps))
    if np.mean(steps_reps) < steps_best_avg:
        steps_best_avg = np.mean(steps_reps)
        steps_best_sample = steps_reps
        best_weights = random_weights
    
    if (r_i+1) % 50 == 0:
        print('ITERATION NUMBER', r_i+1)

ITERATION NUMBER 50
ITERATION NUMBER 100
ITERATION NUMBER 150
ITERATION NUMBER 200
ITERATION NUMBER 250
ITERATION NUMBER 300
ITERATION NUMBER 350
ITERATION NUMBER 400
ITERATION NUMBER 450
ITERATION NUMBER 500
ITERATION NUMBER 550
ITERATION NUMBER 600
ITERATION NUMBER 650
ITERATION NUMBER 700
ITERATION NUMBER 750
ITERATION NUMBER 800
ITERATION NUMBER 850
ITERATION NUMBER 900
ITERATION NUMBER 950
ITERATION NUMBER 1000


In [9]:
print('BEST AVERAGE', steps_best_avg, 'WITH WEIGHTS', best_weights)

BEST AVERAGE 137.79 WITH WEIGHTS [0.0041712  0.94332336]


Wow! Random search for weights of linear model does not work very well, at least not as well as out intelligent system.