### Details [1]

* Name: MountainCar
* Framework: OpenAI GYM
* [Webpage](https://gym.openai.com/envs/MountainCar-v0)

### Description

Get an under powered car to the top of a hill (top = 0.5 position). A car is on a one-dimensional track, positioned between two "mountains". The goal is to drive up the mountain on the right; however, the car's engine is not strong enough to scale the mountain in a single pass. Therefore, the only way to succeed is to drive back and forth to build up momentum.


## Source

Andrew Moore in his PhD thesis [Moore90].

### Environment

## Observation

Type: Box(2)

Num | Observation  | Min  | Max  
----|--------------|------|----   
0   | position     | -1.2 | 0.6
1   | velocity     | -0.07| 0.07


## Actions

Type: Discrete(3)

Num | Observation |
----|-------------|
0   | push left   |
1   | no push     |
2   | push right  |

## Reward

-1 for each time step, until the goal position of 0.5 is reached.

## Starting State

Random position from -0.6 to -0.4 with no velocity.

## Episode Termination

The episode ends when you reach 0.5 position

## Solution Requirements

Getting average reward of -110.0 over 100 consecutive trials.

![Mountain Car](media/mountaincar.png)

## Solution Design

* Linear function approximation
* Incremental 
* Model Free

### Feature Representation
Linear in the parameters for each action
\begin{align}
[s,  v,  s*v,  s^2,  v^2,  s^2*v,  s*v^2]
\end{align}

### State Action Value Estimate
$$\begin{eqnarray}
Q(s, a)' &=& \vec{\theta_t^T} \vec{\phi_s} &=&  \sum_{i=1}^n \vec{\theta_t}(i) \vec{\phi_s}(i)
\end{eqnarray}$$

### Weight update quantity(gradient)
\begin{align}
\nabla_\vec{\theta_t} Q_t (s)' &= \vec{\phi_s}
\end{align}

#### The solution design is completely model free and the same bunch of functions can be used for any other environment as well. There was no prior knowledge about how the rewards and observations were going to show up and what did they mean.

The code for the program is in 'MountainCar.py' file in the same directory as of this. Shown below are some log videos generated from the run.

In [1]:
import io
import base64
from IPython.display import HTML

In [3]:
filename = 'media/beginning.mp4'

video = io.open(filename, 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="Beginning" controls>
           <source src="data:video/mp4;base64,{0}" type="video/mp4" />
           </video>'''.format(encoded.decode('ascii')))

In [3]:
filename = 'media/dramatic.mp4'

video = io.open(filename, 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="Dramatic" controls>
           <source src="data:video/mp4;base64,{0}" type="video/mp4" />
           </video>'''.format(encoded.decode('ascii')))

In [4]:
filename = 'media/first end.mp4'

video = io.open(filename, 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="First End" controls>
           <source src="data:video/mp4;base64,{0}" type="video/mp4" />
           </video>'''.format(encoded.decode('ascii')))

The entire first run completion took about 23 minutes. Without the log being rendered and recorded it takes less than a minute to complete. But reflection of processes are computations are slowed down in order to make consumable videos for the human brain.

In [5]:
filename = 'media/last end.mp4'

video = io.open(filename, 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="Last End" controls>
           <source src="data:video/mp4;base64,{0}" type="video/mp4" />
           </video>'''.format(encoded.decode('ascii')))

The next run completion is far less dramatic and the agent seems to have learned something.

## Challenges
The number of allowed number of episode steps were only 200. An sample run on the environment showed that any naive stochastic search would take thousands or even hundred thousands of steps in an episode to reach the goal state. The restriction to just 200 is too small. However, this is due to the reason that OpenAI wants to put forward challenging problems for the world to solve. One of them is how to be smart about exploration.
## Solution
I have to manually edit the OpenAI source code to allow my agent to take much more number of steps than 200. However, OpenAI discourages uploading such simulation results as there is no novelty in such implementations.

### Room for improvement in the horizon
* Can try Coarse Feature Coding.
* Can try Tile Feature Coding.
* Can use RBF for feature encoding.
* Can use neural network, or even deep neural network as function approximator.
* Can use planning as in Dyna Architecture for faster convergence.
* Batch Gradient Descent with Experience Reply
* Fixed Q-Targets

### References:
* https://github.com/openai/gym/wiki/MountainCar-v0
* https://gym.openai.com/envs/MountainCar-v0