<a href="https://colab.research.google.com/github/nunival/462-Computer-Vision/blob/main/Building_an_AI_game_bot_in_OpenAi_Gym.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Building an AI Game Bot
***
In this notebook I attempt to solve the MountainCar game in OpenAI Gym.<br>
<i>
"A car is on a one-dimensional track, positioned between two "mountains". The goal is to drive up the mountain on the right; however, the car's engine is not strong enough to scale the mountain in a single pass. Therefore, the only way to succeed is to drive back and forth to build up momentum."
</i>
<br>[Source](https://gym.openai.com/envs/MountainCar-v0/)

My approach for solving this is to train a TensorFlow model based on data from game play. This model will tell the car what to do based on the observations.

The data for this model will be generated by running 10,000 games with random moves. Based on the score of the game we can train the model to see which are good or bad moves.


**Resources:** 
* [Link to details on MountainCar game](https://github.com/openai/gym/wiki/MountainCar-v0)
* [This blog post](https://blog.tanka.la/2018/10/19/build-your-first-ai-game-bot-using-openai-gym-keras-tensorflow-in-python/) uses this approach to solve the CartPole game. I attempt the same approach on MountainCar

# Install dependancies

In [None]:
#remove " > /dev/null 2>&1" to see what is going on under the hood
!pip install gym pyvirtualdisplay > /dev/null 2>&1
!apt-get install -y xvfb python-opengl ffmpeg > /dev/null 2>&1

In [None]:
!apt-get update > /dev/null 2>&1
!apt-get install cmake > /dev/null 2>&1
!pip install --upgrade setuptools 2>&1
!pip install ez_setup > /dev/null 2>&1
!pip install gym[atari] > /dev/null 2>&1

# Imports and Helper functions


In [None]:
import gym
from gym import logger as gymlogger
from gym.wrappers import Monitor
gymlogger.set_level(40) #error only
import tensorflow as tf
import numpy as np
import random
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import math
import glob
import io
import base64
from IPython.display import HTML

from IPython import display as ipythondisplay

In [None]:
from pyvirtualdisplay import Display
display = Display(visible=0, size=(1400, 900))
display.start()

In [None]:
"""
Utility functions to enable video recording of gym environment and displaying it
To enable video, just do "env = wrap_env(env)""
"""

def show_video():
  mp4list = glob.glob('video/*.mp4')
  if len(mp4list) > 0:
    mp4 = mp4list[0]
    video = io.open(mp4, 'r+b').read()
    encoded = base64.b64encode(video)
    ipythondisplay.display(HTML(data='''<video alt="test" autoplay 
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
  else: 
    print("Could not find video")
    

def wrap_env(env):
  env = Monitor(env, './video', force=True)
  return env

# Mountain Car

In [None]:
env = wrap_env(gym.make("MountainCar-v0"))

In [None]:
#check out the action space!
print(env.action_space)

There are 3 actions the computer can take:<br>
0 - Go left<br>
1 - Do nothing<br>
2 - Go right

## Understanding the data

Viewing a game where random actions are taken

In [None]:
env = wrap_env(gym.make("MountainCar-v0"))
observation = env.reset()

while True:
  
    env.render()
    
    #your agent goes here
    action = env.action_space.sample()
         
    observation, reward, done, info = env.step(action) 
        
    if done: 
      break;
            
env.close()
show_video()

### Viewing the data that the game produces
We see that each observation has two data points: position and velocity. We can also see the reward and what action was taken (go left/right).

In [None]:
observation = env.reset()

while True:
  
    env.render()
    
    action = env.action_space.sample()
    observation, reward, done, info = env.step(action)
    #print("Step {}:".format(step_index))
    print("action: {}".format(action))
    print("reward: {}".format(reward))
    print("done: {}".format(done))
    print("info: {}".format(info))
    print("observation: {}".format(observation))
    print("***********************************************************")
        
    if done: 
      break;
            
env.close()
show_video()

## Creating Data for a Model

In [None]:
import gym
import random
import numpy as np
from keras.models     import Sequential
from keras.layers     import Dense
from keras.optimizers import Adam

To train a model we need data. We'll run 10,000 (initial_games) games to generate data. We'll save this data and use it to teach a model which actions are good and which aren't.

In [None]:
env.reset()
goal_steps = 200            # Game ends after 200 moves

# Read the documentation at the top, for each move we lose a point. So 
# we only want games that do better than -200 (that means the game never
# won)
score_requirement = -199    

# We're going to run 10,000 games
intial_games = 10000

In [None]:
def model_data_preparation():
    training_data = []
    accepted_scores = []
    for game_index in range(intial_games):
        score = 0
        game_memory = []
        previous_observation = []
        for step_index in range(goal_steps):
            action = random.randrange(0, 3) #Pick a random move: 0, 1, 2
            observation, reward, done, info = env.step(action)
            
            if len(previous_observation) > 0:
                game_memory.append([previous_observation, action])
                
            previous_observation = observation
            score += reward
            if done:
                break
            
        if score >= score_requirement:
            accepted_scores.append(score)
            for data in game_memory:
                if data[1] == 1:        # One hot encoding the move
                    output = [0, 1, 0]
                elif data[1] == 0:
                    output = [1, 0, 0]
                elif data[1] == 2:
                    output = [0, 0, 1]
                training_data.append([data[0], output])
        if game_index % 100 == 0:             # allows us to monitor how many games have progressed
          print("Games Completed: ", game_index)
        env.reset()

    print(accepted_scores)
    
    return training_data

In [None]:
training_data = model_data_preparation()

In [None]:
len(training_data)

So we have a problem here. Not a single one of our games wins based on random moves. All of our games have a -200 score (since they're not winning they take all 200 moves and get -1 score per move).

With all of our games having the same score we can't teach a model which choices to make, they're all equally bad. 

This code will need to be tweaked to find a different reward. Perhaps I can append the position and use that as the indicator of success. For example the highest position achieved per game. This could be used to identify which games performed better than others, even if none of them won.

In [None]:
# Building a sequential model

def build_model(input_size, output_size):
    model = Sequential()
    model.add(Dense(128, input_dim=input_size, activation='relu'))
    model.add(Dense(52, activation='relu'))
    model.add(Dense(output_size, activation='linear'))
    model.compile(loss='mse', optimizer=Adam())
    return model

In [None]:
# Training the model 

def train_model(training_data):
    X = np.array([i[0] for i in training_data]).reshape(-1, len(training_data[0][0]))
    y = np.array([i[1] for i in training_data]).reshape(-1, len(training_data[0][1]))
    model = build_model(input_size=len(X[0]), output_size=len(y[0]))
    
    model.fit(X, y, epochs=10)
    return model

In [None]:
trained_model = train_model(training_data)

Below we'll run 100 games using the model to make the decision on which action to take. We then at the end aggregate the results.

In [None]:
scores = []
choices = []
for each_game in range(100):
    score = 0
    prev_obs = []
    for step_index in range(goal_steps):
         env.render()
         if len(prev_obs)==0:
           action = random.randrange(0,3)
         else:
           action = np.argmax(trained_model.predict(prev_obs.reshape(-1, len(prev_obs)))[0])
        
         choices.append(action)
         new_observation, reward, done, info = env.step(action)
         prev_obs = new_observation
         score+=reward
         if done:
             break

    env.reset()
    scores.append(score)

print(scores)
print('Average Score:', sum(scores)/len(scores))
print('choice 1:{}  choice 0:{}'.format(choices.count(1)/len(choices),choices.count(0)/len(choices)))
show_video()

# A Simpler Solution

So as noted above the process breaks down because of the reward structure. There is no way to gain points, only try to lose as few as possible by winning as soon as possible. 

I took a step back and thought about this problem. If I were playing the game, not the computer, how would I play? Well I would go up one hill and as soon as I started going back down I'd hit the key to go in that direction. 

It occured to me that this logic could be very easily written. If the car is going to the left drive to the left, if it's going to the right drive to the right. 

As you can see below I use 4 lines of code to win the game. 

Now it isn't transferable, this code won't work for any game. But if the goal is to solve the game sometimes a fancy complicated Deep Learning model isn't necessary. While I still plan to play around with the TensorFlow approach it's a good reminder. Sometimes the simplest solution is the best.

In [None]:
env = wrap_env(gym.make("MountainCar-v0"))

observation = env.reset()

while True:
  
    env.render()
    
    #your agent goes here
    
    if observation[1] < 0 : # If Velocity is negative go left
      action = 0
    else:                   # Else go right
      action = 2

    observation, reward, done, info = env.step(action) 
        
    if done: 
      break;
            
env.close()
show_video()
print("Reward: ", reward)

# Victory!