# OpenAI CarRacing with Behavioral Cloning

In this homework, you will train an agent to drive on a race track in a video-game style simulator. The agent has a neural network controller that you will train using example data of a car racing around the track. At each timestep, the neural network takes in the *state* of the car as an image and outputs which *action* to take. 

This system is known as a *Markov Decision Process (MDP)* because at each discrete timestep, the agent makes a decision using only the current state, with no memory of the previous state (this is called Markov property). In the context of Reinforcement Learning, this training strategy is known as *behavioral cloning* because we are learning by copying the actions of another agent.

The simulator is the CarRacing-v0 environment from OpenAI. In this environment, a *state* is a (96,96,3) color image which shows the position of the car along with the current speed, stearing position, and braking status in the bottom of the image. The *actions* that are available to the agent are stear (between -1 and 1), accelerate (0 to 1), and break (0 to 1). To simplify this assignment, I have converted this into a classification problem with only seven discrete actions:

0. Do nothing
1. Left
2. Left+Break
3. Right
4. Right+Break
5. Accelerate!
6. Break

Below is provided a dataset of 11,132 example (state, action) pairs you can use for training. These were sampled from simulations of a highly-skilled AI agent. The first cell downloads the data and installs many of the dependencies needed to run the simulations and generate videos in Google Colab. You should be able to train your agent and view videos of your agent within Colab.

## Tasks:
1.   Create a class called `Agent` with methods 'train' and 'act'.
2.   Train the agent to drive. Optimize hyperparameters such as the learning rate, network architecture, etc. You can do this by hand (you don't need to do anything fancy).
3. Create a video of your agent driving.

## To turn in:
1. Your code as a jupyter notebook.
2. A description of your agent model and its performance. Include this description after your code in the jupyter notebook, following the [Guide to Describing ML Methods](https://laulima.hawaii.edu/access/content/group/MAN.XLSIDIN35ps.202230/Guide_to_Describing_ML_Methods.pdf). I don't expect you to do extensive hyperparameter tuning, but you **must** describe the performance of your model on a validation set using the appropriate metrics so that you know when you are overfitting.
3. Upload a video of your best agent to [this google drive](https://drive.google.com/drive/folders/1Hk4PTqfr5A3BeW2m3mgAuQmbxo_Z-8AK?usp=sharing). (Feel free to also upload any funny or interesting behavior.)


In [None]:
# NO NEED TO MODIFY THIS CELL
# Dependencies for rendering openai gym in colab and enable video recording.
# Remove " > /dev/null 2>&1" to see what is going on under the hood
!pip install gym[box2d] pyvirtualdisplay piglet > /dev/null 2>&1
!apt-get install -y xvfb python-opengl ffmpeg > /dev/null 2>&1
import gym
from gym import logger as gymlogger
gymlogger.set_level(40) #error only
from gym.wrappers import Monitor
import tensorflow as tf
import numpy as np
import random, math, glob, io, base64
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import HTML
from IPython import display as ipythondisplay
from pyvirtualdisplay import Display
display = Display(visible=0, size=(1400, 900))
display.start()

def show_video():
  mp4list = glob.glob('video/*.mp4')
  if len(mp4list) > 0:
    mp4 = mp4list[0]
    video = io.open(mp4, 'r+b').read()
    encoded = base64.b64encode(video)
    ipythondisplay.display(HTML(data='''<video alt="test" autoplay 
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
  else: 
    print("Could not find video")
    
def wrap_env(env):
  """
  Utility functions to enable video recording of gym environment and displaying it
  To enable video, just do "env = wrap_env(env)""
  """
  return Monitor(env, './video', force=True)

# Download example data for training.
import gzip, os, pickle, random
import matplotlib.pyplot as plt
!gdown --id 1AQnMFSRU3qQcHA-ruS8Ahcz-00FmYoi0 # File shared on Peter's gdrive 6MB.
with gzip.open('carracing_behavior.gzip', 'rb') as f:
    states, action_classes = pickle.load(f)

print('\nState data shape (examples, x, y, color):', states.shape)
print('Action data shape (examples, action idx):', action_classes.shape)

# Plot an example state. This is the model input.
print('\nExample state (this is the input to your neural network):')
plt.imshow(states[0, :, :, :])

# The simulator expects a length-3 array corresponding to stear, 
# accellerate, and break. But I converted the training data actions into a 
# discrete set to frame the problem as classification. This is the set of 
# possible actions. The indices in training data targets (action_classes) 
# correspond to this set of actions. Your agent's act method should
# return one of these, not an integer index.

ACTION_SPACE = [[0, 0, 0],  # no action
                [-1, 0, 0],  # left
                [-1, 0, 1],  # left+break
                [1, 0, 0],  # right
                [1, 0, 1],  # right+break
                [0, 1, 0],  # acceleration
                [0, 0, 1], ]  # break

# Create, Train, and Simulate Agent

Create your agent class below. The code provided should help get you started. Then test your agent in the racing environment.






In [None]:
import tensorflow as tf
from tensorflow import keras
import keras
from gc import callbacks
from keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split

In [None]:
# One-hot encode and reshape actions

ohe = OneHotEncoder()
y = action_classes
y = y.reshape(-1, 1)
y = ohe.fit_transform(y).toarray()

In [None]:
class Agent():
  def __init__(self):
    self.action_space = ACTION_SPACE
    self.train_states = states.reshape(states.shape[0], 96*96*3)
    self.train_action = y

  def train(self):
    model = Sequential()
    model.add(Dense(32, input_dim=(96*96*3), activation='relu'))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(7, activation='softmax'))

    opt = SGD(learning_rate=1e-6, momentum=0.8)
    model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
    callbacks = [EarlyStopping(patience=6, monitor = 'val_loss')]
    model.fit(self.train_states, self.train_action, epochs=100, batch_size=32, validation_split=0.15, callbacks = callbacks)
    
    self.model = model

  def act(self, input):
    input = input.reshape(-1, 96*96*3) 
    return self.action_space[np.argmax(self.model.predict(input))]

agent = Agent()
agent.train()

# Simulate Agent

In [None]:
# NO NEED TO MODIFY THIS CELL
# Run simulation for t timesteps.
NUM_TIMESTEPS = 1000  # Increase this to run simulation longer.
with wrap_env(gym.make("CarRacing-v0")) as env: # Exits env when done.
  observation = env.reset()  # Restarts car at the starting line.
  for t in range(NUM_TIMESTEPS):
    env.render() 
    action = agent.act(observation)
    observation, reward, done, info = env.step(action)
    if done:
      print("Episode finished after {} timesteps".format(t+1))
      break
show_video()  # Video can be downloaded by clicking option in bottom right.

### Description of Agent Model

1. I am trying to train a neural network agent to drive on a race track in a 2D video-game style simulator. The input is image data with dimensions (96,96,3) and the goal is to produce the correct (output) action to take. This is a classification problem with 7 different outputs for the agent (brake, accelerate, left, right, ..).

2. Our training data comes from the output of a highly skilled AI agent, image by image. It consists of states (images of the car, the track and it's position) and the actions that the AI agent took.

3. I am using a fully connected neural network with 4 layers, linear rectified unit activation functions for hidden layers, softmax activation for the output layer, categorical crossentropy for the loss function and stochastic gradient descent for the optimizer. I chose the activation function as a result of parameter tuning, the loss function based on my research as what to use for classificaiton problems and the architecture and optimizer based on me wanting to replicate my attempt of building this kind of network without libraries. For this network, I am utilizing Keras.

4. Data was given to us cleaned and structured. I only needed to reshape the states and reshape and one-hot encode the actions. 

5. The total dataset contained 11132 examples that were then randomly divided into two subsets: 85% training and 15% validation. The network was trained on the training set, while the validation set was used for early stopping and hyperparameter optimization.

6. I optimized the hyperparameters in the neural network by hand and used the validation-loss for EarlyStopping as the deciding factor. For the network architecture I tried these number of neurons for the layers (except output layer), 32,64,128. For the activation functions I tried Sigmoid, tanh and ReLu. 
For the training process I adjusted learning rate, momentum and batch size. with following values
lr: (1e-4, 1e-5, 1e-6, 1e-7)
momentum: (0.5, 0.8, 1)
batch size: (16, 32, 64)

7. I exhaustively tried every combination of parameters for the network architecture (number of neurones and activation function) and then did the same for the training parameters with the already chosen architectural model. The model with the lowest validation loss was chosen.

8. The model achieved an validation loss of 1.3 and accuracy of about 62% on the 15% held out validation set. 

9. There does not seem to be an obvious difference between test and train data when we assume that the AI agent was indeed highly skilled and could have driven on-track infinitely long. 