<a href="https://colab.research.google.com/github/nabilfrancis/sit796-task1.2P/blob/main/Task12P.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **AI Gym**

The OpenAI library is a Python suite that contains many classical pattern recognition, computer vision, machine learning and robotics tools and algorithmns for AI. We will be using this library for the practicals since it also includes a number of environments that are standard for the simulation of reinforcement learning techniques.

First, we will install the necessary X11 dependencies, in particular Xvfb, which is an X server that can run on machines with no display hardware and no physical input devices. 

After Xvfb is installed, the Python wrapper pyvirtualdisplay has to be setup in order to interact with Xvfb virtual displays from within Python. We will also install the Python bindings for OpenGL (PyOpenGL) and optional set of C (Cython) extensions providing acceleration of common operations in PyOpenGL 3.x (PyOpenGL-accelerate). Finally, we will install the OpenAI Gym package. 


In [None]:
# install the required AI Gym system dependencies
!apt-get install -y xvfb x11-utils  
!apt-get install x11-utils > /dev/null 2>&1
!pip install PyVirtualDisplay==2.0.* \
            PyOpenGL==3.1.* \
            PyOpenGL-accelerate==3.1.* \
            gym[box2d]==0.17.*
!pip install pyglet

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libxxf86dga1
Suggested packages:
  mesa-utils
The following NEW packages will be installed:
  libxxf86dga1 x11-utils xvfb
0 upgraded, 3 newly installed, 0 to remove and 29 not upgraded.
Need to get 993 kB of archives.
After this operation, 2,981 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/main amd64 libxxf86dga1 amd64 2:1.1.4-1 [13.7 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/main amd64 x11-utils amd64 7.7+3build1 [196 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 xvfb amd64 2:1.19.6-1ubuntu4.8 [784 kB]
Fetched 993 kB in 1s (1,332 kB/s)
Selecting previously unselected package libxxf86dga1:amd64.
(Reading database ... 160975 files and directories currently installed.)
Preparing to unpack .../libxxf86dga1_2%3a1.1.4-1_amd64.deb ...
Unpacking libxxf86dga1:amd64 (2:

Once installed, the Gym, and other libraries, can be imported. The code in the cell below imports the Gym, NumPy, MatPlot and Display libraries. It also creates a virtual display in the background that your Gym Envs can connect to for rendering. 

In [None]:
import gym
import numpy as np
import base64
import io
import IPython

Depending on which Gym environment we will be working on in each practical, we will be working with different environments. In this practical, we will work with the MountainCar-v0 environment. 

In the code below, we import the Gym wrappers as Monitor, and simulate a mountain car for 200 steps using the MountainCar-v0 environment. The environment corresponds to a car between two mountains. The car's engine is not strong enough to scale the mountain in a single pass. 

To show the behaivour of the car, we will produce a video that then we can play. This is since Gym assumes a monitor. Here, we are working on a Jupiter notebook and, hence, we will create a "Display" for rendering so we can use a video player instead. We start by cleaning up the directory we will store the video on with the following cell

In [None]:
!rm ./vid/*.*

rm: cannot remove './vid/*.*': No such file or directory


We then define the import names and the environment. Once this is done, we can then sample the action space. This is example, the observation o is two numbers: car position and velocity, r is the reward, d is true if the simulation is finished and i is the "info" 

In [None]:
from gym.wrappers import Monitor
from IPython import display
from pyvirtualdisplay import Display

d = Display()
d.start()

env = gym.make('MountainCar-v0')
env = Monitor(env,'./vid',force=True)


o = env.reset()

for _ in range(200):
    o, r, d, i = env.step(env.action_space.sample()) # Take action from DNN in actual training.

    if d:
        env.reset()

for f in env.videos:
    video = io.open(f[0], 'r+b').read()
    encoded = base64.b64encode(video)

    display.display(display.HTML(data="""
        <video alt="test" controls>
        <source src="data:video/mp4;base64,{0}" type="video/mp4" />
        </video>
        """.format(encoded.decode('ascii'))))

# **Policy Formulation**

Note the above code uses a random set of actions. Since the motor of the car has not enough power, the only way to succeed in the task (reach the cusp of the hill) is to drive back and forth to build up momentum. To complete achieve the goal, we need a suitable polciy.

In [None]:
import time
from random import randint
from random import seed

def policy(obs, t):
    # Write the code for your policy here. You can use the observation
    # (a tuple of position and velocity), the current time step, or both,
    # if you want.
    position, velocity = obs
    
    # This is an example policy. You can try running it, but it will not work 
    # since its just using random actions. 
    # The actions are
    #    0      Accelerate to the Left
    #    1      Don't accelerate
    #    2      Accelerate to the Right

    # seed random number generator with the system clock
    seed(time.clock())
        
    # generate random integers between zero and two
    actions = randint(0,2)
   
    return actions

 Once we define our policy, we can run it using the code below

In [None]:
!rm ./vid/*.* # Clean up the video before starting

TIME_LIMIT = 1200 # Set time limit

d = Display()
d.start()

env = gym.make('MountainCar-v0')
env = Monitor(env,'./vid',force=True)

o = env.reset()

for t in range(TIME_LIMIT):
    
    action = policy(o,t)  # Call your policy
    o, r, d, _ = env.step(action)  # Pass the action chosen by the policy to the environment
    
    # We don't do anything with reward here because MountainCar is a very simple environment,
    # and reward is a constant -1. Therefore, your goal is to end the episode as quickly as possible.

    if d and t<TIME_LIMIT-1:
        print("Task completed in", t, "time steps")
        break
else:
    print("Time limit exceeded. Try again.")

env.reset()



Task completed in 199 time steps


array([-0.52730119,  0.        ])

and produce a video using the cells below.

In [None]:
for f in env.videos:
    video = io.open(f[0], 'r+b').read()
    encoded = base64.b64encode(video)

    display.display(display.HTML(data="""
        <video alt="test" controls>
        <source src="data:video/mp4;base64,{0}" type="video/mp4" />
        </video>
        """.format(encoded.decode('ascii'))))



```
# This is formatted as code
```

# Testing Section

In [1]:
from gym import envs
print(envs.registry.all())

dict_values([EnvSpec(Copy-v0), EnvSpec(RepeatCopy-v0), EnvSpec(ReversedAddition-v0), EnvSpec(ReversedAddition3-v0), EnvSpec(DuplicatedInput-v0), EnvSpec(Reverse-v0), EnvSpec(CartPole-v0), EnvSpec(CartPole-v1), EnvSpec(MountainCar-v0), EnvSpec(MountainCarContinuous-v0), EnvSpec(Pendulum-v0), EnvSpec(Acrobot-v1), EnvSpec(LunarLander-v2), EnvSpec(LunarLanderContinuous-v2), EnvSpec(BipedalWalker-v3), EnvSpec(BipedalWalkerHardcore-v3), EnvSpec(CarRacing-v0), EnvSpec(Blackjack-v0), EnvSpec(KellyCoinflip-v0), EnvSpec(KellyCoinflipGeneralized-v0), EnvSpec(FrozenLake-v0), EnvSpec(FrozenLake8x8-v0), EnvSpec(CliffWalking-v0), EnvSpec(NChain-v0), EnvSpec(Roulette-v0), EnvSpec(Taxi-v3), EnvSpec(GuessingGame-v0), EnvSpec(HotterColder-v0), EnvSpec(Reacher-v2), EnvSpec(Pusher-v2), EnvSpec(Thrower-v2), EnvSpec(Striker-v2), EnvSpec(InvertedPendulum-v2), EnvSpec(InvertedDoublePendulum-v2), EnvSpec(HalfCheetah-v2), EnvSpec(HalfCheetah-v3), EnvSpec(Hopper-v2), EnvSpec(Hopper-v3), EnvSpec(Swimmer-v2), EnvSp

# Custom Environment

In [2]:
import gym
from gym import spaces

class CustomEnv(gym.Env):
  """Custom Environment that follows gym interface"""
  metadata = {'render.modes': ['human']}

  def __init__(self, arg1, arg2, ...):
    super(CustomEnv, self).__init__()
    # Define action and observation space
    # They must be gym.spaces objects
    # Example when using discrete actions:
    self.action_space = spaces.Discrete(N_DISCRETE_ACTIONS)
    # Example for using image as input:
    self.observation_space = spaces.Box(low=0, high=255, shape=
                    (HEIGHT, WIDTH, N_CHANNELS), dtype=np.uint8)

  def step(self, action):
    # Execute one time step within the environment
    ...
  def reset(self):
    # Reset the state of the environment to an initial state
    ...
  def render(self, mode='human', close=False):
    # Render the environment to the screen
    ...

SyntaxError: ignored