#Title: Reinforcement Learning using AI GYM

Links:
https://pypi.org/project/gym-super-mario-bros/


## Import packages and install extras.
Note: You will have to restart the kernel twice to fully install everything.

In [2]:
# Imports
!pip install gym_super_mario_bros==7.3.0 nes_py
!pip install gym==0.23.1
!pip install numpy==1.23.5
import numpy as np
import gym
# Import the game
import gym_super_mario_bros
# Import the Joypad wrapper
from nes_py.wrappers import JoypadSpace
# Import the SIMPLIFIED controls
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT

# NEED TO DECIDE IF I WANT ALL IMPORTS IN ONE CELL OR WANT IT BROKEN UP FOR EASIER UNDERSTANDING
# Displaying game in Collab environment
#from gym.wrappers import RecordVideo # Import RecordVideo
#import glob
#import io
#import base64
#from IPython.display import HTML
#from pyvirtualdisplay import Display
#from IPython import display as ipythondisplay




### Installations for Rendering OpenAI Gym Environments from CoLab

In [3]:
# Installs for displaying video in colab environment
!pip install gym pyvirtualdisplay > /dev/null 2>&1
!apt-get install -y xvfb python-opengl ffmpeg > /dev/null 2>&1

# Requirements for displaying an Atari game
!apt-get update > /dev/null 2>&1
!apt-get install cmake > /dev/null 2>&1
!pip install --upgrade setuptools 2>&1
!pip install ez_setup > /dev/null 2>&1
!pip install gym[atari] > /dev/null 2>&1



Next, we define the functions used to show the video by adding it to the CoLab notebook.

In [4]:
from gym.wrappers import RecordVideo # Import RecordVideo
import glob
import io
import base64
from IPython.display import HTML
from pyvirtualdisplay import Display
from IPython import display as ipythondisplay
display = Display(visible=0, size=(1400, 900))
display.start()

"""
Utility functions to enable video recording of gym environment
and displaying it.
To enable video, just do "env = wrap_env(env)""
"""


def show_video():
    mp4list = glob.glob('video/*.mp4')
    if len(mp4list) > 0:
        mp4 = mp4list[0]
        video = io.open(mp4, 'r+b').read()
        encoded = base64.b64encode(video)
        ipythondisplay.display(HTML(data='''<video alt="test" autoplay
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
    else:
        print("Could not find video")


def wrap_env(env):
    env = RecordVideo(env, './video')
    return env


### Helper functions


*   query_environemnt
*   List item



In [5]:
# Helper functions

# Query game environment
def query_environment(name):
    env = gym.make(name) # create environment
    spec = gym.spec(name) # Get game specs
    # Print other details about the environment
    print(f"Action Space: {env.action_space}")
    print(f"Observation Space: {env.observation_space}")
    print(f"Max Episode Steps: {spec.max_episode_steps}")
    print(f"Nondeterministic: {spec.nondeterministic}")
    print(f"Reward Range: {env.reward_range}")
    print(f"Reward Threshold: {spec.reward_threshold}")

query_environment("SuperMarioBros-v3")

Action Space: Discrete(256)
Observation Space: Box(0, 255, (240, 256, 3), uint8)
Max Episode Steps: 9999999
Nondeterministic: True
Reward Range: (-15, 15)
Reward Threshold: 9999999


## Create the Super Mario Bros Gym environment

In [9]:
# Create env
env = gym_super_mario_bros.make('SuperMarioBros-v3') # PIxelated
#env = gym_super_mario_bros.make('SuperMarioBros-v0') # Full game
# Apply the simplied movements to game environment
env = JoypadSpace(env, SIMPLE_MOVEMENT)
# Wrap env in video recorder
env = wrap_env(env)

  self,


Now we are ready to play the game.  We use a simple random agent.

In [10]:
# Create a boolean flag - restart game or not
done = True

# Loop through game frames. Think of these of screenshots representing a single moment in time within the game.
for step in range(1000):
    # Check if we should start/restart the game
    if done:
        # Start/Restart the game
        state = env.reset()
    # Perform a random action in the game - each time we take an action, the environment returns the current game state, a reward, whether we are dead or alive, and additional info
    state, reward, done, info = env.step(env.action_space.sample())
    #

# Close the game
env.close()

# Dispaly the video of the played game
show_video()

  import moviepy  # noqa: F401
  
  return (self.ram[0x86] - self.ram[0x071c]) % 256


###Explain the game?
We have a simplified version of the Super Mario game. The game features are pixelated/boxy to simplify.


In [13]:
# Print details about the samge
print(f"Available actions: {SIMPLE_MOVEMENT}")
print(f"This is the shape of the game that is returned from an observation: {env.observation_space.shape}")

#Observations return a color frame, 240 pixels wide and 256 pixels in height. We can think about this object as snapshot of the game for a specific moment in time.

print("Keys & values from the info dictionary returned during an observation:")
for key, value in info.items():
    print(f"{key}: {value}")


Available actions: [['NOOP'], ['right'], ['right', 'A'], ['right', 'B'], ['right', 'A', 'B'], ['A'], ['left']]
This is the shape of the game that is returned from an observation: (240, 256, 3)
Keys & values from the info dictionary returned during an observation:
coins: 0
flag_get: False
life: 1
score: 0
stage: 1
status: small
time: 367
world: 1
x_pos: 594
x_pos_screen: 101
y_pos: 115


We can ignore most of the info dictionary for now. We care abou the x_pos variable because it relates to the reward equation.

### Reward Function (Details taken from https://pypi.org/project/gym-super-mario-bros/)

The reward function assumes the objective of the game is to move as far right as possible (increase the agent's *x* value), as fast as possible, without dying. To model this game, three separate variables compose the reward:

***r = v + c + d***

1.   *v*: the difference in agent x values between states.
  *   in this case this is instantaneous velocity for the given step
  *   *v = x1 - x0*
      *    *x0* is the x position before the step
      *    *x1* is the x position after the step
  *   moving right ⇔ *v* > *0*
  *   moving left ⇔ *v* < *0*
  *   not moving ⇔ *v* = *0*



2.   *c*: the difference in the game clock between frames
the penalty prevents the agent from standing still.

  *   in this case this is instantaneous velocity for the given step
  *   *c = c0 - c1*
      *    *c0* is the clock reading before the step
      *    *c1* is the clock reading after the step
  *   no clock tick ⇔ *c* = *0*
  *   clock tick ⇔ *c* < *0*


3.   *d*: a death penalty that penalizes the agent for dying in a state


  *   this penalty encourages the agent to avoid death
  *   alive ⇔ *d* = *0*
  *   dead ⇔ *d* = *-15*

**Note: The reward is clipped into the range (-15, 15).**

## Preprocessing the Environment