# ENN585 - Advanced Machine Learning - Week 1

In the first half of this semester, you will be working with the [Gymnasium](https://gymnasium.farama.org/#) package, an API standard for reinforcement learning with a diverse collection of reference environments. It is commonly referred to as 'gym' for short.

In addition to classical RL problems like the Cart Pole or a double inverted pendulum that are contained in the base Gymnasium package, [Gymnasium-Robotics](https://robotics.farama.org/) contains environments specifically for robot experiments.

This notebook helps you to familiarise yourself with Gymnasium and specifically its robotics environments. We strongly encourage you to explore further and run your own experiments with the help of their documentation.

## Install and Setup

You can run this notebook on [Google Colab](https://colab.google/) or locally on your computer.

This first code cell takes care of some installation and setup. If you run this locally on your own machine, we recommend setting up a conda environment and installing ```pip install gymnasium-robotics renderlab```.



In [1]:
#@title Install packages - (Run this once at the start)

### Install gym-robotics and renderlab
try:
  import gymnasium as gym
  gym.spec('FetchSlide-v2')
except:
  !pip install gymnasium-robotics
  import gymnasium as gym

try:
  import renderlab as rl
except:
  !pip install renderlab


## Are we on Google Colab?
try:
  import google.colab
  IN_COLAB = True
except:
  IN_COLAB = False


### If on Colab, we have to setup gym's rendering. Otherwise we are ok to proceed.
if IN_COLAB:

  from google.colab import files
  import distutils.util
  import os
  import subprocess
  if subprocess.run('nvidia-smi').returncode:
    raise RuntimeError(
        'Cannot communicate with GPU. '
        'Make sure you are using a GPU Colab runtime. '
        'Go to the Runtime menu and select Choose runtime type.')

  # Add an ICD config so that glvnd can pick up the Nvidia EGL driver.
  # This is usually installed as part of an Nvidia driver package, but the Colab
  # kernel doesn't install its driver via APT, and as a result the ICD is missing.
  # (https://github.com/NVIDIA/libglvnd/blob/master/src/EGL/icd_enumeration.md)
  NVIDIA_ICD_CONFIG_PATH = '/usr/share/glvnd/egl_vendor.d/10_nvidia.json'
  if not os.path.exists(NVIDIA_ICD_CONFIG_PATH):
    with open(NVIDIA_ICD_CONFIG_PATH, 'w') as f:
      f.write("""{
      "file_format_version" : "1.0.0",
      "ICD" : {
          "library_path" : "libEGL_nvidia.so.0"
      }
  }
  """)

  # Configure MuJoCo to use the EGL rendering backend (requires GPU)
  print('Setting environment variable to use GPU rendering:')
  %env MUJOCO_GL=egl

  try:
    print('Checking that the installation succeeded:')
    import mujoco
    mujoco.MjModel.from_xml_string('<mujoco/>')
  except Exception as e:
    raise e from RuntimeError(
        'Something went wrong during installation. Check the shell output above '
        'for more information.\n'
        'If using a hosted Colab runtime, make sure you enable GPU acceleration '
        'by going to the Runtime menu and selecting "Choose runtime type".')

  print('Installation successful.')


# Let's Explore Some Environments
Gymnasium offers a number of interesting environments we can use for imitation learning and reinforcement learning experiments.
Let's have a look at some of them.

In [2]:
import gymnasium as gym
import renderlab as rl

# we'll try these environments
# if we want to see more, try running gym.
env_names = ['CartPole-v1', 'FetchSlide-v2', 'FrankaKitchen-v1', 'Humanoid-v4']

for name in env_names:
  print(f'============ Environment: {name} ============')

  # create an environment
  env = gym.make(name, render_mode='rgb_array')

  # this wraps the environment so we can record a video of its outputs and watch it later
  env = rl.RenderFrame(env, "./output")

  # reset the environment, this needs to be called at least once in the beginning
  observation, info = env.reset()

  # do a loop
  while True:
    # sample a random action to be executed
    action = env.action_space.sample()

    # this executes the action and returns observation and reward etc
    observation, reward, terminated, truncated, info = env.step(action)

    # we stop the loop if we terminate (e.g. the pole falls over) or run out of time (truncated after 50 steps)
    if terminated or truncated:
      break

  # show the recorded video
  env.play()




OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'


Moviepy - Building video temp-{start}.mp4.
Moviepy - Writing video temp-{start}.mp4



                                                   

Moviepy - Done !
Moviepy - video ready temp-{start}.mp4






OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'


Moviepy - Building video temp-{start}.mp4.
Moviepy - Writing video temp-{start}.mp4



                                                   

Moviepy - Done !
Moviepy - video ready temp-{start}.mp4






OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'


Moviepy - Building video temp-{start}.mp4.
Moviepy - Writing video temp-{start}.mp4



                                                               

Moviepy - Done !
Moviepy - video ready temp-{start}.mp4




OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'


Moviepy - Building video temp-{start}.mp4.
Moviepy - Writing video temp-{start}.mp4



                                                   

Moviepy - Done !
Moviepy - video ready temp-{start}.mp4




## Explore other Environments

Gym comes with support for many other environments. Have a look at the documentation https://gymnasium.farama.org/ 
The code cell below prints all the available environments.


In [None]:
# print all the available environments in gym
gym.pprint_registry()

# feel free to create and explore other environments, using the code block from above!

## Explore the Action Space of the Fetch Environment

Let's examine the Fetch-Slide environment. This is the environment you will be using for your Assessment 1 project.

We will start with looking at the action space. Let's make the robot move!


### Your Turn!
 - Change the code so that the robot moves upwards, downwards, left, right, forward, backwards.
 - Try different speeds.

In [21]:
env = gym.make('FetchSlide-v2', render_mode='rgb_array')

# this wraps the environment so we can record a video of its outputs and watch it later
env = rl.RenderFrame(env, "./output")

# reset the environment, this needs to be called at least once in the beginning
observation, info = env.reset()

# let's see what the actions looks like
# see https://robotics.farama.org/envs/fetch/slide/#action-space for more details
random_action = env.action_space.sample()
print(f'sampeled action:{action}')


# do a loop
while True:
    # always excecute a specific action 
    # change this to make the robot do something else, e.g. move up or left or forward/backward. Use different speeds.
    action = [0,0,0,0]    # [dx, dy, dz, gripper]

    # this executes the action and returns observation and reward etc
    observation, reward, terminated, truncated, info = env.step(action)

    # we stop the loop if we terminate or run out of time (truncated)
    if terminated or truncated:
        break

print(f'Episode ended because it was terminated: {terminated} or truncated: {truncated}')
env.play()

OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'


sampeled action:[0, 0, 0, 0]
Episode ended because it was terminated: False or truncated: True
Moviepy - Building video temp-{start}.mp4.
Moviepy - Writing video temp-{start}.mp4



                                                   

Moviepy - Done !
Moviepy - video ready temp-{start}.mp4




## Explore the Observation Space of the Fetch Environment
Read the documentation for more details: https://robotics.farama.org/envs/fetch/slide/#observation-space

### Your Turn! 
 - Use the observations to implement a simple hand-written controller that pushes the block towards the direction of the goal.
 - Save and plot the position of the robot's end effector, the puck and the desired goal throughout the episode.

In [None]:
import numpy as np

# a little helpe function to implement a simple P controller
# feel free to make this more sophisticated, e.g. a PI or PID controller
def p_controller(pos, goal, kp=1.0):
    error = goal - pos
    dist = np.linalg.norm(error)
    action = error * kp * dist
    return action, dist < 0.01

# create the environment
env = gym.make('FetchSlide-v2', render_mode='rgb_array',  max_episode_steps=200)    # notice how we set the max_episode_steps to 200 to have more time in the simulation

# this wraps the environment so we can record a video of its outputs and watch it later
env = rl.RenderFrame(env, "./output")

# reset the environment, this needs to be called at least once in the beginning
observation, info = env.reset()


# Here is a simple idea of how to push the puck to a goal position:
# 1. lift the robot's end effector up
# 2. move the robot's end effector to a pre-defined start position
# 3. move the robot's end effector behind the puck, so that it is aligned with the desired goal
# 4. push the puck towards the goal position

# We will implement this as a simple state machine, where we have different phases
phases=[' lift', 'start', 'move_behind', 'push']
phase = phases[0]

# do a loop
while True:
    # we will implement a simple state machine to control the robot
        
    if phase == 'lift':
        # the current position of the gripper
        ef = observation['observation'][:3]
        # the goal position of the gripper is where it currently is, but with a higher z value
        goal = ef.copy(); goal[2] = 0.6
        # use the p controller to move the gripper to the goal position
        action, converged = p_controller(ef, goal, kp=100.0)        
        # if we are close enough to the goal position, we move to the next phase
        if converged:
            phase = phases[1]
    
    # the next phases are similar, we just have different goals
    elif phase == 'start':
        ef = observation['observation'][:3]
        goal = [0.8, 0.75,0.42] # this is a pre-defined start position
        
        # Again, use the p controller to move the gripper to the goal position. Try different kp values.
        action, converged = p_controller(ef, goal, kp=100.0)        
        
        # if we are close enough to the goal position, we move to the next phase
        if converged:
            phase = phases[2]
    
    # the next phases are similar, we just have different goals
    elif phase == 'move_behind':        
        # TODO: implement this phase
        action = [0,0,0]

    elif phase == 'push':     
        # TODO: implement this phase
        action = [0,0,0]
    else:
        action = [0,0,0]
    
    # add one more element to the action vector (gripper open/close)
    action = np.concatenate([action, [0]]) 
    
    # this executes the action and returns observation and reward etc
    observation, reward, terminated, truncated, info = env.step(action)

    # we stop the loop if we terminate or run out of time (truncated)
    if terminated or truncated:
        break
    
print(f'Episode ended because it was terminated: {terminated} or truncated: {truncated}')
env.play()

OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'


Episode ended because it was terminated: False or truncated: True
Moviepy - Building video temp-{start}.mp4.
Moviepy - Writing video temp-{start}.mp4



                                                               

Moviepy - Done !
Moviepy - video ready temp-{start}.mp4
