# **2023 NeurIPS - MyoChallenge: Towards Human-Level Dexterity and Agility**


The challenge consists of developing controllers for a physiologically realistic musculoskeletal models to solve dexterous manipulation and locomotion tasks:

- A) Manipulation task -- Interact with an object and relocate it (`myoChallengeRelocateP1-v0`).

- B) Locomotion/Chase-Tag task -- Chase an opponent (`myoChallengeChaseTagP1-v0`).

## **1. Installing Dependencies**

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%cd '/content/drive/MyDrive/myochallenge2023'
!pip install -r requirements.txt

In [3]:
import os
import sys
import logging
from math import inf
from base64 import b64encode

import numpy as np
import skvideo.io
import pprint
%env MUJOCO_GL=egl
import mujoco
import gym
from gym.envs.registration import register
from IPython.display import HTML
from tqdm import tqdm_notebook as tqdm
import warnings

import myosuite
import deprl

warnings.filterwarnings("ignore", category=DeprecationWarning)

env: MUJOCO_GL=egl
MyoSuite:> Registering Myo Envs


## **2. Manipulation Task**

For this project, we will focus on the **Manipulation** task. The task is to find a good reward function. We use OpenAIs `gym` interface and `myosuite` to  create the `myoChallengeRelocateP1-v0` environment.

**NOTE**: It is required to install `myosuite==1.7.0` to run the task.

### **2.1 Creating the Enviroment**

In [4]:
env = gym.make('myoChallengeRelocateP1-v0')

    MyoSuite: A contact-rich simulation suite for musculoskeletal motor control
        Vittorio Caggiano, Huawei Wang, Guillaume Durandau, Massimo Sartori, Vikash Kumar
        L4DC-2019 | https://sites.google.com/view/myosuite
    


### **2.2 Running the Task**

In [5]:
# utils.py
class SuppressOutput():
  def __enter__(self):
    self.stdout = sys.stdout
    sys.stdout = open(os.devnull, 'w')


  def __exit__(self, exc_type, exc_val, exc_tb):
    sys.stdout.close()
    sys.stdout = self.stdout

In [6]:
# myochallenge.py
class MyoChallengeEnvV0():

  def __init__(self, env, policy):
    self.env = env
    self.policy = policy


  def run_episodes(self, n_eps, policy=True):
    frames = []
    paths = []

    with tqdm(total=n_eps, desc='Episodes') as pbar_total:
      for ep in range(n_eps):
        path = dict()

        with SuppressOutput():
          state = self.env.reset()

        while True:
          if policy:
            action = self.policy(state)
          else:
            action = self.env.action_space.sample()

          frame = self.env.sim.renderer.render_offscreen(width=400,
                                                        height=400,
                                                        camera_id=0)
          frames.append(frame)

          obs_dict = env.get_obs_dict(env.sim)
          rwd_dict = dict(env.get_reward_dict(obs_dict))
          path['env_infos'] = {'obs_dict': obs_dict,
                               'rwd_dict': rwd_dict}
          paths.append(path)

          next_state, reward, done, info = self.env.step(action)
          state = next_state

          if done:
            break
        pbar_total.update(1)

    return paths, frames


In [7]:
mani_task = MyoChallengeEnvV0(env=env, policy=None)
paths, frames = mani_task.run_episodes(n_eps=5, policy=False)

Episodes:   0%|          | 0/5 [00:00<?, ?it/s]

### **2.3 Evaluation Metric**

The manipulation track will use a negative distance error
at the end of the task horizon as the performance metric.

$$ D_{t} = H = - |X_{t} - X_{goal}| $$

Additionally, a physiological metric expressed as total muscle activation will be used to monitor metabolic power. An agent will be evaluated based on its performance over both these metrics on multiple episodes with varying conditions.

In [32]:
class MyoChallengeEvalV0():

  def show_eval_video(self, frames, video_path):
    skvideo.io.vwrite(video_path,
                      np.asarray(frames),
                      outputdict={"-pix_fmt": "yuv420p"})
    video_file = open(video_path, "r+b").read()
    video_url = f"data:video/mp4;base64,{b64encode(video_file).decode()}"

    return HTML(f"""<video autoplay width={400} controls><source src="{video_url}"></video>""")


  def get_eval_metrics(self, n_eps, paths):
    # Get score and effort metrics.
    metrics = env.get_metrics(paths)

    # Calculate neg distance metric.
    p_errors = []
    errors = []
    for p in paths:
      p_errors.append(p['env_infos']['rwd_dict']['pos_dist'])
    ep_size = max(1, len(p_errors) // n_eps)
    for i in range(0, len(p_errors), ep_size):
        subarray = p_errors[i:i + ep_size]
        errors.append(subarray[-1])
    metrics['error'] = np.mean(errors)

    # Report all metrics.
    print(f'Score: {metrics["score"]}')
    print(f'Effort: {metrics["effort"]}')
    print(f'Error: {metrics["error"]}')


In [33]:
n_eps=5
mani_eval = MyoChallengeEvalV0()
mani_eval.get_eval_metrics(n_eps, paths)
mani_eval.show_eval_video(frames, 'demos/running_task.mp4')

Score: 0.0
Effort: 0.05539180381289714
Error: -0.35603070916545904


## **4. Solution**

### **4.1 Modify the Reward Function**

The rewards given by the environment are not the final evaluation metrics, you have to find a good reward function by yourself. Please refer to [evalai](https://https://eval.ai/web/challenges/challenge-page/2105/evaluation) for the evaluation details.

### **4.2 Training New Policy**
You are free to change observations, episode length and all kinds of details during training, but during evaluation everything will be fixed to the original environments.

In [None]:
%cd '/content/drive/MyDrive/myochallenge2023/depRL/'
!python -m deprl.main experiments/myosuite_training_files/myoRelocate.yaml

### **4.3 Evaluating Trained Policy**

In [46]:
path = '/content/drive/MyDrive/myochallenge2023/baselines_DEPRL/relocatep1-v1/240509.175459/'
policy = deprl.load(path, env)

mani_deprl = MyoChallengeEnvV0(env=env, policy=policy)
paths, frames = mani_deprl.run_episodes(n_eps=5, policy=True)

Loading experiment from /content/drive/MyDrive/myochallenge2023/baselines_DEPRL/relocatep1-v1/240509.175459/checkpoints
Stochastic Switch-DEP. Paper version.

Loading weights from /content/drive/MyDrive/myochallenge2023/baselines_DEPRL/relocatep1-v1/240509.175459/checkpoints/step_1000000.pt


Episodes:   0%|          | 0/5 [00:00<?, ?it/s]

In [37]:
n_eps=5
mani_eval.get_eval_metrics(n_eps, paths)
mani_deprl_eval.show_eval_video(frames, 'demos/deprl_trained.mp4')

Score: 0.0
Effort: 0.0760883481138287
Error: -0.40258385670583197
