# Lunar Lander Upgraded: Migrating from CMA-ME to CMA-MAE

In the [previous tutorial](https://docs.pyribs.org/en/latest/tutorials/lunar_lander.html), we showed how to implement the CMA-ME algorithm in pyribs to tackle the lunar lander problem. CMA-ME enabled us to search for a diverse collection of high-performing lunar lander agents, including agents which landed like a space shuttle as shown below.

In [1]:
from IPython.display import display, HTML
display(HTML("""<video width="360" height="auto" controls><source src="https://raw.githubusercontent.com/icaros-usc/pyribs/master/docs/_static/imgs/lunar-lander-left.mp4" type="video/mp4" /></video>"""))

Recent work introduced [Covariance Matrix Adaptation MAP-Annealing (CMA-MAE)](https://arxiv.org/abs/2205.10752), an algorithm which builds on and improves CMA-ME. CMA-MAE not only has strong theoretical guarantees; it also empirically outperforms CMA-ME in a variety of domains. In this tutorial, we show how to implement this more advanced algorithm in pyribs.

_Below: CMA-MAE vs CMA-ME on the 100-dimensional sphere linear projection benchmark described in [Fontaine 2020](https://arxiv.org/abs/1912.02400). We can see how CMA-MAE does a much better job of populating the archive with high-performing solutions._

In [2]:
display(HTML("""<video width="720" height="auto" autoplay muted playsinline loop><source src="https://raw.githubusercontent.com/icaros-usc/pyribs/master/docs/_static/imgs/cma-mae-vs-cma-me-imp.mp4" type="video/mp4" /></video>"""))

## Setup

As in the previous tutorial, let's begin by installing and importing the necessary dependencies.

In [None]:
%pip install ribs[visualize] gymnasium[box2d]==0.27.0 "moviepy>=1.0.0"

# An uninstalled version of decorator is occasionally loaded. This loads the
# newly installed version of decorator so that moviepy works properly -- see
# https://github.com/Zulko/moviepy/issues/1625
import importlib
import decorator
importlib.reload(decorator)

import gymnasium as gym
import time
import numpy as np
import matplotlib.pyplot as plt

## Problem Description

We adopt the same lunar lander problem as described in the first lunar lander tutorial. To recap, in this problem, the objective is provided by the lunar lander environment, which rewards the agent for making a smooth landing (albeit not necessarily on the landing pad in the environment). Meanwhile, there are two measures: the $x$-position when the lunar lander first impacts the ground, and the $y$-velocity when the lunar lander first impacts the ground.

In [13]:
env = gym.make("LunarLander-v2")
seed = 52
action_dim = env.action_space.n
obs_dim = env.observation_space.shape[0]

def simulate(env, model, seed=None):
    """Simulates the lunar lander model.

    Args:
        env (gym.Env): A lunar lander environment.
        model (np.ndarray): The array of weights for the linear policy. The
            weights are passed in as a 1D array and reshaped into a matrix.
        seed (int): The seed for the environment.
    Returns:
        total_reward (float): The reward accrued by the lander throughout its
            trajectory.
        impact_x_pos (float): The x position of the lander when it touches the
            ground for the first time.
        impact_y_vel (float): The y velocity of the lander when it touches the
            ground for the first time.
    """
    action_dim = env.action_space.n
    obs_dim = env.observation_space.shape[0]
    model = model.reshape((action_dim, obs_dim))

    total_reward = 0.0
    impact_x_pos = None
    impact_y_vel = None
    all_y_vels = []
    obs, _ = env.reset(seed=seed)
    done = False

    while not done:
        action = np.argmax(model @ obs)  # Linear policy.
        obs, reward, terminated, truncated, _ = env.step(action)
        done = terminated or truncated
        total_reward += reward

        # Refer to the definition of state here:
        # https://gymnasium.farama.org/environments/box2d/lunar_lander/
        x_pos = obs[0]
        y_vel = obs[3]
        leg0_touch = bool(obs[6])
        leg1_touch = bool(obs[7])
        all_y_vels.append(y_vel)

        # Check if the lunar lander is impacting for the first time.
        if impact_x_pos is None and (leg0_touch or leg1_touch):
            impact_x_pos = x_pos
            impact_y_vel = y_vel

    # If the lunar lander did not land, set the x-pos to the one from the final
    # timestep, and set the y-vel to the max y-vel (we use min since the lander
    # goes down).
    if impact_x_pos is None:
        impact_x_pos = x_pos
        impact_y_vel = min(all_y_vels)

    return total_reward, impact_x_pos, impact_y_vel

In [14]:
from ribs.archives import GridArchive
initial_model = np.zeros((action_dim, obs_dim))

archive = GridArchive(
    solution_dim=initial_model.size,  # Dimensionality of solutions in the archive.
    dims=[50, 50],  # 50 cells along each dimension.
    ranges=[(-1.0, 1.0), (-3.0, 0.0)],  # (-1, 1) for x-pos and (-3, 0) for y-vel.
    qd_score_offset=-300,  # See the note below.
    learning_rate=0.01,
    threshold_min=-300,
)

result_archive = GridArchive(
    solution_dim=initial_model.size,  # Dimensionality of solutions in the archive.
    dims=[50, 50],  # 50 cells along each dimension.
    ranges=[(-1.0, 1.0), (-3.0, 0.0)],  # (-1, 1) for x-pos and (-3, 0) for y-vel.
    qd_score_offset=-300,  # See the note below.
)

In [15]:
from ribs.emitters import EvolutionStrategyEmitter

emitters = [
    EvolutionStrategyEmitter(
        archive=archive,
        x0=initial_model.flatten(),
        sigma0=1.0,  # Initial step size.
        ranker="imp",
        selection_rule="mu",
        restart_rule="basic",
        batch_size=30,  # If we do not specify a batch size, the emitter will
                        # automatically use a batch size equal to the default
                        # population size of CMA-ES.
    ) for _ in range(5)  # Create 5 separate emitters.
]

In [16]:
from ribs.schedulers import Scheduler

scheduler = Scheduler(archive, emitters, result_archive=result_archive)

In [17]:
start_time = time.time()
total_itrs = 500

for itr in range(1, total_itrs + 1):
    # Request models from the scheduler.
    sols = scheduler.ask()

    # Evaluate the models and record the objectives and measuress.
    objs, meas = [], []
    for model in sols:
        obj, impact_x_pos, impact_y_vel = simulate(env, model, seed)
        objs.append(obj)
        meas.append([impact_x_pos, impact_y_vel])

    # Send the results back to the scheduler.
    scheduler.tell(objs, meas)

    # Logging.
    if itr % 25 == 0:
        print(f"> {itr} itrs completed after {time.time() - start_time:.2f}s")
        print(f"  - Size: {archive.stats.num_elites}")    # Number of elites in the archive.
        print(f"  - Coverage: {archive.stats.coverage}")  # Proportion of archive cells which have an elite.
        print(f"  - QD Score: {archive.stats.qd_score}")  # QD score, i.e. sum of objective values of all elites in the archive.
                                                          # Accounts for qd_score_offset as described in the GridArchive section.
        print(f"  - Max Score: {archive.stats.obj_max}")  # Maximum objective value in the archive.

> 25 itrs completed after 479.12s
  - Size: 623
  - Coverage: 0.2492
  - QD Score: 140586.6048369406
  - Max Score: 294.2905210971211
> 50 itrs completed after 1720.96s
  - Size: 796
  - Coverage: 0.3184
  - QD Score: 188022.81781493407
  - Max Score: 310.3286522183843
> 75 itrs completed after 2969.42s
  - Size: 886
  - Coverage: 0.3544
  - QD Score: 236476.54234859342
  - Max Score: 314.951865182828
> 100 itrs completed after 4609.08s
  - Size: 946
  - Coverage: 0.3784
  - QD Score: 265138.7602530137
  - Max Score: 314.951865182828
> 125 itrs completed after 5820.52s
  - Size: 980
  - Coverage: 0.392
  - QD Score: 277482.19482221134
  - Max Score: 314.951865182828
> 150 itrs completed after 6952.63s
  - Size: 1005
  - Coverage: 0.402
  - QD Score: 296550.4362170957
  - Max Score: 314.951865182828
> 175 itrs completed after 7978.13s
  - Size: 1032
  - Coverage: 0.4128
  - QD Score: 309914.090188502
  - Max Score: 314.951865182828
> 200 itrs completed after 8973.91s
  - Size: 1073
  - 