# Lunar Lander Upgraded: Migrating from CMA-ME to CMA-MAE

In the [previous tutorial](https://docs.pyribs.org/en/stable/tutorials/lunar_lander.html), we showed how to implement the CMA-ME algorithm in pyribs. Recent work introduced [Covariance Matrix Adaptation MAP-Annealing (CMA-MAE)](https://arxiv.org/abs/2205.10752), an algorithm which builds on and improves CMA-ME. CMA-MAE not only has strong theoretical guarantees; it also empirically outperforms CMA-ME in a variety of domains. In this tutorial, we show how to implement this more advanced algorithm in pyribs.

TODO (Front cover image, perhaps a video showing CMA-ME vs CMA-MAE on sphere, also show lunar lander in an image)

## Setup

As in the previous tutorial, let's begin by installing and importing the necessary dependencies.

In [None]:
%pip install ribs[visualize] gymnasium[box2d]==0.27.0 "moviepy>=1.0.0"

# An uninstalled version of decorator is occasionally loaded. This loads the
# newly installed version of decorator so that moviepy works properly -- see
# https://github.com/Zulko/moviepy/issues/1625
import importlib
import decorator
importlib.reload(decorator)

import gymnasium as gym
import time
import numpy as np
import matplotlib.pyplot as plt

## Problem Description

We adopt the same lunar lander problem as described in the first lunar lander tutorial. To recap, in this problem, the objective is provided by the lunar lander environment, which rewards the agent for making a smooth landing (albeit not necessarily on the landing pad in the environment). Meanwhile, there are two measures: the $x$-position when the lunar lander first impacts the ground, and the $y$-velocity when the lunar lander first impacts the ground.

In [None]:
env = gym.make("LunarLander-v2")
seed = 52
action_dim = env.action_space.n
obs_dim = env.observation_space.shape[0]

def simulate(env, model, seed=None):
    """Simulates the lunar lander model.

    Args:
        env (gym.Env): A lunar lander environment.
        model (np.ndarray): The array of weights for the linear policy. The
            weights are passed in as a 1D array and reshaped into a matrix.
        seed (int): The seed for the environment.
    Returns:
        total_reward (float): The reward accrued by the lander throughout its
            trajectory.
        impact_x_pos (float): The x position of the lander when it touches the
            ground for the first time.
        impact_y_vel (float): The y velocity of the lander when it touches the
            ground for the first time.
    """
    action_dim = env.action_space.n
    obs_dim = env.observation_space.shape[0]
    model = model.reshape((action_dim, obs_dim))

    total_reward = 0.0
    impact_x_pos = None
    impact_y_vel = None
    all_y_vels = []
    obs, _ = env.reset(seed=seed)
    done = False

    while not done:
        action = np.argmax(model @ obs)  # Linear policy.
        obs, reward, terminated, truncated, _ = env.step(action)
        done = terminated or truncated
        total_reward += reward

        # Refer to the definition of state here:
        # https://gymnasium.farama.org/environments/box2d/lunar_lander/
        x_pos = obs[0]
        y_vel = obs[3]
        leg0_touch = bool(obs[6])
        leg1_touch = bool(obs[7])
        all_y_vels.append(y_vel)

        # Check if the lunar lander is impacting for the first time.
        if impact_x_pos is None and (leg0_touch or leg1_touch):
            impact_x_pos = x_pos
            impact_y_vel = y_vel

    # If the lunar lander did not land, set the x-pos to the one from the final
    # timestep, and set the y-vel to the max y-vel (we use min since the lander
    # goes down).
    if impact_x_pos is None:
        impact_x_pos = x_pos
        impact_y_vel = min(all_y_vels)

    return total_reward, impact_x_pos, impact_y_vel