# Examples

## Example #1
In this example we will use **Fictitious Play** to solve the **Beach Bar** environment. We will set the ``verbose`` parameter to 5 to view a status printout every five iterations.

In [None]:
from mfglib.env import Environment
from mfglib.alg import FictitiousPlay

env = Environment.beach_bar()
alg = FictitiousPlay(alpha=0.13)

_ = alg.solve(env, verbose=True, print_every=5)

## Example #2

In this example we will demonstrate how to use the tuning API to identify optimal hyperparameters for **MF-OMI** on the **Rock Paper Scissors** environment. While we will use **MF-OMI** in this demonstration, it is important to note that **all** algorithms benefit from tuning.

In [None]:
import matplotlib.pyplot as plt
import optuna

from mfglib.env import Environment
from mfglib.alg import OccupationMeasureInclusion
from mfglib.tuning import GeometricMean

# By default, optuna displays logs. This silences them.
optuna.logging.set_verbosity(optuna.logging.WARNING)

env = Environment.rock_paper_scissors()
# Initialize algorithm arbitrarily
alg_orig = OccupationMeasureInclusion(alpha=0.09)

_, expls_orig, _ = alg_orig.solve(env, atol=None, rtol=None)

# tune() returns an optuna.Study object
study = alg_orig.tune(metric=GeometricMean(shift=0.5), envs=[env], n_trials=80)

# which we can use to initialize a new instance
alg_tuned = alg_orig.from_study(study)

_, expls_tuned, _ = alg_tuned.solve(env, atol=None, rtol=None)

plt.xlabel("Iteration")
plt.ylabel("Exploitability")
plt.plot(expls_orig, label="Original")
plt.plot(expls_tuned, label="Tuned")
plt.semilogy(base=2)
plt.grid()
plt.legend();

## Example #3
In this example we will use **Online Mirror Descent** to solve the **Building Evacuation** and visualize how the population behaves under the optimal policy found.

In [None]:
import numpy as np

from mfglib.env import Environment
from mfglib.alg import OnlineMirrorDescent
from mfglib.mean_field import mean_field

N_TIMESTEPS = 20
N_FLOORS = 3
SIZE = 8

env = Environment.building_evacuation(
    T=N_TIMESTEPS,
    n_floor=N_FLOORS,
    floor_l=SIZE,
    floor_w=SIZE,
    eta=0.1,
)
alg = OnlineMirrorDescent(alpha=5e-3)

pis, expls, _ = alg.solve(env, atol=None, rtol=None, max_iter=2000)

# Identify the index of the "best" policy, as measured by exploitability
i = np.argmin(expls)

# Compute the corresponding mean-field and state marginal
L_i = mean_field(env, pis[i])
mu_i = L_i.sum(dim=-1)

For any state $s = (x, y, z) \in \mathbb{Z}_+^3$ representing a position in the building ($z$ being the height) let

$$\mu_t^i(s) = \mu_t^i(x, y, z) = \sum_{a \in \mathcal{A}} L_t^{\pi^i}(x, y, z, a)$$

denote the population state marginal at time $t$ of the induced mean-field $L^{\pi^i}$ where $\pi^i$ is the $i$'th policy iterate. Below we plot the population's state distribution at various times $t$ as a heatmap.

In [None]:
from matplotlib.colors import LogNorm

T_PNTS = [0, N_TIMESTEPS // 2, N_TIMESTEPS]

vmin = mu_i[T_PNTS].min().item()
vmax = mu_i[T_PNTS].max().item()

norm = LogNorm(vmin=vmin + 1e-12, vmax=vmax, clip=True)

fig, axs = plt.subplots(
    nrows=len(T_PNTS), ncols=N_FLOORS, sharex=True, sharey=True, layout="constrained"
)
ims = []
for j, t in enumerate(T_PNTS):
    for z in range(N_FLOORS):
        im = axs[j, z].imshow(mu_i[t, z], norm=norm)
        ims += [im]
        axs[j, z].set_xlabel("x")
        axs[j, z].set_ylabel("y")
        axs[j, z].set_title(rf"$\mu_{{{t}}}^i(x, y, {z})$")

fig.colorbar(ims[0], ax=axs.ravel().tolist())

plt.show()

On the top line we see the population evenly distributed throughout the building. In the second line we see the population making their way to the stairs. And the bottom line indicates that the building is fully evacuated at time $t = 20$.