# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Demonstrations-of-Single-Player-Simulations-for-Non-Stationary-Bandits" data-toc-modified-id="Demonstrations-of-Single-Player-Simulations-for-Non-Stationary-Bandits-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Demonstrations of Single-Player Simulations for Non-Stationary-Bandits</a></div><div class="lev2 toc-item"><a href="#Creating-the-problem" data-toc-modified-id="Creating-the-problem-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Creating the problem</a></div><div class="lev3 toc-item"><a href="#Parameters-for-the-simulation" data-toc-modified-id="Parameters-for-the-simulation-111"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Parameters for the simulation</a></div><div class="lev3 toc-item"><a href="#Two-MAB-problems-with-Bernoulli-arms-and-piecewise-stationary-means" data-toc-modified-id="Two-MAB-problems-with-Bernoulli-arms-and-piecewise-stationary-means-112"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>Two MAB problems with Bernoulli arms and piecewise stationary means</a></div><div class="lev3 toc-item"><a href="#Some-MAB-algorithms" data-toc-modified-id="Some-MAB-algorithms-113"><span class="toc-item-num">1.1.3&nbsp;&nbsp;</span>Some MAB algorithms</a></div><div class="lev4 toc-item"><a href="#Parameters-of-the-algorithms" data-toc-modified-id="Parameters-of-the-algorithms-1131"><span class="toc-item-num">1.1.3.1&nbsp;&nbsp;</span>Parameters of the algorithms</a></div><div class="lev4 toc-item"><a href="#Algorithms" data-toc-modified-id="Algorithms-1132"><span class="toc-item-num">1.1.3.2&nbsp;&nbsp;</span>Algorithms</a></div><div class="lev2 toc-item"><a href="#Checking-if-the-problems-are-too-hard-or-not" data-toc-modified-id="Checking-if-the-problems-are-too-hard-or-not-12"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Checking if the problems are too hard or not</a></div><div class="lev2 toc-item"><a href="#Creating-the-Evaluator-object" data-toc-modified-id="Creating-the-Evaluator-object-13"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Creating the <code>Evaluator</code> object</a></div><div class="lev2 toc-item"><a href="#Solving-the-problem" data-toc-modified-id="Solving-the-problem-14"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Solving the problem</a></div><div class="lev3 toc-item"><a href="#First-problem" data-toc-modified-id="First-problem-141"><span class="toc-item-num">1.4.1&nbsp;&nbsp;</span>First problem</a></div><div class="lev3 toc-item"><a href="#Second-problem" data-toc-modified-id="Second-problem-142"><span class="toc-item-num">1.4.2&nbsp;&nbsp;</span>Second problem</a></div><div class="lev2 toc-item"><a href="#Plotting-the-results" data-toc-modified-id="Plotting-the-results-15"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Plotting the results</a></div><div class="lev3 toc-item"><a href="#First-problem-with-change-on-only-one-arm-(Local-Restart-should-be-better)" data-toc-modified-id="First-problem-with-change-on-only-one-arm-(Local-Restart-should-be-better)-151"><span class="toc-item-num">1.5.1&nbsp;&nbsp;</span>First problem with change on only one arm (Local Restart should be better)</a></div><div class="lev3 toc-item"><a href="#Second-problem-with-changes-on-all-arms-(Global-restart-should-be-better)" data-toc-modified-id="Second-problem-with-changes-on-all-arms-(Global-restart-should-be-better)-152"><span class="toc-item-num">1.5.2&nbsp;&nbsp;</span>Second problem with changes on all arms (Global restart should be better)</a></div>

---
# Demonstrations of Single-Player Simulations for Non-Stationary-Bandits

This notebook shows how to 1) **define**, 2) **launch**, and 3) **plot the results** of numerical simulations of piecewise stationary (multi-armed) bandits problems using my framework [SMPyBandits](https://github.com/SMPyBandits/SMPyBandits).
For more details on the maths behind this problem, see this page in the documentation: [SMPyBandits.GitHub.io/NonStationaryBandits.html](https://smpybandits.github.io/NonStationaryBandits.html).

First, be sure to be in the main folder, or to have [SMPyBandits](https://github.com/SMPyBandits/SMPyBandits) installed, and import `Evaluator` from `Environment` package.

<span style="color:red">WARNING</span>
If you are running this notebook locally, in the [`notebooks`](https://github.com/SMPyBandits/SMPyBandits/tree/master/notebooks) folder in the [`SMPyBandits`](https://github.com/SMPyBandits/SMPyBandits/) source, you need to do:

In [7]:
import sys
import os
sys.path.insert(0, '..')
try:
    import SMPyBandits
except ImportError:
    !pip3 install SMPyBandits
import numpy as np
FIGSIZE = (19.80, 10.80)
DPI = 160
# Large figures for pretty notebooks
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = FIGSIZE
mpl.rcParams['figure.dpi'] = DPI
# Local imports
from SMPyBandits.Environment import Evaluator, tqdm
# Import arms
from SMPyBandits.Arms import Op
from SMPyBandits.Arms.cma import Op_cmaes, generate_arm_pic, generate_Op_cmaes_problem
# Import algorithms
from SMPyBandits.Policies import *

from multiprocessing import cpu_count
CPU_COUNT = cpu_count()
N_JOBS = CPU_COUNT if CPU_COUNT <= 4 else CPU_COUNT - 4

print("Using {} jobs in parallel...".format(N_JOBS))
problem = 'F18'
dimension = 10
n_arms = 100
HORIZON = n_arms * 30
n_horizon = HORIZON
REPETITIONS = 100

print("Using T = {}, and N = {} repetitions".format(HORIZON, REPETITIONS))
from SMPyBandits.Arms.Sampler import uniform_sampler
from SMPyBandits.Arms.Problem import Problem
ENVIRONMENTS = []
arms = generate_Op_cmaes_problem(problem, dimension, n_arms, n_horizon, load=True, save=True)
path = f'./saved_problems/{problem}/{dimension}/{n_arms}/'
os.makedirs(os.path.dirname(path), exist_ok=True)
generate_arm_pic(problem, dimension, n_arms)
ENVIRONMENT_0 = {   # A simple piece-wise stationary problem
    *arms
}
ENVIRONMENTS = [
    ENVIRONMENT_0,
]

NB_ARMS = n_arms
open('record.txt', 'w').close()
POLICIES =      [  # XXX Regular stochastic bandits algorithms!
                    { "archtype": UCBH, "params": { "horizon": HORIZON,} },
                ] + [
                    { "archtype": UCBalpha, "params": {} }
                ] + [ # --- # XXX experimental other version of the sliding window algorithm, knowing the horizon
                    { "archtype": SWUCBPlus, "params": {
                        "horizon": HORIZON, "alpha": alpha,
                    } }
                    for alpha in [4.0]
                ] + [
                    { "archtype": EpsilonGreedy, "params": {} }
                ] + [
                    { "archtype": MaxMedian, "params": {"budget": HORIZON} }
                ] + [
                    { "archtype": Qomax_SDA, "params": {"budget": HORIZON, "q": 0.5} }
                ] + [
                    { "archtype": MaximumBandit, "params": {"budget": HORIZON,} }
                ] + [
                    { "archtype": Uniform, "params": {} }
                ] 
# POLICIES = [
#         { "archtype": MaximumBandit, "params": {"budget": HORIZON,} }
#     ]
# POLICIES = [
#         { "archtype": Qomax_SDA, "params": {"budget": HORIZON, "q": 0.5} }
#     ]

configuration = {
    # --- Duration of the experiment
    "horizon": HORIZON,
    # --- Number of repetition of the experiment (to have an average)
    "repetitions": REPETITIONS,
    # --- Parameters for the use of joblib.Parallel
    "n_jobs": N_JOBS,    # = nb of CPU cores
    "verbosity": 0,      # Max joblib verbosity
    # --- Arms
    "environment": ENVIRONMENTS,
    # --- Algorithms
    "policies": POLICIES,
    # --- Random events
    "nb_break_points": 0,
    # --- Should we plot the lower-bounds or not?
    "plot_lowerbound": False,  # XXX Default
    "path": path,
}
# (almost) unique hash from the configuration
hashvalue = abs(hash((tuple(configuration.keys()), tuple([(len(k) if isinstance(k, (dict, tuple, list)) else k) for k in configuration.values()]))))
print("This configuration has a hash value = {}".format(hashvalue))

import os, os.path

subfolder = "SP__K{}_T{}_N{}__{}_algos".format(NB_ARMS, HORIZON, REPETITIONS, len(POLICIES))
PLOT_DIR = "plots"
plot_dir = os.path.join(PLOT_DIR, subfolder)

# Create the sub folder
if os.path.isdir(plot_dir):
    print("{} is already a directory here...".format(plot_dir))
elif os.path.isfile(plot_dir):
    raise ValueError("[ERROR] {} is a file, cannot use it as a directory !".format(plot_dir))
else:
    os.makedirs(plot_dir)

print("Using sub folder = '{}' and plotting in '{}'...".format(subfolder, plot_dir))
mainfig = os.path.join(plot_dir, "main")
print("Using main figure name as '{}_{}'...".format(mainfig, hashvalue))

evaluation = Evaluator(configuration)

def printAll(evaluation, envId):
    print("\nGiving the vector of final regrets ...")
    evaluation.printLastRegrets(envId)
    print("\nGiving the final ranks ...")
    evaluation.printFinalRanking(envId)
    print("\nGiving the mean and std running times ...")
    evaluation.printRunningTimes(envId)
    print("\nGiving the mean and std memory consumption ...")
    evaluation.printMemoryConsumption(envId)

envId = 0
env = evaluation.envs[envId]

# Evaluate just that env
evaluation.startOneEnv(envId, env)

def plotAll(evaluation, envId, mainfig=None):
    savefig = mainfig
    # if savefig is not None: savefig = "{}__LastRegrets__env{}-{}".format(mainfig, envId+1, len(evaluation.envs))
    # print("\nPlotting a boxplot of the final regrets ...")
    # evaluation.plotLastRegrets(envId, boxplot=True, savefig=savefig)

    # if savefig is not None: savefig = "{}__RunningTimes__env{}-{}".format(mainfig, envId+1, len(evaluation.envs))
    # print("\nPlotting the mean and std running times ...")
    # evaluation.plotRunningTimes(envId, savefig=savefig)

    # if savefig is not None: savefig = "{}__MemoryConsumption__env{}-{}".format(mainfig, envId+1, len(evaluation.envs))
    # print("\nPlotting the mean and std memory consumption ...")
    # evaluation.plotMemoryConsumption(envId, savefig=savefig)

    # if savefig is not None: savefig = "{}__Regrets__env{}-{}".format(mainfig, envId+1, len(evaluation.envs))
    # print("\nPlotting the mean regrets ...")
    # evaluation.plotRegrets(envId, savefig=savefig)

    if savefig is not None: savefig = "{}__MeanReward__env{}-{}".format(mainfig, envId+1, len(evaluation.envs))
    print("\nPlotting the mean rewards ...")
    fig = evaluation.plotRegrets(envId, meanReward=True, savefig=savefig)
    fig.savefig(f'{path}mean.pdf')

    if savefig is not None: savefig = "{}__MaxReward__env{}-{}".format(mainfig, envId+1, len(evaluation.envs))
    print("\nPlotting the max rewards ...")
    fig = evaluation.plotRegrets(envId, maxReward=True, savefig=savefig)
    fig.savefig(f'{path}max.pdf')

    # if savefig is not None: savefig = "{}__LastRegrets__env{}-{}".format(mainfig, envId+1, len(evaluation.envs))
    # print("\nPlotting an histogram of the final regrets ...")
    # evaluation.plotLastRegrets(envId, subplots=True, sharex=True, sharey=False, savefig=savefig)

envId = 0
_ = plotAll(evaluation, envId, mainfig=mainfig)

Using 4 jobs in parallel...
Using T = 3000, and N = 100 repetitions


FileNotFoundError: [Errno 2] No such file or directory: './saved_problems/F20/10/pickle/0.pickle'