## Purpose
This notebook is made to help analysing results produced by TeachMyAgent's experiments. Using this, one can analyze book-keeped information from training including performance on tasks for each seed and curriculum monitoring gifs we show on our [website](https://developmentalsystems.org/TeachMyAgent/). 

## How to use this notebook
This notebook is broken down into 4 sections:
- **Imports**: import needed packages.
- **Load Data**: load results produced by experiments and format them (e.g. calculate best seed of each experiment).
- **Plot definitions**: define all the plot functions we provide.
- **Experiment graphs**: use the previously defined functions to generate the different figures.

## Add our paper's results to your plots
In order to add the results we provide in our paper to your plots, make sure you have downloaded them:
1. Go to the `notebooks` folder
2. Make the `download_baselines.sh` script executable: `chmod +x download_baselines.sh`
3. Download results: `./download_baselines.sh`
> **_WARNING:_**  This will download a zip weighting approximayely 4.5GB. Then, our script will extract the zip file in `TeachMyAgent/data`. Once extracted, results will weight approximately 15GB. 
----

# Imports

In [None]:
import sys
import os
import random
import math
import pylab
import copy
import re
from enum import Enum
from collections import OrderedDict

import numpy as np

import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import matplotlib.colorbar as cbar
import seaborn as sns
import imageio

DIV_LINE_WIDTH = 50
print(np.__version__)
print(sys.executable)
sns.set()

In [None]:
module_path = os.path.abspath(os.path.join('../'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
from TeachMyAgent.run_utils.environment_args_handler import EnvironmentArgsHandler
import TeachMyAgent.students.test_policy as test_policy
from TeachMyAgent.students.run_logs_util import get_run_logs
from TeachMyAgent.teachers.teacher_controller import param_vec_to_param_dict, param_dict_to_param_vec

# Load Data

In [None]:
def get_datasets(rootdir, name_filter=None, rename_labels=False):
    """
        Loads results of experiments.
 
        Results to load can be filtered by their name and each experiment can be associated to a label (usually ACL method's name)
 
        :param rootdir: Directory containing experiments to load (do not forget '/' at the end of the path)
        :param name_filter: String experiments to load must contain
        :param rename_labels: If True, each experiment will be associated to a label (see below). Labels are the names that will appear in plots.
        :type rootdir: str
        :type name_filter: str (or None)
        :type rename_labels: boolean
    """
    _, models_list, _ = next(os.walk(rootdir))
    print(models_list)
    for dir_name in models_list.copy():
        if "ignore" in dir_name:
            models_list.remove(dir_name)
        if name_filter is not None and name_filter not in dir_name:
            models_list.remove(dir_name)         
        
    for i,m_name in enumerate(models_list):           
        print("extracting data for {}...".format(m_name))
        m_id = m_name
        models_saves[m_id] = OrderedDict()
        models_saves[m_id]['data'] = get_run_logs(rootdir+m_name, book_keeping_keys='*', min_len=0)
        print("done")
        if m_name not in labels:
            if not rename_labels:
                labels[m_name] = m_name
            else:
                ##### MODIFY THIS IF YOU ADD A NEW METHOD #####
                if 'ADR' in m_name:
                    labels[m_name] = 'ADR'
                elif 'ALP-GMM' in m_name:
                    labels[m_name] = 'ALP-GMM'
                elif 'Random' in m_name:
                    labels[m_name] = 'Random'
                elif 'Covar-GMM' in m_name:
                    labels[m_name] = 'Covar-GMM'
                elif 'RIAC' in m_name:
                    labels[m_name] = 'RIAC'
                elif 'GoalGAN' in m_name:
                    labels[m_name] = 'GoalGAN'
                elif 'Self-Paced' in m_name:
                    labels[m_name] = 'Self-Paced'
                elif 'Setter-Solver' in m_name:
                    labels[m_name] = 'Setter-Solver'
                elif 'UPPER_BASELINE' in m_name:
                    labels[m_name] = 'UPPER_BASELINE'
                else:
                    labels[m_name] = m_name
                ##### MODIFY THIS IF YOU ADD A NEW METHOD #####
labels = OrderedDict()
models_saves = OrderedDict()

##### MODIFY THIS TO POINT TO YOUR DATA FOLDER #####
data_folder = "../TeachMyAgent/data/BENCHMARK/"
##### MODIFY THIS TO POINT TO YOUR DATA FOLDER #####

get_datasets(data_folder, rename_labels=True)
# get_datasets(data_folder, rename_labels=True, name_filter="parkour_RIAC_walker_type_fish") # You can also add filters

## Compute mastered tasks percentage

Compute "% of Mastered tasks" metric: percentage of test tasks (over a test set of 100 tasks) on which the agent obtained an episodic reward greater than a threshold (230).

In [None]:
mastered_thr = 230
for i,(m_id,label) in enumerate(labels.items()):
    print(m_id)
    runs_data = models_saves[m_id]['data']
    #collect raw perfs
    print("Seeds : " + str(len(runs_data)))
    for r,run in enumerate(runs_data):
        models_saves[m_id]['data'][r]['nb_mastered'] = []
        models_saves[m_id]['data'][r]['avg_pos_rewards'] = []
        models_saves[m_id]['data'][r]['local_rewards'] = []
        if 'env_test_rewards' in run:
            size_test_set = int(len(run['env_test_rewards'])/len(run['evaluation return']))
            for j in range(len(run['evaluation return'])):#max_epoch):
                test_data = np.array(run['env_test_rewards'][j*size_test_set:(j+1)*(size_test_set)])
                nb_mastered = len(np.where(test_data > mastered_thr)[0])
                models_saves[m_id]['data'][r]['nb_mastered'].append((nb_mastered/size_test_set)*100)
        else:
            print("Skipping seed {}".format(r))

## Compute best seeds

Get best seed of each experiment. This is then used to analyze test set performances and show curricula.

In [None]:
def get_best_seed(expe_name, metric="evaluation return"):
    """
        Calculate best seed of an experiment.
 
        :param expe_name: Experiment's name
        :param metric: Metric to use to calculate best seed
        :type expe_name: str
        :type metric: str
        :return best seed, its metric value, mean of all seeds, std over seeds
    """
    best_seed = -1
    best_seed_value = -1000
    runs_data = models_saves[expe_name]['data']
    all_values = []
    for run in runs_data:
        if len(run[metric]) > 0:
            data = run[metric][-1]
            all_values.append(data)
            if data > best_seed_value:
                best_seed_value = data
                best_seed = run["config"]["seed"]
        else:
            print("Skipping seed {}: no data".format(run["config"]["seed"]))
    return best_seed, best_seed_value, np.mean(all_values), np.std(all_values)

In [None]:
best_seeds = {}
for i,(m_id,label) in enumerate(labels.items()):
    best_seed, best_seed_value, mean, std = get_best_seed(m_id, metric="nb_mastered")
    best_seeds[m_id] = best_seed
    print("Expe {0} : {1} ({2}) - Mean: {3} ({4})".format(m_id, best_seed, best_seed_value, mean, std))

# Plot Definition

## Test set plots

In [None]:
def dict_to_args_str(dictionary):
    args_str = []
    for key in dictionary:
        args_str.append("--{}".format(key))
        if dictionary[key] is not None:
            args_str.append("{}".format(dictionary[key]))

    return args_str

In [None]:
def round_values(values):
    if isinstance(values, np.ndarray):
        for i in range(len(values)):
            values[i] = round(values[i], 3)
    else:
        values = round(values, 3)
    return values

In [None]:
def params_to_str(params_dict, line_width=116):
    result = str(params_dict)
    nb_splits = max(1, len(result) // line_width)
    final_result = ""
    for i in range(nb_splits):
        p1 = result[i*line_width:line_width]
        p2 = result[(i+1)*line_width:(i+2)*line_width]
        final_result = final_result + p1 + "\n" + p2
    return final_result

In [None]:
def plot_test_tasks_results(env, env_params_list, env_rewards_list, fig_name, nb_env_test_to_check=None):
    """
        Plot test tasks and associated reward obtained.
 
        :param env: An instance of one of the two TeachMyAgent's environments
        :param env_params_list: List of tasks (i.e. vector controlling PCG)
        :param env_rewards_list: List of associated reward
        :param fig_name: Name of the figure
        :param nb_env_test_to_check: Plot only the N first tasks (if None plot all tasks)
    """
    nb_env = len(env_params_list) if nb_env_test_to_check is None else nb_env_test_to_check
    nb_plots_per_row = 2
    nb_rows = math.ceil(nb_env/nb_plots_per_row)
    f = plt.figure()
    f.set_figwidth(25)
    f.set_figheight(6*nb_rows)
        
    for i in range(nb_env):
        fig = plt.subplot(nb_rows, nb_plots_per_row, i+1)
        rounded_current_params = {k: round_values(v) for k, v in env_params_list[i].items()}
        fig.text(-0.05, 1.03, 
                 "Test env nb {0} \nScore performed: {1} \nEnv params: {2}".format(i, env_rewards_list[i], params_to_str(rounded_current_params)), 
                     ha="left", transform=fig.transAxes)
        
        env.set_environment(**env_params_list[i])
        env.reset()
        
        plt.imshow(env.render(mode='rgb_array'))
        plt.axis('off')
        
    plt.savefig('../TeachMyAgent/graphics/{}.png'.format(fig_name), bbox_inches='tight', dpi=100)

In [None]:
def perform_test_sets_analysis(dataset_folder, settings, nb_tasks=-1, test_set=None, ep_returns=None):
    """
        Load results obtained by the best seed of chosen experiments and plot their performance on test set at the end of the experiment.
 
        :param dataset_folder: Directory containing experiments to load (do not forget '/' at the end of the path)
        :param settings: Dictionary defining experiments to load 
        :param nb_tasks: Number of tasks to load per experiment (-1 means all)
        :param test_set: Whether another test set than the one used during the experiments must be loaded. If so, specify its name
        :param ep_returns: If another test set is used, specify the new list(s) of rewards obtained. You can use `test_policy_perf` of the `Policies_visualization` notebook
    """
    os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
    parser = test_policy.get_parser()
    parser.add_argument('--expe_name', type=str)
    
    i = 0
    result = {}
    for setting in settings:
        current_expe_best_seed = best_seeds[setting["expe_name"]]
        data_path = os.path.join(dataset_folder, setting["expe_name"], setting["expe_name"] + "_s" + str(current_expe_best_seed))
        setting["fpath"] = data_path
    
        args_str = dict_to_args_str(setting)

        args = parser.parse_args(args_str)
        env_fn, param_bounds, _, _ = EnvironmentArgsHandler.get_object_from_arguments(args)
        env = env_fn()
        env._SET_RENDERING_VIEWPORT_SIZE(4000, 2000, keep_ratio=True)
        
        if test_set is None:
            test_set_params, rewards = test_policy.load_training_test_set(data_path, order_by_best_rewards=args.bests)
        else:
            test_set_params = test_policy.load_fixed_test_set(data_path, test_set)
            rewards = ep_returns[i]
        result[setting["expe_name"]] = [param_dict_to_param_vec(param_bounds, param) for param in test_set_params]
        
        if nb_tasks == -1:
            nb_tasks = len(test_set_params)
        
        ordering_name = ""
        if args.bests is None:
            ordering_name = "firsts"
        elif args.bests:
            ordering_name = "top"
        else:
            ordering_name = "worse"
        fig_name = "{0}_s{1}_{2}test-set-analysis_{3}_{4}".format(args.expe_name, 
                                                               current_expe_best_seed, 
                                                               "fixed-" if test_set is not None else "",
                                                               ordering_name,
                                                               nb_tasks)
        plot_test_tasks_results(env, test_set_params, rewards, fig_name, nb_env_test_to_check=nb_tasks)
        env.close()
        i+=1
    return result

## Plot Curriculum

### Stump Tracks

In [None]:
def truncate_colormap(cmap, minval=0.0, maxval=1.0, n=100):
    new_cmap = mcolors.LinearSegmentedColormap.from_list(
        'trunc({n},{a:.2f},{b:.2f})'.format(n=cmap.name, a=minval, b=maxval),
        cmap(np.linspace(minval, maxval, n)))
    return new_cmap

def draw_ellipse(position, covariance, ax=None, edge_color=None, face_color=None, **kwargs):
    """Draw an ellipse with a given position and covariance"""
    ax = ax or plt.gca()
    
    covariance = covariance[0:2,0:2]
    position = position[0:2]

    # Convert covariance to principal axes
    if covariance.shape == (2, 2):
        U, s, Vt = np.linalg.svd(covariance)
        angle = np.degrees(np.arctan2(U[1, 0], U[0, 0]))
        width, height = 2 * np.sqrt(s)
    else:
        angle = 0
        width, height = 2 * np.sqrt(covariance)

    # Draw the Ellipse
    for nsig in range(2, 3):
        ax.add_patch(Ellipse(position, nsig * width, nsig * height,
                             angle, **kwargs, edgecolor=edge_color, facecolor=face_color))

def get_colorbar(cmap, ax):
    from mpl_toolkits.axes_grid1 import make_axes_locatable
    import matplotlib.pyplot as plt
    fig = ax.figure
    divider = make_axes_locatable(ax)
    cax = divider.append_axes("right", size="5%", pad=0.05)
    cbar = fig.colorbar(cmap, cax=cax)
    plt.sca(ax)
    return cbar; cax

def plot_gmm(weights, means, covariances, X=None, ax=None, xlim=[0,1], ylim=[0,1], xlabel='', ylabel='',
             bar=True, bar_side='right'):
    """Draw distributions of a GMM"""
    from mpl_toolkits.axes_grid1 import make_axes_locatable
    ft_off = 15

    ax = ax or plt.gca()
    colormap = sns.color_palette("coolwarm", as_cmap=True)
    cmap = truncate_colormap(colormap, minval=0.0,maxval=1.0)
    for pos, covar, w in zip(means, covariances, weights):
        draw_ellipse(pos, covar, alpha=0.6, ax=ax, edge_color=cmap(pos[-1]), face_color=cmap(pos[-1]))

    if bar:
#         divider = make_axes_locatable(ax)
#         cax = divider.append_axes("right", size="5%", pad=0.5)
        cax, _ = cbar.make_axes(ax, location=bar_side, shrink=0.8)
        cb = cbar.ColorbarBase(cax, cmap=cmap)
        cb.set_label('Absolute Learning Progress', fontsize=ft_off + 5)
        cax.tick_params(labelsize=ft_off + 0)
        cax.yaxis.set_ticks_position(bar_side)
        cax.yaxis.set_label_position(bar_side)

In [None]:
def generate_stumps_curriculum(dataset_folder, settings, frequency=1, initial_frequency=250000,
                           stump_height_dims=[-1, 4], stump_spacing_dims=[-1, 7]):
    """
        Generate a gif visualization of the curriculum of best seeds of chosen experiments.
        
        This function uses book-keeped 100 tasks sampled by teachers every `initial_frequency` steps. 
 
        :param dataset_folder: Directory containing experiments to load (do not forget '/' at the end of the path)
        :param settings: Dictionary defining experiments to load 
        :param frequency: Plot 1/N sampled distribution 
        :param initial_frequency: Initial frequency (steps) at which the tasks were sampled
        :param stump_height_dims: Bounds of the stump height axis
        :param stump_spacing_dims: Bounds of the stump spacing axis
    """
    parser = test_policy.get_parser()
    parser.add_argument('--expe_name', type=str)
    
    result = {}
    for setting in settings:
        current_expe_best_seed = best_seeds[setting["expe_name"]]
        
        args_str = dict_to_args_str(setting)
        args = parser.parse_args(args_str)
        _, param_bounds, _, _ = EnvironmentArgsHandler.get_object_from_arguments(args)
        
        current_label = labels[setting["expe_name"]]
        data = models_saves[setting["expe_name"]]['data'][current_expe_best_seed]
        task_samples = data["periodical_samples"]
        associated_infos = data["periodical_infos"]
        filenames = []
        
        for i in range(0, len(task_samples), frequency):
            if len(task_samples[i]) == 0:
                continue
                
            f, ax = plt.subplots(1,1,figsize=(20,20))
            current_data = task_samples[i]
            infos = associated_infos[i]
            hue = None
            ax.set_ylabel('stump_spacing', fontsize=25)
            ax.set_xlabel('stump_height', fontsize=25)
            plt.xticks(fontsize=18)
            plt.yticks(fontsize=18)
            plt.xlim(stump_height_dims[0], stump_height_dims[1])
            plt.ylim(stump_spacing_dims[0], stump_spacing_dims[1])
            set_legend = lambda: None
            bk_index = infos[0]["bk_index"]
            
            # CHANGE THIS IF YOU ADD A NEW TEACHER #
            if current_label in ["ALP-GMM", "Covar-GMM"]:
                if bk_index > 0:
                    plot_gmm(data["weights"][bk_index], data["means"][bk_index], data["covariances"][bk_index], 
                             ax=ax, xlim=stump_height_dims, ylim=stump_spacing_dims)
            elif current_label == "Self-Paced":
                draw_ellipse(data["mean"][bk_index], data["covariance"][bk_index], ax=ax, alpha=0.5)
            elif current_label == "ADR":
                x1 = data["task_space"][bk_index][0][0]
                x2 = data["task_space"][bk_index][1][0]
                y1 = data["task_space"][bk_index][0][1]
                y2 = data["task_space"][bk_index][1][1]
                ax.add_patch(Rectangle((x1, y1), x2-x1, y2-y1, alpha=0.5))
            elif current_label == "Setter-Solver":
                set_legend = lambda: ax.legend(title="Feasibility", fontsize=25)
                hue = [_info["task_infos"][0][0] for _info in infos]
            elif current_label == "RIAC":
#                 for box in data["all_boxes"][bk_index]:
#                     ax.add_patch(Rectangle((box.low[0], box.high[1]), 
#                                            box.low[1]-box.low[0], 
#                                            box.high[1]-box.high[0], 
#                                            alpha=0.5, color=))     
                set_legend = lambda: ax.legend(title="Region ALP", fontsize=25)
                hue = [data["all_alps"][bk_index][_info["task_infos"]] for _info in infos]
        # CHANGE THIS IF YOU ADD A NEW TEACHER #
            
            g = sns.scatterplot(x=current_data[:, 0], y=current_data[:, 1], ax=ax, hue=hue, s=100)
            legend = set_legend()
            if legend is not None:
                legend.get_title().set_fontsize('25')
                for legobj in legend.legendHandles:
                    legobj.set_linewidth(5.0)
            f_name = "../TeachMyAgent/graphics/gifs/scatter_{}.png".format(i)
            plt.suptitle('Step {}'.format(math.ceil(initial_frequency/frequency) * i), fontsize=25)
            plt.savefig(f_name, bbox_inches='tight')
            plt.close(f)
            filenames.append(f_name)
        
        images = []
        for filename in filenames:
            images.append(imageio.imread(filename))
        imageio.mimsave('TeachMyAgent/graphics/{}.gif'.format(setting["expe_name"] + "_" + str(current_expe_best_seed)), images, duration=0.3)

In [None]:
def generate_parkour_curriculum(dataset_folder, settings, frequency=1, initial_frequency=250000):
    """
        Generate a gif visualization of the curriculum of best seeds of chosen experiments.
        
        This function uses book-keeped 100 tasks sampled by teachers every `initial_frequency` steps. 
 
        :param dataset_folder: Directory containing experiments to load (do not forget '/' at the end of the path)
        :param settings: Dictionary defining experiments to load 
        :param frequency: Plot 1/N sampled distribution 
        :param initial_frequency: Initial frequency (steps) at which the tasks were sampled
    """
    parser = test_policy.get_parser()
    parser.add_argument('--expe_name', type=str)
    
    result = {}
    for setting in settings:
        current_expe_best_seed = best_seeds[setting["expe_name"]]
        data_path = os.path.join(dataset_folder, setting["expe_name"], setting["expe_name"] + "_s" + str(current_expe_best_seed))
        setting["fpath"] = data_path
    
        args_str = dict_to_args_str(setting)

        args = parser.parse_args(args_str)
        env_fn, param_bounds, _, _ = EnvironmentArgsHandler.get_object_from_arguments(args)
        env = env_fn()
        env._SET_RENDERING_VIEWPORT_SIZE(4000, 2000, keep_ratio=True)
        
        fig_name = "{0}_s{1}_curriculum-analysis".format(args.expe_name, 
                                                               current_expe_best_seed)
        data = models_saves[setting["expe_name"]]['data'][current_expe_best_seed]
        tasks = data["periodical_samples"]
        associated_infos = data["periodical_infos"]
        
        nb_env = math.ceil(len(tasks) / frequency)
        
        filenames = []
        
        for i in range(0, nb_env-1, frequency):
            if len(tasks[i]) == 0:
                continue
            current_tasks = tasks[i]
            current_infos = associated_infos[i]
            index = random.randint(0, len(current_tasks)-1)
            task = param_vec_to_param_dict(param_bounds, current_tasks[index])
            associated_info = current_infos[index]
            f, ax = plt.subplots(1,1,figsize=(12,10))

            env.set_environment(**task)
            env.reset()

            plt.imshow(env.render(mode='rgb_array'))
            plt.axis('off')
            f_name = "../TeachMyAgent/graphics/gifs/{}_{}.png".format(fig_name, i)
            plt.suptitle('Step {}'.format(math.ceil(initial_frequency/frequency) * i), fontsize=20)
            plt.savefig(f_name, bbox_inches='tight')
            plt.close(f)
            filenames.append(f_name)

        images = []
        for filename in filenames:
            images.append(imageio.imread(filename))
        imageio.mimsave('../TeachMyAgent/graphics/{}.gif'.format(setting["expe_name"] + "_" + str(current_expe_best_seed)), images, duration=0.3)
        env.close()

# Experiment graphs

## Curriculum analysis

Change settings below for each experiment: 
- for Stump Tracks, you only need to specify the experiment's name
- for thre Parkour, you must specify the experiment's name as well as the agent's type and its lidars 

### Parkour

In [None]:
generate_parkour_curriculum(data_folder, settings = [
    {
        "env": "parametric-continuous-parkour-v0",
        "embodiment": "old_classic_bipedal",
        "lidars_type": "down",
        "expe_name" : "14-12_benchmark_parkour_Setter-Solver_walker_type_old_classic_bipedal",
    }])

In [None]:
generate_parkour_curriculum(data_folder, settings = [
    {
        "env": "parametric-continuous-parkour-v0",
        "embodiment": "climbing_profile_chimpanzee",
        "lidars_type": "up",
        "expe_name" : "14-12_benchmark_parkour_Setter-Solver_walker_type_climbing_profile_chimpanzee",
    }])

In [None]:
generate_parkour_curriculum(data_folder, settings = [
    {
        "env": "parametric-continuous-parkour-v0",
        "embodiment": "fish",
        "lidars_type": "full",
        "expe_name" : "14-12_benchmark_parkour_Setter-Solver_walker_type__fish",
    }])

### Stump Tracks

In [None]:
generate_stumps_curriculum(data_folder, settings = [
    {
        "expe_name" : "14-12_profiling_benchmark_stumps_Setter-Solver_criteria_1_allow_expert_knowledge_maximal",
    }], stump_height_dims=[-1, 10], stump_spacing_dims=[-1, 7])

In [None]:
generate_stumps_curriculum(data_folder, settings = [
    {
        "expe_name" : "14-12_profiling_benchmark_stumps_Setter-Solver_criteria_2_allow_expert_knowledge_maximal",
    }], stump_height_dims=[-4, 4], stump_spacing_dims=[-1, 7])

In [None]:
generate_stumps_curriculum(data_folder, settings = [
    {
        "expe_name" : "14-12_profiling_benchmark_stumps_Setter-Solver_criteria_3_allow_expert_knowledge_maximal",
    }], stump_height_dims=[-1, 4], stump_spacing_dims=[-1, 7])