**Copyright © 2018 University of Stirling**

#Plotting Output from Experiments on 2 Objectives

Prior to running this script, the experiment data must be obtained in Pickle file format (`pkl`). If working with raw experiment output, the script `processing-two-objectives.ipynb` is used to convert the data to the appropriate format for use in this script.

This script is used to generate the plots for the two-objective runs. This script loads the data, which contains Pareto fronts output from each run (30 in our experiments). Note that due to low sample size sampling error, some points which appear Pareto-optimal in the origonal data are actually not part of the Pareto front when re-sampled.

The Pareto fronts will be combined and a new, single Pareto front computed from each experiment. Any non--Pareto-optimal points in the dataframes are not plotted. This means that far fewer points (~300) will be plotted than are present in the dataframes (~6000).

Note that if not using re-evaluation data, there is a flag in the script to ignore re-evaluation and only plot the origonal evaluations. this is not recommended as it will cause candidate solutions to appear to have a lower failure rate.

In [0]:
from __future__ import absolute_import, division, print_function

import pandas as pd
import numpy as np

Here we configure the relative path and filename information for the data to be processed.

In [0]:
DATA_PATH = "/content/antibiotic"
USING_REEVAL = True
FAILURE_RATE_COL = "failurerate_reeval" if USING_REEVAL else "failurerate"
EXPERIMENTS = ({"title" : "Total Antibiotic vs Failure Rate",
                "series" : ("totalantibiotic-unconstrained.pkl",
                            "totalantibiotic-constrained.pkl"),
                "objectives" : (FAILURE_RATE_COL,
                                "totalantibiotic")
               },
               {"title" : "Maximum Concentration vs Failure Rate",
                "series" : ("maximumconcentration-unconstrained.pkl",
                            "maximumconcentration-constrained.pkl"),
                "objectives" : (FAILURE_RATE_COL,
                                "maximumconcentration")
               }
              )

CONSTRAINED_LIMIT = 0.01 # In the range [0.0, 1.0]
MAX_FAILURE_RATES = (None, CONSTRAINED_LIMIT)

This section loads the Pandas dataframes from the Pickle (`pkl`) files.

The first few rows of the data from the objectives columns is printed. Other columns omitted.

In [0]:
for experiment in EXPERIMENTS:
    
    print(experiment['title'])
    
    objectives = list(experiment['objectives'])
    
    dataframes = [None] * len(experiment['series'])
    
    for index, series in enumerate(experiment['series']):
    
        load_from = DATA_PATH + "/" + series
        df = pd.read_pickle(load_from)
        
        failure_rate = 'failurerate_reeval' if USING_REEVAL else 'failurerate'
        other_objective = objectives[1]
        df[failure_rate + "_x100"] = df[failure_rate] * 100.0
        
        dataframes[index] = df
        
        print(df[objectives].head(5))
        print("...")
        
    experiment['dataframes'] = dataframes
    
    print()

Due to sampling error. Some candidate solutions may be produced from the contrained optimisation which have failure rate above the constraint. Here we remove this data from the dataframe in memory.

In [0]:
for experiment in EXPERIMENTS:
    
    dataframes = experiment['dataframes']
    
    for index, df in enumerate(dataframes):
        
        max_failure_rate = MAX_FAILURE_RATES[index]
        
        failure_rate = 'failurerate_reeval' if USING_REEVAL else 'failurerate'
        
        if max_failure_rate is not None:
       
            dataframes[index] = df[df[failure_rate] <= max_failure_rate]
    
    experiment['dataframes'] = dataframes

Since each experiment was run multiple times, we have multiple Pareto fronts for each experiment. These functions are used to find a new Pareto front from the combined data.

In [0]:
# Returns True if a dominates b, or a = b
def dominates_or_equal(a, b, objectives):
    for o in objectives:
        if b[o] < a[o]:
            return False
    return True

# Finds the Pareto front in the given dataframe and returns two new dataframes,
# one with the Pareto-optimal points and one with the dominated points. If
# two or more points are equal, only one of these will be selected for the new
# front.
def find_pareto_front(df, objectives):
    pareto_front = {}
    removed_rows = {}
    for i, candidate in df.iterrows():
        indices_to_delete = set()
        for j, existing in pareto_front.items():
            if existing is not None:
                if dominates_or_equal(existing, candidate, objectives):
                    removed_rows[i] = candidate
                    break
                elif dominates_or_equal(candidate, existing, objectives):
                    removed_rows[j] = existing
                    indices_to_delete.add(j)
        else:
            pareto_front[i] = candidate
        for i in indices_to_delete:
            del(pareto_front[i])
    df1 = pd.DataFrame(columns=list(df), dtype=np.float64)
    for index, row in pareto_front.items():
        df1.loc[index] = row
    df2 = pd.DataFrame(columns=list(df), dtype=np.float64)
    for index, row in removed_rows.items():
        df2.loc[index] = row
    return df1, df2

 Here we find the combined Pareto front for each experiment and replace the old dataframes.
 
 This operation may take a few minutes.

In [0]:
for experiment in EXPERIMENTS:
    
    objectives = list(experiment['objectives'])
    
    combined_dataframes = [None] * len(experiment['series'])
    
    for index, df in enumerate(experiment['dataframes']):
        
        old_len = len(df)
        
        print("Combining {0} data points ...".format(old_len), end="")
        
        combined_dataframes[index], _ = find_pareto_front(df, objectives)
        
        new_len = len(combined_dataframes[index])
        
        print(" {0} pareto optimal points found".format(new_len))
    
    experiment['combined_dataframes'] = combined_dataframes

Here we configure the display settings for the output plot.

In [0]:
# Series for color and legend
SERIES_TITLES     = ("Unconstrained", "Constrained (1%)")
SERIES_COLORS     = ("red",           "blue")
SERIES_SIZES      = (10,              20)

# Size of the generated figures
FIG_SIZE             = (8, 4)
FIG_SIZE_CONSTRAINED = FIG_SIZE

# Display text for columns
OBJECTIVE_TITLES = {"failurerate"             : "Failure Rate",
                    "failurerate_reeval"      : "Failure Rate - Re-Eval",
                    "failurerate_x100"        : "Failure Rate (%)",
                    "failurerate_reeval_x100" : "Failure Rate (%) - Re-Eval",
                    "maximumconcentration"    : "Maximum Concentration (µg/ml)",
                    "totalantibiotic"         : "Total Antibiotic (µg/ml)"}

This section creates the plots for the combined Pareto fronts found for each experiment, scaled so that the entireity of the unconstrained expeirment's Pareto front is shown.

In [0]:
from matplotlib.patches import Rectangle

for experiment in EXPERIMENTS:
    
    objectives = list(experiment['objectives'])
    
    ax = None
    
    for index, df in enumerate(experiment['combined_dataframes']):
        
        failure_rate = 'failurerate_reeval_x100' \
                if USING_REEVAL else 'failurerate_x100'
        other_objective = objectives[1]
        
        ax = df.plot(figsize=FIG_SIZE,
                     kind = 'scatter',
                     x    = other_objective,
                     y    = failure_rate,
                     ax   = ax,
                     s    = SERIES_SIZES[index],
                     c    = SERIES_COLORS[index])
        ax.set_xlabel(OBJECTIVE_TITLES[other_objective])
        ax.set_ylabel(OBJECTIVE_TITLES[failure_rate])  
        ax.set_title(experiment["title"])
    
    legend_proxy = [Rectangle((0, 0), 1, 1, fc=fc) for fc in SERIES_COLORS]
    ax.legend(legend_proxy, SERIES_TITLES)

This section creates a zoomed version of the above plots so that the combined Pareto fronts for the constrained optimisation experiments may be more clearly compared with that of the unconstrained 

In [0]:
for experiment in EXPERIMENTS:
    
    objectives = list(experiment['objectives'])
    
    ax = None
    
    for index, df in enumerate(experiment['combined_dataframes']):
        
        failure_rate = 'failurerate_reeval_x100' \
                if USING_REEVAL else 'failurerate_x100'
        other_objective = objectives[1]
        
        df = df[df[failure_rate] <= (CONSTRAINED_LIMIT * 100)]
        
        ax = df.plot(figsize=FIG_SIZE_CONSTRAINED,
                     kind = 'scatter',
                     x    = other_objective,
                     y    = failure_rate,
                     ax   = ax,
                     s    = SERIES_SIZES[index],
                     c    = SERIES_COLORS[index])
        ax.set_xlabel(OBJECTIVE_TITLES[other_objective])
        ax.set_ylabel(OBJECTIVE_TITLES[failure_rate])  
        ax.set_title(experiment["title"])
    
    legend_proxy = [Rectangle((0, 0), 1, 1, fc=fc) for fc in SERIES_COLORS]
    ax.legend(legend_proxy, SERIES_TITLES)