[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jrkasprzyk/CVEN5393/blob/main/epsilon_nondominance_from_file.ipynb)

*This notebook is part of course notes for CVEN 5393: Water Resource Systems and Management, by Prof. Joseph Kasprzyk at CU Boulder.*

In this notebook, we will perform epsilon non-dominated sorting of solutions, using the Platypus Python library. Generic solution data is included.

# Install platypus-opt and load packages

In [1]:
!pip install platypus-opt

Collecting platypus-opt
  Downloading Platypus_Opt-1.4.1-py3-none-any.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.2/44.2 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading Platypus_Opt-1.4.1-py3-none-any.whl (124 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m124.4/124.4 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: platypus-opt
Successfully installed platypus-opt-1.4.1


In [2]:
from platypus import *
import numpy as np
import pandas as pd

Functions to perform the sorting

In [24]:
def df_to_pt(df, objective_directions, nobjs, nvars=0, nconstrs=0):
  problem = Problem(nvars=nvars, nobjs=nobjs, nconstrs=nconstrs)
  pt = []
  for index, row in all_solutions_df.iterrows():
    # create solution object
    solution = Solution(problem)

    # save an id for which row of the original
    # dataframe this solution came from. really important
    # for cross-referencing things later!
    solution.id = index

    # populate the objective values into platypus, correcting
    # the maximized objectives by multiplying by -1
    for j in range(num_objs):
      if objective_directions[j] == 'minimize':
        solution.objectives[j] = row[objective_names[j]]
      elif objective_directions[j] == 'maximize':
        solution.objectives[j] = -1.0*row[objective_names[j]]

    # add the solution to the list
    pt.append(solution)
  return pt

In [19]:
def label_eps_nd(df, label_col, objective_directions, epsilons, nobjs, nvars, nconstrs=0):

  # reset the label column
  df[label_col] = False

  # convert to platypus format
  pt = df_to_pt(df, objective_directions, nobjs, nvars, nconstrs)

  # save the epsilon non-dominated solutions to a new list of platypus solutions
  eps_pt = EpsilonBoxArchive(epsilons)
  for solution in pt:
    eps_pt.add(solution)

  # save which ids ended up being epsilon non-dominated
  eps_ids = [sol.id for sol in eps_pt]

  # add labels to the epsilon non-dominated solutions
  for id in eps_ids:
    df.at[id, label_col] = True

  return df

# Prepare list of all solutions

In this step, we need to prepare the solutions that will be analyzed in the sorting process.

A solution to a multi-objective problem comprises the following types of data:


*   Decision Variables (defining actions)
*   Objectives (multiple measures of the solution's performance)
*   *Constraint Violations (not included in this example)*
*   *Extra Metrics (other measures of the solution's performance, not included in this example)*

To place solution data in a dataframe, the decision variables, objectives (and other variables) are in columns, and each solution is in its own row. We will call the dataframe `all_solutions_df`. Later, when we do the sorting, we will add columns to store flags that say whether or not a solution is epsilon non-dominated.





In [9]:
# Create dataframe from a dict
# https://builtin.com/data-science/dictionary-to-dataframe

decision_variable_names = [
    'Conservation Amt',
    'Old Res Added Capacity',
    'New Res Added Capacity'
    ]
num_decs = len(decision_variable_names)

objective_names = [
    'Cost',
    'Reservoir Capacity',
    'Reliability',
    'Worst-Case Shortfall',
    'Average Length of Shortfall'
]
num_objs = len(objective_names)

objective_directions = [
    'minimize',
    'maximize',
    'maximize',
    'minimize',
    'minimize',
    'minimize'
]

solutions = {
    'A': [0.15, 0.3, 2.0, 110.1, 3.3, 0.95,  0.1,  2],
    'B': [0.0,  0.3, 0.8, 85,    2.1, 0.9,   0.3,  3],
    'C': [0.15, 0.0, 0.0, 0.1,   1.0, 0.6,   0.4,  6],
    'D': [0.0,  0.0, 0.0, 0.0,   0.0, 1.0,   0.45, 6],
    'E': [0.2,  0.0, 0.0, 0.4,   1.0, 0.65,  0.33, 6],
    'F': [0.15, 1.0, 3.0, 200.0, 5.0, 0.951, 0.09, 2]
}

We will save the data in a dataframe. We will add a new column that will store the results of the sorting. In other words, when we do an epsilon non-domination sort, each solution will be labeled with `True` when it is epsilon non-dominated, and `False` if not.

In [None]:
all_solutions_df = pd.DataFrame.from_dict(
    solutions,
    orient='index',
    columns=decision_variable_names+objective_names)

all_solutions_df["Eps Nd"] = False

Next, we populate a list of solutions in the Platypus format

In [25]:
all_solutions_pt = df_to_pt(
    df=all_solutions_df,
    objective_directions=objective_directions,
    nobjs=num_objs,
    nvars=num_decs,
    nconstrs=0)

# Perform Epsilon Non-dominated Sort

In [29]:
epsilons = [10,    #cost
            0.1,   #capacity
            0.05,  #reliability
            0.01,  #worst-case shortfall
            1,     #length of shortfall
            ]

The next code snippet performs the sorting itself.

After the above is completed, you now have a list of Platypus objects for all solutions: `all_solutions_pt`, which we're not really using here; and `eps_solutions_pt`, a Platypus `EpsilonBoxArchive` which is really just a list of Platypus `Solution` that are guaranteed to be epsilon non-dominated.

The last cell here pulls out the solution ids for the epsilon archive and populates labels in the original dataframe that indicate whether a solution is epsilon nondominated. It also creates a new dataframe that only contains the epsilon solutions, for completeness.

By converting all of our work into a label for the original dataset, it helps facilitate lots of different experiments on that dataset. In other words, you can see which of the ‘original’ solutions survived the test. This is especially helpful when you have multiple ‘tests’ you’re performing on your solutions. For example, imagine that you had labels that indicated that a given row of the big dataframe came from a given optimization experiment .. then you could do lots of interesting things like show which ones are epsilon non-dominated across all experiments, within one experiment, etc. You just repeat the same procedure just assigning different labels to the original set.