[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jrkasprzyk/CVEN5393/blob/main/epsilon_nondominance_from_file.ipynb)

*This notebook is part of course notes for CVEN 5393: Water Resource Systems and Management, by Prof. Joseph Kasprzyk at CU Boulder.*

In this notebook, we will perform epsilon non-dominated sorting of solutions in a text file, using the Platypus Python library.

TODO: Upload an example text file. Right now this is hard-coded for the CRB problem

In [None]:
!pip install platypus-opt



In [None]:
from platypus import *
import numpy as np
import pandas as pd

`Archive.txt` is output from BorgRW in 'readable' format. In Colab, you must upload a new copy of the file every time you start a runtime!

Here we start with a dataframe of all solutions, `all_solutions_df` and create a column for the label, which will be `True` in case that a solution here ends up being epsilon nondominated. Some benefits of doing this:



*   Epsilon non-domination is a nonlinear process. If a new solution is found that epsilon-dominates multiple solutions, it could **delete multiple previous epsilon non-dominated solutions**.
*   Therefore assigning labels **at the end of the process** ensures you're not falsely thinking a solution is epsilon non-dominated because it was in the archive early in the analysis!
*   Saving the output as labels in the original dataset is useful because you can create **multiple labels** -- for example, you could determine if a solution lived in multiple sorts (different values of epsilon, different subsets of objectives, etc.)

In [None]:
# import archive (in 'readable' format)
all_solutions_df = pd.read_csv("Archive.txt", delimiter=" ")
all_solutions_df["Eps Nd"] = False

We'll index the dataframes using a list of the objective names. This allows us to only use a subset of columns for the activities like epsilon non-dominated sorting (since all we're after is a label anyway). When analyzing the data, we will always take the entire row, so other information such as decision variables, metrics, and constraint violations are preserved.

In [None]:
objective_names = ["Objectives.Objective_Powell_3490",
                   "Objectives.Objective_Mead_1000",
                   "Objectives.Objective_LB_Shortage_Volume",
                   "Objectives.Objective_Max_Delta_Annual_Shortage"]
num_objs = len(objective_names)

epsilons = [5,
            1,
            10000,
            10000]

# create a Platypus Problem object. Right now, the Platypus analysis only
# uses objectives, so that's all I'm populating. But future analyses may
# need to copy other information too
problem = Problem(nvars=0, nobjs=num_objs, nconstrs=0)

# pt stands for platypus format: these two items will be
# lists of Platypus Solution objects. One for all the solutions
# and another for only the epsilon non-dominated solutions
all_solutions_pt = []
eps_solutions_pt = EpsilonBoxArchive(epsilons)

# go through all the solutions, and continually update
# the epsilon archive. Note that as you go along, the
# archive might grow or shrink
for index, row in all_solutions_df.iterrows():

  # create solution object
  solution = Solution(problem)

  # save an id for which row of the original
  # dataframe this solution came from. really important
  # for cross-referencing things later!
  solution.id = index

  for j in range(num_objs):
    solution.objectives[j] = row[objective_names[j]]

  # save every solution you look at
  all_solutions_pt.append(solution)

  # calling the 'add' function on an EpsilonBoxArchive
  # orchestrates the archive update algorithm: it only
  # puts a solution into the archive if it's epsilon non-dominated
  # (and subsequently deletes solutions that end up being dominated!)
  eps_solutions_pt.add(solution)

After the above is completed, you now have a list of Platypus objects for all solutions: `all_solutions_pt`, which we're not really using here; and `eps_solutions_pt`, a Platypus `EpsilonBoxArchive` which is really just a list of Platypus `Solution` that are guaranteed to be epsilon non-dominated.

The last cell here pulls out the solution ids for the epsilon archive and populates labels in the original dataframe that indicate whether a solution is epsilon nondominated. It also creates a new dataframe that only contains the epsilon solutions, for completeness.

In [None]:
# save a list of the ids of the epsilon non-dominated solutions
eps_ids = [sol.id for sol in eps_solutions_pt]

# earlier, we initiated this flag to be False. We set it to True
# if the solution's id matches the one in the list
for id in eps_ids:
  all_solutions_df.at[id, "Eps Nd"] = True

# create a new dataframe that only contains the rows that were
# epsilon non-dominated
eps_solutions_df = all_solutions_df[all_solutions_df["Eps Nd"]].copy(deep=True)

By converting all of our work into a label for the original dataset, it helps facilitate lots of different experiments on that dataset. In other words, you can see which of the ‘original’ solutions survived the test. This is especially helpful when you have multiple ‘tests’ you’re performing on your solutions. For example, imagine that you had labels that indicated that a given row of the big dataframe came from a given optimization experiment .. then you could do lots of interesting things like show which ones are epsilon non-dominated across all experiments, within one experiment, etc. You just repeat the same procedure just assigning different labels to the original set.