# Exercise 4A - Pareto Fronts
Introduced during Tutorial 4

In exercise 4A we will begin by using a simple <span style="color:orange;"> synthetic dataset </span>. In Exercise 4B we generate a more complex synthetic dataset which will involve more detailed analysis. In Exercise 4C serves as template for your Coursework Question 4. You will you use some pre-compiled EnergyPlus data to build the Pareto Front.

In this exercise, students will be guided on how to determine the min-min pareto front using two algorithms : an inefficient, but intuitive algorithm based on the manual method shown in class, and an efficient method implented in the *paretoset* package.

### Colour codes

<span style="color:orange;"> Orange text is for emphasis and definitions </span>

<span style="color:lime;"> Green text is for tasks to be completed by the student </span>

<span style="color:dodgerblue;"> Blue text is for Python coding tricks and references </span>

## Load all the necessary Python packages
All packages should work with Conda environment if installed on your machine. Otherwise all necessary packages can be installed in a virtual environment (.venv) in VS Code using: Ctrl+Shift+P > Python: Create Environment > Venv > Python 3.12.x > requirements.txt

<span style="color:orange;"> NOTE: that we are using the **paretoset** package will be used. You may need to install this package using pip.</span>

In [None]:
import matplotlib.pyplot as plt

import numpy as np
import pandas as pd
from paretoset import paretoset


## 1. Creating a synthetic dataset

We will begin by generating a sample of *n* random points based on the equation for an ellipse.

$$
\
(x, y) = (a cos(\theta), b (sin(\theta)))
\
$$

Where,

$$
\
\pi > \theta > \frac{3\pi}{2} \\

1 > a > 2 \\

0.5 > b > 1
\
$$

Here we are restricting points to the lower left quadrant of the ellipse.

### 1.1 Enter the general parameters for this exercise
<span style="color:limegreen;"> Select the number of random samples and the random seed you want to use </span>

In [None]:
n = 50
random_seed = 27

### 1.2 Set up a fixed random number generator with a random seed
<span style="color:dodgerblue;"> Setting up a random number generator (RNG) object with a fixed random seed is best practice, especially if you want to ensure consistency on each run. In this step we create a *generator* instance of Numpy's random number generator with the desired input seed.</span>

In [None]:
rng = np.random.default_rng(random_seed)

### 1.3 Generate the data points
Using the rng create *n* (x, y) coordinate pairs based on the equation of the ellipse.

In [None]:
theta = rng.uniform(np.pi, 3/2 * np.pi, n)
a = rng.uniform(1, 2, n)
b = rng.uniform(0.5, 1, n)

x = a * np.cos(theta) + 2 # Add 2 and 1 to ensure all data points are positive
y = b * np.sin(theta) + 1

print ("The first ten points:")
for i in range (10):
    print (f"{i} ({x[i]:.4f}, {y[i]:.4f})")

### 1.4 Plot the graph

In [None]:
fig, ax = plt.subplots()
ax.scatter (x, y, color = "red", linewidths = 0.5, edgecolor = "black")

ax.set_xlim(0, 2)
ax.set_ylim(0, 1)

ax.set_xlabel ("x")
ax.set_ylabel ("y")

ax.set_aspect("equal", "box")
fig.set_figwidth(7)
fig.set_figheight(5)
fig.tight_layout()


## 2. Determine the Pareto Front
### 2.1 The Inefficient Way
Determine the Pareto Front in a for loop using the same methodology we did using the manual example.
* Loop through each coordinate pair (*i*) and compare it to every other coordinate pair (*j*)
* If both of *i*'s x and y coordinates are greater than *j*'s than that point is dominated 
    * Append its index to the list and we can break from the for loop.

In [None]:
# Initialize a list for dominated points
dominated = [] # Empty list to be appended to

for i in range(n):
    for j in range (n):
        if x[i] > x[j] and y[i] > y[j]:
            print (f"{i} is dominated by {j}.")
            dominated.append(i)
            break

# Get the non-dominated points by using Python's sets function
nonDominated = list(set(range(n)) - set(dominated))

print ("The dominated points are:")
print (dominated)

print ("The non-dominated points on the Pareto Front are:")
print (nonDominated)

### 2.2 The Efficient way
While this method is simple (7 lines of code), it is inefficient (0<sup>2</sup>), and not very robust (what if want to test max-max pareto front or wanted to perform a trivariate pareto front).

The Python package *paretoset* [Link](https://pypi.org/project/paretoset/) has more efficient and more robust implementation of this algorithm taking advantage of Numpy arrays. The package is relatively straightforward to use provided that you give it the columns of the dataframe and identify the *senses* of each column - whether you want to the analysis on the min or max of those columns.

To use the paretoset function we must place the data into a dataframe

In [None]:
df = pd.DataFrame({"x" : x, "y" : y})

The paretoset function has the *sense* argument which tells us whether we want to determine the paretofront for the min-min, max-max, or min-max of the two variables.

We will begin by doing the min-min pareto front.

In [None]:
paretoFront = paretoset(df, sense = ["min", "min"])

# Return the non-dominated values in a list and print
paretoFront_nonDominated = list(df[paretoFront].index)

print ("The non-dominated points on the Pareto Front are:")
print (paretoFront_nonDominated)


Do a quick check that both methods found the same list of points.

In [None]:
print(set(nonDominated) == set(paretoFront_nonDominated))

## 3. Visualization
To visualise the paretofront we can use scatterplots using different markers for those on the pareto front and those which are not.

This can be done by using masks to create two separate dataframes and then plotting those two scatter points separately.

In [None]:
# Create separate dataframes for those on the Pareto Front and those which are not.
# Those on the Pareto Front will be plotted separately with colour emphasized
onPareto = df[paretoFront]

offPareto = df[~paretoFront] # Note the tilde denotes 'not' 

fig, ax = plt.subplots()
# First the points off of the Pareto front
ax.scatter(offPareto.x, offPareto.y, c = "red",  linewidths = 0.5, edgecolors = "black")
# Second those on the Pareto Front with colour green for emphasis
ax.scatter(onPareto.x, onPareto.y,  c = "lime",  linewidths = 0.5, edgecolors = "black")

ax.set_xlim(0, 2)
ax.set_ylim(0, 1)

ax.set_xlabel ("x")
ax.set_ylabel ("y")

ax.set_aspect("equal", "box")
fig.set_figwidth(7)
fig.set_figheight(5)
fig.tight_layout()

plt.show()

## 4. Determining the max-max pareto front

<span style="color:limegreen;"> On your own, find the max-max paretofront from the data points already generated and plot the results. <b> Before you begin, which points do you expect to be on the pareto front?</span>

### 4.1 Find the pareto front

In [None]:
# Add code here

### 4.2 Visualization

In [None]:
# Add code here