# Multi Objective Optimization & NSGA-II
Many problems require looking at multiple costs / benefits and arriving at an aoutcome which has the greatest overall satisfaction. However, often decreasing the cost in one objective may increase it in another.

## I. Formal Definition
* Let there be $M$ functions to be optimized (minimized or maximized): $F_1(x), F_2(x), \ldots, F_M(x)$
* subject to $J$ inequality constraints: $g_j(x)\geq 0, \forall j=\{1,\ldots,J\}$
* and $K$ equality constraints: $h_k(x)=0, \forall k=\{1,\ldots,K\}$.
* $x\in D$, that is, the domain of the decision space is $D$. If $A_i\leq x_i\leq B_i, \forall i=\{1,\ldots,n\}$, then $D$ is a hypercube defined by the "box constraints" $(A_i, B_i)$

### Dominated vs. Non-dominated solutions
Assuming we want to *minimize* both $F_1(x)$ and $F_2(x)$. Then, a solution $x_1$ is said to **dominate** a solution $x_2$ if both of the folowing are true:
1. $F_1(x_1)\leq F_1(x_2)$ and $F_2(x_1)\leq F_2(x_2)$
2. $F_1(x_1)< F_1(x_2)$ or $F_2(x_1)< F_2(x_2)$

Hence, $x_1$ dominates $x_2$ if it is not worse than $x_2$ in any of the objectives and is better in atleast any one of them.

**Pareto Optimality**: A solution $x^*$ is pareto optimal if there does not exist a solution $x'\in D$ which dominates $x^*$.

**Pareto Front**: The set of all pareto optimal solutions is called the pareto front. Hence the aim of a multi objective optimization algorithm is to deduce the pareto front or a near optimal pareto front. Below is an example fo a pareto front:

![A pareto front](Images/pareto_front.png)

## II. Solving MOO Problems
Solving Multi Objective Optimization (MOO) problems has two goals:
* Convergence: Finding solutions as close as possible to the pareto front
* Diversity: Find a good spread of solutions of the front

The techniques can be broadly divided into two approaches:

![Taxonomy of appraoches to solving MOO problems](Images/MOO_techniques.png)

1. **Classical approach**:
  * <u>Weighted method</u>: Use higher level information to estimate an "importance vector" that ranks which objective are more important than others. Using this importance vector, convert the multi-objective problem into a single objective optimization problem (eg. weighted sum). Mathematically, problem becomes: minimize $r_k F_1(x)+F_2(x)$, subject to $g_j(x)\geq 0, \forall j=\{1,\ldots,J\}, x\in D$. Here, $r_k$ is the ratio of weights on $F_1(x)$ and $F_2(x)$. This problem is then solved using single objective optimization methods.
  
    Diadvantages:
    * You must solve a Single Objective many times for each ratio of $w_1 /w_2$.
    * No control over area of objective space searched.
    * Appraoch will not work on non-convex parts of the tradeoff curve (The solution is always where the tangent is $w_1/w_2$. If there are two such points, then the optima is the lower of the two points.)
    ![Missed pareto optimal solutions in non-convex objective space](Images/weighted_moo.png)
    Here, you can change the ration of the weights to go from A to B, but you will still miss the non-convex part of the pareto optimal.
  * <u>Constraint Method</u>: Optimize one objective and constrain all the others. For example, minimize $F_2$, subject to $F_1\leq R_i$ and then solve the problems from many values of $R_i$. This approach **will work** on non-convex tradeoff curves.
    
    Disadvantages:
    * Must solve a single optimization problems many times.
    * Solution highly depends on the values of $R_i$ chosen. It has to be carefully chosen so that it lies within the feasible objective space.

    ![Selecting the ideal $R_i$ for constrained MOO](Images/constrained_moo.png)

    Here, it can be seen that the solution will depend entirely on the value of $R_i$. For $R_i = \epsilon_4$, then the solution will be point C. Similarly, $R_i = \epsilon_3 \to B$, $R_i = \epsilon_2 \to A$. If $R_i = \epsilon_1$, then we will not be able to find any solution at all.
2. **Multi Objective Approach**: Using an ideal multi-objective optimizer, achieve **multiple trade-off solutions**. Choose one among these trade off solutions using some higher level knowledge. These methods usually apply evolutionary algorithms.

## III. Non Dominated Sorting Genetic Algorithm (NSGA II)
Developed by [K. Deb et al (2000)](https://www.iitk.ac.in/kangal/Deb_NSGA-II.pdf), NSGA-II has several improtant ideas:
* It divides the population into multiple fronts
* It uses the frot to determine fitness
* It examines the distance between points close together on a front to determine fitness.
Below is the overall framework of NSGA.

![NSGA-II Framework](Images/NSGA_Framework.png)

### Sorting the population into "Fronts"
1. Sort the  population into solutions of "non-dominated fronts". The first such front is the Pareto optimal solution, i.e., all the points fromFrant 1 are non-dominated by all other points in the solution  space.
2. Generate teh 2nd frot by reomving all points from the 1st front and re-computing the pareto optimal solution. The points on this 2nd Pareto optimal curve are the 2nd front.
3. Repeat until $n$th front, removing all the points from the $1$st to $(n-1)$th fronts and finding the Pareto optimal for the remaining points.
For example, if we are trying to minimize two functions $f_1$ and $f_2$, then below could be the different fronts obtained.

![Example of Pareto optimal fronts in NGSA](Images/nsga_fronts.png)

### NSGA II - Handling Diversity
To have **diversity**, NSGA II has a probability of seleting as a parent an individual from any of the different fronts. However, this probability of being selected is less if the individual is located on a "worse" front. NSGA also tries to avoid parents that are too similar to each other so it has a method of computing "crowding". The likelihood of selecting a point for a parent is reduced if it is crowded.

### Parameters in NSGA II
We have the following parameters to be selected in NSGA II (the 1st three are common for all genetic algorithms):
* Population size
* crossover location probability
* Mutation rate
* $\epsilon$, the difference in base fitness between two adjacent fronts
* $\sigma_{share}$, the maximmum Euclidean distance between two points on the same "front" to consider them in the "crowding" calculations.

### Elitism in NSGA
Do the following at every iteration:
1. Do crossover on the population (of size $N$) and generate a set $Q_t$ of $N$ offspring, where $t$ is the generation number.
2. Combine the set of parents and children into a new set $R_t$
3. Do sorting as shown below and generate a new population of parents $P_{t+1}$ of size $N$

![Elistism in NSGA](Images/nsga_elitism_algo.png)

This approach has the following advantages:
* A good population is mantained by the elitism, which saves the best parents by letting them compete with the offspring.
* Diversity among non-dominated solutions is introduced by the crowding comparison procedure, wich is used with the tournament selection during the population reduction phase.

### Crowded Tournament Selection
Each individual has the following metrics:
* Rank $r_i$ that depends on the rank of the front it belongs to.
* Crowding distance $d_i$ (described later)

**Crowded Tournament Selection Operator:** A solution $i$ wins a tournament with another solution $j$ if any of the following conditions are true:
1. If solution $i$ has a better rank, that is, $r_i < r_j$
2. If they have the same rank but solution $i$ has a better crowding distance than solution $j$, that is, $r_i = r_j$ and $d_i > d_j$
Below is defined the `crowding_distance_assignemnt()` function, which takes as input the non-dominated set (a front) $\mathcal{I}$:

![Crowd distance algorithm for NSGA II](Images/nsga_crowding_algo.png)

The above function is equivalent of calculating for the $i$th solution, in it's front (marked with solid circles), the average side-length of the cuboid.

![Crowd distance cuboid](Images/nsga_crowd_cuboid.png)

We keep in mind that crowding is used to determine BOTH:
1. which individuals to keep from the last allowable front. This happens when the last allowable front has for example 10 indivudals but only 4 can be carried forward into the new generation; AND
2. in tournament secetion to generate new offspring.

## IV. Visualiation of Multi-Objective Solutions for more than 2 Objectives
For MOO problems, we have two desired properties: converge and even distribution. Therefore, for any solution, we can have two types of errors:

![Types of errors in MOO problems](Images/moo_errors.png)

As seen above, these errors can be easily visualized for the 2 objective function case. However, if we have more than 2 objective function, this becomes diffcult. Below are some methods that can be used to visualize solutions for more than 2 objectives:
1. Scatter plot visualization: In each box, two objectives are plotted in any one time. The diagonal represents which function is being drawn.

![Scatterplot visualization](Images/moo_scatter.png)

2. Value Path visualization: Each line represents a single solution on the Pareto front. The objective values are plotted after normalization. From the example below, you see that objective 1 and 3 arerelated int eh same direction, whereas objective 2 is a competing objective.

![Value Path visualization](Images/moo_valuepath.png)

3. Bar Chart method: Every pareto optimal is plotted as a bar graph for each of the objective functions seperately.
4. Star coordinate visualization: Each solution has a circle, and each of the radial line is the function value of that objective for that solution.

![Star coordination visualization](Images/moo_star.png)