# Errata of moea-benchmark data
> Discussing the fixed tuned MOEA/D data for 10-objective experimental scenarios.

- toc: true 
- badges: true
- comments: true
- categories: [jupyter]
- image: images/errata.png

This notebook has been rendered as an HTML page for your navigation. Yet, the notebook is also available for cloning, or to be executed online using Binder or Colab.

Below, you will find an analysis of the difference between the original (`indicators.csv.gz`) and errata (`indicators-errata.csv.gz`) datasets. 

In detail, the fix affects only tuned MOEA/D results for 10-objective experimental scenarios, and was caused by a typo that prevented MOEA/D from correctly reading input decomposition vectors.

---
## Setup

The difference in data between datasets has been provided as a separate CSV file in the moea-benchmark repository, for simplicity. The dataset can be read using the `pandas` data science library for Python, and includes a feature `version` to indicate whether the data comes from the original dataset or from the errata version.

In [1]:
#collapse-hide
import pandas as pd
df_errata = pd.read_csv("https://github.com/leobezerra/moea-benchmark/raw/master/moead-errata.csv.gz")

In [2]:
#hide_input
df_errata.head()

Unnamed: 0,setup,FE,algo,indicator,nobj,problem,nvar,seed,value,version
0,tuned,2500,moead,rpd,10,DTLZ2,31,1,0.110215,original
1,tuned,2500,moead,rpd,10,DTLZ2,31,2,0.101172,original
2,tuned,2500,moead,rpd,10,DTLZ2,31,3,0.101156,original
3,tuned,2500,moead,rpd,10,DTLZ2,31,4,0.099056,original
4,tuned,2500,moead,rpd,10,DTLZ2,31,5,0.080091,original


Besides `pandas`, we will also use the Plotly interactive data visualization library.

In [3]:
#collapse-hide
import re

import plotly.express as px
import plotly.graph_objects as go

### Boxplots

We start the discussion with boxplots comparing the original and errata data. 

Overall, results are improved by the fix, though it varies as a function of indicator and problem.

We remark that a few resources from Plotly can be useful for navigation:
- selecting a subset of the MOEAs, by clicking on their names in the legend
- zooming into a given range of a given plot, by selecting an area of the plot

In [4]:
#collapse-hide
fig_moead = px.box(
    df_errata,
    x="nvar",
    y="value",
    color="version",
    facet_col="FE",
    facet_row="indicator",
    animation_frame="problem",
    height=1000,
    category_orders={"indicator": ["rpd", "eps", "igd"]}
)

# Adjust the ranges
ymax = [10, 4, 0.4]
for k in fig_moead.layout: 
    if re.search('yaxis[1-9]*', k): 
        matches = re.findall(r'(\d+)', k)
        idx = int(matches[0]) if len(matches) else 1
        ymax_idx = (idx-1) // 3
        fig_moead.layout[k].update(matches=None, range=(0,ymax[ymax_idx]))

In [5]:
#hide_input
fig_moead.show()

### Rank sum

An updated Table 8 for 10-objective scenarios is given below.

Overall, MOEA/D:
- greatly improves its relative ranking according to the $\textit{HV}_\textit{rd}$
- improves its rank sum according to the remaining indicators, but not its relative ranking

In [6]:
#collapse_hide
# Auxiliary procedure to compute rank sums
def rank_sum(df, columns=["algo"]):
    df_wide = df.pivot_table(
        index=["indicator", "problem", "nvar", "seed"],
        columns=columns, 
        values=["value"]
    )
    
    return df_wide.rank(axis=1).groupby("indicator").sum()

# Compute the rank sums
df = pd.read_csv("https://github.com/leobezerra/moea-benchmark/raw/master/indicators-errata.csv.gz")
df_rs = df.groupby(["setup","FE", "nobj"]).apply(rank_sum).droplevel(0, axis=1)

In [7]:
#hide_input
for FE in [2500, 10000, 40000]:
    for indicator in ["rpd", "eps", "igd"]:
        idx = ("tuned", FE, 10, indicator)
        rs_diff = (df_rs.loc[idx] - df_rs.loc[idx].min())
        display(rs_diff.sort_values().to_frame().T)

Unnamed: 0,Unnamed: 1,Unnamed: 2,algo,sms,ibea,moead,cma,nsga,nsga3,spea,hype,moga
tuned,2500,10,rpd,0.0,909.0,1835.0,2145.5,3429.0,4018.0,4862.5,6113.0,7447.0


Unnamed: 0,Unnamed: 1,Unnamed: 2,algo,moead,ibea,sms,nsga,cma,nsga3,spea,hype,moga
tuned,2500,10,eps,0.0,1630.0,3195.0,3590.0,3850.5,4303.0,4519.5,5818.0,7228.0


Unnamed: 0,Unnamed: 1,Unnamed: 2,algo,ibea,nsga3,spea,cma,sms,nsga,hype,moead,moga
tuned,2500,10,igd,0.0,875.0,1324.5,1346.5,2128.0,2245.0,3353.0,3610.0,5968.0


Unnamed: 0,Unnamed: 1,Unnamed: 2,algo,ibea,sms,moead,cma,nsga3,spea,nsga,hype,moga
tuned,10000,10,rpd,0.0,429.0,1102.0,1311.0,2804.0,2971.0,3876.0,5419.0,6826.0


Unnamed: 0,Unnamed: 1,Unnamed: 2,algo,moead,ibea,sms,cma,nsga3,nsga,spea,hype,moga
tuned,10000,10,eps,0.0,1421.0,3015.0,3438.0,3499.0,4179.0,4701.0,5359.0,7064.0


Unnamed: 0,Unnamed: 1,Unnamed: 2,algo,nsga3,ibea,spea,nsga,sms,cma,hype,moead,moga
tuned,10000,10,igd,0.0,92.0,674.0,1744.0,2152.0,2294.0,2563.0,2724.0,5493.0


Unnamed: 0,Unnamed: 1,Unnamed: 2,algo,ibea,sms,spea,moead,cma,nsga3,nsga,hype,moga
tuned,40000,10,rpd,0.0,1053.0,1588.0,1880.0,2001.0,2444.0,2929.0,5289.0,6825.0


Unnamed: 0,Unnamed: 1,Unnamed: 2,algo,moead,ibea,nsga3,sms,nsga,spea,cma,hype,moga
tuned,40000,10,eps,0.0,1368.0,2855.0,3414.0,4009.0,4262.0,4777.0,5494.0,7424.0


Unnamed: 0,Unnamed: 1,Unnamed: 2,algo,ibea,nsga3,spea,nsga,hype,moead,cma,sms,moga
tuned,40000,10,igd,0.0,967.0,978.0,2731.0,3027.0,3895.0,4185.0,4599.0,6732.0


## Concluding remarks

In the original dataset, MOEA/D was run for 10-objective scenarios using weight vectors that limited its spread across the objective space. As a result, MOEA/D was able to achieve interesting binary $\epsilon$ results, which value convergence, but poor $\textit{HV}_\textit{rd}$ and $\textit{IGD}$ performance, given these indicators require spread and distribution.

In the errata dataset, results using a spread and distributed set of weight vectors enable MOEA/D to improve its performance according to the $\textit{HV}_\textit{rd}$. Yet, $\textit{IGD}$ performance was not improved to the point of affecting its relative ranking. This is a rather surprising result, given that MOEA/D was originally designed with $\textit{IGD}$ as target metric. We conjecture that these results are an effect of tuning for the binary $\epsilon$ indicator on 10-objective scenarios, as discussed in the paper.