---
title: "Solving QAP"
author: "Łukasz Andryszewski 151930"
date: "May 2025"
subtitle: "Using a heuristic, local search, random algorithms, simulated annealing and tabu search."
titlepage: true
lof: true
lot: true
---

\newpage

# Problem and solver description

## Qaudratic Assignment Problem (QAP)

The tackled problem is NP-hard, meaning there's no polynomial time algorithm that guarantees optimum. The goal of the optimization is:

$$ \min_{\pi} \sum_{i=0}^{n}\sum_{j=0}^{n} a_{ij} \cdot b_{\pi_{i}\pi_{j}}$$

The problem can be thought of as assigning some 'departments' to some 'locations'. The $a$ is the flow of communication between the 'departments' and $b$ is a distance matrix between the 'locations'. The $\pi$ is a permutation, which describes to which 'locations', different 'departments' are assigned. The aim is to minimise the cost of communication between all of the 'departments', by controlling their 'location'.

The problem has real world applications, like in the case of the $\texttt{KraXXx}$ instances. These contain real world data, which was used to plan the Klinikum Regensburg in Germany. Or the $\texttt{EscXXx}$ instances which "...stem from an application in computer science, from the testing of self-testable sequential
circuits".

### Chosen instances:

Instances bigger in size were selected, to limit test the solver. For each type, as described in the QAPLIB description, it was desirable to have it in multiple sizes in order to potentially analyze the impact of the size of instance. It was also important for the instance to have the optimum known.

- $\texttt{Lipa20a}$, $\texttt{Lipa50a}$ and $\texttt{Lipa90a}$ examples were chosen because they are generated using the same method, the optimum is known and they are assymetric which makes them somewhat interesting.

- $\texttt{Esc16a}$, $\texttt{Esc32g}$ and $\texttt{Esc128}$, because they come from real life problems and are of different sizes.

- $\texttt{tai50b}$, $\texttt{tai100b}$ and $\texttt{tai150b}$, because they are all assymetric and generated the same way

- $\texttt{tai64c}$, $\texttt{tai256c}$ because they 'occur in the generation of grey patterns' and also contain the largest instance

## Solver implementation

The solver used to analyse the problem was written in C++ and compiled using g++. It can appropriately handle assymetric instances. It implements:

- Local search (in steepest and greedy version) $\text{---}$ local_search
- Random walk $\text{---}$ random_walk
- Random search $\text{---}$ random_seach
- a construction Heuristic $\text{---}$ heuristic
- Tabu search $\text{---}$ tabu_search
- Simulated Annealing $\text{---}$ simulated_annealing

Construction heuristic calculates value of adding a new assignment at the end. Then the best assignment is kept. An additional boolean array is used to determine if a 'location' is already assigned. Additionally the first assignment is random to make the heuristic non-deterministic.

The program is used as follows:

```bat
bio_solver.exe (instance:str) (solver_name:std) (repetitions:int) (solver_args...)
```

Local search takes ```steepest``` or ```greedy``` as an argument. Random walk and random search take ```duration``` in nanoseconds as an argument.

The program outputs:

- instance_name instance_size optimal_value
- optimal_solution
- number_of_repetitions
Then repetitions of:
- initial_solution
- final_solution
- starting_value final_value
- evaluations steps execution_time

```run_all.bat``` program takes instances as an argument and then runs all the solvers on these instances, with predefined number of repetitions. The execution time for RS and RW is taken as an average of execution times of greedy and steepest local search.

### Simulated Annealing

Simulated annealing arguments:

- number of cycles $P$ without improvements $\text{---}$ which is then multiplied by Markov chain length $L$ $\text{---}$ the algorithm stops after $P\cdot L$ steps without improvment.
- initial acceptance $\text{---}$ percentage of neighbours to be accepted by the initial temperature
- temperature decrease $\text{---}$ ratio of how much the temperature should decrease after a given chain $(\alpha)$ $\text{---}$ $c(k+1) = c(k) \cdot \alpha$
- chain lenght ratio $\text{---}$ percentage of neighbourhood size, which determines the size of Markov chains

The algorithm is inspired by the natural process of annealing, in which a material is slowly cooled to achieve a crystalic structure. In theory, with the right parameters and with a globally convex fitness landscape the algorithm can find the optimal value. 

In this implementation the algorithm:
1. Selects initial temperature, by scanning the deltas randomly sampled from the solution space. Then the average acceptance is calculated and using the Secant method the algorithm finds the temperature that satisfies the desired acceptance. Secant method is an algorithm for finding the root of a finding using numerical-analysis. It is recurrent and uses the following formula:
$$ x_n = x_{n-1} - f(x_{n-1}) \cdot \frac{x_{n-1} - x_{n-2}}{f(x_{n-1}) - f(x_{n-2})}$$ 
The starting $x_0$ and $x_1$ are selected to be $10^{-3}$ and $1$ respectively to avoid the method exploding. The algorithm is still able to find solutions beside that range. To find the temperature, the average acceptance is calculated and from it the desired acceptance is subtracted.

2. For each length of the markov chain ($L$) the temperature is the same. Then a next move in the neighbourhood is picked and applied if accepted. If a move is applied the neighbourhood is shuffled. After each evaluation the stopping condition is verified.

3. After the chain ends, the temperature $c$ is lowered as specified earlier:

$$c(k+1) = c(k) \cdot \alpha $$ 
3. When after $P \cdot L$ iterations(here not equivalent to a "step", but just the evaluation) with no improvement the current chain ends. Temperature is then lowered to $10^{-4}$ and run until no improvement is triggered again. If the temperature already is that low, the algorithm ends.

### Tabu Search

Tabu search arguments:

- number of steps without improvement 
- tabu tenure $\text{---}$ number of steps when a move is tabu, given as a percentage of size of instance
- top percent $\text{---}$ used to calculate the $k$ best solutions to consider in elite candidate list strategy
- max quality drop $\text{---}$ how much the quality of a move can drop in relation to the current solution value for the candidate list to be reconstructed

The algorithm tries to introduce exploration by accepting non improving neighbours. To avoid cycles a tabu list is kept, which marks a move *tabu* for *tabu tenure* iterations.

The procedure applies two aspiration criteria:

- aspiration by default $\text{---}$ if all moves are tabu, accept the least tabu move.

- aspiration by optimization objective $\text{---}$ accept a move, even if it is tabu, when applying it would bring get objective value, better than the best seen.

In this implementation an elite candidate list is introduced, to lower the computation cost.

1. At the start of the iteration, all solutions in the neighbourhood are evaluated and sorted.

2. Then the best $k$ moves, become the elite candidates. In each iteration the best value move $m_b$ is found and the best value move that is also the least tabu $m_t$.

3. If $m_b$ would produce a better individual than known to this point, the move is applied even if is tabu, which suffices the global aspiration criteria. Otherwise if $m_b$ the $m_t$ is applied, which satisfies aspiration by default.


In both algorithms if the value a solution is better then the best and the solution is about to be degraded $(\delta > 0)$, the current solution is saved as the best known. This way if the solution improves multiple times in a row, its not copied each time, but only when its about to get worse (like doing a checkpoint).

## Neighbourhood

The defining operator in local search is how the neighbourhood of a solution is defined. Local search implemented hear used the 2-OPT neighbourhood. For this operator the size can be defined as:

$$ N = \dfrac{n^2 - n}{2} $$

Which can be thought of as the upper triangular part of a square matrix. The neighbourhood is initialized, for each instance of local search, by creating every combination of two positions. Then its shuffled. A random offset is initialized to help with the randomization of ordering, whilst not spending much time on additional reshuffling. This is especially important for greedy, because the first found improvement is selected in each iteration.

# Comparison of algorithms

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import pandas as pd
from util import pandify
from itertools import product
from IPython.display import Markdown

In [2]:
def quality_over(start_val, final_val, opt):
    # assuming minimization of objective
    # to be minimized
    return (start_val - final_val) / opt

def quality_to_opt(final_val, opt):
    return quality_over(final_val,opt,opt)

def similarity(sol1, sol2):
    return np.mean(sol1==sol2)

In [3]:
instances = ["lipa20a","lipa50a","lipa90a","esc16a","esc32g","esc128","tai50b","tai100b", "tai150b","tai64c","tai256c"]
#print(*instances)

In [4]:
#instances = list("esc128 tai256c wil100 tho150 lipa90b chr25a bur26a rou12".split())
solvers = ["ls_greedy", "ls_steepest", "rs", "rw", "heuristic", "tabu_search", "simulated_annealing"]
shorter_solvers = ["G","S","RS","RW", "H", "TS", "SA"]
full_solvers = ["local_search_greedy","local_search_steepest","random_search","random_walk", "heuristic", "tabu_search", "simulated_annealing"]
solver_map = {short:full for short,full in zip(solvers,full_solvers)}
solver_to_short_map = {full:short for short,full in zip(shorter_solvers,full_solvers)}
colors = ["seagreen", "mediumblue", "red", "orange", "hotpink", "turquoise", "darkviolet"]
color_map = {solver:color for solver,color in zip(full_solvers, colors)}
img = "img"
img_size = (8.3,11.7)
img_format = ".pdf"

In [5]:
data = pandify(instances,solvers,solver_map)
#data.head(100)

## Parameters selected

Advanced metaheuristic algorithms (TS and SA) were run 50 times on each instance. The rest of the methods were run 300 times. The figures however only show 50 runs for each.

### Simulated Annealing

The chosen parameters were:
- P = 20
- initial acceptance = 0.95
- temperature decrease = 0.9
- ratio of chain length = 2.0 (of the neighbourhood size)

The temperature was chosen to decrease slowly to incentivise exploration. For that the P and ratio of chain length were increased so that the algorithm does not end prematurely. The chain length can be over $1.0$, because the neighbourhood is shuffled and the solution changes when the move is accepted.

The values were fitted, so the algorithm achieves at least as good results as local searches and performs more evaluations.

### Tabu Search

The chosen parameters were:
- no improvement iterations = 250
- tabu tenure = 0.25 (of the instance size)
- top percent = 0.2 (of the instance size, which calculates to k)
- max quality drop = 0.1

The tabu search was tuned to perform in as similar timeframe as possible, to simulated annealing. They were also tuned so that they achieve at least as good results as local searches.


## Running times

The time was measured in nanoseconds with the C++ ```chrono``` library using a ```high_resolution_clock```.

In [6]:
flierprops = dict(marker='o', markerfacecolor='black', markersize=5,markeredgecolor='none',alpha=0.2)
X = data.pivot(columns=["instance","solver"], index="repetition")["time"]
fig, axs = plt.subplots(ncols=3,nrows=4,layout="tight",sharex=True,sharey='row')
fig.set_size_inches(*img_size)
fig.supylabel("Running time")
for instance, ax in zip(instances,fig.get_axes()):
    ax.set_title(instance)
    vplot = ax.violinplot(X[instance].dropna(),showextrema=False)
    bplot = ax.boxplot(X[instance].dropna(),widths=0.2,medianprops=dict(color="black"),flierprops=flierprops)
    ax.set_yscale('log')
    ax.grid(True,which="major",ls='-')
    ax.grid(True,which="minor",ls='dotted')
    ax.yaxis.set_major_locator(mpl.ticker.LogLocator())
    ax.yaxis.set_major_formatter(mpl.ticker.FuncFormatter(lambda x, pos: f"{x/1e6:.2f} ms"))
    ax.set_xticks(np.arange(len(solvers))+1,shorter_solvers[:len(solvers)])
    for vp, color in zip(vplot["bodies"],colors):
        vp.set_color(color)
    ax.set_facecolor("whitesmoke")
fig.delaxes(axs[-1, -1])
axs[-2,-1].xaxis.set_tick_params(labelbottom=True)
img_path = f"{img}/run_times{img_format}"
plt.savefig(img_path)
plt.close()
display(Markdown(f"![Running times of algorithms]({img_path}){{#fig:run_times}}"))

![Running times of algorithms](img/run_times.pdf){#fig:run_times}

As seen in @fig:run_times Steepest is slower than greedy on every tested instance. The running times of random searche and random walk are identical in each repetition. The chosen duration of the algorithms was the average between mean running times of both local searches. As expected the construction heuristic was by far the fastest. Tabu Search and Simulated Annealing run longer, as they were parametrized to do so.

## Quality

The quality is measured as a relative gap according to the following formula:

$$ Q(v_z) = \dfrac{v_z-v_o}{v_o} = \dfrac{v_z}{v_o} - 1$$

where $v_z$ is the solution value and $v_o$ are the value of the optimum. In describes the distance to the optimum relative to it. This way it is also better comparable between instances. Lower values are more desirable.

In [7]:
X = data.assign(quality=lambda df: quality_to_opt(df.final_value,df.optimal_value)).pivot(columns=["instance","solver"], index="repetition")["quality"]
X = X.fillna(X.max())
fig, axs = plt.subplots(nrows=4,ncols=3,layout="tight",sharex=True)
fig.set_size_inches(*img_size)
fig.supylabel("Quality")
for instance, ax in zip(instances,fig.get_axes()):
    ax.set_title(instance)
    vplot = ax.violinplot(X[instance].dropna(),showextrema=False)
    bplot = ax.boxplot(X[instance].dropna(),widths=0.2,medianprops=dict(color="black"),flierprops=flierprops)
    ax.yaxis.set_major_formatter(mpl.ticker.PercentFormatter(xmax=1))
    ax.set_xticks(np.arange(len(solvers))+1,shorter_solvers[:len(solvers)])
    for vp, color in zip(vplot["bodies"],colors):
        vp.set_color(color)
    ax.set_facecolor("whitesmoke")
    ax.grid()
fig.delaxes(axs[-1, -1])
axs[-2,-1].xaxis.set_tick_params(labelbottom=True)
img_path = f"{img}/quality{img_format}"
plt.savefig(img_path)
plt.close()
display(Markdown(f"![Quality of algorithms]({img_path}){{#fig:quality}}"))

![Quality of algorithms](img/quality.pdf){#fig:quality}

Looking at @fig:quality As expected the random walk and random search usually have the biggest variances when it comes to the values of the solutions. However in some cases the heuristic performed worse than them. Both local searches had very similar performances and similar distributions, although greedy seems to be slightly better. For a few instances, like $\texttt{esc32g}$ the quality of the heuristic reaches $0\%$, which means it achieves the optimal value. This could suggest that these instances may be trivial. Overall the best solutions belong to either Tabu Search or Simulated Annealing. In the case of Simulated annealing the variance of the solutions is huge, like in $\texttt{Tai256c}$, although in cases like $\texttt{Lipa50a}$ Tabu Search is less stable in comparison.

## Efficiency of algorithms

The quality over time is measured as difference of relative gaps between the initial and final solution int relation to the optimum:

$$ V(v_i,v_z,t) = \dfrac{Q(v_i)-Q(v_z)}{t} = \dfrac{(v_i-v_o)-(v_z-v_o)}{v_ot} = \dfrac{v_i-v_z}{v_ot}$$

where $v_z$ is the solution value and $v_o$ are the value of the optimum. Intuitively it can be though of as 'speed' of the algorithm - how much it improves the initial solution over some time, relative to optima. Higher values are more desirable here.

In [8]:
X_raw = data.assign(quality=lambda df: quality_over(df.initial_value,df.final_value,df.optimal_value)).assign(q_over_t=lambda df: df.quality/df.time)
X = X_raw.pivot(columns=["instance","solver"], index="repetition")["q_over_t"]
fig, axs = plt.subplots(nrows=4,ncols=3,layout="tight",sharex=True,sharey='row')
fig.set_size_inches(*img_size)
fig.supylabel("Quality over time")
for instance, ax in zip(instances,fig.get_axes()):
    ax.set_title(instance)
    vplot = ax.violinplot(X[instance].dropna(),showextrema=False)
    bplot = ax.boxplot(X[instance].dropna(),widths=0.2,medianprops=dict(color="black"),flierprops=flierprops)
    ax.set_yscale('log')
    ax.grid(True,which="major",ls='-')
    ax.grid(True,which="minor",ls='dotted')
    ax.yaxis.set_major_locator(mpl.ticker.LogLocator())
    ax.yaxis.set_major_formatter(mpl.ticker.FuncFormatter(lambda x, pos: f"{x*100:.0e} %/ns"))
    ax.set_xticks(np.arange(len(solvers))+1,shorter_solvers[:len(solvers)])
    # ax.xaxis.set_major_locator(mpl.ticker.FixedLocator(np.arange(1,len(selected)+1)))
    # ax.set_xticklabels(shorter_solvers)
    for vp, color in zip(vplot["bodies"],colors):
        vp.set_color(color)
    ax.set_facecolor("whitesmoke")
fig.delaxes(axs[-1, -1])
axs[-2,-1].xaxis.set_tick_params(labelbottom=True)
img_path = f"{img}/efficiency{img_format}"
plt.savefig(img_path)
plt.close()
display(Markdown(f"![Efficiency of algorithms]({img_path}){{#fig:efficiency}}"))

![Efficiency of algorithms](img/efficiency.pdf){#fig:efficiency}

In @fig:efficiency Although greedy and steepest achieve similar qualities, because greedy is much faster, it achieves much better efficiency overall. Actually, in effciency steepest performs similarily to random search and random walk. Overall Tabu Search and Simulated Anneling are less efficient then Local Searches, as they run for much longer, while there is not much quality to improve further.  In some cases the heuristic, when compared to a random initial solution, produces a worse solution. This cannot be exactly visualized on a logarithm scale, because it reaches negative values. It can be seen in @tbl:heur-stats.

In [32]:
table = X_raw[X_raw.solver == "heuristic"].pivot(columns="instance",index="repetition")["q_over_t"].describe().loc[["min","mean","max"]]

display(Markdown(table.to_markdown(floatfmt=".2e")+"\n\n: Heuristic running times {#tbl:heur-stats}"))

|      |   esc128 |    esc16a |   esc32g |   lipa20a |   lipa50a |   lipa90a |   tai100b |   tai150b |   tai256c |    tai50b |   tai64c |
|:-----|---------:|----------:|---------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|---------:|
| min  | 3.77e-06 | -1.02e-04 | 1.57e-04 | -4.34e-06 |  5.89e-09 |  4.05e-09 | -4.94e-07 | -1.20e-07 |  1.29e-08 | -1.43e-05 | 3.35e-06 |
| mean | 8.53e-06 |  1.54e-04 | 7.36e-04 |  2.45e-06 |  1.74e-07 |  1.79e-08 |  2.39e-07 | -5.02e-08 |  4.31e-08 | -4.26e-06 | 9.70e-06 |
| max  | 1.15e-05 |  3.31e-04 | 1.27e-03 |  8.74e-06 |  3.51e-07 |  3.37e-08 |  9.83e-07 |  3.83e-08 |  7.30e-08 |  2.89e-06 | 2.18e-05 |

: Heuristic running times {#tbl:heur-stats}

## Number of steps of algorithms.

Here the number of steps performed by local searches, tabu search and simulated annealing are compared. Each step is defined as a single move to a neighbour. For steepest the entire neighbourhood is checked and the one giving the best improvement is performed. For greedy the first found improving move is performed.

In [10]:
selected = full_solvers[:2] + full_solvers[-2:]
mask_ls = data["solver"].isin(selected)
X = data[mask_ls].pivot(columns=["instance","solver"], index="repetition")["iterations"]
fig, axs = plt.subplots(nrows=4,ncols=3,layout="tight",sharex=True)
fig.set_size_inches(*img_size)
fig.supylabel("Steps")
for instance, ax in zip(instances,fig.get_axes()):
    ax.set_title(instance)
    vplot = ax.violinplot(X[instance].dropna(),showextrema=False)
    bplot = ax.boxplot(X[instance].dropna(),widths=0.2,medianprops=dict(color="black"))
    #ax.yaxis.set_major_formatter(mpl.ticker.FuncFormatter(lambda x, pos: f"{x*100:.3e} %/ns"))
    ax.set_xticks(np.arange(len(selected))+1,[solver_to_short_map[slv] for slv in selected])
    ax.set_yscale('log')
    ax.grid(True,which="major",ls='-')
    ax.grid(True,which="minor",ls='dotted')
    ax.yaxis.set_major_locator(mpl.ticker.LogLocator())
    for vp, slv in zip(vplot["bodies"],selected):
        vp.set_color(color_map[slv])
    ax.set_facecolor("whitesmoke")
    # ax.grid()
fig.delaxes(axs[-1, -1])
axs[-2,-1].xaxis.set_tick_params(labelbottom=True)
img_path = f"{img}/steps{img_format}"
plt.savefig(img_path)
plt.close()
display(Markdown(f"![Number of steps of algorithms]({img_path}){{#fig:steps}}"))

![Number of steps of algorithms](img/steps.pdf){#fig:steps}

As can be seen in @fig:steps, as expected greedy peforms more steps of the two local search algorithms, as it does not check the entire neighbourhood. Because of that it does not converge as quickly as steepest. The more advanced metaheuristics do more steps than Local searches as they explore more of the fitness landscape.

## Number of evaluations

An evaluation is counted each time an algorithm either evaluates an entire solution or if an algorithm performs a partial evaluation (calculating the delta of a swap).

In [11]:
# selected = full_solvers[:-1]
# mask_GSRSRW = data["solver"].isin(selected)
X = data.pivot(columns=["instance","solver"], index="repetition")["evaluations"]
fig, axs = plt.subplots(nrows=4,ncols=3,layout="tight",sharex=True)
fig.set_size_inches(8.3,11.7)
fig.supylabel("Evaluations")
for instance, ax in zip(instances,fig.get_axes()):
    ax.set_title(instance)
    vplot = ax.violinplot(X[instance].dropna(),showextrema=False)
    bplot = ax.boxplot(X[instance].dropna(),widths=0.2,medianprops=dict(color="black"))
    #ax.yaxis.set_major_formatter(mpl.ticker.FuncFormatter(lambda x, pos: f"{x*100:.3e} %/ns"))
    ax.set_xticks(np.arange(len(shorter_solvers))+1,shorter_solvers)
    ax.set_yscale('log')
    ax.grid(True,which="major",ls='-')
    ax.grid(True,which="minor",ls='dotted')
    ax.yaxis.set_major_locator(mpl.ticker.LogLocator())
    for vp, color in zip(vplot["bodies"],colors):
        vp.set_color(color)
    ax.set_facecolor("whitesmoke")
    # ax.grid()
fig.delaxes(axs[-1, -1])
axs[-2,-1].xaxis.set_tick_params(labelbottom=True)
img_path = f"{img}/evals{img_format}"
plt.savefig(img_path)
plt.close()
display(Markdown(f"![Number of evaluations of algorithms]({img_path}){{#fig:evals}}"))

![Number of evaluations of algorithms](img/evals.pdf){#fig:evals}

As shown in @fig:evals that despite steepest performing more steps, it performs less evaluations than steepest. This is exactly because it does not check the entire neighbourhood. It is also expected for random walk to perform more steps than random search. As these are run for the same amount of time and random walk simply peforms much less in a single 'step' compared to random search, it is able to evaluate more solutions. In some cases like $\texttt{Esc32g}$ the heuristic performs more evaluations than random search. As they run for longer, the metaheuristics evaluate more solutions than local searches on most of the instances. On some howevere like $\texttt{150b}$ some runs of Simulated Annealing evaluate less solutions than Steepest, which could suggest either premature convergance or quick convergence to an optimal value. Although the first case is much more likely. 

## Quality of initial vs final solution

The dependence between the initial quality of solution and final quality for local searches is checked.

In [12]:
X = data[mask_ls].assign(
    quality_fin=lambda df: quality_to_opt(df.final_value,df.optimal_value)
    ).assign(
    quality_start=lambda df: quality_to_opt(df.initial_value,df.optimal_value)
    ).pivot(columns=["instance","solver"], index="repetition")
corr_data = {col:[] for col in ["instance","solver","correlation"]}
fig, axs = plt.subplots(nrows=4,ncols=3,layout="constrained")
fig.set_size_inches(*img_size)
fig.supxlabel("Initial quality")
fig.supylabel("Final quality")
for instance, ax in zip(instances, fig.get_axes()):
    ax.set_title(instance)
    for solver, color in zip(full_solvers[:2], colors):
        x = X["quality_start"][instance][solver]
        y = X["quality_fin"][instance][solver]
        ax.scatter(x, y, s=3, label=solver_to_short_map[solver],color=color)
        corr = np.corrcoef(x,y)[0,1]
        corr_data["instance"].append(instance)
        corr_data["solver"].append(solver)
        corr_data["correlation"].append(corr)
        #ax.text(0.9,0.1,f"{corr:2f}",transform=ax.transAxes)
    ax.yaxis.set_major_formatter(mpl.ticker.PercentFormatter(xmax=1))
    ax.xaxis.set_major_formatter(mpl.ticker.PercentFormatter(xmax=1))
    #ax.tick_params(axis="x", labelrotation=15)
    ax.grid()
    ax.set_facecolor("whitesmoke")
corr_data = pd.DataFrame.from_dict(corr_data)
handles, labels = ax.get_legend_handles_labels()
fig.legend(handles, labels, loc='lower right')
fig.delaxes(axs[-1, -1])
img_path = f"{img}/final_v_initial{img_format}"
plt.savefig(img_path)
plt.close()
display(Markdown(f"![Comparison of quality between initial and final solution]({img_path}){{#fig:final-v-initial}}"))


![Comparison of quality between initial and final solution](img/final_v_initial.pdf){#fig:final-v-initial}

Additionally correlation between the final and initial solution value is presented in @tbl:correlations. The exact measure used is the Pearson Correlation coefficient.

In [19]:
table = corr_data.copy()
table["solver"] = table["solver"].apply(lambda s: solver_to_short_map[s])
table = table.pivot(columns="instance",index="solver")
display(Markdown(table["correlation"].to_markdown(floatfmt=".2e")+"\n\n: Correlation between initial and final solution in Greedy and Steepest for selected instances {#tbl:correlations}"))

| solver   |   esc128 |    esc16a |   esc32g |   lipa20a |   lipa50a |   lipa90a |   tai100b |   tai150b |   tai256c |    tai50b |    tai64c |
|:---------|---------:|----------:|---------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|
| G        | 4.16e-02 | -6.97e-02 | 2.86e-02 | -1.14e-01 | -7.87e-03 |  7.82e-02 |  8.99e-03 | -4.82e-02 | -2.54e-03 | -2.26e-02 | -3.40e-02 |
| S        | 5.15e-02 | -4.40e-02 | 5.02e-02 |  2.16e-02 | -7.28e-02 |  1.86e-03 | -6.01e-03 |  7.92e-02 |  3.33e-02 | -9.91e-02 | -1.01e-01 |

: Correlation between initial and final solution in Greedy and Steepest for selected instances {#tbl:correlations}

It would be expected that better initial solutions would produce better final solutions, by the possibility of being placed in a better region of the solution space. However there is seemingly no significant correlation between the quality of initial solution and the quality of the final solution, at least for the selected instances as presented in @fig:final-v-initial.

## Multi-random start local search

In this section the 300 repetitions of local searches are treated as a single multi-random start local search. It works similarily to random search but each solution in the restart is a local optimum.

In [14]:
X = data[mask_ls].assign(quality=lambda df: quality_to_opt(df.final_value,df.optimal_value)).pivot(columns=["instance","solver"], index="repetition")["quality"]
best = X.cummin()
mean = X.expanding().mean()
std = X.expanding().std().fillna(0)
fig, axs = plt.subplots(nrows=4,ncols=3,layout="constrained",sharex=True)
fig.set_size_inches(*img_size)
fig.supxlabel("Repetitions")
fig.supylabel("Quality")
for instance, ax in zip(instances, fig.get_axes()):
    ax.set_title(instance)
    ax.yaxis.set_major_formatter(mpl.ticker.PercentFormatter(xmax=1))
    for solver, color in zip(full_solvers[:2], colors):
        u = mean[instance][solver]
        ro = std[instance][solver]
        ax.plot(best[instance][solver],label=solver_to_short_map[solver]+" best",color=color)
        ax.plot(u,'--',label=solver_to_short_map[solver]+" mean",color=color)
        ax.fill_between(np.arange(len(u)),u+ro,(u-ro).clip(0), label=solver_to_short_map[solver]+" std",alpha=0.1,color=color)
    ax.grid()
    ax.set_facecolor("whitesmoke")
handles, labels = ax.get_legend_handles_labels()
fig.legend(handles, labels, loc='lower right',ncols=2)
fig.delaxes(axs[-1, -1])
axs[-2,-1].xaxis.set_tick_params(labelbottom=True)
img_path = f"{img}/multistart{img_format}"
plt.savefig(img_path)
plt.close()
display(Markdown(f"![Multistart Local search in Greedy and Steepest version]({img_path}){{#fig:multi-start}}"))


![Multistart Local search in Greedy and Steepest version](img/multistart.pdf){#fig:multi-start}

There is value to be gained from running the local search algorithm multiple times. This can be seen in @fig:multi-start However the exact number of steps is highly dependent on the instance, as it is an additional parameter. For most of the selected instances it seems $100$ repetitions is enough, as the algorithm usually flatlines. The mean stabilizes and stays roughly the same. Standard deviation is also stable and its value ultimately depends on the instance.

## Similarity of local optimas

The similarity between instances is measured as the inverse of Hamming distance:

$$ {Sim}(x_1, x_2) = 1 - H(x_1,x_2) = 1 - \dfrac{\sum_{i=0}^n \begin{cases} 1 & \text{if $x_1[i]$ $\neq$ $x_2[i]$} \\ 0 & {otherwise}\end{cases} }{n}$$

This measure seems appropriate for the nature of the problem as the position of a solution $x$ describes the department, while the index at that position describes to which location that department is assigned.

The similarities are calculated between all local optima and then averaged, as well as between the local optimum and global optimum.

The selected instances are $\texttt{Lipa90a}$ and $\texttt{tai100b}$ as the heuristic performs badly on them.

In [15]:
subset = ["lipa20a","lipa90a","esc16a", "tai100b"]
selected = full_solvers[:2]
to = ["optimal_solution","final_solution"]
to_titles = ["Wrt optimum", "Wrt eachother"]
to_map = {wrt:title for wrt,title in zip(to,to_titles)}
X = data[data["repetition"]<100].assign(
    quality=lambda df: quality_over(df.initial_value,df.final_value,df.optimal_value)
    ).pivot(columns=["instance","solver"], index="repetition")[to+["quality"]]
fig, ax = plt.subplots(nrows=4,ncols=2,layout="constrained")
fig.set_size_inches(*img_size)
for (instance, wrt), ax in zip(product(subset,to),fig.get_axes()):
    for solver, color in zip(selected,colors):
        y = np.array(list(map(lambda sols: similarity(*sols),product(X[wrt][instance][solver],X["final_solution"][instance][solver]))))
        n = np.ceil(np.sqrt(y.shape[0])).astype(int)
        y = y.reshape((n,n))
        np.fill_diagonal(y,0)
        mean_y = (y.mean(0))*(n/(n-1))
        x = X["quality"][instance][solver]
        ax.scatter(x,mean_y,label=solver_to_short_map[solver],color=color,s=4)
        if wrt == to[1]:
            np.fill_diagonal(y,mean_y)
            std_y = y.std(0)*np.sqrt((n)/(n-1)) 
            ax.errorbar(x,mean_y,std_y,color=color,linestyle="none",marker="none",alpha=0.1)

        ax.xaxis.set_major_formatter(mpl.ticker.PercentFormatter(xmax=1))
        ax.yaxis.set_major_formatter(mpl.ticker.PercentFormatter(xmax=1))
        ax.set_title(to_map[wrt]+" in "+instance)
        ax.grid(True)
        ax.set_facecolor("whitesmoke")
fig.supxlabel("Quality")
fig.supylabel("Similarity")
handles, labels = ax.get_legend_handles_labels()
fig.legend(handles, labels, loc='outside lower right',ncols=2)
img_path = f"{img}/simils{img_format}"
plt.savefig(img_path)
plt.close()
display(Markdown(f"![Quality of a solution compared with similarity to optimum(left side) or each solution(right side)]({img_path}){{#fig:simils}}"))


![Quality of a solution compared with similarity to optimum(left side) or each solution(right side)](img/simils.pdf){#fig:simils}

If the problem is globally convex, the expectation is that locally optimal solutions are very similar to eachother. However @fig:simils shows that the similarity is low overall, which can stem from how the nature hamming distance and the size of the instances. Simply speaking, high values are not easy to achieve, even on the smaller instances.

# Conclusions

Overall it was quite suprising to see that the greedy version of local search performed better than the steepest, as I have not seen that often in practice. Althought the final quality was similar, the greedy version was much more efficient. In cases of instances like $\texttt{Esc32g}$ the heuristic was able to find the optimum along with local searches, as well as competing solutions for instances like $\texttt{Tai256c}$. 

No significant correlation between the quality of initial solution and final solution in local searches was found. No significant relation between the similarity and the final quality was found, but this could be partly due to how similarity is calculated.

Lastly there is value to be gained from running local search multiple times from different starting locations. After around 100 repetitions, which depends on the instance, the algorithm improves after which it stagnates.

# Difficulties

Coming up with an appropriate construction heuristic is not easy as it depends on the problem. In this case its not entirely intuitive on how to do it. The final solution is one that could probably be implemented for most problems, but does not imply good performance. In comparison a heuristic for TSP would add the closest city each time, which in a way is very similar to this solution!

There were some difficulties in how to properly pass the specific arguments to the solvers in the program. Finally it was achieved by implementing a ```Experiment class``` which first parses all command lines arguments and then using a ```switch```, launches the appropriate solver with its arguments.

Overall implementing the metaheursics correctly and picking proper initial parameters was much harder, than in the case of Local searches.

# Introduced improvements

Initially the heuristic evaluated the whole incomplete solution. The improvement was to just evaluate assignment of the last 'location', which could be seen as analogous to just evaluating a delta of a move in local algorithms. This was it was much faster, as one would expect for a construction heuristic, while not changing the final result.