# Assignment 8: Similarity measures

- Łukasz Andryszewski 151930
- Filip Firkowski 151946

Link to the repository is: https://github.com/lucapl/Evolutionary-Computations.

In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
import pandas as pd
import numpy as np
import scipy

from utils import *
from plotting import *
from similarity import nodes_similarity , edges_similarity, similarity_matrix, add_edges, similarity_to_best, similarity_average
pd.set_option('display.max_colwidth', None)

## Description of a problem:

The problem is about selecting exactly 50% of the nodes to form a Hamiltonian cycle that minimizes the total distance of the path and the total cost of the selected nodes.

In this raport we measure similarity between different local optimas.

## Pseudocode of similarity measures

<style>
  .no-page-break {
    page-break-inside: avoid;
  }
</style>

<div class="no-page-break">
  <h3>Nodes similarity</h3>
  <pre>
  function node_similarity(solution1, solution2):
    return len(set.AND(set(solution1.cityOrder), set(solution2.cityOrder))) / len(solution.cityOrder)
  </pre>
</div>

<div class="no-page-break">
  <h3>Edges similarity</h3>
  <pre>
  function edge_similarity(solution1, solution2):
    createEdges(solution1)
    createEdges(solution2)
    return len(set.AND(set(solution1.edges), set(solution2.edges))) / len(solution.edges)
  </pre>

  <pre>
  function createEdges(solution):
    for i in len(solution.cityOrder):
      city_1 = solution.cityOrder[i]
      city_2 = solution.cityOrder[i+1]
      solution.edges.add((min(city_1, city_2), max(city_1, city_2)))
  </pre>
</div>


When calculating the similarity in terms of edges in the solution, the edges are counted no matter their direction.

<style>
  table {
    width: 100%;
    table-layout: fixed;
    word-wrap: break-word;
  }
</style>

## Results of a computational experiments

The greedy local search was run 1000 times. It uses the edges neighbourhood.

In [2]:
solver_types = ["-".join(["localSearch","Greedy","Edges","Random"])]
instances = ['A', 'B']

all_json_data = load_all_json_data(solver_types,folder_path='../out8/')
add_edges(all_json_data)

table, best_solutions = get_best_solutions_and_vertical_table(solver_types,instances,all_json_data)

In [3]:
display_html(table,False)

Method,Instance A,Instance B
localSearch-Greedy-Edges-Random,73849.4 (70523.0-78365.0),48465.2 (45609.0-53093.0)


In [4]:
similarities_names, similarities = ('nodes','edges'), (nodes_similarity, edges_similarity)

similarity_matrices = {}
for instance in instances:
    if instance not in similarity_matrices:
        similarity_matrices[instance] = {}
    for name, similarity_measure in zip(similarities_names, similarities):
        similarity_matrices[instance][name] = similarity_matrix(all_json_data, instance, similarity_measure)

In [None]:
simils_to, simils_to_funcs = ("to the best solution", "average to each"), (similarity_to_best, similarity_average)

for instance in instances:
    for simil_name in similarities_names:
        for simil_to, simils_to_func in zip(simils_to, simils_to_funcs):
            data = simils_to_func(all_json_data,instance,similarity_matrices[instance][simil_name])
            plot_similarity_data(
                data,
                f"{simil_name.capitalize()} similarity - {simil_to} for instance {instance}"
            )

<h2>Pearson correlation coefficients</h2>

In [6]:
for instance in instances:
    for simil_name in similarities_names:
        for simil_to, simils_to_func in zip(simils_to, simils_to_funcs):
            data = simils_to_func(all_json_data,instance,similarity_matrices[instance][simil_name])
            x, y = data[:, 0], data[:, 1]
            print(f"{simil_name.capitalize()} similarity {simil_to} for instance {instance}:\t{scipy.stats.pearsonr(x,y).statistic:.3f}")

Nodes similarity to the best solution for instance A:	-0.627
Nodes similarity average to each for instance A:	-0.571
Edges similarity to the best solution for instance A:	-0.559
Edges similarity average to each for instance A:	-0.712
Nodes similarity to the best solution for instance B:	-0.583
Nodes similarity average to each for instance B:	-0.524
Edges similarity to the best solution for instance B:	-0.576
Edges similarity average to each for instance B:	-0.745


In [None]:
short_to = ("best", "average")
for instance in instances:
    for simil_name in similarities_names:
        for simil_to, simils_to_func in zip(short_to, simils_to_funcs):
            data = simils_to_func(all_json_data,instance,similarity_matrices[instance][simil_name])
            x, y = data[:, 0], data[:, 1]
            plt.scatter(x, y, label=f"{simil_name}-{simil_to}-{instance}")
            plt.xlabel("Objective function")
            plt.ylabel("Similarity value")
            plt.title("Entire data")
            plt.legend()
plt.show()

# Conclusions:

The most consistent trend in the data is that the with the increase in objective function theres an increase in similarity. That means that local minima are more likely to be similar to each other, especially when it comes to the choice of cities.

There is a bigger drop in similarity when it comes to edges present. This means that local minima often choose the same cities, but connect them quite differently.

If a straight line was made to fit the data, it would be much steeper for similarities in relation to the best solution, as opposed to the "average".