# Constructive heuristics comparison

The objective of the $\alpha$-neighbor $p$-center problem can be thought of as "equally" distributing the facilities among the clients to cover them efficiently.

This is the actual goal of the $p$-dispersion problem, so a constructive heuristic that uses its objective function will be tested and compared against a greedy heuristic that takes into account the objective function of this problem.

There will be used 20 random instances of size $n = 50$, $p = 5$ and other 20 of size $n = 400$, $p = 20$, and each one will be tested with both $\alpha = 2$ and $\alpha = 3$. The coordinates of the points are between 0 and 1000 for both planes.

In [1]:
from copy import deepcopy
from typing import List

from models.instance import Instance


def generate_instances(amount: int, n: int, p: int) -> List[Instance]:
    alpha2 = [
        Instance.random(n, p, 2, 1000, 1000)
        for _ in range(amount)
    ]
    alpha3 = deepcopy(alpha2)
    for i in alpha3:
        i.alpha = 3
    return alpha2 + alpha3

In [2]:
instances = generate_instances(20, 50, 5) + generate_instances(20, 400, 20)

We will use the following code to measure the time taken by the evaluations and the objective function results, formatted in a Pandas DataFrame.

In [3]:
import timeit

import pandas as pd

from heuristics.constructive import pdp_based, greedy
from utils import eval_obj_func

def measure(instance, heuristic):
    start = timeit.default_timer()
    solution = heuristic(instance)
    time = timeit.default_timer() - start
    of = eval_obj_func(instance, solution)
    return heuristic.__name__, solution, of, time

def get_dataframe(data):
    return pd.DataFrame({
        colname: [d[i] for d in data]
        for colname, i in zip(
            ('n', 'p', 'a', 'heuristic', 'solution', 'OF', 'seconds'),
            range(len(data[0])))
    })

## Comparing data

Let's create the dataframe of PDP-based evaluations, or get it from a file if it already exists:

In [4]:
import os

OUT_FOLDER = 'nb_results\\constructive'

filepath = os.path.join(OUT_FOLDER, 'pdp_df.csv')
if os.path.exists(filepath):
    pdp_df = pd.read_csv(filepath)
else:
    pdp_data = [(*i.get_parameters(), *measure(i, pdp_based))  for i in instances]
    pdp_df = get_dataframe(pdp_data)
    pdp_df.to_csv(filepath, index=False)
pdp_df

Unnamed: 0,n,p,a,heuristic,solution,OF,seconds
0,50,5,2,pdp_based,"{33, 5, 37, 40, 28}",628,0.000760
1,50,5,2,pdp_based,"{33, 2, 34, 13, 14}",607,0.000535
2,50,5,2,pdp_based,"{35, 5, 8, 9, 42}",794,0.000527
3,50,5,2,pdp_based,"{18, 19, 13, 45, 14}",611,0.000617
4,50,5,2,pdp_based,"{1, 7, 27, 29, 47}",643,0.000525
...,...,...,...,...,...,...,...
75,400,20,3,pdp_based,"{66, 68, 198, 74, 395, 397, 399, 16, 81, 209, ...",406,0.074006
76,400,20,3,pdp_based,"{261, 8, 201, 202, 13, 22, 281, 155, 222, 289,...",390,0.093924
77,400,20,3,pdp_based,"{1, 259, 325, 198, 134, 201, 204, 77, 78, 208,...",375,0.073400
78,400,20,3,pdp_based,"{391, 10, 13, 77, 271, 16, 17, 274, 144, 84, 2...",372,0.064396


Now the dataframe of the greedy results:

In [5]:
filepath = os.path.join(OUT_FOLDER, 'greedy_df.csv')
if os.path.exists(filepath):
    greedy_df = pd.read_csv(filepath)
else:
    greedy_data = [(*i.get_parameters(), *measure(i, greedy))  for i in instances]
    greedy_df = get_dataframe(greedy_data)
    greedy_df.to_csv(filepath, index=False)
greedy_df

Unnamed: 0,n,p,a,heuristic,solution,OF,seconds
0,50,5,2,greedy,"{2, 3, 37, 40, 28}",600,0.216955
1,50,5,2,greedy,"{33, 1, 13, 14, 15}",607,0.067130
2,50,5,2,greedy,"{1, 35, 8, 44, 31}",627,0.021097
3,50,5,2,greedy,"{18, 3, 4, 43, 13}",591,0.050969
4,50,5,2,greedy,"{20, 38, 7, 9, 29}",610,0.020568
...,...,...,...,...,...,...,...
75,400,20,3,greedy,"{1, 2, 3, 4, 5, 198, 6, 8, 7, 9, 10, 14, 16, 2...",369,32.796715
76,400,20,3,greedy,"{1, 2, 3, 4, 5, 6, 7, 8, 201, 10, 12, 14, 16, ...",401,36.384875
77,400,20,3,greedy,"{1, 2, 3, 67, 4, 6, 201, 9, 11, 12, 77, 78, 14...",434,33.087493
78,400,20,3,greedy,"{1, 2, 3, 4, 5, 6, 7, 8, 9, 138, 10, 12, 13, 1...",404,29.960488


We now have 2 dataframes, one for each heuristic. Let's filter them by $n$ and $\alpha$ too:

In [15]:
filtered_data = {
    heuristic: {
        f'n{n}': {
            f'a{alpha}': df[
                (df['n'] == n) &
                (df['a'] == alpha)
                ].iloc[:, [0, 1, 2, 3, 5, 6]]
            for alpha in (2, 3)
        }
        for n in (50, 400)
    }
    for heuristic, df in (('pdp', pdp_df), ('greedy', greedy_df))
}

Now we can access the data by using keys referring to the heuristic, its size $n$ and $\alpha$:

In [16]:
filtered_data['pdp']['n50']['a2']

Unnamed: 0,n,p,a,heuristic,OF,seconds
0,50,5,2,pdp_based,628,0.00076
1,50,5,2,pdp_based,607,0.000535
2,50,5,2,pdp_based,794,0.000527
3,50,5,2,pdp_based,611,0.000617
4,50,5,2,pdp_based,643,0.000525
5,50,5,2,pdp_based,688,0.000509
6,50,5,2,pdp_based,608,0.000518
7,50,5,2,pdp_based,743,0.000526
8,50,5,2,pdp_based,779,0.000502
9,50,5,2,pdp_based,627,0.000532


To calculate some basic statistics about the data, let's create a function that will take parameters $n$ and $\alpha$ to compare the results between the 2 heuristics:

In [7]:
def calc_stats(n, a):
    ncol = f'n{n}'
    acol = f'a{a}'
    stats = (filtered_data['pdp'][ncol][acol]
        .compare(filtered_data['greedy'][ncol][acol], keep_equal=True)
        .rename(columns={ 'self': 'pdp', 'other': 'greedy' })
        .drop(columns='heuristic'))

    stats['OF', 'absolute'] = stats['OF', 'pdp'] - stats['OF', 'greedy']
    stats['OF', '%'] = (stats['OF', 'absolute'] / stats['OF', 'pdp']) * 100

    order = ['pdp', 'greedy', 'absolute', '%']
    tops = ('OF', 'seconds')
    stats = stats.loc[:, (tops, order)]

    winnings = [
        stats[stats['OF', 'absolute'] <= 0].count()[0],
        stats[stats['OF', 'absolute'] > 0].count()[0],
        '', '', '', ''
    ]
    average = [
        stats[col].mean()
        for col in stats.columns
    ]

    stats.loc['winnings'] = winnings
    stats.loc['average'] = average

    return stats

In [17]:
from itertools import product

from IPython.display import display

for n, a in product((50, 400), (2, 3)):
    print('------------------------')
    print(f'n = {n}, alpha = {a}')
    stats = calc_stats(n, a)
    filepath = os.path.join(OUT_FOLDER, f'stats_n{n}_a{a}.csv')
    stats.to_csv(filepath)
    display(stats)

------------------------
n = 50, alpha = 2


  obj = obj._drop_axis(labels, axis, level=level, errors=errors)


Unnamed: 0_level_0,OF,OF,OF,OF,seconds,seconds
Unnamed: 0_level_1,pdp,greedy,absolute,%,pdp,greedy
0,628.0,600.0,28.0,4.458599,0.00076,0.216955
1,607.0,607.0,0.0,0.0,0.000535,0.06713
2,794.0,627.0,167.0,21.032746,0.000527,0.021097
3,611.0,591.0,20.0,3.273322,0.000617,0.050969
4,643.0,610.0,33.0,5.132193,0.000525,0.020568
5,688.0,548.0,140.0,20.348837,0.000509,0.032279
6,608.0,577.0,31.0,5.098684,0.000518,0.016759
7,743.0,558.0,185.0,24.899058,0.000526,0.019974
8,779.0,582.0,197.0,25.288832,0.000502,0.017632
9,627.0,579.0,48.0,7.655502,0.000532,0.020344


------------------------
n = 50, alpha = 3


  obj = obj._drop_axis(labels, axis, level=level, errors=errors)


Unnamed: 0_level_0,OF,OF,OF,OF,seconds,seconds
Unnamed: 0_level_1,pdp,greedy,absolute,%,pdp,greedy
20,799.0,659.0,140.0,17.521902,0.000595,0.029482
21,924.0,688.0,236.0,25.541126,0.000536,0.016152
22,832.0,794.0,38.0,4.567308,0.001752,0.015544
23,786.0,611.0,175.0,22.264631,0.001203,0.015619
24,918.0,653.0,265.0,28.867102,0.000576,0.01683
25,856.0,761.0,95.0,11.098131,0.000534,0.017954
26,758.0,617.0,141.0,18.601583,0.000977,0.013202
27,762.0,665.0,97.0,12.729659,0.000547,0.018028
28,903.0,786.0,117.0,12.956811,0.000509,0.019016
29,738.0,712.0,26.0,3.523035,0.000521,0.012126


------------------------
n = 400, alpha = 2


  obj = obj._drop_axis(labels, axis, level=level, errors=errors)


Unnamed: 0_level_0,OF,OF,OF,OF,seconds,seconds
Unnamed: 0_level_1,pdp,greedy,absolute,%,pdp,greedy
40,307.0,304.0,3.0,0.977199,0.07164,45.30798
41,292.0,350.0,-58.0,-19.863014,0.102789,42.235033
42,310.0,309.0,1.0,0.322581,0.12521,50.214393
43,294.0,426.0,-132.0,-44.897959,0.051584,45.909697
44,317.0,297.0,20.0,6.309148,0.057793,48.009938
45,297.0,358.0,-61.0,-20.538721,0.049126,51.167755
46,328.0,336.0,-8.0,-2.439024,0.05111,45.846795
47,323.0,354.0,-31.0,-9.597523,0.050484,45.359293
48,275.0,362.0,-87.0,-31.636364,0.052707,49.758392
49,339.0,341.0,-2.0,-0.589971,0.062176,50.711559


------------------------
n = 400, alpha = 3


  obj = obj._drop_axis(labels, axis, level=level, errors=errors)


Unnamed: 0_level_0,OF,OF,OF,OF,seconds,seconds
Unnamed: 0_level_1,pdp,greedy,absolute,%,pdp,greedy
60,409.0,425.0,-16.0,-3.91198,0.06933,30.42869
61,353.0,480.0,-127.0,-35.977337,0.056991,29.794342
62,400.0,389.0,11.0,2.75,0.057283,34.427897
63,386.0,431.0,-45.0,-11.658031,0.055634,30.758092
64,350.0,395.0,-45.0,-12.857143,0.059793,32.295813
65,377.0,511.0,-134.0,-35.543767,0.053938,33.588127
66,357.0,407.0,-50.0,-14.005602,0.057638,30.638826
67,422.0,361.0,61.0,14.454976,0.053892,34.784959
68,372.0,444.0,-72.0,-19.354839,0.059268,99.633402
69,471.0,414.0,57.0,12.101911,0.058515,64.141011


The greedy heuristic performs better with small instances, with a minimal difference in time taken compared to the PDP-based, both of them taking less than 1 second.
However, with bigger instances the greedy takes more than 30 seconds while also performing worse.
The PDP-based takes more time solving big instances but it's still less than a second.

As a result, we can conclude that the greedy heuristic is only worth using in small instances.