Copyright (c) 2023 Ivan Magistro Contenta, Matteo Pietro Pillitteri and Francesca Zafonte (<s314356@studenti.polito.it> , <s314404@studenti.polito.it>, <s319331@studenti.polito.it>)
https://github.com/ivanmag22/computational-intelligence
https://github.com/Matteo-Pietro-Pillitteri/Computational-Intelligence
https://github.com/Zafonte/computational-intelligence

In [148]:
from random import random
from functools import reduce
from collections import namedtuple
from queue import PriorityQueue, SimpleQueue, LifoQueue

import numpy as np

**General case:**

In [149]:
PROBLEM_SIZE = 5  # we have 5 elements to cover
NUM_SETS = 10  # we have 10 tails in SET.
SETS = tuple(
    np.array([random() < 0.3 for _ in range(PROBLEM_SIZE)])
    for _ in range(NUM_SETS)
)
State = namedtuple("State", ["taken", "not_taken"])

**Goal**: minimize the number of sets (num of array in SETS used to cover all the elements)

In [150]:
SETS

(array([ True, False, False, False, False]),
 array([False, False,  True, False, False]),
 array([False,  True, False, False, False]),
 array([ True,  True,  True, False, False]),
 array([False, False,  True, False, False]),
 array([False, False,  True, False, False]),
 array([ True,  True,  True, False, False]),
 array([False,  True,  True,  True,  True]),
 array([False,  True,  True, False,  True]),
 array([False, False, False, False, False]))

**Specific case**: we created **SETS** (manually) that shows that the algorithm chooses the state with the lowest overlap if there are states with the same cost (**number of tiles  + distance to the goal**)

In [151]:
PROBLEM_SIZE = 5
NUM_SETS = 5

SETS = tuple(
    np.array(
        [
            [True, True, False, False, True],
            [True, False, False, True, False],
            [True, True, True, False, False],
            [False, False, False, False, False],
            [False, False, True, False, False],
        ]
    )
)
State = namedtuple("State", ["taken", "not_taken"])

In [152]:
SETS

(array([ True,  True, False, False,  True]),
 array([ True, False, False,  True, False]),
 array([ True,  True,  True, False, False]),
 array([False, False, False, False, False]),
 array([False, False,  True, False, False]))

It is possible to use the following code for both cases 

In [153]:
def goal_check(state):
    return np.all(
        reduce(
            np.logical_or,
            [SETS[i] for i in state.taken],
            np.array([False for _ in range(PROBLEM_SIZE)]),
        )
    )

def covered(state):
    return reduce(
            np.logical_or,
            [SETS[i] for i in state.taken],
            np.array([False for _ in range(PROBLEM_SIZE)]),
        )

def heuristic(
    state,
):  # this function gives the distance to the goal state in terms of number of missing Trues
    return PROBLEM_SIZE - np.sum(covered(state))


def overlap_cost(
    state,
):  # this function returns the cost due to overlapping among the array in SETS
    set_overlap = np.array(
        np.sum([SETS[i] for i in state.taken], axis=0)
    )
    res = [x - 1 if x > 1 else 0 for x in set_overlap]
    return np.sum(res) / (PROBLEM_SIZE * len(state.taken))


def g(state):
    return len(state.taken) + overlap_cost(state)

def f_cost(state):    # f(n) = h(n) + g(n)
    return heuristic(state) + g(state)

We know that our heuristic cost function is pessimistic because each time our function returns the distance d from the current state to the goal one. So it means that we need d sets to reach the goal state, but it is possible to reach it also with less sets.

A possible heuristic cost is the one that returns the minimum number of sets that could cover the whole interval of elements (as implemented by prof. Squillero in set-covering_path-search.ipynb).

In [154]:
def goal_check(state):
    return np.all(
        reduce(
            np.logical_or,
            [SETS[i] for i in state.taken],
            np.array([False for _ in range(PROBLEM_SIZE)]),
        )
    )

def covered(state):
    return reduce(
            np.logical_or,
            [SETS[i] for i in state.taken],
            np.array([False for _ in range(PROBLEM_SIZE)]),
        )

def h3(state):  # given the state, we compute the heuristic cost as the minimum number of needed sets to add to the current taken sets that could cover the interval  
    already_covered = covered(state)
    if np.all(already_covered):
        return 0
    missing_size = PROBLEM_SIZE - sum(already_covered)
    candidates = sorted((sum(np.logical_and(s, np.logical_not(already_covered))) for s in SETS), reverse=True)
    print("\th3(): print candidates of ",state.taken,": ",candidates)
    taken = 1
    while sum(candidates[:taken]) < missing_size:
        taken += 1
    print("\th3(): print heuristic cost of ",state.taken,": ",taken)
    return taken


def overlap_cost(
    state,
):  # this function returns the cost due to overlapping among the array in SETS
    if(len(state.taken)>1): # we put a condition on the number of taken sets in order to avoid to work with 0 dimensional array and to do not work with only one vector because there is no overlap in this case
        set_overlap = np.array(
            np.sum([SETS[i] for i in state.taken], axis=0)
        )
        res = [x - 1 if x > 1 else 0 for x in set_overlap]
        return np.sum(res) / (PROBLEM_SIZE * len(state.taken))
    else:
        return 0


def g(state):
    return len(state.taken) + overlap_cost(state)

def f_cost(state):  # f(n) = h(n) + g(n)
    return h3(state) + g(state)

In [155]:
assert goal_check(
    State(set(range(NUM_SETS)), set())
), "Problem not solvable"

frontier elements: index 0 -> cost ; index 1 -> state

In [156]:
frontier = PriorityQueue()
state = State(set(), set(range(NUM_SETS)))
frontier.put((f_cost(state), state))

counter = 0
current_cost, current_state = frontier.get()
while not goal_check(current_state):
    print("current state: ", current_state,"; current cost: ",current_cost)
    counter += 1
    for action in current_state.not_taken:
        new_state = State(
            current_state.taken ^ {action},
            current_state.not_taken ^ {action},
        )
        frontier.put((f_cost(new_state), new_state))
    current_cost, current_state = frontier.get()
print("solution: ", current_state,"\n")

print(
    f"Solved in {counter:,} steps, tiles: ({len(current_state.taken)})"
)

	h3(): print candidates of  set() :  [3, 3, 2, 1, 0]
	h3(): print heuristic cost of  set() :  2
current state:  State(taken=set(), not_taken={0, 1, 2, 3, 4}) ; current cost:  2
	h3(): print candidates of  {0} :  [1, 1, 1, 0, 0]
	h3(): print heuristic cost of  {0} :  2
	h3(): print candidates of  {1} :  [2, 2, 1, 0, 0]
	h3(): print heuristic cost of  {1} :  2
	h3(): print candidates of  {2} :  [1, 1, 0, 0, 0]
	h3(): print heuristic cost of  {2} :  2
	h3(): print candidates of  {3} :  [3, 3, 2, 1, 0]
	h3(): print heuristic cost of  {3} :  2
	h3(): print candidates of  {4} :  [3, 2, 2, 0, 0]
	h3(): print heuristic cost of  {4} :  2
current state:  State(taken={0}, not_taken={1, 2, 3, 4}) ; current cost:  3
	h3(): print candidates of  {0, 1} :  [1, 1, 0, 0, 0]
	h3(): print heuristic cost of  {0, 1} :  1
	h3(): print candidates of  {0, 2} :  [1, 0, 0, 0, 0]
	h3(): print heuristic cost of  {0, 2} :  1
	h3(): print candidates of  {0, 3} :  [1, 1, 1, 0, 0]
	h3(): print heuristic cost of  {0, 3

In [157]:
overlap_cost(current_state)

0.06666666666666667

In [158]:
current_state

State(taken={0, 1, 4}, not_taken={2, 3})

The following code tests the PriorityQueue when there are both decimal and integer costs. It is possible to use a decimal cost for assign less importance to a state with an higher overlap. So, this decimal contribution to the overall cost of a state, is due to **overlapping** and it moves from 0.1 to 0.9 according to the number of overlapped elements. In order to prioritize the state with a minimum overlap, it's possbile to assign a lower decimal value. 

In [159]:
prova = PriorityQueue()
prova.put((2.2, "ciao"))
prova.put((2, "ivan"))
prova.put((3, "matteo"))

In [160]:
prova.get()

(2, 'ivan')

The following code shows how it's possible to calculate the overlap

In [161]:
PROVA_SETS = tuple(
    np.array([random() < 0.6 for _ in range(PROBLEM_SIZE)])
    for _ in range(2)
)



In [162]:
PROVA_SETS

(array([False,  True, False,  True, False]),
 array([ True,  True,  True, False, False]))

In [163]:
res = np.array(np.sum(PROVA_SETS, axis=0))
res

array([1, 2, 1, 1, 0])

If there are 3 arrays, for istance [1, 1, 0, ..]
                                   [1, 0, 0, ..]
                                   [1, 1, 1, ..]
for the first element there is an overlap between the first and the second array. Then, there is another overlap between the second and the third array. In sum, for the first element it's possible to count an overlap equals to 2.

In [164]:
res = [x - 1 if x > 1 else 0 for x in res]

In [165]:
res

[0, 1, 0, 0, 0]

In order to calculate a decimal cost it is possible to sum the values of the array evaluated from the previous step and divide it for the total number of elements in the "taken set". In this small example PROBLEM_SIZE * len(PROVA_SETS) represents the taken set

In [166]:
x = np.sum(res) / (PROBLEM_SIZE * len(PROVA_SETS))

In [167]:
x

0.1