## Introduction

First of all, I'm not a expert in python, I'm learning this language for the first time in this course, so I will try my best to solve the problem of set-covering coming up with my own original solution (hopefully) to both reaching  the goal and improve my knowledge of the language. 

### The problem, the goal, the strategy
Here I try to model a 'formal' description of the set-covering.

Given a number N and a list X of sets Si of integers X = (S0,S1,S2,...,Sn), determine, if possible, a list Y of taken sets Ti, Y = (T0,T1,...,Tn), such that each integer between 0 and N-1, appears in the union of sets in Y, and that the total number of the sets used to reach the previously stated condition is minimum.

To reach the solution we are gonna make use of A* strategy: It works by exploring nodes in order of their f(n) value, where *f(n) = g(n) + h(n)*.

It's seems logical for the time being, to evaluate the candidate nodes that **minimize** *f(n)*.

It starts at the initial node, evaluates its neighbors, and selects the one with the lowest f(n) value to explore next. The process continues until the goal node is reached or the open set of nodes to be evaluated is empty.

A* is considered "optimal" because it guarantees finding the shortest path as long as the heuristic function h(n) is admissible (never overestimates the true cost) and consistent (satisfies the triangle inequality). Common heuristics include the Manhattan distance or Euclidean distance for grid-based and Euclidean spaces, respectively.

## Implementation

`Imports of libraries`:
- heapq is priority queue, it seems logical to use something like this based on the nature of A*
- random it's used to generate the sets, it's possible to fix the seed to make the problem reproducible aka pseudorandom

In [13]:
import heapq
import random
#import logging


`Problem Generation`: Generated a Goal States and the list of sets X.

In [14]:
def problem(N, fixed_seed=42):
    """Creates an instance of the problem with a fixed seed"""
    random.seed(fixed_seed)
    
    p = [
        list(set(random.randint(0, N - 1) for n in range(random.randint(N // 5, N // 2))))
        for _ in range(random.randint(N, N * 5))
    ]
    
    return p

`State Representation`: Define a state representation that keeps track of the current solution, the remaining uncovered elements, and the list of sets that can be considered for covering.
At the beginning I used a dictionary, bad idea since it does not go well with priority queues, at the end I'm just considering sets as strings with integers separated by semicolons

`Heuristic`: takes a state as an argument. This function calculates the heuristic value, which is an estimate of how far the current state is from the goal state. In this example, the heuristic function calculates the number of uncovered elements by subtracting the covered elements from the goal.
- **len(new_solution)**: This component represents the actual cost (g(n)) in this context. It is the length of the list of selected sets. This represents the number of sets that have been chosen so far to cover the required elements.

- **heuristic(new_state)**: This component is the heuristic value (h(n)) estimated by the heuristic function. It represents the estimated remaining cost to reach the goal state from the new state new_state. In this particular problem, the heuristic function calculates the number of uncovered elements (elements not yet covered) by subtracting the covered elements from the goal set. This estimation represents how many more elements need to be covered

`Main Loop`: aka astar runs until the goal state is reached or there are no more states to explore.

`State Evaluation`: For each new state, the algorithm checks if it has been explored before and calculates the f(n) value:

In [15]:
def astar(N, all_lists):
    goal = set(range(N))
    initial_state = ([], set(), all_lists[:])

    def heuristic(state):
        return len(goal - state[1])

    open_set = [(heuristic(initial_state), 0, initial_state)]
    explored_states = set()

    while open_set:
        _, step, current_state = heapq.heappop(open_set)

        if current_state[1] == goal:
            return current_state[0]

        state_str = ",".join(map(str, current_state[0])) + ";" + ",".join(map(str, current_state[1]))
        explored_states.add(state_str)

        for i, next_set in enumerate(current_state[2]):
            new_solution = current_state[0] + [next_set]
            new_covered = current_state[1] | set(next_set)
            new_remaining = current_state[2][i+1:]
            new_state = (new_solution, new_covered, new_remaining)

            # Check if the new state has been explored
            state_str = ",".join(map(str, new_solution)) + ";" + ",".join(map(str, new_covered))
            if state_str not in explored_states:
                f_value = len(new_solution) + heuristic(new_state)
                heapq.heappush(open_set, (f_value, step + 1, new_state))

    return None  # No solution found

## Usage

In [16]:
N = 10  #  N as needed
problem_instance = problem(N)
print("Generated Problem Instance:")
for i, sub_list in enumerate(problem_instance):
    print(f"Set {i + 1}: {sub_list}")

solution = astar(N, problem_instance)

if solution:
    print("A* Solution:")
    for i, selected_set in enumerate(solution):
        set_number = problem_instance.index(selected_set) + 1  # Find the index of the selected set
        print(f"Selected Set {set_number}: {selected_set}")
else:
    print("No solution found.")

Generated Problem Instance:
Set 1: [0, 4]
Set 2: [1, 2, 3]
Set 3: [9, 6]
Set 4: [0, 1]
Set 5: [8, 9, 3]
Set 6: [8, 3]
Set 7: [0, 3, 4, 7, 9]
Set 8: [4, 5, 6]
Set 9: [1, 3, 5]
Set 10: [1, 6]
Set 11: [0, 9, 4, 5]
Set 12: [8, 1, 6]
Set 13: [9, 3, 5]
Set 14: [0, 3]
Set 15: [1, 3, 6]
Set 16: [2, 5, 7]
Set 17: [1, 3, 4, 9]
Set 18: [8, 2, 3]
Set 19: [3, 4, 5, 6, 8]
Set 20: [0, 3]
Set 21: [1, 3, 4, 6]
Set 22: [3, 6, 7]
Set 23: [2, 3, 4]
Set 24: [9, 6]
Set 25: [8, 2, 3, 7]
Set 26: [0, 1]
Set 27: [9, 2, 6]
Set 28: [6]
Set 29: [8, 0, 4, 1]
Set 30: [1, 4, 5, 6]
Set 31: [0, 4, 7]
Set 32: [8, 1, 4]
Set 33: [2, 5]
Set 34: [9, 5]
Set 35: [0, 1, 3, 4, 5]
Set 36: [9, 3]
Set 37: [1, 7]
Set 38: [8, 2]
Set 39: [8, 2, 7]
Set 40: [8, 9, 3, 6]
Set 41: [4, 5, 6]
Set 42: [8, 1, 3, 7]
Set 43: [0, 5]
Set 44: [0, 9, 3]
Set 45: [0, 3]
Set 46: [0, 5]
Set 47: [8, 3]
Set 48: [8, 2, 3, 7]
Set 49: [1, 3, 6, 7]
Set 50: [5, 6]
A* Solution:
Selected Set 35: [0, 1, 3, 4, 5]
Selected Set 39: [8, 2, 7]
Selected Set 40: [8, 9,