## Task

Given a number $N$ and some lists of integers $P = (L_0, L_1, L_2, ..., L_n)$, 
determine is possible $S = (L_{s_0}, L_{s_1}, L_{s_2}, ..., L_{s_n})$
such that each number between $0$ and $N-1$ appears in at least one list

$$\forall n \in [0, N-1] \ \exists i : n \in L_{s_i}$$

and that the total numbers of elements in all $L_{s_i}$ is minimum. 

## Instructions

* Create the directory `lab1` inside the course repo (the one you registered with Andrea)
* Put a `README.md` and your solution (all the files, code and auxiliary data if needed)
* Use `problem` to generate the problems with different $N$
* In the `README.md`, report the the total numbers of elements in $L_{s_i}$ for problem with $N \in [5, 10, 20, 100, 500, 1000]$ and the total number on $nodes$ visited during the search. Use `seed=42`.
* Use `GitHub Issues` to peer review others' lab

In [4]:
import random
import logging
import itertools

In [3]:
def problem(N, seed=None):
    random.seed(seed)
    return [
        list(set(random.randint(0, N - 1) for n in range(random.randint(N // 5, N // 2))))
        for n in range(random.randint(N, N * 5))
    ]

## Greedy solution
Does not find the optimal solution for N greater than 5 but the computation is feasible both in terms of visited nodes and computation time

In [5]:
def set_cover(P,N):
    goal=set(range(0,N))
    len_P=len(P)
    n_nodes=0
    P_sets=[set(x) for x in P]
    lengths=[len(x) for x in P_sets]
    elements=set(e for s in P for e in s)
    if elements!=goal:
        return None
        
    covered=set()
    S=[]
    while covered!=elements:
        subset=max(P_sets, key=lambda s: len(s-covered)/lengths[P_sets.index(s)])
        n_nodes+=len_P
        S.append(subset)
        covered |= subset
    w=sum(len(_) for _ in S)
    logging.getLogger().setLevel(logging.INFO)
    logging.info(f" Optimized solution for N={N}: w={w} (bloat={int((w-N)*100/N)}%), nodes visited={n_nodes}")
    return S, n_nodes

In [6]:
N=10
SEED=42
solution, n_nodes=set_cover(problem(N,SEED),N)
print(solution)
print(n_nodes)

INFO:root: Optimized solution for N=10: w=12 (bloat=20%), nodes visited=250


[{0, 4}, {1, 2, 3}, {9, 6}, {2, 5, 7}, {8, 3}]
250


In [8]:
SEED=42
for n in [5,10,20,100,500,1000]:
    set_cover(problem(n,SEED),n)

INFO:root: Optimized solution for N=5: w=5 (bloat=0%), nodes visited=125
INFO:root: Optimized solution for N=10: w=12 (bloat=20%), nodes visited=250
INFO:root: Optimized solution for N=20: w=30 (bloat=50%), nodes visited=204
INFO:root: Optimized solution for N=100: w=171 (bloat=71%), nodes visited=3416
INFO:root: Optimized solution for N=500: w=1256 (bloat=151%), nodes visited=21708
INFO:root: Optimized solution for N=1000: w=2913 (bloat=191%), nodes visited=47047


## Combinations
Finds the optimal solution for N=[5,10,20]. For N greater than 20 it results in a memory allocation error

In [7]:
def combinations_search(P,N):
    '''for every combination of lists in P check if it's a solution. If it is save it and at the end compare all solutions'''
    n_nodes=0
    solutions=dict()
    universe=set(range(0,N))
    avg_len=sum(len(el) for el in P)/len(P) #average length of the lists generated by problem
    avg_subsets=int(N//avg_len+N/5) #estimate that the optimal solution will be found considering a number of lists close to N/avg_len
    for i in range(2,avg_subsets):
        temp=list(itertools.combinations(P,i))
        temp=list(list(el) for el in temp)  
        n_nodes+=len(temp)
        for el in temp:
            current_elements=set(e for l in el for e in l)
            if current_elements==universe:
                solutions[sum(len(_) for _ in el)]=el
    len_sol=min(list(solutions.keys()))
    logging.getLogger().setLevel(logging.INFO)
    logging.info(f" Optimized solution for N={N}: w={len_sol}, nodes visited={n_nodes}")
    return solutions[len_sol], len_sol, n_nodes

In [10]:
N=10
SEED=42
solution, len_sol, n_nodes=combinations_search(problem(N,SEED),N)
print(solution)
print(n_nodes)

INFO:root: Optimized solution for N=10: w=10, nodes visited=251125


[[1, 7], [8, 2], [4, 5, 6], [0, 9, 3]]
251125


In [6]:
SEED=42
for n in [5,10,20]:
    combinations_search(problem(n,SEED),n)

INFO:root: Optimized solution for N=5: w=5, nodes visited=2600
INFO:root: Optimized solution for N=10: w=10, nodes visited=251125
INFO:root: Optimized solution for N=20: w=23, nodes visited=1676081
