## Minimum Spanning Tree (MST)

Given un undirected, connected, weighted graph $G(V,E)$ we want to find an uncycled subset $T\subseteq E$ such that all vertices are connected.
$T$ is by definition a tree, called a spanning tree of graph $G$, and if also require the same of all weight edges in $T$ to be the minimum, it is called the minimum spanning tree.
$$w(T) = \sum\limits_{(u,v)\in T} w(u,v) $$
To determine the minimum spanning tree two main algorithms exist:
- Prim's Algorithm
- Krustal's Algorithm  
Both of thsese algorithms are Greedy algorithm, since ar every step they make the best choise in order to achieve a best overall choise.

#### Given an undirected, weigthed graph $G(v,E)$
- a cut $S, S-V$ partitions the vertices in $G$
- an edge $(u,v)$ crosses the cut if $u\in S\,,\,v\in V-S$
- a set of edges $A\subseteq E$ is being respected by the cut if there is no edge in $A$ that crosses that cut
- an edge is called light-edge if it has the minimum weight among the edges that crosses the cut
- $G_A = (V,A)$ forms a forest  
#### Given the above we can prove the theorem and the colorary:
- Given $G(V,E)$ an undirected, weighted graph, let $A\subseteq E$ be a subset of edges in some minimum spanning tree $T$ and $(S, V-S)$ a cut that respects $A$. Then the light-edge $(u,v)$ can be safely added to $A$, i.e. $A\cup (u,v)$ will be subset of $T$.
- Given $G(V,E)$ an undirected, weighted graph, let $A\subseteq E$ be a subset of edges in some minimum spanning tree $T$ and $C=(V_C,E_C)$ be a connected component in the forest $G_A = (V,A)$. If $(u,v)$ a light-edge connecting $C$ to some other connected component of $G_A$, then $(u,v)$ is safe for $

### Prim's Algorithm
Given a connected, undirected, weighted graph $G(V,E)$ and a startng vertex $v_0$ that will be the root of the MST prim's algorithm operates as follows:
- Initialize all vertices as unvisited 
- Initialize the parent of all vertices to None
- Initialize the minimum weight of the edge connected a vertex to some vertex in the MST to Infinity, i.e. the vertices are not adjecent to the MST
- maintain a data structure holding all vertices before they enter in some way to the MST
- starting at the root, set its minimum_weight to 0 and extract it from the DS, i.e. added to the MST with None Parent
- consider the adjacent nodes of the last added to MST node and if they are not visited, i.e. still in the DS, and their current minimum weight to the MST is larger than the weight of their edge weight to the last added node to MST, update their minimum weight and their parent to those of the edge with the last added node.
- add to the MST the node with the current minimum weight and pop it out of the DS
- terminate when then are no more nodes in the DS and all nodes have been added to the MST with some parent and the corresponding weight.  
To extract at every iteration the node outside the MST with the current minimum weight connected it to the MST and added to it, we can either use an array $\mathcal{O}(|V|)$, min-heap $\mathcal{O}(\log(|V|))$ or even better a Fibonacci-heap $\mathcal{O}(1)$.  
#### Here is how the pseudocode works:  
- for $i,v\in \text{enumerate}(V)$:    $\quad \mathcal{O}(|V|)$
  - v.parent = None
  - v.visited = False
  - v.min_w = Infinity  
  - v.heap_pos = i  # store also the position of each node in the heap, so we can access
- root.min_w = 0
- min_heap = [v for $v\in V$ with min_property_key the min_w attribute]
- while min_heap:   $\quad \mathcal{O}(|V|)$
  - last_node_added_mst = min_heap.extract_min()  $\quad \mathcal{O}(\log(|V|))$
  - last_node_added_mst.visited = Ture
  - for adj_node in AdjList(last_node_added_mst):  $\quad \mathcal{O}(|E|)$ in total
    - if adj_node.visited == False and adj_node.min_w > weight(last_node_added_mst, adj_node)    
    $\quad \mathcal{O}(1)$    (since we store the attributes)
      - adj_node.parent = last_node_added
      - adj_node.min_w = weight(last_node_adde_mst, adj_node)
      - min_heap.bubble_up(adj_node.min_w)  $\quad \mathcal{O}(\log(|V|))$

- The total time complexity is:
$$ \mathcal{O}(|V|\log(|V|)+|E|\log(|V|)) = \mathcal{O}(|E|\log(|V|)) $$
- space complexity is $\mathcal{O}(|V|)$ since we keep a min_heap, the visited and parent attributes all of size $|V|$

In [1]:
class GraphAdjList:
    
    def __init__(self, size: int, directed: bool = False, weighted: bool = False):
        self.size = size
        self.adj_list = [set() for _ in range(size)]
        self.vertex_data = [None]*self.size
        self.directed = directed
        self.weighted = weighted
        
    def add_edge(self, u:int, v:int, w = 1):
        if not self.weighted:
            w =1
        if 0<=u<=self.size and 0<=v<=self.size:
            self.adj_list[u].add((v, w))
            if not self.directed:
                self.adj_list[v].add((u, w))
                
    def add_vertex_data(self, vertex: int, data = None):
        if data==None:
            data=vertex

        if 0<= vertex <=self.size:
            self.vertex_data[vertex] = data
            
    def print_graph(self):
        for vertex, nbrhs in enumerate(self.adj_list):
            print(f"Vertex {self.vertex_data[vertex]} is connected to {','.join([str(self.vertex_data[v]) + '-'+str(w) for (v, w) in nbrhs])}") 

In [16]:
import heapq
arr = [[2, float('infinity'), 0], [1, float('infinity'), 0], [0, float('infinity'), None]]
heapq.heapify(arr)
arr

[[0, inf, None], [1, inf, 0], [2, inf, 0]]

In [2]:
g = GraphAdjList(9, weighted= True)
for i in range(9):
    g.add_vertex_data(i, str(i))

g.add_edge(0, 1, 4)
g.add_edge(0, 7, 8)
g.add_edge(1, 2, 8)
g.add_edge(2, 3, 7)
g.add_edge(2, 8, 2)
g.add_edge(2, 5, 4)
g.add_edge(3, 5, 14)
g.add_edge(3, 4, 9)
g.add_edge(4, 5, 10)
g.add_edge(5, 6, 2)
g.add_edge(6, 8, 6)
g.add_edge(6, 7, 1)
g.add_edge(7, 8, 7)
g.print_graph()

Vertex 0 is connected to 7-8,1-4
Vertex 1 is connected to 0-4,2-8
Vertex 2 is connected to 3-7,5-4,1-8,8-2
Vertex 3 is connected to 5-14,4-9,2-7
Vertex 4 is connected to 5-10,3-9
Vertex 5 is connected to 4-10,2-4,3-14,6-2
Vertex 6 is connected to 7-1,8-6,5-2
Vertex 7 is connected to 8-7,6-1,0-8
Vertex 8 is connected to 6-6,7-7,2-2


In [7]:
def prim_alg(g: GraphAdjList, root: int):
    visited = [False]*g.size
    parent = [None]*g.size
    min_weights = [float('inf')]*g.size
    min_weights[root] = 0
    for _ in range(g.size):
        # get the node with the min_weight that is not on the MST, i.e. not visited yet
        node_added = min([node for node in range(g.size) if not visited[node]], key= lambda x: min_weights[x])
        visited[node_added] = True
        for (adj_node, w) in g.adj_list[node_added]:
            if not visited[adj_node] and min_weights[adj_node] > w:
                parent[adj_node] = node_added
                min_weights[adj_node] = w
    
    for i, p in enumerate(parent):
        print(f'the parent of node {i} is {p} in the MST')
    print(f'Total weight of the MST is {sum(min_weights)}')

In [8]:
prim_alg(g, 0)

the parent of node 0 is None in the MST
the parent of node 1 is 0 in the MST
the parent of node 2 is 1 in the MST
the parent of node 3 is 2 in the MST
the parent of node 4 is 3 in the MST
the parent of node 5 is 2 in the MST
the parent of node 6 is 5 in the MST
the parent of node 7 is 6 in the MST
the parent of node 8 is 2 in the MST
Total weight of the MST is 37


### Kruskal Algorithm
Kruskl's Algorithm takes an undirected, weighted Graph $G(V,E)$, which can be diconneced-in contrast with Prim's algorithm, and determines a MST using dijoint sets data structure.

#### A disjoin sets data structure

- A forest of trees $\mathcal{F} = \{T_i, i=1,\dots,k\le n \}$ in which each element $n_i$ belongs to only one tree
- all elements in a tree are represented by the same element, the root, i.e.  $\forall n_j\in T_i, \text{repr}(n_j) = r_i = \text{root}(T_i)$
- the rank of a node in the tree is the height of that node, the rank of the tree is the heigh of the root
- we can join two elements belonging to two different trees, by join the root with the smaller rank to the other root.
- if the ranks of the two trees are the same the overall rankincrease by one 

#### Kruskal pseudocode
- A = set()

- for $v\in V$: $\quad \mathcal{O}(|V|)$
  - MakeSet(v)
- sort the edges in ascending order of their weights $\quad \mathcal{O}(|E|\log(|E|))$
- for $(u,v)\in E$: $\quad \mathcal{O}(|E|)$
  - if FindSet($u$)!= Findset($v$): $\quad \mathcal{O}(\log(|V|))$
    - A.add($(u,v)$) $\quad \mathcal{O}(1)$
    - Union($u,v$) $\quad \mathcal{O}(1)$
- return A
- the total time complexity is 
$$ \mathcal{O}(|E|\log(|E|) = \mathcal{O}(|E|\log(|V|)) $$

In [4]:
class GraphAdjList:
    
    def __init__(self, size: int, directed: bool = False, weighted: bool = False):
        self.size = size
        self.adj_list = [set() for _ in range(size)]
        self.vertex_data = [None]*self.size
        self.directed = directed
        self.weighted = weighted
        
    def add_edge(self, u:int, v:int, w = 1):
        if not self.weighted:
            w =1
        if 0<=u<=self.size and 0<=v<=self.size:
            self.adj_list[u].add((v, w))
            if not self.directed:
                self.adj_list[v].add((u, w))
                
    def add_vertex_data(self, vertex: int, data = None):
        if data==None:
            data=vertex

        if 0<= vertex <=self.size:
            self.vertex_data[vertex] = data
    
    
    def find_root(self, parents: list, i: int):
        '''finds and returns the representative of the node i
        '''
        
        if parents[i] == i:
            return i
        
        i = parents[i]
        
        return self.find_root(parents, i)
        
    
    def union(self, parents: list, rank: list, i: int, j: int):
        '''joins the tree of the node i with the tree of the nodej
        '''
        
        # find the root of each node
        root_i = self.find_root(parents, i)
        root_j = self.find_root(parents, j)
        
        # if they belong on the same tree return
        if root_i == root_j:
            return 
        
        # join the tree with the smaller rank to that of the higher
        # the rank of the joined tree increases by one if the rank of the two trees are equal
        if rank[root_i]>rank[root_j]:
            parents[root_j] = root_i
        elif rank[root_i] == rank[root_j]:
            parents[root_j] = root_i
            rank[root_i]+=1
        else:
            parents[root_i] = root_j
            
    
    def kruskal(self):
        
        # store the edges of the mst
        A = []
        
        # initialize its node as a disjoint set, i.e. its node its parent of itself
        # initialize the rank of each node to 1
        ranks = [1]*self.size
        parents = [i for i in range(self.size)]
        
        # sort the edges in ascending order of their weights
        edges = []
        for vertex in range(self.size):
            for (adj_vertex, w) in self.adj_list[vertex]:
                edges.append((vertex, adj_vertex, w))
                
        edges = sorted(edges, key= lambda edge: edge[2])
        
        for (u, v, w) in edges:
            if self.find_root(parents, u)!=self.find_root(parents, v):
                A.append((u,v,w))
                self.union(parents, ranks, u,v)
                
        return A    
    
    def print_graph(self):
        for vertex, nbrhs in enumerate(self.adj_list):
            print(f"Vertex {self.vertex_data[vertex]} is connected to {','.join([str(self.vertex_data[v]) + '-'+str(w) for (v, w) in nbrhs])}") 

In [5]:
g = GraphAdjList(9, weighted= True)
for i in range(9):
    g.add_vertex_data(i, str(i))

g.add_edge(0, 1, 4)
g.add_edge(0, 7, 8)
g.add_edge(1, 2, 8)
g.add_edge(2, 3, 7)
g.add_edge(2, 8, 2)
g.add_edge(2, 5, 4)
g.add_edge(3, 5, 14)
g.add_edge(3, 4, 9)
g.add_edge(4, 5, 10)
g.add_edge(5, 6, 2)
g.add_edge(6, 8, 6)
g.add_edge(6, 7, 1)
g.add_edge(7, 8, 7)
g.print_graph()

Vertex 0 is connected to 7-8,1-4
Vertex 1 is connected to 0-4,2-8
Vertex 2 is connected to 3-7,5-4,1-8,8-2
Vertex 3 is connected to 5-14,4-9,2-7
Vertex 4 is connected to 5-10,3-9
Vertex 5 is connected to 4-10,2-4,3-14,6-2
Vertex 6 is connected to 7-1,8-6,5-2
Vertex 7 is connected to 8-7,6-1,0-8
Vertex 8 is connected to 6-6,7-7,2-2


In [11]:
A = g.kruskal()
print(f'Kruskal Algorithm outputs the mst \n{A}')
print(f'The sum of the weights in the MST using Kruskal algorithm is {sum([w for (_,_,w) in A])}')

Kruskal Algorithm outputs the mst 
[(6, 7, 1), (2, 8, 2), (5, 6, 2), (0, 1, 4), (2, 5, 4), (2, 3, 7), (0, 7, 8), (3, 4, 9)]
The sum of the weights in the MST using Kruskal algorithm is 37
