**CS560 - Algorithms and Their Analysis**
<br>

Title: **Lecture 9**
<br>
Speaker: **Dr. Shota Tsiskaridze**
<br>

Bibliography:
<br> 
 **Chapter 16.4**. Cormen, Thomas H. and Leiserson, Charles Eric and Rivest, Ronald Linn and Stein, Clifford Seth, *Introduction to Algorithms, 3rd Edition*, MIT Press, 2009
 


<h1 align="center">Matroids</h1>

- The word **matroid** is due to **Hassler Whitney**. 


- He was studying **matric matroids**, in which the **elements** of $S$ are the **rows** of a given matrix and a **set of rows** is **independent** if they are **linearly independent** in the usual sense.



- More general, a **matroid** can be defined as an ordered pair $M = (S, I)$ satisfying the following conditions:


1. $S$ is **finite set**.


2. $I$ is a **nonempty** family of subsets of S, called the **independent** subsets of S:
  
   If $B \in I$ and $A \subseteq B$, then $A \in I$. 
   
   We say that $I$ is **hereditary** if it satisfies this property. 
   
   Note that the set $\varnothing$  is necessarily a member of $I$.


3. If $A \in I$, $B \in I$, and $|A| < |B|$, then there exists some element $x \in (B-A)$ such that $(A \cup \{x\}) \in I$.

   $|A| < |B|$ means, that number of elements in $A$ is less than number of elements in $B$,
   
   We say that $M$ satisfies the **exchange property**.

<h3 align="center">Matrix Example</h3>

- Given an $ m \times n$ **matrix** $M$ over some **field** $\mathcal{F}$.
  
  **Ordered pair** $(S, I)$ is a **matroid**, where $S$ is the **set of columns** of $M$ and $A \in I$ **if and only if** the **columns in $A$ are linearly independent**.
  
  In other words, suppose $M = [c_1, c_2, c_3, ..., c_n]$, where $c_i$ represents the **columns** of the matrix $M$.
  
  Then, $A = \{c_{i_1}, c_{i_2}, ..., c_{i_k}\} \in I$, if from $\alpha_1 [c_{i_1}] + \alpha_2 [c_{i_2}] + \cdots + \alpha_k [c_{i_k}] = 0$, where $\alpha \in \mathcal{F}$ leads that $\alpha_1 = \alpha_2 = \cdots = \alpha_k = 0$.


- Lets **prove** it:


1. $S$ consist of $n$ elements $\{c_1, c_2, c_3, ..., c_n\}$.
  

2. Suppose $B = \{c_{b_1}, c_{b_2}, ..., c_{b_l}\} \in I$, and $A = \{c_{a_1}, c_{a_2}, ..., c_{a_k} \}\subseteq B$.
 
   Without braking the generality, we can assume that $c_{a_i} = c_{b_i}$ for $i = 1, 2, ..., k$.
   
   Now, lets assume that columns in $A$ are **linear dependent**, this means that there exists $\alpha_1, \alpha_2, ..., \alpha_k$ such that $\sum_{i=1}^k \alpha_i c_{a_i} = 0$ and not all $\alpha_i = 0$.
   
   But then, if we consider  $\beta_1 = \alpha_1, \beta_2 = \alpha_2, ..., \beta_k = \alpha_k, \beta_{k+1} = 0, ..., \beta_{l} = 0$, then $\sum_{i=1}^k \beta c_{b_i} = \sum_{i = 1}^k \alpha c_{a_i} + \sum_{i = k+1}^l 0 c_{b_i} = 0$, thus $c_{b_1}, c_{b_2}, ..., c_{b_l}$ are **not linear indenpendent**. We've got a contradiction.
   
   Thus $c_{a_1}, c_{a_2}, ..., c_{a_k}$ are linear independent and $A \in I$.
   
   
3. Lets assume that $A = \{c_{a_1}, c_{a_2}, ..., c_{a_k} \}\in I$ and $B = \{c_{b_1}, c_{b_2}, ..., c_{b_l}\} \in I$ and $|A| < |B|$.

   We need to show, that we can find such column $c \in B$, that $c \notin A$, and $(A \cup \{c\}) \in I $, i.e. $c_{a_1}, c_{a_2}, ..., c_{a_k}, c $ are **linear independent**.
  
   Lets assume the opposite, i.e. for every $c \in B$ such that $c \notin A$ the columns $c_{a_1}, c_{a_2}, ..., c_{a_k}, c $ are **linear dependent**.
   
   Thus, the set of columns $c_{a_1}, c_{a_2}, ..., c_{a_k}, c_{b_1}$ will be linear dependent, so we can express $c_{a_1}$ as a sum of rest, so the linear combination of $c_{a_2}, c_{a_3}, ..., c_{a_k}, c_{b_1}$ will still cover the same space as $c_{a_1}, c_{a_2}, ..., c_{a_k}, c_{b_1}$.
   
   This means, that if we now consider add $c_{b_2}$ the set of columns will be linear dependent again, i.e. we can express $c_{a_2}$ as a linear combination of  $c_{a_3}, ..., c_{a_k}, c_{b_1}, c_{b_2}$.
   
   We can repeate these step, replacing $a_i$ with $b_i$.
   
   Since the number of columns in $B$ is larger than number of columns in $A$, we get that we replace all $c_{a_i}$ and still will left elements $c_{b_k}, ..., c_{b_l}$ of $B$.
   
   And again, adding element $c_{b_{k+1}}$ will couse the linear dependent of the columnts $c_{b_1}, c_{b_2}, ..., c_{b_{k+1}}$, which contradicts the assumption that $B$ is linear independent. Q.E.D.
   

   

<h3 align="center">Matrix Example</h3>

- Let's consider the **graph** $G = (V, E)$, which is the ordered pair of **verticies** ($V$) and **edged** ($E$).


- Lets define the **orderes pair** $M_G = (S_G, I_G)$ as follows:
  
  - $S_G = E$, i.e. the set $S_G$ is the **set of all edges**.
  
  - $I_G$ is a set subset of $E$ and $A \in I_G$ if and only if $A \in E$ and $A$ is **acyclic**, i.e. $A \in I_G$ if and only if the subgraph $G_A = (V, A)$ forms a **forest**.
  
  A forest is an **undirected**, **disconnected**, **acyclic graph**. Each component of a **forest** is **tree**, i.e a **disjoint** collection of **trees** is known as **forest**
  
  <img src="images/L9_Forest.png" width="400" alt="Example" />


- **Theorem**:  If $G = (V,E)$ is an **undirected graph**, then $M_G = (S_G, I_G)$ is a **matroid**.

  To prove this theorem, we will need next **Lemme**:
  
  
- **Lemma**:  If $F = (V_F,E_F)$ is **forest**, then it contains exactly $|V_F| - |E_F|$ **trees**.

<h3 align="center">The Maximal Independent Subsets</h3>

- Given a matroid $M = (S, I)$ we call an element $x \in A$ an **extension** of $A \in I$ if we **can add** $x$ to $A$ while **preserving independence**, i.e. if $(A \cup \{x\}) \in I$.


- As an example, let's consider a **graphic matroid** $M_G$. 

  If $A$ is an **independent** set of edges, then edge $e$ is an **extension** of $A$ **if and only if** $e \notin A$ and the addition of $e$ to $A$ does **not create a cycle**.
  
  
- If $A$ is an **independent** subset in a matroid $M$, we say that $A$ is **maximal** if it has **no extensions**. 

  That is, $A$ is **maximal** if it is **not contained** in any **larger independent** subset of $M$.
  
  
- **Theorem:** All **maximal independent subsets** in a matroid have the **same size**.

<h3 align="center">A Weighted Matroid</h3>

- We say that a **matroid** $M = (S, I)$ is **weighted** if it is associated with a **weight function** $w$ that assigns a **strictly positive weight** $w_x$ to each element $x \in S$. 


- The **weight function** $w$ extends to subsets of $S$ by summation:

  $$w(A) = \sum_{x \in A} w(x)$$
  
  for any $A \subseteq S$.
  
  
- Many problems for which a **greedy approach provides optimal solutions** can be formulated in terms of **finding a maximum-weight independent subset** in a **weighted matroid**:

  Given a **weighted matroid** $M = (S, I)$ and we wish to **find** an **independent set** $A in I$ such that $w(A)$ is **maximized**. 
  
  We call such a subset that is **independent** and has **maximum possible weight** an **optimal subset** of the matroid. 
  
  
- Because the **weight** $w(x)$ of **any element** $x \in S$ is **positive**, an **optimal subset** is always a **maximal independent subset**.

<h3 align="center">The Minimum-Spanning-Tree Problem</h3>

- Given a **connected undirected graph** $G = (V, E)$ and a **length function** $w$ such that $w(e)$ is the (**positive**) **length of edge** $e$.


- **Find** a **subset** of the edges that **connects all of the vertices** together and has **minimum total length**.

  <img src="images/L9_Minimum_Spanning_Tree.png" width="600" alt="Example" />


- To **solve** this problem, we can **view** it as a problem of **finding an optimal subset of a matroid**:

  Instead of considering $M_G$ with weights $w(e)$, we consider the weights $w'(e) = w_{sup} - w(e)$, where $w_{sup} = \max_{e \in E} {w(e)} +1$.
  
  In this weighted matroid, **all weights** are **positive** and an **optimal subset** is a **spanning tree of minimum total length** in the original graph.
  
  More specifically, each **maximal independent subset** $A$ corresponds to a **spanning tree** with $|V|-1$ edges, and since:
  
  $$w'(A) = \sum_{e \in A} w'(e) = \sum_{e\in A}(w_{\sup} - w(e)) = (|V| - 1) w_{sup} - \sum_{e \in A} w(e) = (|V|-1) w_{sup} - w(A)$$
  
  for **any maximal independent subset** $A$, an **independent subset** that **maximizes the quantity** $w'(A)$ must **minimize** $w(A)$.
  

The *Pseudocode* for the **greedy algorithm** is looks as follows:

In [None]:
def greedy(M, w):
A = np.empty
sort M.S #into monotonically decreasing order by weight w
for each x in M.S: #taken in monotonically decreasing order by weight w(x)
    if (A U {x}) is in M.I:
        A = (A U {x})
return A

- The **running time** of `greedy` is easy to analyze. 

  Let $n$ denote $|S|$. 
  
  The **sorting phase** of `greedy` takes time $O(n \lg n)$. 
  
  **Line 5** executes exactly $n$ **times**, once for each element of $S$. 
  
  Each execution of **line 5** requires a check on whether or not the set $A \cup \{x\}$ is **independent**. 
  
  If each such check takes time $O(f(n))$ the entire algorithm runs in time $O(n \lg n + n f(n) )$.

- Lets consider the next **graph** and write the **Python code** for **greedy algorithm**:

  <img src="images/L9_Graph.jpg" width="600" alt="Example" />


In [2]:
class Graph: 
    def __init__(self, v): 
  
        # number of vertices 
        self.v = v 
        self.adj = [0] * v 
        self.edges = [] 
        for i in range(v): 
            self.adj[i] = [] 
  
    # Function to add an edge to graph 
    def addEdge(self, u: int, v: int, w: int): 
        self.adj[u].append(v) # Add w to v’s list. 
        self.adj[v].append(u) # Add w to v’s list. 
        self.edges.append((w, (u, v))) 
  
    def dfs(self, v: int, visited: list): 
  
        # Mark the current node as visited and print it 
        visited[v] = True
  
        # Recur for all the vertices adjacent to this vertex 
        for i in self.adj[v]: 
            if not visited[i]: 
                self.dfs(i, visited) 
  

    # Returns true if given graph is connected, else false 
    def connected(self): 
        visited = [False] * self.v 
  
        # Find all reachable vertices from first vertex 
        self.dfs(0, visited) 
  
        # If set of reachable vertices includes all, return true. 
        for i in range(1, self.v): 
            if not visited[i]: 
                return False
  
        return True
  
    # Our greedy algorithm
    def greedy(self): 
  
        # Sort edges into monotonically decreasing order by weight w(x)
        self.edges.sort(key = lambda a: a[0]) 
  
        mst_wt = 0 # Initialize weight of MST 
  
        print("Edges in MST") 
  
        # Iterate through all sorted edges in decreasing order by weight w(x)
        for i in range(len(self.edges) - 1, -1, -1): 
            u = self.edges[i][1][0] 
            v = self.edges[i][1][1] 
  
            # Remove edge from undirected graph 
            self.adj[u].remove(v) 
            self.adj[v].remove(u) 
  
            # Adding the edge back if removing it causes disconnection. 
            # In this case this edge becomes part of MST. 
            if self.connected() == False: 
                self.adj[u].append(v) 
                self.adj[v].append(u) 
  
                # This edge is part of MST 
                print("( %d, %d )" % (u, v)) 
                mst_wt += self.edges[i][0] 
        print("Total weight of MST is", mst_wt)

if __name__ == "__main__": 
  
    # Create the graph given in above fugure 
    V = 9
    g = Graph(V) 
  
    # making above shown graph 
    g.addEdge(0, 1, 4) 
    g.addEdge(0, 7, 8) 
    g.addEdge(1, 2, 8) 
    g.addEdge(1, 7, 11) 
    g.addEdge(2, 3, 7) 
    g.addEdge(2, 8, 2) 
    g.addEdge(2, 5, 4) 
    g.addEdge(3, 4, 9) 
    g.addEdge(3, 5, 14) 
    g.addEdge(4, 5, 10) 
    g.addEdge(5, 6, 2) 
    g.addEdge(6, 7, 1) 
    g.addEdge(6, 8, 6) 
    g.addEdge(7, 8, 7) 
  
    g.greedy() 

Edges in MST
( 3, 4 )
( 0, 7 )
( 2, 3 )
( 2, 5 )
( 0, 1 )
( 5, 6 )
( 2, 8 )
( 6, 7 )
Total weight of MST is 37


- Lets consider the next **graph** and write the **Python code** for **greedy algorithm**:

  <img src="images/L9_Graph2.jpg" width="600" alt="Example" />


<h1 align="center">End of Lecture</h1>