Informal Goal: Connect a bunch of points together as cheaply as possible.
 - EX: connecting servies in netowrk or something more abstract
 - Applications: Clustering (more later)
     - Networking
 - Very Fast Greedy Algorithms
     - Prim's Algorithm (1957, similar to Dijkstra's), aka Jarnick's (discovered 25 years prior); Use Heaps
     - Kruskal's Algorithm (1956); Union Fine data structure
     - O(mlogn); m = edges, n = vertices w/proper data structures

Minimum Spanning Tree (MST) Problem:
 - Input: Undicred Graph G = (V,E); (optimal branching is for directed graphs)
     - Assume graph rep as adjacency list
     - Cost Ce for each edge e; OK if costs are negative
 - Output: Minimum Cost Tree that spans all vertices.
     - Cost of tree i.e. sum of edge costs. 
     - Spans all vertices means no cycles; subgraph (V,T) is connected i.e. path between each pair of vertices. From v, can get to any other vertex u in V. 
     
Assumptions (mostly just to make things easier, nothing too critical):
 - Input graph G is connected (can run DFS or BFS to preprocess and check)
    - Else no spanning trees; can be modified to find MSTs for each connected component if G not completely connected (Minimum Spanning Forest)
 - Edge costs are distinct:
     - Prim and Kruskals remain correct with ties (which can be broken arbitrarily)
     - Correctness proof a bit more annoying if accounting for ties

# Prim's Algorithm

Grow tree one edge at a time, suck up new vertex with each iteration (similar to Dijkstra's). Pick arbitrary starting vertex. With each iteration, select cheapest edge to span an additional vertex. Does not look at edges that do not connect to any new vertices. 

PrimMST:
 - Initialize X = [s], s is starting vertex chosen arbitrarily
 - T = Null [empty tree; invariant: X = vertices spanned by tree-so-far T]
 - While X != V:
     - Let e = (u,v) e the cheapest edge of G with u in X, v not in X. 
     - Add e to T
     - Add v to X. These 3 steps are increasing # of spanned vertices

Theorem: Prim's algorithm always computes an MST:
 - Part 1: Computes a spanning tree T*
     - Will use basic properties of graphs and spanning trees
 - Part 2: T* is an MST, uses the "Cut Property"

## Correctness:

Claim - Prim's algorithm outputs a spanning tree

Definition - a cut of a graph G = (V, E) is a partition of V into two non-empty sets A and B. Crossing edges go between A and B.
 - A graph with n vertices can have roughly 2^n cuts. For example, one vertex can be in either A or B, 2 options per vertex. 2^n - 1 cuts tbh bc A or B cannot be empty. 
 
Empty-Cut Lemma - a graph is not connected iff there exists cutt (A,B) with no crossing edges. 
 - 2 part proof: Assuming 1st statement, then can prove second. Then, assuming 2nd, can prove first. Bc is iff statement.
 - Assume the 1st statement. Pick ny u in A and v in B. 
     - Since no edges cross A and B, there is no u to v path in G. Thus, G is not connectd. 
 - Assume the 2nd statement. Suppose G has no u to v path. 
     - Define A = vertices reachable from A. u is in A by definition. A is u's connected component basically.
     - B = all other vertices (i.e. all other connected components). v in B. 
     - Note, no crossing edges at (A,B). If there did exist path A to B, bc u can reach everything in A, u would be able to reach whatever reached from that path. That's a contradiction. So, must be true. 
 - So, graph is disconnected iff above; connected iff A B have crossing edges. 
 
Double-Crossing Lemma - Suppose the there exists a cycle C in E. E has an edge crossing the cut A,B. Then, so does some other edge of C. 
 - Fairly obvious. If there is an edge that connects points u, v in A, B. Since on cycle, must cross the cuts an even number of times. Remember, cycle starts and ends at same point. 
 - Lonely Cut Corollary: if e is the only edge crossing some cut (A,B), then it is not in any cycle. Remember, spanning tree can't have cycle. 
     - If it were in a ycle, some ether edge must cross the cuts. 
     
Proof of Original Claim:
 - Algorithm maintains invariant that T spans X (straightforward induction)
 - Cannot get stuck with X != V i.e. spans all vertices (so T spans all V). 
     - With the while loop, will not end as long as X != V. Only thing that can go wrong is if no edges on frontier of X connecting to V.
     - If that were the case, the cut (X, V-X) must be empty i.e. no crossing edges. Then, via empty cut lemme, input graph G is disconnected
 - No cycles ever get created in T. Why? Consider any iteration with current sets X and T. Suppose e gets added:
     - e is the first edge crossing (X, V-X) that gets added to T. No edges thus far cross X, V-X. 
     - e must be an edge that crosses the cuts (by def in algorithm). So, when e added to T, will be lonely. By Lonely Cut Corollary, cannot possibly be in a cycle. 
 - Thus, a spanning tree is outputted.

Claim: Prim's algorithm always outputs a minimum-cost spanning tree.

Key Question: How can we be sure that, when an edge is added, it will be a minimum? That is, when is it "safe" to include an edge in T?

The Cut Property - Consider an edge e of G. Suppose there is a cut A, B such that e is the cheapest edge of G that crosses it.
 - Then, e belongs to the MST of G. Assume it is true but will be proved later. 
     - Note can have multiple MSTs if not distinct edge costs. If not distinct, must slightly modify this. 
 - Just look for any cut to confirm an edge. 
 
Proof:
 - By previous part, we know that Prim's algorithm outputs a spanning tree T*
 - In Prim's, looks at two cuts X and V-X. Adds the cheapest edge that spans the cuts at each iteration
     - Via Cut Property, knows that these will all be edges contained in the MSTs. 
     - So, every edge picked by Prim's must belong to the MST.
 - Thus, everything in T* is an edge that belongs to the MST.
 - Since T* is already a spanning tree, it must be the MST. 

### Proof of Cut Property

Will argue by contradiction using an exchange argument.

Suppose there is an edge e that is the cheapest one crossing a cut A, B, yet e is not in the MST T* . Idea is to exchange e with another edge in T* to make it even cheaper (contradiction). 
 - Note, since T* is connected, must contain an edge f that crosses A, B. f is more expensive than e. 
 - Is T* U {e} - {f} (i.e. swapping out f for e) a spanning tree of G? Maybe not though, depending on the choice of e and f.
     - e may create a cycle or isolate a vertex. 
     - Consider e`, another edge of T* that is crossing. If choose this, can maintain spanning tree
     - Hope: can always find this suitable e` that yields bona fide spanning tree of G
 - Let C = cycle created by adding e to T*. Must exist e` that creates cycle via Double-Crossing Lemma. 
     - T = T* U {e} - {e`} is also a spanning tree. So, execute swap with e`.
     - Bc they on cycle, this swap does not destry connectivity between any pair of vertices. Still exists one path. 
     - Thus, final proof. T differs from T* only in the swap of these two edges, and via swap decreased the cost. So, completes proof of cut property.
 - Honestly, idrgi.

## Fast Implementation and Running Time

**Running Time**

PrimMST:
 - Initialize X = [s], s is starting vertex chosen arbitrarily
 - T = Null [empty tree; invariant: X = vertices spanned by tree-so-far T]
 - While X != V:
     - Let e = (u,v) e the cheapest edge of G with u in X, v not in X. 
     - Add e to T
     - Add v to X. These 3 steps are increasing # of spanned vertices
     
If implementing as-is.
 - number of loop iterations is exacty n-1, so O(n) iterations.
 - Each iteration work:
     - Brute force search through edges with 1 in X, 1 not in X. O(m) time. 
     - Total O(mn) time). Honestly not horrible bc can have 2^n-1 diff spanning trees or smthn like that. 
     
Speed-Up Via Heaps:
 - Heaps speed up repeated minimum computations. Supports Extract-Min in O(logn). 
 - Using Heaps to store edges, keys = edge costs. This will run O(mlogn) but not as good constants. Can show as exercise
     - Would need extra checks bc may pull edges that are not crossing edges. 
 - Using Heaps to store vertices, this is more practical
     - Invariant 1: Elements in heap = vertices of V-X
     - Invariant 2: For v in V-X, Key[v] = cheapest edge u,v with u in X. 
         - If there are no incident edges for some w in V-X, can just define key to be +inf.
 - Check can initialize heap with O(m + nlogn) = O(mlogn) preprocessing
     - At start, X contains only s
     - Key value for all other vertices is cost between s to other vertices or infinity if no crossing edges.
     - Costs O(m) time to compute keys, O(nlogn) for n insertions. Can even to inserts in O(n) with heapify. 
     - Remember: m > n - 1 (so can replace n with m to get valid upper) and graph is connected bc otherwise not interesting
 - Note: Given invariants, Extract-Min yields next vertex and edge to add to X and T respectively. 
     - Run like 2-round knockout winner. For each vertex in v-X v, locally remembers what is best candidate i.e. cheapest edge incident on that vertex (basically key of each vertex). Extract-Min is 2nd round of elimination tournament. 
 - Maintaining Invariant #2 biggest problem:
     - May need to recompute some keys to maintain after Extract-Min
     - Pseudocode: When v added to X:
         - for each edge (v,w) in Edges:
             - if w in V-X:
                 - Delete w from heap
                 - Recompute key[w] = min(key[w], (v,w)); can be previous key or new key from vw edge. 
                 - Insert into heap
     - Need additional book-keeping for vertex at which position in heap bc deletion occurs at index
 - Runing Time with Heaps
     - Dominated by time required for heap operations
     - n-1 inserts during preprocessing
     - n-1 Extract-Mins (one per iteration of while loop)
     - Each edge (v,w) triggers one Delete/Insert combo (when its first endpoint is added to X)
         - So,at most 2m
     - So, O(m) heap operatiosn (m >= n-1).
         - Each heap operations runs O(logn)
         - So, total O(mlogn) time. 