# Week 01 Assignment: minimum spanning trees

A minimum spanning tree (MST) of an undirected weighted graph $G=(V,E)$ is a graph $T=(V',E')$ such that
\begin{align*}
V' &= V\\
E' &\subset E
\end{align*}
with $\sum_{e\in E'} w_e$ being the the smallest possible among all choices of $E'$. Here $w_e$ is the weight of an edge.

From an implementation perspective, an adjacency matrix $G[i][j]$ can represent the edge weights as

$$
G[i][j] = \begin{cases}
  \infty,\ \text{if}\ (i,j)\not\in E \\
  w_{ij},\ \text{if}\ (i,j)\not\in E
\end{cases}
$$

Python offers the `float('inf')` object to represent an very large number. Java has `Double.POSITIVE_INFINITY`. Early versions of C supported the following notation
```C
double positiveInfinity = 1.0 / 0.0;
```
Since C99, the math standard library header (`math.h`) includes a variable `INFINITY` so that you don't have embarassing code where you divide by 0.

No matter how the absence of an edge is represented, we can always find it by looking at $G[0][0]$. Because in the adjacency matrix all the diagonal elements must show an absence of edge, since we do not allow an edge from a vertex to itself.

## Borůvka's algorithm

The basis of all MST algorithm was developed by [Otakar Borůvka](https://en.wikipedia.org/wiki/Otakar_Bor%C5%AFvka) in 1926. Here's the pseudocode.

\begin{align*}
& \textbf{find minimum spanning tree of } G: \\
& \qquad \textbf{initialize}\ T\ \text{as an edgeless copy of}\ G \\
& \qquad \textbf{while}\ T\ \text{has more than 1 components}: \\
& \qquad\qquad \text{connect two components with a safe edge} \\
& \qquad \textbf{return}\ T
\end{align*}

This innocent-looking, 99 year-old algorithm requires quite a bit of work to come to life:

* How to create an edgeless copy of $G$?
* How to count the components of a graph?
* How to reference the components of a graph? (hint: label them)
* What is a safe edge?
* How to find the safe edge between two components?

### Your assignment

For this assignment, we want to find the MST of a give graph. For that we'll need to write a method

```python
def min_span_tree(G):
   ...
   return T
```
that produces the adjacency matrix $T$ of *a* minimum spanning tree of $G$. A graph may have more than one MSTs. It doesn't matter which one we obtain.

There is significant work to be done.

#### Create an edgeless copy of $G$

This is more of an implementation issue than algorithmic, but it's good practice. A graph is represented by an adjacecy matrix which is coded as a two-dimensional array (ok, ok, list). 
The edgeless copy of the input graph is the graph that will become the minimum spanning tree. 

This candidate MST has as many vertices as the input graph. Therefore the adjacency matrix of the candiate MST should have the same size as the input graph. The only different between the two adjacency matrices is that the candidate MST shows no edges. And because we can't presume what the no edge value is in $G$, it's best to just use whatever is stored in $G[0][0]$ as the no edge value.

#### Count the components of a graph

This involves examining every vertex in the graph and determining all the vertices reachable from it. To consider every vertex in a graph we need a simple loop:
```python
for vertex in len(G):
  # what vertices are reachable from this vertex?
```

#### Reference (label) the components 

For the Borůvka algorithm we want to associate every vertex in the candidate MST with the component they belong to. How to find all vertices reachable from a given vertex? How to label components? [See section 5.5](https://jeffe.cs.illinois.edu/teaching/algorithms/book/05-graphs.pdf) (page 203) from Jeff's book.

#### What is a safe edge?

A graph component may have one or more vertices. When we are looking at two components, there may be several edges between their constituent vertices. The safe edge is the edge with the list weight. Here lies a problem. The candidate MST has no edges. How can we look for edges between components in an edgeless graph?

Consider for example, all the cities and towns of Illinois -- there are vertices in one component. Consider also all the cities and towns in Wisconsing and assume they're vertices in a different component. There are many edges between Illinois and Wisconsin vertices. For example, the edge between Winthop Harbor, Ill. and Kenosha, Wisc. is 8 miles long. The edge between Beloit, Wisc. and Rockford, Ill. is 18 miles. Between them two, the (Winthrop Harbor, Kenosha) is the safest edge (the shortest edge).

#### How to find the safe edge between two components?

This is where component labeling comes handy. We can look at every edge between vertices in two different components, essentially performing a linear search for the smallest one.


### Reading

* [Minimum Spanning Trees](https://jeffe.cs.illinois.edu/teaching/algorithms/book/07-mst.pdf) from Jeff Erikson's book. You may want to brush up on graphs first ([chapters 5.1 through 5.4](https://jeffe.cs.illinois.edu/teaching/algorithms/book/05-graphs.pdf) from same book).






# Coding requirements

* You may *not* import modules in your code without explicit permission from Leo. Basically this means no `import` or `include` or similar statements in your programs.

* You may *not* use statements like `break` to end loops or `continue` and `pass` to move through branching.

* When possible, methods that return values should have only one return statement. This is no longer a strict requirement (if you took COMP 271/272 with me, you know what I am talking about). In general, there is no good reason for a method with 20-25 lines of code at most to have multiple return statements.

* Your code should be neat and well documented. If you are coding with Visual Studio Code, there are extensions that can do a great job formatting your program. For Python, consider installing the **Black Formatter** by Microsoft.

* If you code in Python, learn to use type hints. They are annoying but useful.

* Use a standard style guide for your code. I like Google's style guides for [Java](https://google.github.io/styleguide/javaguide.html) and [Python](https://google.github.io/styleguide/pyguide.html).

* If you are using Jupyter notebooks, spend some time exploring MarkDown syntax for documentation and LaTeX for mathmetical typesetting. Good skills to have.

# Finals week policy

There is no final exam for the course. There will be a final assignemnt that will be published the week before finals and will be due the week of finals. Additionally, 8 students in the course will be invited randomly to a brief meeting with the instructor during the course's final exam slot. If you are selected for a brief meeting, we'll spend about 15 minutes during the final exam slot to review your work. This interview will cover coding practices based on your past assignments. It is meant as a checkpoint to ensure that you have internalized the work you submitted.