# Quick Union

## Improving the `Connect` Operation

Let's have a summary of what we have so far.

#### Approach Zero
Represent everything as boxes and lines. Overly complicated.

![](images/zero.png)

#### `ListOfSets`
Represent everything as connected componenets
* Represented connected components as list of sets of integers

![](images/int.png)

#### `QuickFind`
Represent everything as connected components
* Represented connected components as a list of integers where `value = id`
* Bad feature: connecting 2 sets is slow

![](images/ds.png)

#### Next approach - `QuickUnion`
* Still represent everything as connected components
* Still represent connected components as a list of integers
* However, values will be chosen so that `connect` is fast

## Hard Question

How could we change our set representation so that combining 2 sets into their union requires changing **one** value?

![](images/hard.png)

## Assign Parent

Idea: Assign each item a parent (instead of an id). 

![](images/parent.png)

This results in a tree-like shape. How to read this:
* In the tree, `2` is below `1`
    * We say "2 belongs to item 1"
    * Looking at the boxes, see that the item at `id=2` is 1. This means 2 belongs to item 1
* In the tree, `1` is below `0`
    * 1 belongs to item 1
    * see that the item at `id=1` is 0
* `0` is the root of a set
    * Roots are indicated by a negative number
    
This approach might seem innocuous, but this is the foundation of state-of-the-art disjoint operations. This approach unlocks a gigantic area of CS and math theory that we won't discuss. 

## Quiz  - `connect(5, 2)`

![](images/quiz.png)

If we call `connect(5, 2)`, how should we change the parent list to handle this `connect` operation?

#### Possible Answer: connect 3 to 2

![](images/32.png)

A possible answer is to connect 3 to 2. However, the tree will become unnecessarily tall!

#### Preferred Answer: connect 3 to 0

* Find `root(5)` -> 3
* Find `root(2)` -> 0
Set `root(5)`'s value equal to `root(2)`.

![](images/30.png)

However, even this approach has potential performance issues. 
* The tree can get too tall
    * The operation `root(x)` becomes expensive

## The Worst Case

If we always connect the first item's tree below the second item's tree, we can end up with a tree of height `M` after `M` operations.
* `connect(4, 3)`
* `connect(3, 2)`
* `connect(2, 1)`
* `connect(1, 0)`

![](images/tall.png)

For `N` items, what's the worst case runtime for..
* `connect(p, q)`? $\Theta(N)$
* `isConnected(p, q)`? $\Theta(N)$

Let's say, we want to find out if `4` and `3` is connected. We have to:
* Find the root of `4`, which can take as bad as `N`
* Find the root of `3`, which can take `N-1`

## `QuickUnionDs`

This is what the implementation look like, 

In [1]:
public class QuickUnionDS implements DisjointSets {
    private int[] parent;
    public QuickUnionDS (int N) {
        parent = new int[N];
        for (int i = 0; i < N; i++) {
            /** Initially no elements are connected thus each element is a root */ 
            parent[i] = -1;
        }
    }
    
    private int find(int p) {
        int r = p;
        while (parent[r] >= 0) {
            r = parent[r];
        }
        return r;
    }
}

SyntaxError: invalid syntax (<ipython-input-1-4c524ca65d43>, line 1)

The `find` method is the `root(x)` method. We name it `find` so that it matches the textbook version.

For `N` items, the worst runtime for `find` method is $\Theta(N)$. Both `isConnected` and `Connect` rely on the `find` method and therefore, both methods have worst case runtime of $\Theta(N)$.

## Performance Summary

| Implementation | Constructor | `connect` | `isConnected`|
| --- | --- | --- | --- |
|`ListOfSetsDs` | $\Theta(N)$ | $O(N)$ | $O(N)$|
| `QuickFindDs` | $\Theta(N)$ | $\Theta(N)$ | $\Theta(1)$|
|`QuickUnionDS` | $\Theta(N)$ | $O(N)$ | $O(N)$ |

The bad thing about `QuickUnion` is that the trees can get very tall. The performance is potentially even worse than `QuickFind` if the tree is imbalanced.

Observation: things would be fine if we just kept our tree balanced. 