# Quick Find

The next big challenge is to figure out what underlying abstraction we can use to track sets membership.

## Challenge:  Pick Data Structures to Support Tracking of Sets

Suppose we have the following situation:

![](images/challenge.png)

Assume elements are numbered from `0` to `N-1`

What instance variable will our data structure have to keep track who in which set?

#### Idea 1: List of set of integers

In [None]:
[{0, 1, 2, 4}, {3, 5}, {6}]

* In Java: `List<set<Integer>>`
    * `List` and `Set` refer to the Java interfaces we saw on previous lectures.
    
This is an intuitive idea. However, once we start thinking about what `isConnected` and `connect` operation look like, we'll see that it can be very slow and complicated.

Imagine we have a situation where nothing is connected yet.

![](images/no.png)

We have the following list of sets of integers,

In [None]:
[{0}, {1}, {2}, {3}, {4}, {5}, {6}]

If we want to do `isConnected(5, 6)`, Java has to check every set to see if `5` is in there.
* For example, Java will check if `5` is in `{0}`. If not, check if it's in `{1}`, and so on...

This implementation requires iterating through all the sets to find anything. It's complicated and slow!
* Worst case: if nothing is connected, then `isConnected(5, 6)` requires iterating through:
    * `N-1` sets to find `5`
    * then `N` sets to find `6`
* The overall runtime would be $\Theta(N)$

## Performance Summary

This first approach, which we'll call `ListOfSetsDS` (where DS stands for Disjoint Sets). is complicated and slow.

| Implementation | Constructor | `connect` | `isConnected`|
| --- | --- | --- | --- |
|`ListOfSetsDs` | $\Theta(N)$ | $O(N)$ | $O(N)$|

* The constructor's runtime has order of growth `N` no matter what
    * If we create a list, then we create sets and add them to the list, the order of growth has to be `N`
* Previously, we analyzed the worst case for `isConnected`.
    * The analysis is the same for `connect`! Worst case is $\Theta(N)$
    * But other cases may be better. 
    * Taking into account that other cases may be better, we'll say $O(N)$ since $O$ means "less than or equal"
    
Operations are linear when number of connections are small
* Because of this, we'll have to iterate over all sets

**Important Point**: By deciding to use a List of Sets in the first place, we have doomed ourselves to complexity and bad performance.

When we want to implement a high-level data structure using basic building blocks, the choice of building blocks to use (the instance variables) will deeply affect the complexity of the code and its performance.

## Instructor's Approach: Just Use a List of Integers

Idea #2: Use a list of integers where in `ith` entry, we'll store the set number (or `id`) of the `ith` item.

![](images/list_int.png)

In the example above, before the `connect(2, 3)` operation,
* 0, 1, 2, and 4 belong to set `4`  

Once we did the `connect(2, 3)` operation,
* Change all the item that is the same as `id=2` to the item inside `id=3`
    * The item inside `id=3` is 5
    * Thus we change items at `id=0, 1, 2,` and `4` from 4 to 5
    
In general `connect(x, y)` changes entries that equal `id[x]` to `id[y]`

## `QuickFindDs`

This would be what the code look like in Java,

In [None]:
// isConnected method
public boolean isConnected(int p, int q) {
    return id[p] == id[q];
}

The `isConnected` method is very fast!
* Involves only 2 array accesses
* Order of growth $\Theta(1)$

In [None]:
// connect method
public void connect(int p, int q) {
    int pid = id[p];
    int qid = id[q];
    for(int i = 0; i < id.length; i++) {
        if (id[i] == pid) {
            id[i] = qid;
        }
    }...
}

The `connect` method is relatively slow. Notice that there is a `for` loop that goes through the entire array, thus there'll be about `N+2` to `2N+2` array accesses.
* Order of growth is $\Theta(N)$

The `QuickFindDS` class looks like the following,

In [None]:
public class QuickFindDS implements DisjointSets {
    private int[] id;
    
    public boolean isConnected(int p, int q) {
        return id[p] == id[q];
    }
    
    public void connect(int p, int q) {
        int pid = id[p];
        int qid = id[q];
        for (int i = 0; i < id.length; i++) {
            if (id[i] == pid) {
                id[i] = qid;
            }
        } ...
    }
}

public quickFindDS(int N) {
    id = new int[N];
    for (int i = 0; i < N; i++) {
        id[i] = i;
    }
}

## Performance Summary

| Implementation | Constructor | `connect` | `isConnected`|
| --- | --- | --- | --- |
|`ListOfSetsDs` | $\Theta(N)$ | $O(N)$ | $O(N)$|
| `QuickFindDs` | $\Theta(N)$ | $\Theta(N)$ | $\Theta(1)$|

`QuickFindDS` is an improvement compared to `ListOfSetsDS`, but it's still too slow for practical use
* Connecting 2 items takes `N` time
* Instead, let's try something more radical