A *disjoint-set*, also called *union-find* or *merge-find*, is a <Definition term="data structure"/> that operates with a set partitioned in several disjoint subsets. It typically supports two operations:

- Find: Given a particular element of the set it identifies the subset of the element.
- Unite: Joins two subsets into a single subset.

The Find operation usually returns one element of the subset called the /representative/. This definition is a bit dry, so let's focus on a practical problem for more insight.

#### Practical problem

Suppose we have a group of $N$ friends that play a game. Initially, each one of them plays against all the others. As the game progresses, alliances are formed between them. The alliance relationship is transitive, meaning that if $A$ and $B$ are allies and $B$ and $C$ are allies, then $A$ and $C$ are also allies. You know when the alliances are formed. At certain moments in time you want to know if two particular friends are in the same team or not.

#### Graph modelling

Before trying to solve the problem, it's useful to model it using <Definition term="graph" value="graphs"/>. We can build a graph where each node is associated to one of the people playing the game. Initially there are no edges in this graph. We are dealing with two types of operations:

- Update: Two friends $A$ and $B$ become allies. This means we need to add an edge between their associated nodes. 
- Query: We want to know if $A$ and $B$ are allies. In order to answer this, we should check if their nodes are in the same <Definition term="connected component "/>or not.

#### Brute force solutions

An obvious solution is to just build the graph. At each update we add an edge. For each query we can perform a graph traversal, either <Link href="/lesson/breadth_first_search/" value="BFS"/> or <Link href="/lesson/depth_first_search/" value="DFS"/>, starting in node $A$ and check if we visit node $B$. Adding an edge can be done really fast, but the queries takes a lot of time.
Another solution is to maintain an array $V$ of size $N$. We want $V_A$ to represent a node in the connected component of node $A$. Here is where the idea of representative comes in handy, as we want all the nodes in the same component to have the same value in $V$. This way we can perform the queries really fast, by checking if $V_A = V_B$. But what about the updates? When two people $A$ and $B$ become allies, we should replace all the occurrences of $V_A$ with $V_B$ (or the other way around, it doesn't really matter). The update works slow, as we need to go through all the elements of $V$.

### Disjoint-set
The disjoint-set works by representing each connected component as a <Definition term="rooted tree"/>. The root of each tree is that component's representative. Each node's father is another node in the same component. Let's take an example and see how this works.
Suppose we have $6$ friends playing the game. Initially, there are no alliances, so each node is the root of a tree (the friends are numbered from $0$ to $5$):

![](figures/disjoint.png)

Here's the code that deals with these operations:

In [1]:
void init() {
    for (int i = 0; i < N; ++i) {
        father[i] = i;
    }
}

int find(int node) {
    if (father[node] == node) {
        return node;
    }
    return find(father[node]);
}

void unite(int A, int B) {
    int rootA = find(A);
    int rootB = find(B);
    father[rootA] = rootB;
}


SyntaxError: invalid syntax (<ipython-input-1-1d5dd92d69c5>, line 1)

The complexity of an operation depends on the height of the trees. Notice than whenever we have an update we are faced with two options. We either make the first root the son of the second root, or the other way around. So far didn't choose any criterium for making this choice. Because of this, an operation can be really slow as the trees can have $$O(N)$$ height.
#### Union by rank
We can use a simple heuristic to improve the disjoint-set. Whenever we are faced with a new update, we can choose the root of the smaller tree to become the son of the other root. In our case, /smaller/ refers to the depth of the tree. We can store this extra information for each node. This way, the height of a tree can be at most $$O(\log N)$$.
#### Path compression
The second heuristic is used to flatten the structure of the tree whenever the Find method is called. The idea is that each node in a tree may as well be attached to the root, as that is the representative of the component. Whenever we need to call Find, we can take all the nodes on the path to the root and change their parent to be the root itself. We get a tree that's much flatter and we speed up future operations.
#### Code
The new implementation looks like this:

In [2]:
void init() {
    for (int i = 0; i < N; ++i) {
        father[i] = i;
        height[i] = 0;
    }
}

int find(int node) {
    if (father[node] != node) {
        father[node] = find(father[node]);
    }
    return father[node];
}

void unite(int A, int B) {
    int rootA = find(A);
    int rootB = find(B);
    if (height[rootA] > height[rootB]) {
        father[rootB] = rootA;
        height[rootA] = max(height[rootA], height[rootB] + 1);
    } else {
        father[rootA] = rootB;
        height[rootB] = max(height[rootB], height[rootA] + 1);
    }
}

SyntaxError: invalid syntax (<ipython-input-2-99aea44041f0>, line 1)

#3 Randomized Linking
One last trick that makes the implementation cleaner is the use of randomized linking. Basically, instead of storing the extra information about the height of the trees, we can just choose which root to append randomly. You can read more about this technique <Link href="http://www.cis.upenn.edu/~sanjeev/papers/soda14_disjoint_set_union.pdf" value="here"/>. The new Unite implementation is:

In [3]:
void unite(int A, int B) {
    int rootA = find(A);
    int rootB = find(B);
    if (rand() % 2) {
        father[rootB] = rootA;
    } else {
        father[rootA] = rootB;
    }
}


SyntaxError: invalid syntax (<ipython-input-3-e51668ff09f5>, line 1)

#3 Complexity analysis
Robert Tarjan was the first to prove the complexity in terms of the inverse <Link href="https://en.wikipedia.org/wiki/Ackermann_function" value="Ackermann function"/>. This is a very slow growing function, in practice we can consider the disjoint-set to work in constant time for each operation."
