#### Union & Find Algorithm:

##### What is a Disjoint set data structure?

Two sets are called disjoint sets if they don’t have any element in common, the intersection of sets is a null set.

A data structure that stores non overlapping or disjoint subset of elements is called disjoint set data structure. The disjoint set data structure supports following operations:

- Adding new sets to the disjoint set.

- Merging disjoint sets to a single disjoint set using Union operation.

- Finding representative of a disjoint set using Find operation.

- Check if two sets are disjoint or not. 


Consider a situation with a number of persons and the following tasks to be performed on them:

- Add a new friendship relation, i.e. a person x becomes the friend of another person y i.e adding new element to a set.

- Find whether individual x is a friend of individual y (direct or indirect friend)

In [1]:
"""
We are given 10 individuals say, a, b, c, d, e, f, g, h, i, j

Following are relationships to be added:
a <-> b  
b <-> d
c <-> f
c <-> i
j <-> e
g <-> j

Given queries like whether a is a friend of d or not. We basically need to create following 4 groups and maintain a quickly accessible connection among group items:
G1 = {a, b, d}
G2 = {c, f, i}
G3 = {e, g, j}
G4 = {h}
"""

'\nWe are given 10 individuals say, a, b, c, d, e, f, g, h, i, j\n\nFollowing are relationships to be added:\na <-> b  \nb <-> d\nc <-> f\nc <-> i\nj <-> e\ng <-> j\n\nGiven queries like whether a is a friend of d or not. We basically need to create following 4 groups and maintain a quickly accessible connection among group items:\nG1 = {a, b, d}\nG2 = {c, f, i}\nG3 = {e, g, j}\nG4 = {h}\n'

Find whether x and y belong to the same group or not, i.e. to find if x and y are direct/indirect friends.

Partitioning the individuals into different sets according to the groups in which they fall. This method is known as a Disjoint set Union which maintains a collection of Disjoint sets and each set is represented by one of its members.

To answer the above question two key points to be considered are:

##### How to Resolve sets? 

Initially, all elements belong to different sets. After working on the given relations, we select a member as a representative. There can be many ways to select a representative, a simple one is to select with the biggest index.

##### Check if 2 persons are in the same group? 

If representatives of two individuals are the same, then they’ll become friends.

Data Structures used are: 

Array: 

An array of integers is called Parent[]. If we are dealing with N items, i’th element of the array represents the i’th item. More precisely, the i’th element of the Parent[] array is the parent of the i’th item. These relationships create one or more virtual trees.

Tree: 

It is a Disjoint set. If two elements are in the same tree, then they are in the same Disjoint set. The root node (or the topmost node) of each tree is called the representative of the set. There is always a single unique representative of each set. A simple rule to identify a representative is if ‘i’ is the representative of a set, then Parent[i] = i. If i is not the representative of his set, then it can be found by traveling up the tree until we find the representative.

Operations on Disjoint Set Data Structures:

- Find

- Union

1. Find:

Can be implemented by recursively traversing the parent array until we hit a node that is the parent of itself.

In [2]:
"""
def find(i): 
	if (parent[i] == i): 
		return i 
	else: 
		return find(parent[i]) 
"""

'\ndef find(i): \n\tif (parent[i] == i): \n\t\treturn i \n\telse: \n\t\treturn find(parent[i]) \n'

##### Time complexity: 

This approach is inefficient and can take O(n) time in worst case.

2. Union: 

It takes two elements as input and finds the representatives of their sets using the Find operation, and finally puts either one of the trees (representing the set) under the root node of the other tree.

In [None]:
"""
def union(parent, rank, i, j): 
	irep = find(parent, i) 
	jrep = find(parent, j) 
	parent[irep] = jrep 
    """

##### Time complexity: 

This approach is inefficient and could lead to tree of length O(n) in worst case.

##### Optimizations (Union by Rank/Size and Path Compression):

The efficiency depends heavily on which tree get attached to the other. There are 2 ways in which it can be done. First is Union by Rank, which considers height of the tree as the factor and Second is Union by Size, which considers size of the tree as the factor while attaching one tree to the other . This method along with Path Compression gives complexity of nearly constant time.

##### Path Compression (Modifications to Find()):

It speeds up the data structure by compressing the height of the trees. It can be achieved by inserting a small caching mechanism into the Find operation. Take a look at the code for more details:

In [None]:
"""
def find(i): 
	if Parent[i] == i: 
		return i 
	else: 
		result = find(Parent[i]) 
		Parent[i] = result 
		return result 
"""

##### Time Complexity:
 
O(log n) on average per call.



##### Union by Rank:

First of all, we need a new array of integers called rank[]. The size of this array is the same as the parent array Parent[]. If i is a representative of a set, rank[i] is the height of the tree representing the set. 

Now recall that in the Union operation, it doesn’t matter which of the two trees is moved under the other. Now what we want to do is minimize the height of the resulting tree. If we are uniting two trees (or sets), let’s call them left and right, then it all depends on the rank of left and the rank of right. 

- If the rank of left is less than the rank of right, then it’s best to move left under right, because that won’t change the rank of right (while moving right under left would increase the height). In the same way, if the rank of right is less than the rank of left, then we should move right under left.

- If the ranks are equal, it doesn’t matter which tree goes under the other, but the rank of the result will always be one greater than the rank of the trees.


In [3]:
class DisjointSet: 
	def __init__(self, size): 
		self.parent = [i for i in range(size)] 
		self.rank = [0] * size 

	def find(self, i): 
		if self.parent[i] != i: 
			self.parent[i] = self.find(self.parent[i])
		return self.parent[i] 

	def union_by_rank(self, i, j): 
		irep = self.find(i) 
		jrep = self.find(j) 
		if irep == jrep: 
			return

		irank = self.rank[irep] 
		jrank = self.rank[jrep] 
		if irank < jrank: 
			self.parent[irep] = jrep 
		elif jrank < irank: 
			self.parent[jrep] = irep 
		else: 
			self.parent[irep] = jrep 
			self.rank[jrep] += 1

	def main(self): 
		size = 5
		ds = DisjointSet(size) 
		ds.union_by_rank(0, 1) 
		ds.union_by_rank(2, 3) 
		ds.union_by_rank(1, 3) 
		for i in range(size): 
			print(f"Element {i} belongs to the set with representative {ds.find(i)}") 

ds = DisjointSet(size=5) 
ds.main() 


Element 0 belongs to the set with representative 3
Element 1 belongs to the set with representative 3
Element 2 belongs to the set with representative 3
Element 3 belongs to the set with representative 3
Element 4 belongs to the set with representative 4


##### Union by Size:

Again, we need a new array of integers called size[]. The size of this array is the same as the parent array Parent[]. If i is a representative of a set, size[i] is the number of the elements in the tree representing the set. 

Now we are uniting two trees (or sets), let’s call them left and right, then in this case it all depends on the size of left and the size of right tree (or set).

- If the size of left is less than the size of right, then it’s best to move left under right and increase size of right by size of left. In the same way, if the size of right is less than the size of left, then we should move right under left. and increase size of left by size of right.

- If the sizes are equal, it doesn’t matter which tree goes under the other.

In [4]:
class UnionFind: 
	def __init__(self, n): 
		self.Parent = list(range(n)) 
		self.Size = [1] * n 

	def find(self, i): 
		if self.Parent[i] != i: 
			self.Parent[i] = self.find(self.Parent[i]) 
		return self.Parent[i] 

	def unionBySize(self, i, j): 
		irep = self.find(i) 
		jrep = self.find(j) 
		if irep == jrep: 
			return

		isize = self.Size[irep] 
		jsize = self.Size[jrep] 
		if isize < jsize: 
			self.Parent[irep] = jrep 
			self.Size[jrep] += self.Size[irep] 
		else: 
			self.Parent[jrep] = irep 
			self.Size[irep] += self.Size[jrep] 

n = 5
unionFind = UnionFind(n) 
unionFind.unionBySize(0, 1) 
unionFind.unionBySize(2, 3) 
unionFind.unionBySize(0, 4) 
for i in range(n): 
	print("Element {}: Representative = {}".format(i, unionFind.find(i))) 

Element 0: Representative = 0
Element 1: Representative = 0
Element 2: Representative = 2
Element 3: Representative = 2
Element 4: Representative = 0


##### Time complexity: 

O(log n) without Path Compression.



##### Below is the complete implementation of disjoint set with path compression and union by rank.


In [5]:
class DisjSet: 
	def __init__(self, n): 
		self.rank = [1] * n 
		self.parent = [i for i in range(n)] 

	def find(self, x): 
		if (self.parent[x] != x): 
			self.parent[x] = self.find(self.parent[x]) 

		return self.parent[x] 

	def Union(self, x, y): 
		xset = self.find(x) 
		yset = self.find(y) 
		if xset == yset: 
			return
		if self.rank[xset] < self.rank[yset]: 
			self.parent[xset] = yset 

		elif self.rank[xset] > self.rank[yset]: 
			self.parent[yset] = xset 
		else: 
			self.parent[yset] = xset 
			self.rank[xset] = self.rank[xset] + 1

obj = DisjSet(5) 
obj.Union(0, 2) 
obj.Union(4, 2) 
obj.Union(3, 1) 
if obj.find(4) == obj.find(0): 
	print('Yes') 
else: 
	print('No') 
if obj.find(1) == obj.find(0): 
	print('Yes') 
else: 
	print('No') 


Yes
No


##### Time complexity: 

O(n) for creating n single item sets . The two techniques -path compression with the union by rank/size, the time complexity will reach nearly constant time. It turns out, that the final amortized time complexity is O(α(n)), where α(n) is the inverse Ackermann function, which grows very steadily (it does not even exceed for n<10600  approximately).

##### Space complexity: 

O(n) because we need to store n elements in the Disjoint Set Data Structure.

#### Union By Rank and Path Compression in Union-Find Algorithm


We introduced union find algorithm and used it to detect cycles in a graph. We used the following union() and find() operations for subsets.

In [6]:
def find(parent, i):
	if (parent[i] == -1):
		return i
	
	return find(parent, parent[i])

def Union(parent, x, y):
	xset = find(parent, x)
	yset = find(parent, y)
	parent[xset] = yset

The above union() and find() are naive and the worst case time complexity is linear. The trees created to represent subsets can be skewed and can become like a linked list. Following is an example worst case scenario. 

    Let there be 4 elements 0, 1, 2, 3

    Initially, all elements are single element subsets.
    0 1 2 3 

    Do Union(0, 1)
    1   2   3  
    /
    0

    Do Union(1, 2)
        2   3   
        /
    1
    /
    0

    Do Union(2, 3)
            3    
            /
        2
        /
    1
    /
    0

The above operations can be optimized to O(Log n) in the worst case. The idea is to always attach a smaller depth tree under the root of the deeper tree. This technique is called union by rank. The term rank is preferred instead of height because if the path compression technique (we have discussed it below) is used, then the rank is not always equal to height. Also, the size (in place of height) of trees can also be used as rank. Using size as rank also yields worst-case time complexity as O(Logn).

    Let us see the above example with union by rank
    Initially, all elements are single element subsets.
    0 1 2 3 

    Do Union(0, 1)
    1   2   3  
    /
    0

    Do Union(1, 2)
    1    3
    /  \
    0    2

    Do Union(2, 3)
        1    
    /  |  \
    0   2   3

The second optimization to naive method is Path Compression. The idea is to flatten the tree when find() is called. When find() is called for an element x, root of the tree is returned. The find() operation traverses up from x to find root. The idea of path compression is to make the found root as parent of x so that we don’t have to traverse all intermediate nodes again. If x is root of a subtree, then path (to root) from all nodes under x also compresses.

    Let the subset {0, 1, .. 9} be represented as below and find() is called
    for element 3.
                9
            /   |   \  
            4    5    6
        /         /  \
        0         7    8
        /        
        3
    / \         
    1   2
    When find() is called for 3, we traverse up and find 9 as representative
    of this subset. With path compression, we also make 3 and 0 as the child of 9 so 
    that when find() is called next time for 0, 1, 2 or 3, the path to root is reduced.

            --------9-------
        /   /    /  \      \
        0   4    5    6       3 
                    /  \    /  \
                    7    8   1   2

The two techniques -path compression with the union by rank/size, the time complexity will reach nearly constant time. It turns out, that the final amortized time complexity is O(α(n)), where α(n) is the inverse Ackermann function, which grows very steadily (it does not even exceed for n<10600  approximately).

Following is union by rank and path compression-based implementation to find a cycle in a graph. 

In [7]:
from collections import defaultdict

class Graph:
	def __init__(self, num_of_v):
		self.num_of_v = num_of_v
		self.edges = defaultdict(list)

	def add_edge(self, u, v):
		self.edges[u].append(v)

class Subset:
	def __init__(self, parent, rank):
		self.parent = parent
		self.rank = rank
  
def find(subsets, node):
	if subsets[node].parent != node:
		subsets[node].parent = find(subsets, subsets[node].parent)
	return subsets[node].parent

def union(subsets, u, v):
	if subsets[u].rank > subsets[v].rank:
		subsets[v].parent = u
	elif subsets[v].rank > subsets[u].rank:
		subsets[u].parent = v
	else:
		subsets[v].parent = u
		subsets[u].rank += 1

def isCycle(graph):
	subsets = []

	for u in range(graph.num_of_v):
		subsets.append(Subset(u, 0))

	for u in graph.edges:
		u_rep = find(subsets, u)

		for v in graph.edges[u]:
			v_rep = find(subsets, v)

			if u_rep == v_rep:
				return True
			else:
				union(subsets, u_rep, v_rep)

g = Graph(3)
g.add_edge(0, 1)
g.add_edge(1, 2)
g.add_edge(0, 2)

if isCycle(g):
	print('Graph contains cycle')
else:
	print('Graph does not contain cycle')



Graph contains cycle


#### Time complexity: 

O(ElogV) where E is the number of edges in the graph and V is the number of vertices. 

#### Space complexity: 

O(V), where V is the number of vertices. This is because we are using an array of subsets to store the representative elements of each vertex, and the size of this array is proportional to the number of vertices.