<a href="https://colab.research.google.com/github/shivavsrivastava/Algorithms/blob/main/Course2_W1_Kosaraju_SCC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# KOSARAJU'S STRONGLY CONNECTED COMPONENTS ALGORITHM



Following is Kosaraju’s DFS based simple algorithm that does two DFS traversals of graph:

1.   Initialize all vertices as not visited.
2.   Do a DFS traversal of graph starting from any arbitrary vertex v. If DFS traversal doesn’t visit all vertices, then return false.
3.   Reverse all arcs (or find transpose or reverse of graph)
4.   Mark all vertices as not-visited in reversed graph.
5.   Do a DFS traversal of reversed graph starting from same vertex v (Same as step 2). If DFS traversal doesn’t visit all vertices, then return false. Otherwise return true.


In [3]:
import numpy as np
import random
import pandas as pd
import math
from collections import defaultdict
from collections import deque

## Directed Graph


In [4]:
# This class represents a directed graph using adjacency list representation
class Graph():
  def __init__(self, n):
    self.V = n
    self.graph = defaultdict(list)  # default dictionary to store graph
    self.rgraph = defaultdict(list)

  # function to add an edge to graph
  def addEdge(self, u, v):
    self.graph[u].append(v)

  # Function to add reverse edge
  def addReverseEdge(self, u, v):
    self.rgraph[v].append(u)

  # DFS Reverse graph iterative function
  # Most impoertant is the 'sink' and break to get a true DFS going.
  # This thread : https://www.coursera.org/learn/algorithms-graphs-data-structures/discussions/forums/wtt6E3b0EeamjgocByS1BQ/threads/1D9ElKnLEeayfRIMfs2y9A
  # and the guy "Paul fenton" response helped
  def dfs_Grev_stack(self, visited, nodestack, stack):
    while stack:
      v = stack[-1]
      visited[v] = True
      sink=True
      for i in self.rgraph[v]:
        if not visited[i]:
          sink=False
          stack.append(i)
          break
      if sink:
        stack.pop()
        nodestack.append(v)

  # DFS Pass 2 iterative
  # be careful, don't overcount it, only count when not visited,
  # and that is when you append node to SCC
  def dfs_Pass2_stack(self, visited, stack, scc, leader):
    while stack:
      v = stack.pop()
      if not visited[v]:
        visited[v] = True
        scc[leader].append(v)
      for i in self.graph[v]:
        if not visited[i]:
          stack.append(i)




  # The main Kosaraju function for Strongly Connected Components =========>
  def Kosaraju(self):

    ## First Pass: Do DFS on reverse graph
    visited =[False]*(self.V)
    nodestack = deque()
    stack = []
    for node in reversed(range(self.V)):
      if not visited[node]:
        stack = [node]
        self.dfs_Grev_stack(visited, nodestack, stack)


    # print("In Kosaraju function")
    print(nodestack)


    ## Second Pass: DFS on graph
    visited =[False]*(self.V)
    scc = defaultdict(list)
    stack = []
    while nodestack:
      v = nodestack.pop()
      if not visited[v]:
        stack = [v]
        leader = v
        self.dfs_Pass2_stack(visited, stack, scc, leader)

    # print("In Kosaraju function")
    # for i in range(len(scc)):
    #   print(scc[i], "\n")
    return scc



## TEST

In [5]:
# Create a graph given in the above diagram
g1 = Graph(5)
g1.addEdge(0, 1)
g1.addEdge(1, 2)
g1.addEdge(2, 3)
g1.addEdge(3, 0)
g1.addEdge(2, 4)
g1.addEdge(4, 2)
g1.addReverseEdge(0, 1)
g1.addReverseEdge(1, 2)
g1.addReverseEdge(2, 3)
g1.addReverseEdge(3, 0)
g1.addReverseEdge(2, 4)
g1.addReverseEdge(4, 2)
scc = g1.Kosaraju()
print ("Number of components", len(scc))
for i in scc:
  print("leader {} has {} nodes in SCC".format(i, len(scc[i])))

deque([3, 0, 1, 2, 4])
Number of components 1
leader 4 has 5 nodes in SCC


In [6]:
g2 = Graph(4)
g2.addEdge(0, 1)
g2.addEdge(1, 2)
g2.addEdge(2, 3)
g2.addReverseEdge(0, 1)
g2.addReverseEdge(1, 2)
g2.addReverseEdge(2, 3)
scc = g2.Kosaraju()
print ("Number of components", len(scc))
for i in scc:
  print("leader {} has {} nodes in SCC".format(i, scc[i]))

deque([0, 1, 2, 3])
Number of components 4
leader 3 has [3] nodes in SCC
leader 2 has [2] nodes in SCC
leader 1 has [1] nodes in SCC
leader 0 has [0] nodes in SCC


In [7]:
edges = [(0, 3), (6, 0), (3, 6), (8, 6), (5, 8), (8, 2), (2, 5), (7, 5), (7, 4), (1, 7), (4, 1)]
g3 = Graph(9)
for u, v in edges:
  g3.addEdge(u, v)
  g3.addReverseEdge(u, v)
scc = g3.Kosaraju()
print ("Number of components", len(scc))
for i in scc:
  print("leader {} has {} nodes in SCC".format(i, scc[i]))
lenArray = []
for i in scc:
  lenArray.append(len(scc[i]))
lenArray.sort(reverse=True)
print(lenArray[:5])

deque([2, 4, 1, 7, 5, 8, 0, 3, 6])
Number of components 3
leader 6 has [6, 0, 3] nodes in SCC
leader 8 has [8, 2, 5] nodes in SCC
leader 7 has [7, 4, 1] nodes in SCC
[3, 3, 3]


In [8]:
edges = [(1, 2), (1, 4), (4, 2), (3, 2), (3, 4)]
g4 = Graph(4)
for u, v in edges:
  g4.addEdge(u-1, v-1)
  g4.addReverseEdge(u-1, v-1)
scc = g4.Kosaraju()
print ("Number of components", len(scc))
lenArray = []
for i in range(len(scc)):
  lenArray.append(len(scc[i]))
lenArray.sort(reverse=True)
print(lenArray[:5])

deque([0, 2, 3, 1])
Number of components 4
[1, 1, 1, 1]


In [9]:
edges = [(1, 4), (2, 8), (3, 6), (4, 7), (5, 2), (6, 9), (7, 1), (8, 5), (8, 6), (9, 7), (9, 3)]
g4 = Graph(9)
for u, v in edges:
  g4.addEdge(u-1, v-1)
  g4.addReverseEdge(u-1, v-1)
scc = g4.Kosaraju()
print ("Number of components", len(scc))
lenArray = []
for i in scc:
  lenArray.append(len(scc[i]))
lenArray.sort(reverse=True)
print(lenArray[:5])

deque([2, 4, 1, 7, 5, 8, 0, 3, 6])
Number of components 3
[3, 3, 3]


## Assignment Problem

In [11]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [16]:
#df = pd.read_csv('https://d3c33hcgiwev3.cloudfront.net/_410e934e6553ac56409b2cb7096a44aa_SCC.txt?Expires=1711929600&Signature=ZueOXJEYWFsOezDHcs-q5p0hT4qcLKGozWOOQEuGt8h10sd-3ZJSsdkbUwPiOdTkK3bbUghPtXBsD5o-6vVDIVcwzztXQhdpfvR7dQoBJbKDkMFPx-h0I9f0HEGnU39NfRSQsu40NLNNORmNElkEd6uJJALRaTydygP16RsDqA0_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A', header=None)
filepath = "/content/drive/MyDrive/SCC.txt"
df = pd.read_csv(filepath, header=None)
EdgesArray = df[0].tolist()
print(len(EdgesArray))


5105043


In [17]:
## As told in the assignment there are 875714 nodes
G4 = Graph(875714)
for i in range(len(EdgesArray)):
  # remove spaces, split on tab
  edge = EdgesArray[i].strip().split(' ')
  #print(edge)
  # convert string to integer and also reduce number by 1 so nodes range from 0-199 instead of 1-200
  res = [eval(i)-1 for i in edge]
  #print(res)
  if res[0] == res[1]:
    continue
  G4.addEdge(res[0], res[1])
  G4.addReverseEdge(res[0], res[1])

In [None]:
## I didn't ultimately use it because even after increasing the recursion limit, I got recursion error
# import sys
# print(sys.getrecursionlimit())
# sys.setrecursionlimit(100000)
# print(sys.getrecursionlimit())

In [18]:
SCC = G4.Kosaraju()
print ("Number of components", len(SCC))
lenArray = []
for i in SCC:
  lenArray.append(len(SCC[i]))
lenArray.sort(reverse=True)
print(lenArray[:5])

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



Number of components 371762
[434821, 968, 459, 313, 211]


My God I got the correct answer after so many days! Success is sweet!

Correct Answer was: 434821,968,459,313,211
