# Graph search, Shortest path, and Data structure

In [None]:
import math
import random
import collections

## Graph search

Goals
- find everything findable from a given start vertex
- don't explore anything twice
- $O(n+m)$ time

Generic algorithm
- given graph G with vertex s
- initially s explored, all other vertices unexplored
- while possible
    - choose an edge (u, v) with u explored and v unexplored (if none, halt)
    - mark v explored

BFS
- explore ndoes in layers
- can compute shortest path
- can compute connected components of undirected graph
- $O(n+m)$ using queue

BFS(Graph G, start vertex s)
- [all node initially unexplored]
- mark s as explored
- let Q = queue initialized with s
- while Q is not empty:
    - remove first node of Q, call it v
    - for each edge (v, w)
        - if w unexplored
            - mark w as explored
            - add w to Q (at the end)
            
Shortest path
- goal: compute dist(v), fewest # of edges from s to v
- assumption: every edge has length of 1 
- Extra code to BFS
    - initialize dist(v): 0 if v=s, large number if v != s
    - when considering edge (v,w)
        - if w unexplored, then set dist(w) = dist(v) + 1 
        
Undirected connectivity
- let G(V,E) undirected graph
- connected component = pieces of G
- goal: compute all connected components
- initalize: all nodes unexplored
- assume labelled 1 to n
- for i = 1 to n
    - if i not explored # in some previsou BFS
        - BFS(G, i) # discovers precisely i's connected component
     
DFS
- backtrack when only necessary
- can compute topological ordering & directed acyclic graph
- can compute connected components of directed graph
- $O(n+m)$ using stack

DFS(Graph G, start vertex s)
- mark s as explored
- for every edge (s, v)
    - if v is unexplored
        - DFS(G, v)
        
Topological ordering (straight forward)
- let v a sink vertex of G (every directed graph has a sink vertex)
- set f(v) = n
- recurse on G - {v}

Topological ordering (DFS)

    DFS-loop(Graph G) 
    - mark all nodes unexplored
    - current_label = n # keep track of ordering
    - for each vertext v in G
        - if v not explored
            - DFS(G,v)
    DFS(Graph G, start vertex s)
    - mark s as explored
    - for every edge (s, v)
        - if v is unexplored
            - DFS (G, v)
    - set f(s) = current_label
    - current_label--
    
Strongly connected components
- there exist path u->v and v->u in graph G

Kosaraju's two pass algorithm
- $O(m+n)$
- let G' = G with all arcs reversed
- run DFS_loop on G' (compute magical ordering of nodes)
- run DFS_loop on G (compute strongly connected component one by one)

DFS_loop(graph G)
- global variable t=0 # number of nodes processed so far
- global variable s=null # current source vertex
- assumes nodes labelled 1 to n
- for i = n to 1
    - if i not explored 
        - s = i
        - DFS(G, i)
        
DFS(graph G, node i)
- mark i as explored
- set leader(i) = node s
- for each arc (i,j) in G
    - if j not explored
        - DFS(G, j)
- t++
- set f(i) = t

In [None]:
def DFS_ordering(graph, node, explored_ordering):
    explored_ordering.append(node)
    for vertex in get_next(graph, node):
        if vertex not in explored_ordering:
            DFS_ordering(graph, vertex, explored_ordering)
    ordering.append(node)
            

def DFS_loop_ordering(graph, max_integer):
    i = max_integer
    while i > 0:
        if i not in explored_ordering:
            DFS_ordering(graph, i, explored_ordering)
        i = i - 1
    

def DFS_loop_computing(graph, max_integer):
    i = max_integer
    s[0] = 0
    while i > 0:
        if i not in explored_computing:
            s[0] = i
            DFS_computing(graph, i, explored_computing)
        i = i - 1


def DFS_computing(graph, node, explored_computing):
    explored_computing.append(node)
    leader.append(s[0])
    for vertex in get_next(graph, node):
        if vertex not in explored_computing:
            DFS_computing(graph, vertex, explored_computing)
            
            
def get_next(graph, node):
    vertices = []
    for arc in graph:
        if arc[0] == node:
            vertices.append(arc[1])
    return vertices
    
    
def compute_max(graph):
    temp_list = []
    for edge in graph:
        temp_list.append(max(edge[0], edge[1]))
    return max(temp_list)

        
def open_graph(file_path):    
    graph = []
    
    with open(file_path, 'r') as line:
        array = line.read().split("\n")
        for subarray in array:
            graph.append(subarray.split(" "))
    
    for arc in graph:
        arc[0] = int(arc[0])
        arc[1] = int(arc[1])
        
    return graph
    
    
# graph = open_graph("data/strongly-connected-component-test1.txt")
# graph = open_graph("data/strongly-connected-component-test2.txt")
# graph = open_graph("data/strongly-connected-component-test3.txt")
# graph = open_graph("data/strongly-connected-component-test4.txt")
# graph = open_graph("data/strongly-connected-component-test5.txt")
# graph = open_graph("large/strongly-connected-component.txt")


# Compute the magical ordering
ordering = []    
explored_ordering = []
DFS_loop_ordering(graph, compute_max(graph))
# print(ordering)


# Reverse direction of graph
for edge in graph:
    tmp = edge[0]
    edge[0] = edge[1]
    edge[1] = tmp
# print(graph)


# Change nodes based on magical ordering
for i in range(0, len(graph)):
    graph[i][0] = ordering.index(graph[i][0]) + 1
    graph[i][1] = ordering.index(graph[i][1]) + 1       
# print(graph)


# Compute the strongly connected components
leader = []
explored_computing = []
s = []
s.append(-1) # leaders in second path   
DFS_loop_computing(graph, compute_max(graph))
# print(leader)


# Show the result
counter = collections.Counter(leader)
print(counter.values())
print(counter.most_common(5))

## Shortest path (Dijkasta's Algorithm)

- initialize: x = {s} # vertices processed so far
- A[s] = 0 # computed shortest path distances
- B[s] = null # computed shortest path (actial path like a->b->c)
- while $X$ != $V$ # assume there are two sets $X$ and $V-X$ 
    - among all edges $(v,w)$ with $v$ in $X$, $w$ not in $X$, pick the one that minimizes $A[v]$ + $l_{vw}$ # call it $v^{*}, w^{*}$
    - add $w^{*}$ to $X$
    - set $A[w^{*}]$ = $A[v^{*}] + l_{v^{*}w^{*}}$
    - set $B[w^{*}]$ = $B[v^{*}] + (v^{*}, w^{*})$