# MSDM5051 Tutorial 6 - Graph Traversal

## Content

1. Graph structure representation
2. Graph traversal algorithm - DFS & BFS
3. Practices

--- 

# 1. Graph Structure Representation

A graph consists of 2 kinds of elements - vertices ($V$) and edge ($E$). Some common ways to store a graph structure include:

- **Adjacency list/dictionary** - A linked list/dictionary of all the nodes, with all neighbours of the node stored as a separated linked list/dictionary under the property of the node.

- **Adjacency matrix** - A $|V|\times |V|$ matrix whose elements are 

$$
a_{ij} = 
\begin{cases}
1 \qquad \text{if the edge }ij \text{ exists} \\
0 \qquad \text{else}
\end{cases}
$$


- **Incident matrix** - A $|V|\times |E|$ matrix whose elements are 

$$
a_{ij} = 
\begin{cases}
1 \qquad \text{if node }i \text{ is connected to other nodes via edge } j\\
0 \qquad \text{else}
\end{cases}
$$


<figure style="text-align: center">

    <br>
    <figcaption> <b>Fig. 6</b> Different structures for storing a graph (A): Adjacency list (B), adjacency matrix (C) and incident matrix (D).</figcaption>
</figure>

**Note:** 

- Adjancency matrix and incident matrix are very efficient for storage and access, if we are only interested in the neighbour relations between nodes/edge.

- The list/dictionary approach is more prefered if we want to add additional properties to nodes/edges, e.g. data value on each node, weight on an edge, etc. For example, a very commonly used graph analysis library [NetworkX](https://networkx.org/documentation/stable/index.html) is using ["dictionary of dictionaries of dictionaries"](https://networkx.org/documentation/stable/reference/introduction.html#data-structure) as the basic data structure.

---
# 2. Graph Traversal

Traversing a graph means visiting each node (vertex) once and only once. In the lecture, we have introduced two graph traversal algorithms:

- **Depth-First Search** (DFS)
- **Breadth-First Search** (BFS)

In the below demo, I am using a "list of node with list" structure to store the graph so I can keep the code short (however not efficient). The definitions are as follow:

In [None]:
class Node:
    def __init__(self, adj=[]):
        self.adj = adj         # an array for storing its adjacent nodes in object form
        
class Graph:
    def __init__(self, list_of_nodes = []):    # create a graph by inputting a list of node objects
        self.graph = list_of_nodes             # the actual graph is stored as the list of nodes objects        
        

## 2.1. Depth-First Search

DFS as recursion: 

- **Stopping condition**: Reaches a vertex that has no un-visited adjecent vertex.
- **Inductive step**: Mark the current vertex as "visited", then propagate to each of (i.e. `for` loop) the neighbouring un-visited vertex and repeat.

In terms of tree graphs, it is like exploring a branch all the way to the leaf level, then backtrack to the leaf's parent to see if its has any un-visited branches. 


In [None]:
# implemented as a function under the class Graph

# for initializing the DFS
def depth_first_search(self, start_node):
    
    visited = {node:False for node in self.graph}       # create an dictionary for recording if the node has been visited nodes. 
                                                        # keys: the node object itself
                                                        # Values: False = not visited, True = visited
                                                        # Note that this is equivalent to using a set
    
    DFS_order = []        # an array just for recording the order of DFS
    
    visited = DFS(start_node, visited)    # start DFS
    
    return DFS_order    # return the order of visited nodes


# A recursive function for the real DFS
def DFS(self, node, visited):
    
    if not visited[node]:        # need to continue searching only if the node is not yet visited before

        visited[node] = True        # mark it is as visited in the visited dict
        
        DFS_order.append(node)      # add it to the final output

        for n in node.adj:               # then recursively visited its adjacent nodes 
            visited = DFS(n, visited)    # the visited list is updated everytime a recursion return
            
    return visited        # return the updated visited list

## 2.2. Breadth-First Search

BFS as `for` loop: 

1. Choose one of the vertex as a start, add all its adjecent vertices to a visited queue
2. start from the first vertices in the queue, put all its un-visited adjacent vertex in the queue
3. Repeat step 2 for the whole queue until it reaches the end


In terms of tree graphs, it is like exploring all nodes at the same level before moving on to the next level.

In [None]:
# implemented as a function under the class Graph

def breadth_first_search(self, start_node):
    
    in_queue = {node:False for node in self.graph}      # create an dictionary for recording if the node has been put in the queue. 
                                                        # keys: the node object itself
                                                        # Values: False = not in queue, True = in queue
                                                        # Note that this is equivalent to using a set
    
    queue = [start_node]              # initialize the queue with the start_node
    in_queue[start_node] = True
    
    BFS_order = []        # an array just for recording the order of BFS
     
    
    for node in queue:      # visit the nodes following the order in queue. The queue is updated dynamically
        
        BFS_order.append(node)    # add it to the final output
        
        for adj_node in node.adj:    # then check all its adjacent node
            
            if not in_queue[adj_node]:    # the adjacent node is added to the queue only if it is not in the queue
                queue.append(adj_node)
                in_queue[adj_node] = True
    
    return BFS_order

---
# 3. Practices

## 3.1. DFS using stack

In last tutorial, I have demonstrated how to write DFS using recursion. In fact, as pairing with BFS using a queue, DFS can be realized using a stack. Try to rewrite DFS using stack instead of recursion. You may begin with the code of BFS. Assume all nodes in the graph are connected.

**For your revision**: 

The recursive definition of DFS is:

- **Stopping condition**: Reaches a vertex that has no un-visited adjecent vertex.
- **Inductive step**: Mark the current vertex as "visited", then propagate to each of (i.e. `for` loop) the un-visited vertex and repeat.

In [None]:
# The defintion of Node and Graph are given to you. They are the same as in last tutorial.
class Node:
    def __init__(self, adj=[]):
        self.adj = adj         # an array for storing its adjacent nodes in object form
        
class Graph:
    def __init__(self, list_of_nodes = []):    # create a graph by inputting a list of node objects
        self.graph = list_of_nodes             # the actual graph is stored as the list of nodes objects        
    
    
    ###########################################################################
    # The definition of BFS is already copied for you
    def breadth_first_search(self, start_node):
    
        in_queue = set()        # changed to using an empty set 
                                # keys: the node object itself. The key exist if the node is already in queue

        queue = [start_node]              # initialize the queue with the start_node
        in_queue.add(start_node)

        BFS_order = []        # an array just for recording the order of BFS


        for node in queue:      # visit the nodes following the order in queue. The queue is updated dynamically

            BFS_order.append(node)    # add it to the final output

            for adj_node in node.adj:    # then check all its adjacent node

                if adj_node not in in_queue:    # the adjacent node is added to the queue only if it is not in the queue
                    queue.append(adj_node)
                    in_queue.add(adj_node)

        return BFS_order
        
        
    ###########################################################################
    # try it yourself
    def depth_first_search(self, start_node):
        

**Solution:**

The outline of BFS using queue and DFS using stack are very similar:

- **BFS**: 
    - Always check the first element in the queue. 
    - Add the checked node into the storage array (or do what every you like). 
    - After check, "pop" it out of the queue. (In the above code, I use a for loop instead.)
    - After the "pop", the adjacent nodes are appeneded to the queue.


- **DFS**: 
    - Always check the last element in the stack. 
    - Add the checked node into the storage array (or do what every you like).
    - After check, "pop" it out of the stack.
    - After the "pop", the adjacent nodes are appeneded to the stack.

Here I re-write the BFS to pair up with DFS.

In [None]:
# The defintion of Node and Graph are given to you. They are the same as in last tutorial.
class Node:
    def __init__(self, adj=[]):
        self.adj = adj         # an array for storing its adjacent nodes in object form
        
class Graph:
    def __init__(self, list_of_nodes = []):    # create a graph by inputting a list of node objects
        self.graph = list_of_nodes             # the actual graph is stored as the list of nodes objects        
    
    
    ###########################################################################
    # Re-written to make it looks the same as DFS
    def breadth_first_search(self, start_node):
    
        in_queue = set()        # changed to using an empty set 
                                # keys: the node object itself. The key exist if the node is already in queue

        queue = [start_node]              # initialize the queue with the start_node
        in_queue.add(start_node)

        BFS_order = []        # an array just for recording the order of BFS


        while len(queue) > 0:     # use a while loop instead of for loop 
            
            node = queue[0]           # always operate on the first element of the queue
            BFS_order.append(node)    # add it to the final output
            queue.pop(0)              # then removed it from queue

            for adj_node in node.adj:    # then check all its adjacent node

                if adj_node not in in_queue:    # the adjacent node is added to the queue only if it is not in the queue
                    queue.append(adj_node)
                    in_queue.add(adj_node)

        return BFS_order
        
        
    ###########################################################################
    # The two codes are only different by two number (Those lines with ***). Everywhere else are the same except for the variables' names.
    def depth_first_search(self, start_node):
        
        in_stack = set()        # changed to using an empty set 
                                # keys: the node object itself. The key exist if the node is already in queue

        stack = [start_node]              # initialize the stack with the start_node
        in_stack.add(start_node)

        DFS_order = []        # an array just for recording the order of DFS


        while len(stack) > 0: 
            
            node = stack[-1]          # ***always operate on the last element of the stack
            DFS_order.append(node)    # add it to the final output
            stack.pop()               # ***then removed it from stack. Default value of pop() is -1 so no input required

            for adj_node in node.adj:    # then check all its adjacent node

                if adj_node not in in_stack:    # the adjacent node is added to the stack only if it is not in the stack
                    stack.append(adj_node)
                    in_stack.add(adj_node)

        return DFS_order

## 3.2. Bipartite graph

An undirected graph is called *k-colourable* if all its vertices can be coloured using $k$ different colours such that no two adjacent vertices have the same colour. In particular when $k=2$, such graph are called bipartite graph. Try to write an algorithm to identify whether a graph is bipartite.


In [None]:
# The defintion of Node and Graph are given to you. They are the same as in last tutorial.
class Node:
    def __init__(self, adj=[]):
        self.adj = adj         # an array for storing its adjacent nodes in object form
        
class Graph:
    def __init__(self, list_of_nodes = []):    # create a graph by inputting a list of node objects
        self.graph = list_of_nodes             # the actual graph is stored as the list of nodes objects    
    
    ###########################################################################
    # try it yourself
    def is_bipartite():
        

**Solution:**

We can use DFS to decide the order to colour the nodes.

- **Stopping condition:** 
    - Finding a vertex that is already in the opposite colour list - Contradiction found and we can quit DFS instantly.
    - Reaching a vertex that has no un-visited adjecent vertex - Not violating the bipartite rule. DFS continues.
    
- **Induction steps:**
    - Mark the current vertex as "visited" and put it inside the current colour list. Then propagate to the neighbouring un-visited vertices and check if they are in the opposite colour list.

In [None]:
# The defintion of Node and Graph are given to you. They are the same as in last tutorial.
class Node:
    def __init__(self, adj=[]):
        self.adj = adj         # an array for storing its adjacent nodes in object form
        
class Graph:
    def __init__(self, list_of_nodes = []):    # create a graph by inputting a list of node objects
        self.graph = list_of_nodes             # the actual graph is stored as the list of nodes objects    
    
    ###########################################################################
    # Modifying the recursive version from last tutorial
    def is_bipartite():
        
        start_node = self.graph[0]
        visited = set()
        
        colour_list = [set(), set()]        # two set for storing nodes in respective colour
        colour_index = 0                    # begin with the first colour        
        
        bipartite = True     # assume the graph is bipartite initially
        bipartite = bipartite_DFS(start_node, bipartite, colour_index)    # start DFS
        
        return bipartite    # Return True or False


    # A recursive function for the real DFS
    def bipartite_DFS(self, node, bipartite, colour_index):                       
        
        if node in colour_list[(colour_index+1)%2]:       # check if the current node is on the opposite colour list
            return False                                  # If yes, then a contradiction is found and we don't have to continue
        
        if node not in visited:        # need to continue DFS only if the node is not yet visited before

            visited.add(node)                        # mark the node as visited
            colour_list[colour_index].add(node)      # add the node to the current colour list

            for n in node.adj:               # then recursively visited its adjacent nodes 
                bipartite = bipartite_DFS(n, bipartite, (colour_index+1)%2)    # the adjacent nodes are checked using the opposite colour
                if not bipartite:            # we don't have to continue checking other neighbour nodes if contradiction is found 
                    break                    # on the current node

        return bipartite        # return whether the graph is still bipartite