## Introduction

This tutorial will cover some fundamental operations on graphs in Python. Graphs are a key programming concept that shows up in many areas in our life. For example, consider a social media platform with millions of users. How can we efficiently know that person A is friends with person B? Using a graph representation, we can efficiently compute queries such as these, while having the ability to scale to an even larger userbase.

## Table of Contents

1. Background
2. Graph Class
3. Loading the Data
4. BFS
5. DFS
6. Bellman-Ford Algorithm
7. Topological Sort

## Background

Graphs in general can either be weighted or unweighted by assigning a cost to edges between nodes. For example, consider the case of encoding distances between cities in the form of a graph. We could represent each city as a node, and assign a numerical weight to each edge representing the distance from one city to another.

Furthermore, graphs can also be directed (order of edges matter) or undirected (order doesn't matter). We can encode this information visually through the use of an arrow for an edge, indicating directionality. Note that in a directed graph, in order to indicate bidirectionality, one would need to use two edges to specify this. Most of the graph algorithms in general can be used in both cases with minor modifications. 
 

## Graph Class

The first step is to create a Graph class. There are many representations one can use to store information about a graph. Some of the more common ones include: adjacency list, adjacency matrix, and adjacency dictionary. As one would imagine, using an adjacency list is feasible, but not as efficient for looking up information within a graph. For this tutorial, we will focus on the adjacency dictionary implementation as it gives us constant time operations for edge detection and outgoing edge collection.

In [1]:
class Graph:
    
    def __init__(self):
        self.edges = dict()
        
    
    #Assumes input list of edges of the form (node,node,weight)
    #If weight doesn't exist, auto assigns weight of 1.0 to every edge
    #Returns dictionary of dictionarys (adjacency dictionary) for each node in the 
    #graph
    def add_edges(self, edges_list):
        unique_nodes = set()
        
        for edge in edges_list:
            if (len(edge) == 2):
                source,dest = edge
                weight = 1.0
            else:
                source,dest,weight = edge
            if (source not in self.edges):
                self.edges[source] = dict()
            
            self.edges[source][dest] = weight
            unique_nodes.add(source)
            unique_nodes.add(dest)
            
        
        for node in unique_nodes:
            if (node not in self.edges):
                self.edges[node] = dict()
                
    
    def bfs(self, source):
        
        if (source not in self.edges):
            return []
        
        visited = set()
        queue = []
        result = []
        
        #Add source vertex to queue and mark as visited
        #This FIFO structure means we visit every immediate neighbor first before going to the next level of neighbors
        queue.append(source)
        visited.add(source)
        
        while (queue):
            curr = queue.pop(0)
            neighbors = self.edges[curr]
            result.append(curr)
            
            for n in neighbors:
                if n not in visited:
                    queue.append(n)
                    visited.add(n)
        
        return result
    
    def dfs(self, source):
        
        if (source not in self.edges):
            return []
        
        
        visited = set()
        stack = []
        result = []
        
        #Add the source vertex to the stack and mark as visited
        stack.append(source)
        visited.add(source)
        
        while (stack):
            #Get the next neighbor off the stack and inspect it's neighbors
            curr = stack.pop()
            neighbors = self.edges[curr]
            result.append(curr)
            
            for n in neighbors:
                if (n not in visited):
                    stack.append(n)
                    visited.add(n)
        
        return result
    
    def bellmanford(self, source):
        
        if (source not in self.edges):
            return []
        
        num_nodes = len(self.edges)
        dists = dict()
        
        for node in self.edges:
            dists[node] = float("inf")
            
        dists[source] = 0
    
        #A node in the graph can have at most n-1 outward edges, so iterate over these edges
        for i in range(num_nodes-1):
            #Iterate over every node in the graph, while checking it's neighbors
            for u in self.edges:
                neighbors = self.edges[u]
                for v in neighbors:
                    weight = neighbors[v]
                    
                    #This checks if the newly discovered edge is a better edge to use than any previously recorded 
                    #edge weights
                    if ((dists[u] != float("inf")) and (dists[u] + weight < dists[v])):
                        dists[v] = dists[u] + weight
        
        #This part checks if there are any negative weight cycles
        #For the graphs in this tutorial, we wont see any negative weight cycles since all weights are all non-negative
        #However, in the real world, this is very possible, so always good to have a check that makes sure
        for u in self.edges:
            for v in self.edges[u]:
                weight = self.edges[u][v]
                
                #Since we have already iterated V-1 times over the graph and reached an optimal solution,
                #Any discovery of a "better edge to take" indicates that there is a negative weight cycle, since
                #The optimal solution would've been found already in the V-1 relaxation iterations
                if ((dists[u] != float("inf")) and (dists[u] + weight < dists[v])):
                    print("Found negative weight cycle in graph, returning early!")
                    return dict()
        
        return dists
    
    def ts_helper(self, curr_node, visited, stack):
        #Mark the current node as visited
        visited[curr_node] = True
        
        #However, we can only add the current node into the stack if it's neighbors exist in the stack
        #Do this process recursively
        for neighbor in self.edges[curr_node]:
            if (not(visited[neighbor])):
                visited,stack = self.ts_helper(neighbor, visited, stack)
        
        #Eventually, after all neighbors have been added to the stack, we can add this node to the beginning of the stack
        stack = [curr_node] + stack
        
        return visited, stack

    def topological_sort(self):
        
        #Create a dictionary of visited nodes
        num_nodes = len(self.edges)
        visited = dict()
        for node in self.edges:
            visited[node] = False
            
        #Return a stack of the ordering of nodes
        stack = []
        
        #For every node in the graph, try to insert it into the stack (recursive)
        for node in self.edges:
            if (not(visited[node])):
                visited, stack = self.ts_helper(node, visited, stack)
        
        return stack
        
    

Great! Now that we have our Graph class, and have a method of adding edges to our graph, we can start to apply some commonly used graph algorithms to solve specific problems.


## Loading the Data

For this tutorial, we care more about the algorithms and operations done on a graph, so the graph we generate will be from two places. 1) A small "test" graph, where visualization of the correctness of our algorithms will be easy. This graph will just have 5 nodes A-E, and some edges. 2) A large "dataset" graph, which is just a directed graph that illustrates the page links from Wikipedia. This dataset was also used in the 15-688 HW2 Graph problem. 

First, lets create a small test graph with 5 nodes. For the sake of this tutorial, we'll just deal with the uniform weight case. You'll see that for our BFS and DFS implementations, this won't matter. However, for Bellman-Ford, since we're trying to find optimal paths from a source node to all other nodes, minimization of the weights will be important.

![alt text](small_graph.png "Title")

In [2]:
#Create a small test graph for easy correctness of algorithm checks
G_test = Graph()
G_test.add_edges([("A", "B", 1.0), ("B","C", 1.0), ("A", "D", 1.0), ("D", "E", 1.0), ("E", "B", 1.0)])

print(G_test.edges)

{'A': {'B': 1.0, 'D': 1.0}, 'B': {'C': 1.0}, 'D': {'E': 1.0}, 'E': {'B': 1.0}, 'C': {}}


Next, load in the data from wikipedia_small.graph. NOTE: This file is from 15-688 hw2 problem 3. Unfortunately, it's pretty large so just download it from: http://www.datasciencecourse.org/assignments/. Specifically assignment 2.

In [3]:
#Create a large dataset graph from the wikipedia links 
G = Graph()
with open ('wikipedia_small.graph') as f:
    for line in f:
        split_line = line.split()
        k,v = split_line[0].strip(), split_line[1].strip()
        G.add_edges([(k,v)])

Great! Now we have both of our datasets loaded into our two graphs. We can now turn our attention towards the algorithms and operations on such a graph.

## BFS

Breadth first search (BFS) is a common algorithm for traversing or searching a graph data structure. In essence, BFS starts at some source vertex in the graph, and explores every immediate neighbor first, before moving to the next "level" of neighbors. For this tutorial, we will consider BFS in the context of traversing a graph. This can be useful if we want to figure out if one node is "connected" to another node, in the sense of reachability. 

The implementation of BFS is shown in the Graph class above.

Let's see the result of running BFS on the small graph with the source node "A". 

In [4]:
visit_order = G_test.bfs("A")
print(visit_order)

['A', 'B', 'D', 'C', 'E']


Notice how we don't continue down the immediate neighbors of A (in this case B's neighbors). This is because we do a "breadth" traversal which includes traversing all immediate neighbors at the same level.

Now let's see the result of running BFS on the larger dataset. Unfortunately, it's harder to visualize this one, but upon further inspection, you'll find that BFS/DFS in general for search is much slower than something like Djikstra.

In [5]:
large_visit_order = G.bfs("0")
print(large_visit_order)

['0', '814', '825', '837', '853', '868', '903', '971', '1352', '1744', '1756', '3982', '4051', '5450', '5468', '5918', '5947', '6193', '6263', '6275', '6503', '6531', '8138', '8146', '8405', '9598', '10284', '10320', '10690', '10693', '10725', '10983', '10989', '10998', '11009', '11016', '13402', '13835', '14113', '14366', '14373', '14582', '14685', '14688', '16423', '17740', '17771', '20070', '20074', '20469', '20472', '20474', '20475', '20560', '21038', '22792', '23525', '23666', '23742', '23744', '23745', '23747', '23749', '23750', '23751', '23876', '23882', '315', '352', '359', '362', '364', '366', '369', '373', '374', '377', '379', '382', '385', '388', '391', '395', '398', '401', '404', '409', '412', '415', '418', '422', '424', '427', '431', '436', '439', '444', '446', '450', '453', '455', '463', '469', '472', '476', '480', '483', '487', '489', '492', '495', '499', '504', '508', '515', '521', '525', '529', '532', '537', '544', '547', '552', '556', '560', '571', '576', '579', '584'

We'll later see that the result of DFS or depth first search is reached through a stack datastructure, giving a "deepest neighbor first" traversal. 

## DFS

Depth first search (DFS) is also a common algorithm for traversing or searching a graph. DFS starts at some source vertex in the graph, and explores as deep as possible with respect to one neighbor. After exploring as far as possible along each branch, it backtracks and explores similarly for the next neighbor. 

This is done (iteratively) using the stack datastructure. Everytime we visit a node, we add all of it's immediate neighbors to a stack. Because of the first-in-first-out policy of a stack, we are in essence traveling as "deep" as possible into the neighbors. 

The implementation of DFS is shown in the Graph class above.

Let's see the result of running DFS on the small graph with source node "A".

In [6]:
visit_order_dfs = G_test.dfs("A")
print(visit_order_dfs)

['A', 'D', 'E', 'B', 'C']


Notice how after starting from A and proceeding to immediate neighbor D, DFS continues down the branch to E then B then C. There was no need to backtrack in this case because A is connected to C through its immediate neighbors.

Now if we run it on the larger dataset, we get the same coverage but  in a different order.

In [7]:
large_visit_order_dfs = G.dfs("0")
print(large_visit_order_dfs)

['0', '23882', '23896', '23910', '23845', '23965', '24024', '23927', '23925', '23826', '18316', '23924', '23956', '23964', '22613', '23807', '24069', '22428', '23705', '23689', '23752', '23755', '23462', '23950', '23954', '23567', '23569', '23813', '23817', '23563', '23565', '23653', '23652', '20969', '20707', '23934', '23935', '24162', '21752', '23669', '23957', '23746', '23743', '23874', '23748', '22754', '23878', '20073', '23962', '23590', '23591', '23531', '23822', '23412', '23901', '23905', '23906', '19843', '22506', '23854', '24037', '23711', '22681', '23078', '17831', '23414', '23774', '23892', '20734', '23848', '23829', '2170', '23799', '23486', '23587', '23483', '23675', '23800', '20096', '23731', '23571', '23980', '23159', '22380', '21951', '21953', '21952', '3425', '23818', '14911', '23426', '22892', '24027', '24041', '24058', '17513', '23861', '23860', '23859', '21633', '23559', '22837', '21825', '22774', '22791', '23847', '21542', '22239', '23708', '23825', '16227', '16406

You'll see that the code for BFS and DFS are very similar except for the underlying datastructure. This is very common in many graph algorithms, as the important step is choosing which node to explore next. In this case, that occurs in the stack.pop()/queue.pop() line, which indicates how the next node will be chosen. Similarly, from the class lecture note's we know that Djikstra uses a priority queue, and assigns this priority based on minimizing the weights of the edges. This is another policy for picking the next best node to visit. 

## Bellman-Ford Algorithm


The Bellman-Ford algorithm is used to calculated shortest paths from a source vertex to all other nodes in the graph. It's very similar to Dijkstra's shortest path algorithm in terms of functionality, but performs slightly worse in some cases. However, the power of this algorithm comes in its ability to handle negatively weighted edges. In essence, having a negative weighted edge can result in a cycle because minimizing a path with a negative edge will simply include the edge infinitely many times. However, Bellman-Ford actively checks for this condition, and will report any such occurances.

The core of the algorithm relies on the idea of relaxation, in which every shortest path is continuously updated with better approximations until reaching the optimal solution. Unlike Djikstra which greedily selects the lowest weighted unvisited vertex, BF updates all edges with better approximations, and iterates this process V-1 times, where V is the number of vertices in the graph. 

The implementation of the algorithm is shown above in the Graph class. Note that we first set all nodes besides the source as an infinite distance. This will allow us to continuously iterate over edges to find the lowest weight for the specified path. We then iterate over all the remaining |V| - 1 vertices of the graph, in order to find the best weight. Updating the weights is pretty intuitive as we check if the current entry is worse (greater) in weight than the potential new path that we have found.


In [8]:
lengths = G_test.bellmanford("A")
print(lengths)

{'A': 0, 'B': 1.0, 'D': 1.0, 'E': 2.0, 'C': 2.0}


As you can see from above, running Bellman-Ford on the small graphs results in a pretty intuitive answer. B,D are immediate neighbors of A with weights of 1.0, so it's clear that the shortest path to these two nodes would be 1.0. Similarly, C is connected to B by a weight of 1.0 and E is connected to D by a weight of 1.0, so it's clear that the shortest path from A to these nodes would be 2.0. Lastly, note that while B can be reached from E, it is less optimal than the direct connection from A to B. This algorithm also includes a check at the end to screen any negative weight cycles. In the case of the small graph, all weights are non-negative, so this issue will not be faced. However, this algorithm scales well to large graphs as well as graphs with negative weights, which gives it the advantage over Djikstra in these cases. 

Now, let's define a slightly more complicated graph (not the big dataset) in which the idea of weight prioritization is more clear.

![alt text](med_graph.png "foo")

https://www.programiz.com/dsa/bellman-ford-algorithm

In [9]:
G_med = Graph()
G_med.add_edges([("A", "B", 4),("A", "C", 2),("B", "C", 3),("C", "B", 1),("B", "E", 3),("C", "D", 4),("B", "D", 2), ("C", "E", 5), ("E", "D", -5)])
print(G_med.edges)

{'A': {'B': 4, 'C': 2}, 'B': {'C': 3, 'E': 3, 'D': 2}, 'C': {'B': 1, 'D': 4, 'E': 5}, 'E': {'D': -5}, 'D': {}}


Next, let's run the Bellman-Ford algorithm with source vertex A.

In [10]:
med_lengths = G_med.bellmanford("A")
print(med_lengths)

{'A': 0, 'B': 3, 'C': 2, 'E': 6, 'D': 1}


You'll see that node D originally had a shortest path weight of 4+2 = 6, since it can be reached from A->B->D. However, after several iterations in the BF algorithm, a better path is acquired of weight 2 + 1 + 3 + -5 = 1 using the path A->C->B->E->D.
The power of this alrgorithm comes from it's ability to process negative weights, something Djikstra and other search algorithms cannot do.

## Topological Sort

Topological sorting of the nodes in a graph is a common operation useful in many areas of computer science. More specifically, the formal definition is as follows: A linear ordering of a graph's vertices such that for every edge (u,v) u comes before v in the topological sort. One major constraint for this is that the graphs must be direct acyclic graphs (DAG) so that we can guarauntee at least one topological sort. However, the same DAG can have multiple (correct) topological representations.

The most common use case for topological sorting is in the graph representation of a set of tasks. Since some tasks are dependant on others, it's useful to represent a set of tasks as a DAG. A topological sort on these vertices and edges returns a potential ordering for a valid sequence of tasks.

The implementation is very similar to DFS except instead of visiting each node immediately after pushing onto the stack, we limit ourselves to only pushing a node if it's adjacent verticies are already on the stack. In doing so, we ensure that the descendents have been ordered, and then we place the parent before them to ensure the sorting.

The implementation is shown in the class definition.

![alt text](topological_graph.png "topo_graph")

In the graph above, we have dependencies of tasks. 4 and 5 are parent tasks with no dependencies.

In [11]:
G_sort = Graph()
G_sort.add_edges([("5","2"), ("5","0"), ("4","0"), ("4","1"), ("2","3"), ("3","1")])
print(G_sort.edges)

{'5': {'2': 1.0, '0': 1.0}, '4': {'0': 1.0, '1': 1.0}, '2': {'3': 1.0}, '3': {'1': 1.0}, '0': {}, '1': {}}


In [12]:
ordering = G_sort.topological_sort()
print(ordering)

['4', '5', '0', '2', '3', '1']


Notice how we first visit nodes 4 and 5. This is because both of these nodes have no dependencies and therefore can be "executed" first (if we're sticking with the pool of tasks analogy). Then we visit 0 because both of it's prerequisites have been satisfied. Then 2 for the same reason. We couldn't have visited 3 before any of the previous because it depends on 2. Lastly, we visit 1 because one if it's parents (3) was just visited. Notice that another (valid) sorting would be ['5','4','2', '3','1','0'].