# [CptS 215 Data Analytics Systems and Algorithms](https://github.com/gsprint23/cpts215)
[Washington State University](https://wsu.edu)

[Gina Sprint](http://eecs.wsu.edu/~gsprint/)
# Graph Implementation
Learner objectives for this lesson:
* Implement a graph in two different ways
    * Adjacency matrices
    * Adjacency lists
* Compare the different implementation approaches
* Implement common graph traversal algorithms


## Acknowledgments
Content used in this lesson is based upon information in the following sources:
* [Miller and Ranum](http://interactivepython.org/runestone/static/pythonds/index.html)

## Implementation Overview
As we have seen with the previous abstract data types, there are several ways to implement a graph ADT. In this lesson, we will cover, implement, and compare two different implementations:
1. Adjacency matrices
1. Adjacency lists

## Adjacency Matrices
An adjacency matrix is a two-dimensional matrix were each vertex $v_{i}$ is assigned row $i$ and column $i$. If two vertices $v_{i}$ and $v_{j}$ are adjacent, then there is a 1 in the $i$th row and $j$th column in the matrix. If two vertices are not adjacent, then a 0 is placed in the respective row and column.

Consider the example graph: 
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/6n-graf.svg/640px-6n-graf.svg.png" width="300">
(image from [https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/6n-graf.svg/640px-6n-graf.svg.png](https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/6n-graf.svg/640px-6n-graf.svg.png)) 

An adjacency matrix for the above graph pictorially:

||1|2|3|4|5|6|
|-||-|-|-|-|-|-|
|1|0|1|0|0|1|0|
|2|1|0|1|0|1|0|
|3|0|1|0|1|0|0|
|4|0|0|1|0|1|1|
|5|1|1|0|1|0|0|
|6|0|0|0|1|0|0|

And using lists:

```
amatrix = [[0,1,0,0,1,0],
           [1,0,1,0,1,0],
           [0,1,0,1,0,0],
           [0,0,1,0,1,1],
           [1,1,0,1,0,0],
           [0,0,0,1,0,0]]
```

To find out if two vertices $v_{i}$ and $v_{j}$ are adjacent, we simply need to look up if there is a one in `amatrix[i][j]`, which is constant time $\mathcal{O(1)}$. 

The size of this matrix is the number of vertices squared, which can be quite large. Adjacency matrices with the majority of the entries 0 is said to be *sparse* and is not an effective use of memory. This shortcoming be overcome with a [*sparse matrix* representation](https://en.wikipedia.org/wiki/Sparse_matrix#Storing_a_sparse_matrix), such as adjacency lists. As we will say, we trade off space for lookup time complexity.

## Adjacency Lists
An adjacency list is a list of vertices where each vertex has a list of adjacent vertices. Each vertex in the list of adjacent vertices represents an edge. 

Consider the example graph: 
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/6n-graf.svg/640px-6n-graf.svg.png" width="300">
(image from [https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/6n-graf.svg/640px-6n-graf.svg.png](https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/6n-graf.svg/640px-6n-graf.svg.png)) 

An adjacency list for the above graph implemented using Python dictionaries:

```
alist = {1: [2, 5],
         2: [1, 3, 5],
         3: [2, 4],
         4: [3, 5],
         5: [1, 2, 4],
         6: [4]}
```

The size of this dictionary is the number of vertices (keys) plus two times the number of edges (each edge appears once in the graph but is stored as a list value twice). To find out if two vertices $v_{i}$ and $v_{j}$ are adjacent, we simply need to look up $v_{i}$, walk through each item in $v_{i}$'s edge list, looking for $v_{j}$. In the worst case, $v_{i}$ is fully connected to every other vertex in the graph and this list traversal is $\mathcal{O(V)}$. 

## Weighted Graphs
In the case of a weighted graph, we need to store additional information for each edge, the weight! With the adjacency matrix, we either need to store an object (such as a tuple or an instance of a custom `Vertex` class) instead of 1 or we need to maintain a parallel matrix with this information. With the adjacency list, we can modify our keys to be instances of a custom `Vertex` class.

The `Vertex` class will store the name of the vertex and a dictionary of adjacent vertex names (keys) and edge weights (values). Then, the adjacency list is a dictionary of vertex names (keys) and `Vertex` objects (values). We will define a `Graph` class to wrap this adjacency list dictionary with our graph ADT methods:
1. `add_vertex(vert)` adds an instance of Vertex to the graph.
1. `add_edge(from_vert, to_vert)` adds a new, directed edge to the graph that connects two vertices.
1. `add_edge(from_vert, to_vert, weight)` adds a new, weighted, directed edge to the graph that connects two vertices.
1. `get_vertex(vert_key)` finds the vertex in the graph named vertKey.
1. `get_vertices()` returns the list of all vertices in the graph.
1. `in` returns True for a statement of the form vertex in graph, if the given vertex is in the graph, False otherwise.

Let's implement a digraph ADT using this adjacency list approach.

### Adjacency List Implementation


In [2]:
class Vertex:
    '''
    keep track of the vertices to which it is connected, and the weight of each edge
    '''
    def __init__(self, key):
        '''
        
        '''
        self.ID = key
        self.connected_to = {}

    def add_neighbor(self, neighbor, weight=0):
        '''
        add a connection from this vertex to anothe
        '''
        self.connected_to[neighbor] = weight

    def __str__(self):
        '''
        returns all of the vertices in the adjacency list, as represented by the connectedTo instance variable
        '''
        return str(self.ID) + ' connected to: ' + str([x.ID for x in self.connected_to])

    def get_connections(self):
        '''
        
        '''
        return self.connected_to.keys()

    def get_ID(self):
        '''
        
        '''
        return self.ID

    def get_weight(self, neighbor):
        '''
        returns the weight of the edge from this vertex to the vertex passed as a parameter
        '''
        return self.connected_to[neighbor]
    
class Graph:
    '''
    contains a dictionary that maps vertex names to vertex objects. 
    '''
    def __init__(self):
        '''
        
        '''
        self.vert_list = {}
        self.num_vertices = 0
        
    def __str__(self):
        '''
        
        '''
        edges = ""
        for vert in self.vert_list.values():
            for vert2 in vert.get_connections():
                edges += "(%s, %s)\n" %(vert.get_ID(), vert2.get_ID())
        return edges

    def add_vertex(self, key):
        '''
        adding vertices to a graph 
        '''
        self.num_vertices = self.num_vertices + 1
        new_vertex = Vertex(key)
        self.vert_list[key] = new_vertex
        return new_vertex

    def get_vertex(self, n):
        '''
        
        '''
        if n in self.vert_list:
            return self.vert_list[n]
        else:
            return None

    def __contains__(self, n):
        '''
        in operator
        '''
        return n in self.vert_list

    def add_edge(self, f, t, cost=0):
        '''
        connecting one vertex to another
        '''
        if f not in self.vert_list:
            nv = self.add_vertex(f)
        if t not in self.vert_list:
            nv = self.add_vertex(t)
        self.vert_list[f].add_neighbor(self.vert_list[t], cost)

    def get_vertices(self):
        '''
        returns the names of all of the vertices in the graph
        '''
        return self.vert_list.keys()

    def __iter__(self):
        '''
        for functionality
        '''
        return iter(self.vert_list.values())
    
# test out the Graph class by making the example graph we have been using (vertices 1,...,6)
# weights are set to dummy values
g = Graph()
for i in range(1, 7):
    g.add_vertex(i)
    
# each edge is two-way
g.add_edge(1, 2, 1 * 2)
g.add_edge(2, 1, 1 * 2)

g.add_edge(1, 5, 1 * 5)
g.add_edge(5, 1, 1 * 5)

g.add_edge(2, 5, 2 * 5)
g.add_edge(5, 2, 2 * 5)

g.add_edge(2, 3, 2 * 3)
g.add_edge(3, 2, 2 * 3)

g.add_edge(3, 4, 3 * 4)
g.add_edge(4, 3, 3 * 4)

g.add_edge(4, 5, 4 * 5)
g.add_edge(5, 4, 4 * 5)

g.add_edge(4, 6, 4 * 6)
g.add_edge(6, 4, 4 * 6)
print(g)

(1, 5)
(1, 2)
(2, 5)
(2, 3)
(2, 1)
(3, 2)
(3, 4)
(4, 3)
(4, 5)
(4, 6)
(5, 1)
(5, 2)
(5, 4)
(6, 4)



## Graph Traversal Algorithms
Similar to trees, we need traversal algorithms that start a vertex and will visit every other vertex in a graph. The order at which we visit the vertices depends on the algorithm. The most common graph traversal algorithms include:
* Breadth first search (BFS)
* Depth first search (DFS)

### Breadth First Search
Big picture idea: explore the closest neighborhoods first

BFS starts at a vertex and visits each vertex that is distance 1 away, then each vertex that is distance 2 away, until all connected vertices have been visited. No vertex will be visited twice.

Some vertices have multiple adjacent vertices. For these vertices a decision will need to be made about the order in which to visit the neighbor vertices. Consequently, a BFS traversal is not unique.

The BFS algorithm makes use of a queue to keep track of the vertices to visit, which we will call the `frontier_queue` (these are vertices that have been discovered but not processed). The algorithm also tracks the discovered (possibly not yet processed) vertices in a set (a list without duplicates). We will call the set `discovered_set`.
1. Enqueue the starting vertex to `frontier_queue`
1. Add the starting vertex to `discovered_set`
1. While `frontier_queue` is not empty
    1. Dequeue `frontier_queue`. Call this vertex `V`
    1. Process `V`
    1. For each adjacent vertex `AV` of `V`
        1. If `AV` not in `discovered_set`:
            1. Enqueue `AV` to `frontier_queue`
            1. Add `AV` to `discovered_set`
            
BFS' time complexity is $\mathcal{O}(\# of vertices + \# of edges)$ because the outer `while` loop processes each vertex in the graph that is connected to the starting vertex and the inner `for` loop processes the edges. 

The worst case scenario that the graph is a complete linear structure, in which case we would be walking through each vertex in the graph.

In [3]:
from collections import deque
# a double ended queue with support for
# append
# append left
# pop
# pop left

def bfs(g, start):
    '''
    enqueue: append left
    dequeue: pop right
    '''
    frontier_queue = deque()
    frontier_queue.appendleft(start)
    discovered_set = set([start])
    
    while len(frontier_queue) > 0:
        curr_v = frontier_queue.pop()
        print(curr_v)
        for adj_v in curr_v.get_connections():
            if adj_v not in discovered_set:
                frontier_queue.appendleft(adj_v)
                discovered_set.add(adj_v)
bfs(g, g.get_vertex(1))

1 connected to: [5, 2]
5 connected to: [1, 2, 4]
2 connected to: [5, 3, 1]
4 connected to: [3, 5, 6]
3 connected to: [2, 4]
6 connected to: [4]


### Depth First Search 
Big picture idea: explore paths as deeply as they go

DFS starts at a vertex and visits each vertex along each path before backtracking on the path. No vertex will be visited twice.

Some vertices have multiple adjacent vertices. For these vertices a decision will need to be made about the order in which to visit the neighbor vertices. Consequently, a DFS traversal is not unique.

The DFS algorithm makes use of a stack to keep track of the vertices to visit, which we will call the `frontier_stack` (these are vertices that have been discovered but not processed). The algorithm also tracks the discovered (possibly not yet processed) vertices in a set (a list without duplicates). We will call the set `discovered_set`.
1. Push the starting vertex to `frontier_stack`
1. While `frontier_stack` is not empty
    1. Pop `frontier_stack`. Call this vertex `V`
    1. If `V` not in `discovered_set`:
        1. Process `V`
        1. Add `V` to `discovered_set`
        1. For each adjacent vertex `AV` of `V`
            1. Push `AV` to `frontier_stack`
            
DFS' time complexity is $\mathcal{O}(\# of vertices + \# of edges)$ because the outer `while` loop processes each vertex in the graph that is connected to the starting vertex and the inner `for` loop processes the edges. 

The worst case scenario that the graph is a complete linear structure, in which case we would be walking through each vertex in the graph.

In [7]:
from collections import deque
# a double ended queue with support for
# append
# append left
# pop
# pop left

def dfs(g, start):
    '''
    push: append right
    pop: pop right
    '''
    frontier_stack = deque()
    frontier_stack.append(start)
    discovered_set = set()
    
    while len(frontier_stack) > 0:
        curr_v = frontier_stack.pop()
        if curr_v not in discovered_set:
            print(curr_v)
            discovered_set.add(curr_v)
            for adj_v in curr_v.get_connections():
                frontier_stack.append(adj_v)
                
dfs(g, g.get_vertex(1))

1 connected to: [5, 2]
2 connected to: [5, 3, 1]
3 connected to: [2, 4]
4 connected to: [3, 5, 6]
6 connected to: [4]
5 connected to: [1, 2, 4]


## Practice Problems

### 1
<img src="https://upload.wikimedia.org/wikipedia/commons/5/5f/CPT-Graphs-undirected-weighted.svg" width="300">
(image from [https://upload.wikimedia.org/wikipedia/commons/5/5f/CPT-Graphs-undirected-weighted.svg](https://upload.wikimedia.org/wikipedia/commons/5/5f/CPT-Graphs-undirected-weighted.svg)) 

Represent the above graph using an adjacency list.

### 2
Represent the above graph using an adjacency matrix.

### 3
For the above graph:
1. What is the size of the set $V$ (that is, what is $|V|$)?
1. What is the size of the set $E$ (that is, what is $|E|$)?
1. Show the breadth first search.
1. Show the depth first search.