In [1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.sparse.csgraph import *
from scipy.sparse import csr_array

# Introduction: Graphs

A [graph](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)) $G = (V, E)$ is a mathematical construct consisting of both
- A set $V$ of vertices (read: nodes), and 
- A set $E$ of edges (read: connections) consisting of pairs of vertices from $V$.

In fact, we have already seen a type of graph in the form of [trees](https://en.wikipedia.org/wiki/Tree_(graph_theory)).  
Trees happen to be undirected acyclic graphs (terms defined below).  
A representative problem in graph theory is the [seven bridges of Konigsberg](https://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg).  
> **Given:** A map of Konigsberg from Wikipedia.  
**Problem:** Is there a path that crosses all bridges exactly once?  
<img src="img/09-10_kberg.png" style="width: 30em" />  
> **Answer:** No (from Euler).   
**Reason:** Ignore all the objects (the buildings, the water, the land, etc.) and details (the shapes and distances) in the image.  
The only relevant information is the fact that there are 4 islands with a number of connections between them.  
This reduces the map to the drawing on the right, where islands are abstracted as vertices and bridges are edges between them.  
<img src="img/09-10_7bk.png" style="width: 50em" />  
Suppose there *were* a hypothetical path to cross each bridge exactly once.  
Then aside from the start and end, every vertex has to have an even number of edges (one for entry and one for exit).  
In other words, at most 2 vertices can have an odd number of edges to it.  
Since all 4 vertices have an odd number of edges, such a path is not possible on this configuration.  

Fundamentally, graphs are reductive/invariant representations of *networks,* and the study of graphs is integral to network science.   
Here are some networks you may be familiar with, and where graph theory has been applied: 
- Social networks
- Telecom
- Supply chains
- Natural language/linguistics

### Flavors of Graphs

Given graph $G = (V, E)$, some possible descriptors of it are: 
- **Directed/Undirected:**  
In *undirected* graphs, if edge $(x, y) \in E$, then the edge in the opposite direction $(y, x) \in E$ as well.  
*Directed* graphs have no such restriction.  
- **Weighted/Unweighted:**  
In *weighted* graphs, a number (read: weight) can be assigned to edges between vertices.  
Such a number can refer to any number of properties, including distance and association strength.  
We will explore algorithms on weighted graphs in the next session.  
For now, we will consider *unweighted* graphs, where we only know the connectivity of the vertices.  
- **Simple/Non-simple:**  
*Self-edges* are edges which start and end at the same vertex.  
*Multi-edges* are edge which occur more than once in a graph.  
*Simple* graphs contain neither self-edges nor multi-edges.  
- **Dense/Sparse:**  
A graph is either *dense* or *sparse* depending on the number of edges relative to the number of vertices.  
There is no formal definition that delineates the two, but roughly, one expects a sparse graph of $N$ vertices to have $\propto N$ edges.  
Dense graphs would have somewhere around $N^2$ or more edges.  
- **Cyclic/Acyclic:**  
A graph is *cyclic* if it contains a cycle, and *acyclic* otherwise.  
A *cycle* is a path (read: directed sequence of edges) that starts and ends at same vertex.  
An important graph for scheduling applications is the *directed acyclic graphs (DAGs)*.
- **Embedded/Topological:**  
*Embedded* graphs are those whose vertices and edges have geometric positions assigned to them.  
Topological graphs are independent of geometry (see: Konigsberg graph above).  


# Representations of Graphs

Euler's problem with his Konigsberg solution was that he had no formal theory behind it (to be fair, he was inventing it).  
It's one thing to draw pictures, but it's mostly useless if we can't express it with math and numbers.  
From our perspective, we would at least have trouble getting it into the computer for our algorithms.  
Fortunately, the mathematics and computer science has been worked out, and much of it should be familiar to you from linear algebra and hashing.  
There are 2 general representations of graphs: **adjacency lists** and **adjacency matrices**.  

Before proceeding, I will make one clarification.  
The book claims that between lists and matrices, the former are more useful for most graph problems.  
This is true in general because most graph porblems you would try to handle are sparse, and lists are space efficient.  
This *would* be true for us if we could directly manage memory and make generic linked lists easily.  
As it stands, your coursework and Python itself are better suited to using arrays and matrices, so that will be our focus.  

## Adjacency Lists
Suppose a graph $G = (V, E)$ contains $N$ vertices and $M$ edges.  
We can represent it with an $N$-length array `L` of pointers to linked lists, where 
1. Each index $i$ corresponds to a vertex. 
1. Element $L_i$ points to/contains a linked list of neighbors of vertex $i$. 

You may recognize the similarity to chaining from hashing and hash-collisions.  
It carries many of the advantages (low space requirments, conceptually simple, fast linked list operations).  
One disadvantage for graph analysis is that we may occasionally want to do linear algebra on the graphs, and this structure does not easily support this. 

## Adjacency Matrices 
Suppose a graph $G = (V, E)$ contains $N$ vertices and $M$ edges.  
We can represent it with an $N\times N$ adjacency matrix `M` , where 
1. Each row/column i corresponds to a vertex
1. Element $M_{ij}$ is either $1$ if $(i, j)$ is an edge of $G$, else $0$. 

Mathematically, this representation is far more convenient/amenable to theory-work via [graph polynomials](https://en.wikipedia.org/wiki/Graph_polynomial).  
Among other things, $M^k$ tells you the number of $k$-length paths between any two vertices.  
Many of the properties of the graph also translate to the matrix, e.g. 
- Undirected $\rightarrow$ symmetric. 
- Simple $\rightarrow$ boolean and $\forall i \quad M_{ii} = 0$ 
- Cyclic $\rightarrow \exists k>1$ such that $Tr(M^k) \neq 0$ 
- Sparse $\rightarrow$ sparse.  
- DAG $\rightarrow $ nilpotent. 

Unfortunately a direct implementation of the matrix is also incredibly space-inefficient for sparse graphs since the matrix size grows as $N^2$.  

## COO & CSR Format
If we know the graph/matrix is sparse, then we can reduce the size it takes up in memory.  
Given an $M \times N$ matrix, a *coordinate list (COO)* format stores all $X$ non-zero elements as 3 arrays:
- `V`: Non-zero values (array length $X$) 
- `C`: Column index of values (array length $X$, values in $[0, N-1]$). 
- `R`: Row index of values (array length $X$, values in $[0, M-1]$). 

Since $X \approx N$ in sparse graphs, this representation grows as $N$.  
However, this representation is also slow at arithmetic/linear algebra (think about why).  

An alternative is the *compressed sparse row (CSR)* format. 
Given an $M \times N$ matrix, CSR stores all $X$ non-zero elements as 3 arrays:
- `V`: Non-zero values (array length X) 
- `C`: Column index of values (array length X, values in $[0, N-1]$). 
- `R`: Starting and ending indices of V and C per row (array length $M+1$, values in $[0, X]$)

This still grows as $N$ for sparse matrices, but now it supports fast arithmetic/linear algebra (again, think about why, especially compared to COO). 

**Note**: There is also a column-equivalent version *(CSC)* which is almost entirely analogus to CSR.  
## Python Graph Representations

Graph problems are as old as the hills.  
So even if Python is not great at making linked structures out of the box, many have already made libraries specifically for graphs.  
Some notable libraries and their implementations:
- **Scipy:** sparse matrices
- **NetworkX:** linked lists
- **Pytorch Geometric:** (effectively) dictionaries with tensors

In your own time, go through these libraries and make sure you understand why they are implemented the way they are.  
For this course I will go with [scipy](https://docs.scipy.org/doc/scipy/reference/sparse.csgraph.html) because: 
1. The documentation and source code are very approachable (and I will insist, **READ THE DOCUMENTATION**). 
1. The implementation already supports a lot of the algorithms we want to do. 
1. I want to make sure you have it installed.

In the demo code below, I show the CSR representation of a matrix in scipy using both `csr_array` and `csgraph`.  
Either would work for representing a graph, though one distinction is in how they interpret multiplication.  
For both `numpy` arrays and `csr_array`, `*` performs element-wise multiplication while `@` performs matrix multplication (dot products of rows and columns).  
For `csgraphs`, both `*` and `@` are intepreted as matrix multiplication (presumably for use in graph polynomials).  
**If you want to avoid issues related to this**, be specific and use `@` wherever you intend matrix multiplication.  

In [2]:
# Demo code

G_dense = np.array([[0, 2, 1],
                    [2, 0, 0],
                    [1, 0, 0]])

print('Matrix (dense):')
print(G_dense, '\n')

G_sparse = csr_array(G_dense)
print('csr_array: ')
print(G_sparse.data)
print(G_sparse.indices)
print(G_sparse.indptr, '\n')

G2_sparse = csgraph_from_dense(G_dense, null_value=0)
print('csgraph:')
print(G2_sparse.data)
print(G2_sparse.indices)
print(G2_sparse.indptr)

Matrix (dense):
[[0 2 1]
 [2 0 0]
 [1 0 0]] 

csr_array: 
[2 1 2 1]
[1 2 0 0]
[0 2 3 4] 

csgraph:
[2. 1. 2. 1.]
[1 2 0 0]
[0 2 3 4]


In [3]:
# Demo of sparse products under "*"
# The first two will be element-wise products, while the last is matrix multiplication
print('"*" products \n')
print('Matrix (dense):')
print(G_dense*G_dense)
print('csr_array: ')
print(G_sparse*G_sparse)
print('csgraph:')
print((G2_sparse*G2_sparse))

"*" products 

Matrix (dense):
[[0 4 1]
 [4 0 0]
 [1 0 0]]
csr_array: 
  (0, 1)	4
  (0, 2)	1
  (1, 0)	4
  (2, 0)	1
csgraph:
  (0, 0)	5.0
  (1, 2)	2.0
  (1, 1)	4.0
  (2, 2)	1.0
  (2, 1)	2.0


In [4]:
# Demo of sparse products under "@"
# All of them will be matrix multiplication
print('"@" products \n')
print('Matrix (dense): ')
print(G_dense@G_dense)
print('csr_array: ')
print(G_sparse@G_sparse)
print('csgraph: ')
print(G2_sparse@G2_sparse)

"@" products 

Matrix (dense): 
[[5 0 0]
 [0 4 2]
 [0 2 1]]
csr_array: 
  (0, 0)	5
  (1, 2)	2
  (1, 1)	4
  (2, 2)	1
  (2, 1)	2
csgraph: 
  (0, 0)	5.0
  (1, 2)	2.0
  (1, 1)	4.0
  (2, 2)	1.0
  (2, 1)	2.0


# Graph Traversal 

One of the most fundamental graph problems is to traverse every edge and vertex.  
This can be interpreted most naturally as a maze-solving problem, where:
1. Each vertex denotes a junction or fork in the path.  
1. Each edge denotes a hallway of the maze.
   
<img src="img/09-10_maze.png" style="width: 30em" />   
For efficiency, we ideally visit each edge at most twice (i.e. backtrack at most once).  

For correctness, the traversal must be systematic; we should *know* all the paths to explore.

The key idea behind graph traversal is to mark each vertex based on its exploration state.  
This can be done most naturally with either Boolean flags or a number to denote the state.  
Each vertex will exist in one of three states:
1. undiscovered – initial state. 
1. discovered – the vertex has been found, but not all incident edges visited.  
1. processed – the vertex has been found, and all incident edges have been visited.  

Traversal will be correct if all vertices are in the final (processed) state.  

There are 2 general approaches: breadth-first and depth-first search
1. *Breadth-first search (BFS)* - explore in order of distance to the root node.  
1. *Depth-first search (DFS)* - explore all of an undiscovered subgraph before moving on.  

# Breadth-First Search (BFS)

In BFS, vertices are discovered in order of increasing distance from a source vertex.  
This is typically implemented with a processing queue (i.e. FIFO order).  
Below I write the abbreviated pseudocode for a BFS search on a graph `G`.  

## BFS Pseudocode

>**Input:** graph `G`, source vertex `s`.
>1. Declare all other vertices to "undiscovered."
>1. Set state of `s` to "discovered," no parent.
>1. Put `s` into processing queue `Q`.
>1. Until `Q` is empty, explore frontier:
>    1. Dequeue node `u`.
>    1. Process edges of `u`.
>    1. Discover new child nodes `v`, parent `u`.
>    1. Add any new nodes to `Q`.
>1. After processing `u`, set to "processed."

BFS starts out exploring the source vertex (root).  
From the source vertex, new adjacent vertices (children) are discovered and added to a processing queue.  
Once the source is fully explored, the source vertex is marked "processed."  
We are left with vertices in the processing queue that are "discovered" but not "processed."  
I call these queued vertices the "frontier," since they have (roughly) equal distance to the root and the search grows incrementally from it.  
We explore the frontier vertices in the same way as the source, turning them into "processed" vertices and adding their children to the queue as well.  
This processing continues until the queue is emptied (no more undiscovered children).  

In this way, every vertex is discovered by at most one other vertex.  
This will define a tree (actually a DAG) on vertices of the graph.  
One can show that this will generate the shortest path from the root to every other node in the tree.  
This is its main benefit over something like DFS.  

## Correctness
How do we know that BFS searches the entire graph?  
For that matter, how do we know it gives the shortest paths from the source?  

To establish this, turn to theorem 20.5 in Cormen, which I abbreviate below the terms.  
Note that for a formal proof, you would have to go prove the lemmas the book does (I will just take them for granted here).  
> **Terms:** 
>- $a.d$: The distance (read: number of edges) actually traversed from the source vertex $s$ to desired vertex $a$.  
>Note that in BFS, the discoverer of a vertex has some distance $d_o$ while its newly discovered nodes have distance $d_o+1$.  
>Due to the FIFO order of the queue, this also means the $d$ value of the currently processing vertex increases monotonically. 
>- $a.\pi$: The predecessor (read: discoverer) of vertex $a$ in BFS. 
>- $\delta(s, a)$: The smallest possible distance from source $s$ to desired vertex $a$.  
>For a vertex to be *reachable*, this has to be a finite non-negative integer.   

> **Theorem:**  
Let $G = (V, E)$ be a graph, and suppose that BFS is run on $G$ from a given source vertex $s$.  
Then, during its execution, BFS discovers every vertex $v \in V$ that is reachable from the source $s$.  
Moreover, the distance of $v$ to the source, $v.d$ is the shortest possible distance denoted $\delta(s, v)$.  
Furthermore, for any vertex $v \neq s$ reachable from $s$, one of the shortest paths from $s$ to $v$ is a shortest path from $s$ to $v.\pi$: followed by the edge $(v.\pi, v)$.  
> **Proof by Contradiction:**  
Suppose there were an offending vertex $v$ s.t. the distance $v.d$ to the root were not minimal.  
$$\exists v|v.d > \delta(s, v)$$
Denoting its parent on the *actual* shortest path as $u$, it follows that 
$$\delta(s, v) = \delta(s,u) + 1 \quad \text{AND} \quad u.d = \delta(s, u)$$
Putting them together
$$v.d > \delta(s, v) = \delta(s,u) + 1 = u.d+1$$
Then at the time $u$ is dequeued (read: fully processed), the offending vertex $v$ is potentially either undiscovered, discovered, or processed.  
Each case leads to a contradiction.  
**Case 1: $v$ undiscovered**  
If $v$ is undiscovered, BFS has to add it to the queue before dequeing $u$ meaning $v.d = u.d + 1$.  
But we declared $u$ is on the actual shortest path, in which case $u.d = \delta(s,u) = \delta(s, v) - 1 \rightarrow v.d = \delta(s, v)$.  
This contradicts our initial assumption.  
**Case 2: $v$ processed**  
If $v$ is processed, it must have already gone through the queue.  
But BFS searches in order of depth to the root, so this would mean $v.d \leq u.d$ and $u$ is not really its parent on the shortest path.  
This contradicts our definition of $u$ as the parent on the actual shortest path.  
**Case 3: $v$ discovered**  
If $v$ is discovered, but not processed, it must have been discovered by some other (processed) parent $w$ earlier than $u$.  
This would require $w.d \leq u.d$ and $v.d = w.d + 1$.  
But then $v.d = w.d + 1 \leq u.d + 1$.  
This again contradicts our definition of $u$ as the parent on the actual shortest path.  

> Thus we conclude that for BFS, $v.d = \delta(s, v) \forall v \in V$.  
All $v$ reachable from $s$ must (at some point) be discovered because the distance is upper-bounded by a finite number.   
Finally, if $v.\pi = u$, then $v.d = u.d+ 1$ and a shortest path from $s \rightarrow v$ is the path $s \rightarrow v.\pi$, followed by a single edge $(v.\pi, v)$.  
**Q.E.D**

## Applications of BFS

Skiena goes over 2 applications of BFS, which I will briefly summarize and possibly make assessments of later.  
1. Connectedness
1. Vertex Coloring

> **Checking Connectedness:** Given a graph, are all vertices reachable from any source?

We say that a graph is *connected* if there is a path between any two vertices.   

It turns out a lot of problems reduce to finding or counting connected components.  
Skiena gives the example of legal Rubik’s cube configurations (solved and scrambled).  
Actually a lot of [mechanical puzzles](https://en.wikipedia.org/wiki/God%27s_algorithm#Examples) can be thought of as graphs, and solving them is a form of search.  
Really, anything endowed with a space of configurations could be considered a graph, with configuration/state being the vertex and legal moves being edges.  

As another example, chess is solved with 7 or fewer pieces through what are known as [tablebases](https://en.wikipedia.org/wiki/Endgame_tablebase).  
Such tablesbases are made through retrograde analysis: work backwards from the final position (checkmate or draw) and explore all legal positions.  
With best play, every legal position is known to be either drawn or won within a minimum number of moves by connection in the configuration space.  
In effect, this is BFS in reverse.  

**Note:** While it is true you could do a BFS for things like Rubik's cube solving, I will clarify this is largely impractical beyond some solution depth.  
[Korf (2008)](https://dl.acm.org/doi/10.1145/1455248.1455250) for instance taps out at about depth 10 solutions due to memory-constraints.  
In case you were curious, [Rokicki et. al (2013)](https://tomas.rokicki.com/rubik20.pdf) showed that Rubik's cubes are always solvable within at most 20 moves (the so-called [God's number](https://www.cube20.org/)).  
They solved it with a lot of mathematical reductions with groups and symmetries, then throwing 35 years of CPU processing at it.   

> **Vertex Coloring Problem:** Given a graph, assign colors to each vertex s.t. no connected vertices have the same color. 

A coloring scheme which satisfies this property is called a *proper coloring* of the graph.  
The smallest possible number of colorings, $\chi(G)$, is the *chromatic number*.  
A graph is *bipartite* if it can be properly colored using only two colors.  

Such colorings show up in a lot of problems involving scheduling or network coverage.  
[This publication](https://nyaspubs.onlinelibrary.wiley.com/doi/abs/10.1111/j.1749-6632.1979.tb32824.x) was one of the earliest ones I could find that explicitly treats them as such (should also be available in the course page).  
More easily digestible explanations of the telecom application are shown [here](https://www.cs.kent.edu/~dragan/ST-Spring2016/Allocating%20radio%20frequencies%20using%20graph%20coloring.pdf) and [here](https://www.cs.kent.edu/~dragan/ST-Spring2016/Frequency%20Assignment%20in%20cellular%20networks%20(1).pdf).

BFS (if you tweak it a bit) also happens to be quite good at checking if a graph is bipartite because of the frontier.  
Simply alternate the color of frontier vertices with their child vertices, and check for any conflicts in non-discovering edges.  

# Depth-First Search (DFS)

DFS can be identified with backtracking in a maze. 
1. Pick an unexplored direction/node. 
1. Advance as far as possible in the same fashion. 
1. If stuck, back up to the last unexplored direction.  

Both are most easily understood/implemented as recursive algorithms, i.e. with a stack.  
Below I write the abbreviated pseudocode for a DFS search on a graph `G`.

## DFS Pseudocode

>**Input:** graph `G`, vertex `u`. 
>1. Set state of `u` to "discovered." 
>1. Process edges of `u`. 
>1. For all children `v` of `u`:
>    1. Set parent of `v` to `u`.
>    1. If undiscovered, recursive call on `v`. 
>1. Set state of `u` to "processed."  

Unlike BFS, where we had an explicit queue object `Q`, here the stack management is done with whatever call stack the system you're using has.  
In Skiena, he also times the execution of the function as a way to measure the number of descendents.  
I omitted because you can do that yourself, but it is somewhat important to proving some theorems so bear them in mind.  

## Edge Classification
Skiena also mentions (oddly late into the chapter) that while DFS is simple, the actual details of implementation are subtle.  
This is mostly to do with the timings of processing edges and vertices.  
Consider an edge from the current vertex `x` to another vertex `y`.  
It is straightforward if `y` is undiscovered vertex; this must be the first time we're seeing it.  
But what happens if `y` is an already discovered: have we already run through it?  
Worse yet, is this an ancestor of the current vertex, and we're making more redundant work for ourselves? 

Apparently one can work out the different cases by "careful reflection."  
Perhaps it is more cautionary than instructional.  
So, yes, think very carefully about how to actually implement a DFS-based algorithm in code.  

But from the theory perspective, you can at least know that DFS is correct with theorems 20.9 and 20.10 in Cormen.  
These theorems establish that in a DFS search, it creates a search tree and there is enough information to classify the edges into 4 categories.  
1. **Tree edges:** $(u, v)$ is a tree edge if $v$ was first discovered by exploring this edge. 
2. **Back edges:** $(u, v)$ is a back edge if it connects descendent vertex $u$ to ancestor $v$.
3. **Forward edges:** nontree edges connecting a vertex $u$ to a proper descendant $v$ in a tree.
4. **Cross edges:** all other edges.  
They can go between vertices in the same tree, as long as one vertex is not an ancestor of the other, or they can go between vertices in different trees.

The end result is that properly implemented DFS should only ever generate the first two types of edge, in which case, the resolutions should be manageable.  

Now, I am *not* going to go over the proofs as I did for BFS.  
This time I will ask you to go over Cormen and do the same type of reading/understanding of the proof.  
Write up a similar summary for exercise.  

## Applications of DFS

Skiena goes over 2 applications of DFS, which I will briefly summarize and possibly make assessments of later.  
1. Cycle Detection
1. Articulation Vertices

> **Cycle detection:** Given a graph, does it contain any cycles?  

Cycles are somewhat important to detect, usually for warning systems.  
In most graph applications specifically, you want to avoid cycles because they are associated with non-terminating states.  
For something like version control (i.e. github), which stores the branches and forks as a DAG, a cycle would be pretty bad for tracking history.  

The benefit of DFS is that it organizes the edges of the graph in a precise way.  
Vertices are preferentially discovered by depth, and edges are strictly tree or back edges.  
If there is at all a back edge, there is clearly a cycle.  

> **Articulation Vertices:** Given a graph, does it contain articulation vertices? 

Suppose you had a network (generators, computers, supply chains, etc.).  
Suppose a catastrophe knocks out one of the nodes. How affected is the service?  
Strongly connected graphs will be resilient.  
Linear graphs may be completely bisected.  

An *articulation vertex* or *cut node* is a vertex whose removal will disconnect the graph.  
Identifying the existence of such vertices is clearly important for people who care about the stability of their network.  
We can find them by brute force: for all nodes, delete one and BFS/DFS the rest.  
But we can do better (linear time) with a single DFS. 

$v$ is an articulation vertex/cut node if, and only if either
- $v$ is a root with more than one child, OR
- $v$ is not a root or leaf, and subtrees of $v$ do not have a back edge to ancestors of $v$. 

Graphs without articulation vertices are *biconnected.* 

## Applications to Directed Graphs

Skiena goes over 2 applications of DFS, which I will very shortly summarize, and mostly leave you to read because they seem straightforward.  
1. Topological Sorting on DAGs
1. Strongly Connected Components

For the first, a directed acyclic graph (DAG) has no directed cycles, and a topological sorting is an orderings on vertices s.t. all edges go left-to-right.  
DFS produces only tree edges on DAGs, so by assigning a number in the order a vertex is marked "processed," you naturally produce a topological sorting.  
Note though that DFS is not intrinsically a deterministic sort because of the branching of the tree, so there may be many equivalent sorts.  

For the second, strongly connected components are sections of the graph for which there is a path going to and from any pair of vertices.  
This is applicable to things like road planning: one hopes that a road network allows you to get around a city without having to go off-road/violate traffic laws.  
To test for strongly connected components, simply use DFS on the directed graph and the graph with edges reversed.  
For any given vertex $v$, these will give the paths from and to it respectively.