# Spectral Graph Theory


## Motivation


Basics: Check out the GraphsAndNetworkX notebook.


The Cynosure of spectral graph theory is linear algebra: Building out 
an adjacency matrix and correlating graph properties with a spectrum of 
associated eigenvalues and eigenvectors.




## Spectral graph theory

A slider-based experimentation space would be nice. For now I'll continue with static content.
A graph $G$ is a pair of related sets $V$ and $E$ where$|V|=n$. The elements of $E$ are 
un-ordered pairs of elements of $V$ with no repeats. A particular vertex is often called
$u$ or $v$ and each vertex has degree $d_u$. We disallow loops and multiple edges at the
start, noting that multiple edges can be interpreted as edge weights in future developments. 
A graph may have multiple connected components. We'll generally take $n \ge 2$ as assumed.


There is a lot of additional preliminary terminology to set up.

* $1$
* $vol\;G\;=\;\sum_{v}{d_v}$



- A graph can be represented as either a Laplacian or a normalized Laplacian $n\;\times\;n$ matrix
- For an un-directed graph this will be real symmetric and will have real non-negative eigenvalues
- These $n$ eigenvalues are sorted in increasing order $\{\lambda_0,\;\lambda_1,\;\lambda_2\;\dots\;\lambda_{n-1}\}$
    - ...using Chung's subscript notation beginning at 0
- The number of 0 eigenvalues corresponds to the number of connected components.


Define a cut as a (minimal) collection of cut edges to break the graph into disconnected sub-graphs.
Cuts indicate relatively low connectivity.

Isoperimetric ratio $\theta(S)$ I refer to as IPR. 

Important in getting to Cheeger's inequality: Rayleigh quotients of eigenvalues orthogonal to
the $1$ vector corresponding to $\lambda_0=0$. The RQ is used in an upper bound on IPR. The 
second eigenvalue $\lambda_1$ is used in like fashion to give an upper bound of the IPR of G. 
That is we want to find vertex subsets S with low isoperimetry.

Something like $\theta(S_t)\le \sqrt{2d\lambda_1}$.

It would be helpful to work some examples. 

Cheeger's inequality can be generalized for k-clustering. 

### Spectral graph theory talk part 2

Max flow problem: Push as much flow as possible from $s$ to $t$ without exceeding the flow capacity of the edges. 

That is: Given a directed graph with flows indicated as directions and edge capacities as positive numbers. 
There are two key vertices. Ford-Fulkerson (FF) algorithm cited. $O(mF)$ time. Blocking Flow speeds things up; to
$O(m^{3/2})$ time. A sequence of augmentations. 

Exploiting the relation between distance and cut size accelerates Blocking Flow as an algorithm in comparison 
with FF. 

#### Graph partitioning and region growing

For any value R > 0 can partition a graph into n clusters with some properties... 

M is a 'net flow' matrix. 

Use electrical flows; Laplacian linear system; fast to solve. 

Refers to $l_1, \; l_2, \; \dots \;,l_\infty$.


### Why we want to win the karate trophy


Aric Hagberg (speaker) with nods en passant to Dan Schult, Jarrod Millman


* Open data at Los Alamos National Lab? [you bet!](csr.lanl.gov/data)
  * includes ML on graphs and synthetic bad guys
  * Unified Host and Network Data Set: no doi.
  * Comprehensive Multi-Source Cyber-Security Events doi:10.17021/1179829
  * User-Computer authentication associateions in time doi:10.11578/1160076

### More talks

- Human brain longitudinal progression studies: Implies a kind of graph superposition.
- Graph clustering
- Second eigenvalue to select out
- LocalGraphClustering search on GitHub... 100e6-edge graphs on a laptop


#### Sequence Assembly Graphs and their construction

* PhD student Titus Brown lab UCD
* assembly graphs
* de Bruijn graphs have implicitly defined edges: 
  - I can find its neighbors by querying the set of vertices... etcetera
  - the sequence length parameter $k$ is the critical optimization
  - these graphs are easy to implement but tend to memory intensive

## References

- [Grant Sanderson's YouTube series on linear algebra](https://youtu.be/fNk_zzaMoSs?si=AaLOP0mbjBw4NsX5)
- [Spectral Graph Theory (Fan Chung, 1997)](https://books.google.com/books?hl=en&lr=&id=4IK8DgAAQBAJ&oi=fnd&pg=PP1&dq=spectral+graph+theory&ots=Es2N2jsQui&sig=Z3SPsvf1yXFpUb4DQsrGRw_7zFI#v=onepage&q=spectral%20graph%20theory&f=false)
- [NetworkX](http://networkx.org) ([wikipedia](https://en.wikipedia.org/wiki/NetworkX))
- [Spielman course notes](http://www.cs.yale.edu/homes/spielman/561/)
- [Spielman lecture deck (88 slides)](http://www.cs.yale.edu/homes/spielman/TALKS/haifa1.pdf)
- [Unreasonable effectiveness of SGT (toroidal heat lecture)](https://youtu.be/8XJes6XFjxM)
- Ross Honsberger's graph theory topics in the ***Mathematical Gems*** books
- Aigner and Ziegler's ***Proofs from the Book***
- Selected remarks by one Martin Gardner
- ***The Man Who Loved Only Numbers***, a biography of Hungarian mathematician Paul Erd&#X00F6;s

In [21]:
import sys, time                              # can be used to halt programs that run long
import numpy as np                            # linear algebra
import matplotlib
import matplotlib.pyplot as plt
import networkx as nx
import random as r                            # method random(): uniform distribution on (0, 1)
from numpy import zeros
import warnings; warnings.simplefilter('ignore')

In [22]:
def eigensketch(G, threshold = 9):
    import numpy.linalg

    n = len(G.nodes())
    L = nx.normalized_laplacian_matrix(G)
    e, v = numpy.linalg.eig(L.A)
    e.sort()

    e = [e[i] if e[i] > 1e-14 else 0 for i in range(len(e))]
    msg = "Eigenvalues: "
    nTerms = min(n, threshold)
    msg += '%.2f' % e[0]
    for i in range(1, nTerms): msg += ', ' + '%.2f' % e[i]
    if   n == threshold + 1: msg += ', ' + '%.2f' % e[-1]
    elif n == threshold + 2: msg += ', ' + '%.2f' % e[-2] + ', ' + '%.2f' % e[-1]
    elif n  > threshold + 2: msg += ', ..., ' + '%.2f' % e[-2] + ', ' + '%.2f' % e[-1]

    print(msg)
    
def hamiltonian(G):
    F = [(G,[list(G.nodes())[0]])]   # A one-element list -> a 2-ple [(G, label of node 0 of G)]
    n = G.number_of_nodes()
    while F:
        graph,path = F.pop()
        confs = []
        neighbors = (node for node in graph.neighbors(path[-1]) if node != path[-1]) #exclude self loops
        for neighbor in neighbors:
            conf_p = path[:]
            conf_p.append(neighbor)
            conf_g = nx.Graph(graph)
            conf_g.remove_node(path[-1])
            confs.append((conf_g,conf_p))
        for g,p in confs:
            if len(p) == n: return p
            else:           F.append((g,p))
    return None

In [23]:
eigensketch(G, 18)

Eigenvalues: 0.00, 0.70, 0.79, 0.95, 0.99, 1.04, 1.07, 1.08, 1.10, 1.15, 1.17, 1.20, 1.22, 1.25, 1.29


In [24]:
nx.laplacian_spectrum(G)/10

array([4.44089210e-17, 5.33644381e-01, 6.96300240e-01, 7.98538730e-01,
       8.76271722e-01, 9.89921104e-01, 1.03139612e+00, 1.10283598e+00,
       1.14847455e+00, 1.26044796e+00, 1.30195918e+00, 1.34333413e+00,
       1.40000000e+00, 1.41687591e+00, 1.50000000e+00])

In [25]:
G = nx.random_geometric_graph(15, 0.60)
F = hamiltonian(G)
print(F)

[0, 14, 5, 13, 12, 11, 10, 9, 8, 7, 6, 4, 3, 2, 1]


```
F = hamiltonian(G)    # run with 20 vertices and p = 0.4; single component graph
F                     # will give an H path (not cycle)

[0, 3, 4, 16, 19, 17, 14, 13, 11, 12, 18, 9, 8, 5, 7, 1, 10, 6, 15, 2]
```

Let's decipher the brute force code above. $n$ is the number of nodes in $G$. Here is a paraphrase of a repl: 

```
F = [(G,[list(G.nodes())[0]])]
print(type(F[0]), len(F[0]), type(F[0][0]), type(F[0][1]), len(F[0][1]), type(F[0][1][0]), F[0][1][0])

'tuple', 2, graph G, 'list', 1, 'int', 0
```

From this we see that F is a list of one element; which is a 2-ple: The passed graph G and (second) a list of length 1, with list[0] = 0. 
This zero is the index of the first element of G's node list, i.e. the start of a path. The second element of the tuple (i.e. the list) 
will prove to be a path of vertices. 


The `while` loop runs until `F` is empty.


`graph, path = F.pop()` breaks out the `F` tuple into `graph` and `path`; leaving `F` an empty list after the pop(). Remember `path` will grow.


`confs` is an empty list that grows as we go. It accumulates (Graph, path-list) tuples.


`neighbors` is a list of nodes that are neighbors to the last node on the working `path`... excluding that node itself. This implies that if 
G has self-loops then `.neighbors(node)` would include 'myself'.  

Incidentally `G.neighbors(node)` returns an enumerable; I found it expedient to use it like `range()`: 

```
for i in G.neighbors(4): print(i)

9 18 11 13 6 8 17 19 1 3 12 5 14 7 16
```

As the while progresses brute force additions are made to generate new paths and graphs that have the newest path vertext removed. 
I'll leave it at that for now but it is an interesting bit of code. 

In [26]:
print(e, v)

[0.         1.11111111 1.11111111 1.11111111 1.11111111 1.11111111
 1.11111111 1.11111111 1.11111111 1.11111111] [[ 0.9486833  -0.31622777 -0.07797612 -0.07024899 -0.09584185  0.08585468
   0.00972217  0.08892981 -0.06893175  0.00846137]
 [-0.10540926 -0.31622777 -0.15636482  0.27542065 -0.71957452  0.31752603
   0.05447504  0.17833015  0.40155478  0.0169675 ]
 [-0.10540926 -0.31622777 -0.64810613 -0.16710675  0.21100941 -0.73958869
   0.03997213 -0.03870317 -0.1480744   0.23456501]
 [-0.10540926 -0.31622777 -0.2868268   0.67882996  0.50249939 -0.0197753
  -0.11855243  0.00453957  0.03913832 -0.10827628]
 [-0.10540926 -0.31622777 -0.10618713 -0.57073397  0.21460346  0.34413277
  -0.17032002  0.42221684  0.05876282 -0.53484569]
 [-0.10540926 -0.31622777  0.07445253 -0.18875819 -0.02953795  0.07779289
   0.78348366 -0.36317535  0.18115369  0.05893732]
 [-0.10540926 -0.31622777  0.07445253 -0.18875819 -0.02953795  0.07779289
  -0.57288765 -0.59717752  0.22549071  0.34214692]
 [-0.10540926

In [27]:
import inspect
print(inspect.getdoc(numpy.linalg.eig))

Compute the eigenvalues and right eigenvectors of a square array.

Parameters
----------
a : (..., M, M) array
    Matrices for which the eigenvalues and right eigenvectors will
    be computed

Returns
-------
w : (..., M) array
    The eigenvalues, each repeated according to its multiplicity.
    The eigenvalues are not necessarily ordered. The resulting
    array will be of complex type, unless the imaginary part is
    zero in which case it will be cast to a real type. When `a`
    is real the resulting eigenvalues will be real (0 imaginary
    part) or occur in conjugate pairs

v : (..., M, M) array
    The normalized (unit "length") eigenvectors, such that the
    column ``v[:,i]`` is the eigenvector corresponding to the
    eigenvalue ``w[i]``.

Raises
------
LinAlgError
    If the eigenvalue computation does not converge.

See Also
--------
eigvals : eigenvalues of a non-symmetric array.
eigh : eigenvalues and eigenvectors of a real symmetric or complex
       Hermitian (conjug

In [29]:
import inspect
print(inspect.getdoc(nx.random_graphs.erdos_renyi_graph))

Returns a $G_{n,p}$ random graph, also known as an Erdős-Rényi graph
or a binomial graph.

The $G_{n,p}$ model chooses each of the possible edges with probability $p$.

Parameters
----------
n : int
    The number of nodes.
p : float
    Probability for edge creation.
seed : integer, random_state, or None (default)
    Indicator of random number generation state.
    See :ref:`Randomness<randomness>`.
directed : bool, optional (default=False)
    If True, this function returns a directed graph.

See Also
--------
fast_gnp_random_graph

Notes
-----
This algorithm [2]_ runs in $O(n^2)$ time.  For sparse graphs (that is, for
small values of $p$), :func:`fast_gnp_random_graph` is a faster algorithm.

:func:`binomial_graph` and :func:`erdos_renyi_graph` are
aliases for :func:`gnp_random_graph`.

>>> nx.binomial_graph is nx.gnp_random_graph
True
>>> nx.erdos_renyi_graph is nx.gnp_random_graph
True

References
----------
.. [1] P. Erdős and A. Rényi, On Random Graphs, Publ. Math. 6, 290 (1959

In [30]:
print(inspect.getdoc(nx.draw))

Draw the graph G with Matplotlib.

Draw the graph as a simple representation with no node
labels or edge labels and using the full Matplotlib figure area
and no axis labels by default.  See draw_networkx() for more
full-featured drawing that allows title, axis labels etc.

Parameters
----------
G : graph
    A networkx graph

pos : dictionary, optional
    A dictionary with nodes as keys and positions as values.
    If not specified a spring layout positioning will be computed.
    See :py:mod:`networkx.drawing.layout` for functions that
    compute node positions.

ax : Matplotlib Axes object, optional
    Draw the graph in specified Matplotlib axes.

kwds : optional keywords
    See networkx.draw_networkx() for a description of optional keywords.

Examples
--------
>>> G = nx.dodecahedral_graph()
>>> nx.draw(G)
>>> nx.draw(G, pos=nx.spring_layout(G))  # use spring layout

See Also
--------
draw_networkx
draw_networkx_nodes
draw_networkx_edges
draw_networkx_labels
draw_networkx_edge_l