# Spectral Embedding Methods

One of the primary embedding tools we'll use in this book is called *adjacency spectral embedding* {cite:t}`spectraltutorial`. You'll see spectral embedding and variations on it repeatedly, both throughout this section and when we get into applications, so it's worth taking the time to understand spectral embedding deeply. If you're familiar with Principal Component Analysis (PCA), this method has a lot of spiritual similarities. We'll need to get into a bit of linear algebra to understand how it works.

The basic idea behind Spectral Embedding is to take a network, optionally take its Laplacian, and then find the eigenvectors corresponding to the $d$ largest eigenvalues, depending on how many dimensions ($d$) you'd like to embed your network down to. You then scale those eigenvectors by their eigenvalues (or sometimes the square root of their eigenvalues). You'll then have a rectangular matrix, where the columns are the eigenvalues. The *rows* of that matrix will be the embedding for each node.

You can see visually how Spectral Embedding works below. We started with a 20-node Stochastic Block Model with two communities, and then found its eigenvalues and eigenvectors. It turns out that because there are only two communities, only the first two eigenvectors contain information -- the rest are just noise! (you can see this if you look carefully at the first two columns of the eigenvector matrix). So, we took theose two columns and scaled them by the first two eigenvalues of the eigenvalue matrix $D$. The final embedding is that scaled matrix.

In [None]:
from graspologic.simulations import sbm
from graphbook_code import heatmap, cmaps, plot_latents
from graspologic.utils import to_laplacian
from scipy.linalg import svd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

def rm_ticks(ax, x=False, y=False, **kwargs):
    if x is not None:
        ax.axes.xaxis.set_visible(x)
    if y is not None:
        ax.axes.yaxis.set_visible(y)
    sns.despine(ax=ax, **kwargs)

# Make network
B = np.array([[0.8, 0.1], 
              [0.1, 0.8]])
n = [10, 10]
A, labels = sbm(n=n, p=B, return_labels=True)
L = to_laplacian(A)
U, E, Ut = svd(L)
n_components = 2
Uc = U[:, :n_components]
Ec = E[:n_components]
latents = Uc @ np.diag(Ec)
    
fig = plt.figure();

ax = fig.add_axes([.06, -.06, .8, .8]) 
ax = heatmap(L, ax=ax, cbar=False)
ax.set_title("Network Representation", loc="left", fontsize=16)


# add arrow
arrow_ax = fig.add_axes([.8, .3, .3, .1])
rm_ticks(arrow_ax, left=True, bottom=True)
plt.arrow(x=0, y=0, dx=1, dy=0, width=.1, color="black") 

# add joint matrix
ax = fig.add_axes([1, -.02*3, .8, .8])
ax = heatmap(U, ax=ax, cbar=False)
ax.set_title("Eigenvector matrix $U$", loc="left")

ax = fig.add_axes([1.55, -.06, .8, .8])
ax = heatmap(np.diag(E), ax=ax, cbar=False)
ax.set_title("Eigenvalue matrix $D$", loc="left")

ax = fig.add_axes([2.1, -.06, .8, .8])
ax = heatmap(Ut, ax=ax, cbar=False)
ax.set_title("$U^T$", loc="left")
    
# add second arrow
arrow_ax = fig.add_axes([2.8, .3, .3, .1])
rm_ticks(arrow_ax, left=True, bottom=True)
plt.arrow(x=0, y=0, dx=1, dy=0, width=.1, color="black") 

# Embedding
ax = fig.add_axes([3.15, -.06, .4, .8])
cmap = cmaps["sequential"]
ax = sns.heatmap(latents, cmap=cmap, 
            ax=ax, cbar=False, xticklabels=False, yticklabels=False)
ax.set_title("Embedding \n(matrix representation)", loc="left")
ax.set_xlabel("First two scaled columns of $U$")

ax = fig.add_axes([3.6, -.06, .8, .8])
plot_latents(latents, ax=ax, labels=labels)
ax.set_title("Embedding (Euclidean representation)", loc="left")
ax.set_xlabel("Plotting the rows of U as points in space")

fig.suptitle("The Spectral Embedding Algorithm", fontsize=24, x=2.25)

This new matrix will be organized according to what you usually see in machine learning data. Rows (observations) correspond to the nodes of your network, and columns (features) correspond to the top eigenvectors of the network. 

We need to dive into a few specifics to understand this better. We need to figure out how to find our network's eigenvectors, for instance, and we also need to understand out why those eigenvectors can be used to form a representation of our network.

Let's scale down and make a simple network, with only six nodes. We'll take its Laplacian just to show what that optional step looks like, and then we'll find its eigenvectors with a technique we'll explore called Singular Value Decomposition. Then, we'll explore why we can use the first $k$ eigenvector/eigenvalue pairs to find an embedding. Let's start with creating a simple network.

## A Simple Network

Say we have the simple network below. There are six nodes total, numbered 0 through 5, and there are two distinct connected groups (called "connected components" in network theory land). Nodes 0 through 2 are all connected to each other, and nodes 3 through 5 are also all connected to each other. 

In [None]:
from itertools import combinations
import numpy as np

def add_edge(A, edge: tuple):
    """
    Add an edge to an undirected graph.
    """
    i, j = edge
    A[i, j] = 1
    A[j, i] = 1
    return A

A = np.zeros((6, 6))

for edge in combinations([0, 1, 2], 2):
    add_edge(A, edge)
    
for edge in combinations([3, 4, 5], 2):
    add_edge(A, edge)

You can see the adjacency matrix and network below. Notice that there are two distrinct blocks in the adjacency matrix: in its upper-left, you can see the edges between the first three nodes, and in the bottom right, you can see the edges between the second three nodes.

In [None]:
from graphbook_code import draw_multiplot
import networkx as nx

draw_multiplot(A, pos=nx.kamada_kawai_layout, title="Our Simple Network")

## The Laplacian Matrix

With spectral embedding, we'll either find the eigenvectors of the Laplacian or the eigenvectors of the Adjacency Matrix itself. Since we already have the adjacency matrix, let's take the Laplacian just to see what that looks like.

Remember from chapter four that there are a few different types of Laplacian matrices. By default, Graspologic uses the normalized Laplacian $L = D^{-1/2} A D^{-1/2}$, and that or variations on it are generally what we use in practice. For simplicity and illustration, we'll just use the basic, cookie-cutter version of the Laplacian $L = D - A$.

Here's the degree matrix.

In [None]:
# Build the degree matrix D
degrees = np.count_nonzero(A, axis=0)
D = np.diag(degrees)
D

And here's the Laplacian matrix, written out in full.

In [None]:
# Build the Laplacian matrix L
L = D-A
L

Below, you can see these matrices visually.

In [None]:
from graphbook_code import heatmap
import seaborn as sns
from matplotlib.colors import Normalize
from graphbook_code import GraphColormap
import matplotlib.cm as cm
import matplotlib.pyplot as plt

fig, axs = plt.subplots(1, 5, figsize=(25, 5))

# First axis (Degree)
heatmap(D, ax=axs[0], cbar=False, title="Degree Matrix $D$")

# Second axis (-)
axs[1].text(x=.5, y=.5, s="-", fontsize=200, 
            va='center', ha='center')
axs[1].get_xaxis().set_visible(False)
axs[1].get_yaxis().set_visible(False)
sns.despine(ax=axs[1], left=True, bottom=True)

# Third axis (Adjacency matrix)
heatmap(A, ax=axs[2], cbar=False, title="Adjacency Matrix $A$")

# Third axis (=)
axs[3].text(x=.5, y=.5, s="=", fontsize=200,
            va='center', ha='center')
axs[3].get_xaxis().set_visible(False)
axs[3].get_yaxis().set_visible(False)
sns.despine(ax=axs[3], left=True, bottom=True)

# Fourth axis
heatmap(L, ax=axs[4], cbar=False, title="Laplacian Matrix $L$")

# Colorbar
vmin, vmax = np.array(L).min(), np.array(L).max()
norm = Normalize(vmin=vmin, vmax=vmax)
im = cm.ScalarMappable(cmap=GraphColormap("sequential").color, norm=norm)
fig.colorbar(im, ax=axs, shrink=0.8, aspect=10);

fig.suptitle("The Laplacian is just a function of the adjacency matrix", fontsize=24)

## Finding Eigenvectors: Singular Value Decomposition

Now that we have a Laplacian matrix, we'll want to find its eigenvectors - or, more generally, its "singular vectors". To do this, we'll need to use a technique called *Singular Value Decomposition*, or SVD. 

SVD is a way to break a single matrix apart into three submatrices -- In our case, the matrix will be the Laplacian we just built. Generally, these three submatrices correspond to something called the singular vectors and singular values. These are general versions of eigenvectors and eigenvalues: all matrices have a full set of singular vectors/values, but not all matrices have a full set of eigenvectors/values. In the case of the Laplacian (as with all symmetric matrices that have positive eigenvalues), it turns out that the singular vectors/values and the eigenvectors/values are the same thing. For more technical details on how SVD works, or for explicit proofs, we would recommend a Linear Algebra textbook [Trefethan, LADR]. Let's look at the SVD with a bit more detail. This description of the SVD only applies to square, symmetric matrices, but you can use the SVD on any matrix with a bit more generality.

**Singular Value Decomposition** Suppose you have a square, symmetrix matrix $X$. In our case, $X$ corresponds to the Laplacian $L$ (or the adjacency matrix $A$).

\begin{align*}
\begin{bmatrix}
    x_{11} & & & " \\
    & x_{22} & & \\
    & & \ddots & \\
    " & & & x_{nn}
    \end{bmatrix}
\end{align*}

Then, you can find three matrices - one which rotates vectors in space, one which scales them along each coordinate axis, and another which rotates them back - which, when you multiply them all together, create $X$. This is the essence of Singular Value Decomposition: you can break down any linear transformation into a rotation, a scaling, and another rotation. Let's call the matrix which rotates $U$ (this type of matrix is called "orthogonal"), and the matrix that scales $D$.

\begin{align*}
    X &= U D U^T
\end{align*}

Since $U$ is a matrix that just rotates any vector, all of its column-vectors are at ninety-degree angles from each other and they all have the unit length of 1. These columns are more generally called the singular vectors of X, but since our matrix is symmetric and has real eigenvalues, they are also the **eigenvectors**. Since $D$ just scales, it's a diagonal matrix: there are values on the diagonals, but nothing (0) on the off-diagonals. The amount that each coordinate axis is scaled are the values on the diagonal entries of $D$, $\sigma_{i}$. These also have a more general name, singular values of the matrix $X$, but again, in our case, since they're all positive (and X is symmetric), in this case they'll be X's **eigenvalues**. This will be the case with both the Laplacian and the adjacency matrix, assuming our network is undirected.

\begin{align*}
    X &= \begin{bmatrix}
    \uparrow & \uparrow &  & \uparrow \\
    u_1 & \vec u_2 & ... & \vec u_n \\
    \downarrow & \downarrow &  & \downarrow
    \end{bmatrix}\begin{bmatrix}
    \sigma_1 & &  & \\
    & \sigma_2 &  & \\
    & & \ddots & \\
    & & & \sigma_n
    \end{bmatrix}\begin{bmatrix}
    \leftarrow & \vec u_1^T & \rightarrow \\
    \leftarrow & \vec u_2^T & \rightarrow \\
    & \vdots & \\
    \leftarrow & \vec u_n^T & \rightarrow \\
    \end{bmatrix}
\end{align*}

## Finding the Eigenvectors of our network's Laplacian matrix

Let's break down our Laplacian matrix into $U$, $D$, and $U^\top$ to find its singular values (the diagonals of $D$ and singular vectors (the columns of $U$). Remember, since our Laplacian is symmetric and has real, positive eigenvalues, the singular values and singular vectors correspond to the eigenvalues and eigenvectors. This isn't always the case.

Here, the leftmost column of $U$ (and the leftmost eigenvalue of $D$) correspond to the eigenvector with the highest eigenvalue, and they're organized in descending order.

In [None]:
from scipy.linalg import svd
U, E, Ut = svd(L)

In [None]:
fig, axs = plt.subplots(1, 5, figsize=(25, 5))

# First axis (Laplacian)
heatmap(L, ax=axs[0], cbar=False, title="$L$")

# Second axis (=)
axs[1].text(x=.5, y=.5, s="=", fontsize=200, 
            va='center', ha='center')
axs[1].get_xaxis().set_visible(False)
axs[1].get_yaxis().set_visible(False)
sns.despine(ax=axs[1], left=True, bottom=True)

# Third axis (U)
U_ax = heatmap(U, ax=axs[2], cbar=False, title="$U$")
U_ax.set_xlabel("Columns of eigenvectors")

# Third axis (E)
E_ax = heatmap(np.diag(E), ax=axs[3], cbar=False, title="$D$")
E_ax.set_xlabel("Eigenvalues on diagonal")

# Fourth axis
Ut_ax = heatmap(Ut, ax=axs[4], cbar=False, title="$U^T$")
Ut_ax.set_xlabel("Rows of eigenvectors")

# Colorbar
vmin, vmax = np.array(L).min(), np.array(L).max()
norm = Normalize(vmin=vmin, vmax=vmax)
im = cm.ScalarMappable(cmap=GraphColormap("sequential").color, norm=norm)
fig.colorbar(im, ax=axs, shrink=0.8, aspect=10);

fig.suptitle("You can decompose the Laplacian into eigenvectors/eigenvalues", fontsize=24)

## Why We Care About Taking Eigenvectors: Matrix Rank

When we embed anything to create a new representation, we're essentially trying to find a simpler version of that thing which preserves as much information as possible. This leads us to the concept of **matrix rank**.

**Matrix Rank**: The rank of a matrix $X$, defined $rank(X)$, is the number of linearly independent rows and columns of $X$. 

At a very high level, we can think of the matrix rank as telling us just how "simple" $X$ is. A matrix which is rank $1$ is very simple: all of its rows or columns can be expressed as a weighted sum of just a single vector. On the other hand, a matrix which has "full rank", or a rank equal to the number of rows (or columns, whichever is smaller), is a bit more complex: no row nor column can be expressed as a sum of other rows or columns.

There are a couple ways that the rank of a matrix and the singular value decomposition interact which are critical to understand. The facts below explain why we can represent our matrix using only the first few scaled eigenvectors in the Spectral Embedding algorithm.

### Summing Rank 1 Matrices Recreates The Original Matrix

You can actually create an $n \times n$ matrix using any one of the matrix's eigenvectors by taking its outer product $u_i u_i^T$. This creates a rank one matrix which only contains the information stored in the first eigenvector. Scale it by its eigenvalue $\sigma_i$ and you have something that feels suspiciously similar to how we take the first few eigenvectors of $U$ and scale them in spectral embedding.

It turns out that we can express any matrix $X$ as the sum of all of these rank one matrices.
Take the $i^{th}$ column of $U$. We'll call it $u_i$: the $i^{th}$ eigenvector of our Laplacian. Its corresponding eigenvalue is the $i^{th}$ element of the diagonal eigenvalue matrix $D$. You can make a rank $1$ matrix from this eigenvalue/eigenvector pair by taking the outer product and scaling the result by the eigenvalue: $\sigma_i u_i u_i^T$.

It turns out that when we take the sum of all of these rank $1$ matrices, we'll recreate the original matrix:

\begin{align*}
    X &= \sum_{i = 1}^n \sigma_i u_i u_i^T = \sigma_1 \begin{bmatrix}\uparrow \\ \vec u_1 \\ \downarrow\end{bmatrix}\begin{bmatrix}\leftarrow & \vec u_1^T & \rightarrow \end{bmatrix} + 
    \sigma_2 \begin{bmatrix}\uparrow \\ \vec u_2 \\ \downarrow\end{bmatrix}\begin{bmatrix}\leftarrow & \vec u_2^T & \rightarrow \end{bmatrix} + 
    ... + 
    \sigma_n \begin{bmatrix}\uparrow \\ \vec u_n \\ \downarrow\end{bmatrix}\begin{bmatrix}\leftarrow & \vec u_n^T & \rightarrow \end{bmatrix}
\end{align*}

Here are all of the $\sigma_i u_i u_i^T$ for our Laplacian L. Since there were six nodes in the original network, there are six eigenvalue/vector pairs, and six rank 1 matrices.

In [None]:
n_nodes = U.shape[0]
low_rank_matrices = []
for i in range(n_nodes):
    ui = np.atleast_2d(U[:, i]).T
    low_rank_matrix = E[i] * ui @ ui.T
    low_rank_matrices.append(low_rank_matrix)
    
laplacian_sum = np.array(low_rank_matrices).sum(axis=0)

In [None]:
from matplotlib.gridspec import GridSpec

fig = plt.figure(figsize=(10, 6))
gs = GridSpec(3, 5)
ax_laplacian = fig.add_subplot(gs[:, 2:])

# Plot low-rank matrices
i = 0
for row in range(3):
    for col in range(2):
        ax = fig.add_subplot(gs[row, col])
        title = f"$\sigma_{i+1} u_{i+1} u_{i+1}^T$"
        heatmap(low_rank_matrices[i], ax=ax, cbar=False, title=title)
        i += 1
        
# Plot Laplacian
heatmap(laplacian_sum, ax=ax_laplacian, cbar=False, title="$L = \sum_{i = 1}^n \sigma_i u_i u_i^T$")

fig.suptitle("You can recreate the Laplacian by summing low-rank matrices", fontsize=24)
plt.tight_layout()

### We can approximate L by only summing a few of our low-rank matrices

Secondly, it turns out that if you only sum a few of these low-rank $\sigma_i u_i u_i^T$, you can *approximate* your original matrix. In fact, and this is important, you get the *best possible* approximation to your original matrix.

This tells us something interesting about SVD: it lets us find the best possible low-rank approximations to high-rank matrices. We take a matrix that's extremely complicated and can project it down to the most similar matrix which is less complicated.

Look below. In each plot, we're summing more and more of these low-rank matrices. By the time we're summing three matrices, we've totally recreated the original Laplacian.

In [None]:
fig, axs = plt.subplots(2, 3, figsize=(9,6))

current = np.zeros(L.shape)
for i, ax in enumerate(axs.flat):
    new = low_rank_matrices[i]
    current += new
    heatmap(current, ax=ax, cbar=False, 
            title=f"$\sum_{{i = 1}}^{i+1} \sigma_i u_i u_i^T$")
    
fig.suptitle("Each of these is the sum of an \nincreasing number of low-rank matrices", fontsize=16)
plt.tight_layout()

This becomes even more useful when we have huge networks with thousands of nodes, but only a few communities. It turns out that we can usually sum a very small number of low-rank matrices, and get to an approximation that's pretty close.

Take the network below, a Stochastic Block Model with 1000 nodes total (500 in one community, 500 in another). We took its Laplacian (the regularized version this time), decomposed it, and summed two of the low-rank matrices we generated from the columns.

The result is not exact, but it looks pretty close, especially since we only used two rank-one matrices (generated by summing two scaled vectors) instead of a full $n \times n$ matrix! 

In [None]:
from graspologic.simulations import sbm
from graspologic.utils import to_laplacian

# Make network
B = np.array([[0.8, 0.1], 
              [0.1, 0.8]])
n = [25, 25]
A2, labels = sbm(n=n, p=B, return_labels=True)

# Form new laplacian
L2 = to_laplacian(A2)

# decompose
k = 2
U2, E2, Ut2 = svd(L2)

k_matrices = U2[:, k]
low_rank_approximation = U2[:,0:k] @ (np.diag(E2[0:k]) @ Ut2[0:k, :])


# Plotting
fig, axs = plt.subplots(1, 2, figsize=(12, 6))
l2_hm = heatmap(L2, ax=axs[0], cbar=False, title="$L$")
l2approx_hm = heatmap(low_rank_approximation, ax=axs[1], cbar=False, title="$\sum_{{i = 1}}^{2} \sigma_i u_i u_i^T$")

l2_hm.set_xlabel("Full-rank Laplacian for a 50-node matrix", fontdict={'size': 15})
l2approx_hm.set_xlabel("Sum of only two low-rank matrices!", fontdict={'size': 15});

fig.suptitle("Summing a few low-rank matrices looks pretty similar to the laplacian!", fontsize=24)

plt.tight_layout()

This is where a lot of the power of an SVD comes from: you can estimate extremely complicated (high-rank) matrices with extremely simple (low-rank) matrices.

### How This Matrix Rank Stuff Helps Us Understand Spectral Embedding

Remember the actual spectral embedding algorithm: we take a network, decompose its Laplacian with SVD into its eigenvectors and eigenvalues, and then cut out everything but the top $k$ eigenvalue/eigenvector pairs. Once we scale the columns of eigenvetors by their corresponding eigenvalues, we have our embedding. The rows of this embedding are the locations in Euclidean space for each node, and the columns are the top $k$ scaled eigenvectors. How does what we just talked about help us understand this stuff?

Well, each column of the embedding is the $i^{th}$ eigenvector scaled by the $i^{th}$ eigenvalue: $\sigma_i u_i$. If we right-multiplied one of those columns by its unscaled transpose $u_i^\top$, we'd have one of our rank one matrices. This means that you can think of our rank-one matrices as essentially just fancy versions of the columns of an embedding matrix. They contain all the same information - they're just matrices instead of vectors!

In fact, you can express the sum we did earlier - our lower-rank estimation of L - with just our embedding matrix! Here, $U_k$ is the first $k$ eigenvectors of our Laplacian, and $D_k$ is the diagonal matrix with the first $k$ eigenvalues (remember that we named them $\sigma_1$ through $\sigma_k$).

$\sum_{i = 1}^k \sigma_i u_i u_i^T = U_k D_k U_k^T$

Let's take a closer look at our embedding matrix. 

## The Embedding Matrix

In [None]:
n_components = 3
Uc = U[:, :n_components]
Ec = E[:n_components]

latents = Uc @ np.diag(Ec)

In [None]:
fig, ax = plt.subplots(figsize=(4, 8))
cmap = cmaps["sequential"]
ax = sns.heatmap(latents, cmap=cmap, ax=ax, cbar=False, 
                 xticklabels=1, yticklabels=1)
ax.set_xlabel("Eigenvector")
ax.set_ylabel("Node")

And remember that our embedding matrix is just $D_k U_k$.

This property, and the reason why two  vectors captured two communities, will be key to the concept of the spectral embedding, which we will learn about in the next section.