# Vertex Clustering Using Adjacency Spectral Embedding and K-Means

When you are given a graph, it may be interesting to see which nodes forms clusters. It can be difficult to visualize clusters, or blocks given an adjacency matrix. As such, in this tutorial, we will examine how we can use adjacency spectral embedding combined with k-means to cluster vertices of a graph. We will also explore why it is useful to use adjacency spectral embedding prior to clustering to get optimal results.

In [2]:
import numpy as np

import graspy
from graspy.embed import AdjacencySpectralEmbed
from graspy.simulations import binary_sbm

%matplotlib inline

## Generate synthetic data
In this example, we will work primarily with undirected graphs with no self loops. We will sample three binary stochastic block models with two blocks each with 50 vertices each, and with the following three probability blocks:

\begin{align*}
P_1 = 
\begin{bmatrix}0.5 & 0.5\\
0.5 & 0.5
\end{bmatrix},~
P_2 = \begin{bmatrix}0.6 & 0.5\\
0.5 & 0.6
\end{bmatrix}, ~
P_3 = \begin{bmatrix}0.7 & 0.5\\
0.5 & 0.7
\end{bmatrix}
\end{align*}

We will also generate labels for each vertices. In this case, the first 50 vertices correspond to the first block, which we will denote as 0, and the second 50 correspond to the second block, which we will denote as 1. We know the truth labels since we are using synthetic data, but we emphasize that the true block labeling is usually not known in real datasets.

In [493]:
# Set parameters
n = [60, 40]

P1 =  np.array([[0.42, 0.42], [0.42, 0.5]])
P2 = np.array([[0.15, 0.1], 
               [0.1, 0.15]])
P3 = np.array([[0.4, 0.2], 
               [0.2, 0.4]])

#truth = np.repeat([0, 1], 50)
n = 100
n_ = [int(j) for j in rho * n]
truth = [0] * int((rho[0]) * n) + [1] * int((rho[1]) * n)

In [494]:
A = binary_sbm(n_, P1)

In [495]:
km = KMeans(2)
ARI(truth, km.fit_predict(A))

0.01369257950530042

In [497]:
km.fit_predict(A)

array([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1,
       0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1,
       1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0,
       1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1,
       1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1], dtype=int32)

In [496]:
ase = AdjacencySpectralEmbed(k=1)
Xhat = ase.fit_transform(A)
km = KMeans(2)
ARI(truth, km.fit_predict(Xhat))

0.0056061667834618745

In [431]:
P = np.array([[0.42, 0.42], [0.42, 0.5]])
rho = np.array([0.6, 0.4])
n = [500, 1000]

truth = np.repeat([0, 1], 50)

In [452]:
kmeans_error = []
embed_error = []
for i in n:

    truth = [0] * int((rho[0]) * i) + [1] * int((rho[1]) * i)
    for z in range(10):
        print('run {}'.format(z))
        km = KMeans(2)
        com = []
        A = binary_sbm([int(j) for j in rho * i], P)
        kmeans_error.append(ARI(truth, km.fit_predict(A)))
        
        ase = AdjacencySpectralEmbed(k=2)
        Xhat = ase.fit_transform(A)
        km = KMeans(2)
        embed_error.append(ARI(truth, km.fit_predict(Xhat)))

run 0
run 1
run 2
run 3
run 4
run 5
run 6
run 7
run 8
run 9
run 0
run 1
run 2
run 3
run 4
run 5
run 6
run 7
run 8
run 9


In [453]:
embed_error

[1.1565343176952035e-05,
 -0.0013895651662113127,
 -0.0019373412240537731,
 -0.0017101945095662728,
 -0.0019515007406964103,
 -0.0018854482257356195,
 -0.0024900311957896993,
 0.0002246596335863257,
 0.0005267066124463585,
 -0.002108104460489895,
 -0.0006430095503303557,
 -0.0007666517735804298,
 -0.0007813504419500414,
 -0.0010787087992766454,
 0.00019507036835290727,
 8.529246222457228e-05,
 -0.0010244718919542968,
 -0.00029883261773893896,
 -0.0007281635580399341,
 0.2145235251657943]

In [454]:
kmeans_error

[0.015481353330745311,
 0.03341318181366926,
 0.0012047342780248595,
 0.01997021071895575,
 0.026714809872701785,
 -0.0024141123225673577,
 0.06099672954397681,
 0.00013676136491442827,
 0.01998935378968367,
 0.04414391058877757,
 0.0419388028471526,
 0.34442566955184395,
 0.37113027154774053,
 0.2973722273577886,
 0.07976651834472266,
 0.03362373841992601,
 0.33094877139864987,
 0.24322970357841717,
 0.3566360328290683,
 0.0726387078855726]