# Multigraph Representation Learning

Say you're a brain researcher, and you have a bunch of scans of people's brains - some are scans of normal people, and some are scans of people with Abnormal Disease. You have an algorithm that lets you create networks from these brains, so you turn all your brain scans into networks. You want to know if these two different types of networks share a common community structure. It seems like you should be able to find communities in both networks -- they're both brain networks, after all -- but what do you do? How do you even deal with situations in which you have a lot of networks whose nodes all represent the same objects, but which might come from quite different distributions?

Well, if your goal is to find the shared community structure between the normal and abnormal networks, you could try embedding your networks and then seeing those embeddings look like. This would serve the dual purpose of having less stuff to deal with and having some way to directly compare your networks in the same space. For example, say you have nine networks of normal brains and nine networks of abnormal brains.

In [None]:
import numpy as np
from graspologic.simulations import sbm

# Generate networks from an SBM, given some parameters
def make_network(*probs, n=100, return_labels=False):
    p1, p2, p3, p4 = probs
    P = np.array([[p1, p2], 
                  [p3, p4]])
    
    return sbm([n, n], P, return_labels=return_labels)

# make nine normal networks
# and nine abnormal networks
p1, p2, p3 = .12, .06, .03
n = 100
labels = [0]*n + [1]*n
normals = [make_network(p1, p3, p3, p1, n=n) for i in range(9)]
abnormals = [make_network(p3, p1, p1, p3, n=n) for i in range(9)]

The normal brains are all drawn from the same distribution, and the abnormal brains are also all drawn from the same distribution. We're using Stochastic Block models to model both types of brain networks. However, if you look at the code, you'll see that the normal brains were set up to have strong connections within communities, whereas the abnormal brains were set up to have strong connections between communities. Below is a plot of the adjacency networks for every normal and every abnormal brain we've created.

In [None]:
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import ImageGrid
from graspologic.plot import binary_heatmap, adjplot
import warnings
warnings.filterwarnings("ignore")

fig = plt.figure(figsize=(14,7))

grid1 = ImageGrid(fig, 121, (3, 3), axes_pad=.1, share_all=True)
grid2 = ImageGrid(fig, 122, (3, 3), axes_pad=.1, share_all=True)

for i, (axi, axj) in enumerate(zip(grid1, grid2)):
    hmn = binary_heatmap(normals[i], ax=axi, cbar=False)
    hma = binary_heatmap(abnormals[i], ax=axj, cbar=False)
    for ax in [hmn, hma]:
        for spine in ax.spines.values():
            spine.set_visible(True)    
        ax.vlines(n, 0, n*2, colors="black", lw=.9, linestyle="dashed", alpha=.8)
        ax.hlines(n, 0, n*2, colors="black", lw=.9, linestyle="dashed", alpha=.8)

grid1.axes_all[1].set_title("Normal Brains", fontsize=24, y=1.05)
grid2.axes_all[1].set_title("Abnormal Brains", fontsize=24, y=1.05)

plt.tight_layout(w_pad=3)

Remember, our goal is to find the community structure in these brains. We'd like to embed our brain networks into some lower-dimensional space to see if we can see this more clearly. Then, we'll see whether we can find some clustering in that space. Try to think about how you might find a lower-dimensional embedding where the location of each node's latent positions uses information from all of the networks.

The first thing you might come up with is to average your networks together, and then embed the result of that averaging. It turns out that this is actually the right thing to do in the special case where every network you average comes from the same distribution. In our case, averaging only the normal networks together, and averaging only the abnormal networks together, will produce two separate embeddings with nicely clustered nodes.

In [None]:
from graspologic.embed import AdjacencySpectralEmbed as ASE

# Compute the average adjacency matrix for 
# normal brains and alzheimer's brains
normal_mean = np.array(normals).mean(axis=0)
abnormal_mean = np.array(abnormals).mean(axis=0)

# Embed both matrices
ase = ASE(n_components=2)
latents_normal = ase.fit_transform(normal_mean)
latents_abnormal = ase.fit_transform(abnormal_mean)

The below plot shows what happens when we embed the averaged normal and abnormal networks separately. Like all of our embedding plots, each dot represents the latent positions for a particular node.

In [None]:
import seaborn as sns

def plot_latents(latent_positions, *, title=None, labels=None, ax=None, legend=False,
                 fontdict=None, **kwargs):
    if ax is None:
        ax = plt.gca()
    plot = sns.scatterplot(latent_positions[:, 0], latent_positions[:, 1], hue=labels, 
                           s=10, ax=ax, palette="Set1", color='k', **kwargs)
    if title is not None:
        plot.set_title(title, wrap=True, fontdict=fontdict);
    ax.axes.xaxis.set_visible(False)
    ax.axes.yaxis.set_visible(False)
    if legend:
        ax.legend(loc="upper right", title="Community")
    elif not legend and np.any(labels):
        ax.get_legend().remove()
    
    return plot

# plot
fig, axs = plt.subplots(ncols=2, figsize=(10, 5))
plot_latents(latents_normal, title="Embedding when we average the \nnormal networks", ax=axs[0]);
plot_latents(latents_abnormal, title="Embedding when we average the \nabnormal networks", ax=axs[1]);

Both of these embeddings have clear clustering: there are two communities of nodes in both the normal and the abnormal graphs. We can recover the labels for these communities fairly easily using our pick of unsupervised clustering algorithm. Because we know that the latent positions in an Adjacency Spectral Embedding are normally distributed, we know that the above embeddings are distributed according to a Gaussian Mixture. "Gaussian" just means "normal", and a gaussian mixture model just means that we have groups of normally distributed data clusters. As a result, it makes sense to cluster these data using scikit-learn's GaussianMixture implementation.

In [None]:
from sklearn.mixture import GaussianMixture as GMM
from graspologic.utils import remap_labels

# Predict labels for the normal and abnormal brains
labels_normal = GMM(n_components=2).fit_predict(latents_normal)
labels_abnormal = GMM(n_components=2).fit_predict(latents_normal)

# Make corresponding communities have the same values
labels_abnormal = remap_labels(labels_normal, labels_abnormal)

Below is a plot showing which community we predict each community belongs to, according to our Gaussian Mixture Model.

In [None]:
fig, axs = plt.subplots(ncols=2, figsize=(10, 5))
normal_plot_gmm = plot_latents(latents_normal, title="Clustering our normal network \nembedding with a GMM", 
             labels=labels_normal, ax=axs[0])
abnormal_plot_gmm = plot_latents(latents_abnormal, title="Clustering our abnormal network \nembedding with a GMM", 
                   labels=labels_abnormal, ax=axs[1])

plt.legend(loc=(1.15, .4), fontsize="x-large", title="Community",
           title_fontsize=16);

But what if we wanted to embed *all* of the networks into the same space, both the alzheimer's and normal networks, so that there's only one plot? Let's try it. We'll take all of the networks, both normal and abnormal ones, and then average them together, and then do an Adjacency Spectral Embedding. This will result in a single plot, with each point representing a single node. Do you think we'll still find this nice community separation?

In [None]:
total_mean = np.array(normals + abnormals).mean(axis=0)
all_latents = ase.fit_transform(total_mean)

In [None]:
plot_latents(all_latents, title="Embedding when we average everything together");

Nope, bummer. Our community separation is gone. As far as anybody can tell, our latent positions have just become meaningless noise, so we can't cluster and find communities like we did before.

What we've discovered is that, even though it's a great idea to simply average a bunch of networks if they were drawn from the same distribution, it's a horrible idea to average your networks if they might come from different distributions. This is a case of averaging networks which are "heterogeneous": Not only are your networks slightly different, but they're *expected* to be different because they're distributed differently. Sampling a lot of heterogenous networks and then averaging them, as you can see above, can result in losing the community signal you might have had.

We'd like to find a way to compare these heterogeneous networks directly, so that we can embed all of our networks into the same space and still keep that community structure. Figuring out the best way to do this is a topic under active research, and the set of techniques and tools that have developed as a result are together called multigraph representation learning (here, "graph" just means "network").

There are a few ways of going about multigraph representation learning. The first is called the Embedding Product. The idea 

In [None]:
from graspologic.embed import MultipleASE as MASE
from graspologic.embed import OmnibusEmbed as OMNI
from graspologic.embed.omni import _get_omni_matrix
from graspologic.plot import heatmap

fig = plt.figure()

def rm_ticks(ax, **kwargs):
    ax.axes.xaxis.set_visible(False)
    ax.axes.yaxis.set_visible(False)
    sns.despine(ax=ax, **kwargs)

# add stack of heatmaps
for i in range(5):
    ax = fig.add_axes([.02*i, -.02*i, .8, .8]) 
    ax = binary_heatmap(normals[i], ax=ax, cbar=False)
    if i == 0:
        ax.set_title("Adjacency Matrices", loc="right", fontsize=20)
    rm_ticks(ax, top=False, right=False)
    ax.vlines(n, 0, n*2, colors="black", lw=.9, linestyle="dashed", alpha=.8)
    ax.hlines(n, 0, n*2, colors="black", lw=.9, linestyle="dashed", alpha=.8)


# add arrow
arrow_ax = fig.add_axes([.8, .3, .3, .1])
rm_ticks(arrow_ax, left=True, bottom=True)
plt.arrow(x=0, y=0, dx=1, dy=0, width=.1, color="black") 

# add stack of heatmaps
for i in range(5):
    left, bottom, width, height = [.02*i + 1.15, -.02*i, .55, .8]
    ax = fig.add_axes([left, bottom, width, height]) 
    right = left + width
    top = bottom + height
    rm_ticks(ax, top=False, right=False)

ax.text(.5, .5, 'Classifiers', transform=ax.transAxes, horizontalalignment='center', 
        verticalalignment='center', fontdict={'fontsize': 26})

# add second arrow
arrow_ax = fig.add_axes([1.85, .3, .3, .1])
rm_ticks(arrow_ax, left=True, bottom=True)
plt.arrow(x=0, y=0, dx=1, dy=0, width=.1, color="black") 

# classify
latents = MASE(n_components=2).fit_transform(normals+abnormals)
labels = GMM(n_components=2).fit_predict(latents)
ax = fig.add_axes([2.2, -.02*3, .55, .8])
plot_latents(latents, ax=ax, title="Classification", 
             fontdict={'fontsize': 20}, labels=labels)
rm_ticks(ax, top=False, right=False)

plt.suptitle("Classifier Ensemble", x=1.5, y=1.1, fontsize=26)

In [None]:
fig = plt.figure()

# add stack of heatmaps
for i in range(5):
    ax = fig.add_axes([.02*i, -.02*i, .8, .8]) 
    ax = binary_heatmap(normals[i], ax=ax, cbar=False)
    if i == 0:
        ax.set_title("Adjacency Matrices", loc="right", fontsize=16)
    rm_ticks(ax, top=False, right=False)
    ax.vlines(n, 0, n*2, colors="black", lw=.9, linestyle="dashed", alpha=.8)
    ax.hlines(n, 0, n*2, colors="black", lw=.9, linestyle="dashed", alpha=.8)


# add arrow
arrow_ax = fig.add_axes([.8, .3, .3, .1])
rm_ticks(arrow_ax, left=True, bottom=True)
plt.arrow(x=0, y=0, dx=1, dy=0, width=.1, color="black") 

# add joint matrix
omni_ax = fig.add_axes([1, -.02*3, .8, .8])
A = _get_omni_matrix(normals+abnormals)
a_hm = heatmap(A, ax=omni_ax, title="Joint Matrix", cbar=False)
for _, spine in a_hm.spines.items():
    spine.set_visible(True)
    
# add second arrow
arrow_ax = fig.add_axes([1.75, .3, .3, .1])
rm_ticks(arrow_ax, left=True, bottom=True)
plt.arrow(x=0, y=0, dx=1, dy=0, width=.1, color="black") 

# add omni embedding
latents_omni = OMNI(n_components=2).fit_transform(normals+abnormals).mean(axis=0)
omni_embed_ax = fig.add_axes([2.1, -.02*3, .55, .8])
plot_latents(latents_omni, ax=omni_embed_ax, title="Joint Embedding", 
             fontdict={'fontsize': 16})
rm_ticks(omni_embed_ax, top=False, right=False)

# add third arrow
arrow_ax = fig.add_axes([2.7, .3, .3, .1])
rm_ticks(arrow_ax, left=True, bottom=True)
plt.arrow(x=0, y=0, dx=1, dy=0, width=.1, color="black") 

# classify
labels_normal = GMM(n_components=2).fit_predict(latents_normal)
mase_ax = fig.add_axes([3.05, -.02*3, .55, .8])
plot_latents(latents_omni, ax=mase_ax, title="Classification", 
             fontdict={'fontsize': 16}, labels=labels_normal)

plt.suptitle("Dissimilarity Combination", x=2, y=1.1, fontsize=26)

In [None]:
fig = plt.figure()

# add stack of heatmaps
for i in range(5):
    ax = fig.add_axes([.02*i, -.02*i, .5, .5]) 
    ax = binary_heatmap(normals[i], ax=ax, cbar=False)
    if i == 0:
        ax.set_title("Adjacency Matrices", loc="right")
    rm_ticks(ax, top=False, right=False)
    ax.vlines(n, 0, n*2, colors="black", lw=.9, linestyle="dashed", alpha=.8)
    ax.hlines(n, 0, n*2, colors="black", lw=.9, linestyle="dashed", alpha=.8)

# add arrow
arrow_ax = fig.add_axes([.5, .2, .3, .1])
rm_ticks(arrow_ax, left=True, bottom=True)
plt.arrow(x=0, y=0, dx=1, dy=0, width=.1, color="black") 

# add stack of latent plots
for i in range(5):
    ax = fig.add_axes([.8+.02*i, -.02*i, .35, .5])
    if i == 0:
        ax.set_title("Separate Embeddings")
    latents = ase.fit_transform(normals[i])
    plot = sns.scatterplot(latents[:, 0], latents[:, 1], 
                       s=10, ax=ax, color="black")
    rm_ticks(ax, top=False, right=False)
    
# add second arrow
arrow_ax = fig.add_axes([1.25, .2, .3, .1])
rm_ticks(arrow_ax, left=True, bottom=True)
plt.arrow(x=0, y=0, dx=1, dy=0, width=.1, color="black") 

# add group embeddings
mase = MASE(n_components=2)
latents_mase = mase.fit_transform(normals + abnormals)
mase_ax = fig.add_axes([1.57, -.03, .35, .5])
plot_latents(latents_mase, ax=mase_ax, title="Joint Embedding")
rm_ticks(mase_ax, top=False, right=False)

# add third arrow
arrow_ax = fig.add_axes([1.95, .2, .3, .1])
rm_ticks(arrow_ax, left=True, bottom=True)
plt.arrow(x=0, y=0, dx=1, dy=0, width=.1, color="black") 

# classify
labels_normal = GMM(n_components=2).fit_predict(latents_normal)
mase_ax = fig.add_axes([2.27, -.03, .35, .5])
plot_latents(latents_mase, ax=mase_ax, title="Classification", 
             labels=labels_normal)

plt.suptitle("Embedding Product", x=1.4, y=.7, fontsize=20)

In this section, we'll explore the strengths and weaknesses of different types of multigraph representation learning. Multiple Adjacency Spectral Embedding (MASE) is a technique which embeds a bunch of networks separately (with normal Adjacency Spectral Embedding), and then re-embeds all the separate results into a single space. MASE is nice because you don't actually need each network to be generated from the same distribution - you only need the nodes of the different networks to be aligned and for them to belong to the same communities. Omnibus embedding combines the adjacency matrix of all of the matrices into a single super-matrix, and then embeds that super-matrix.

## Multiple Adjacency Spectral Embedding

The first algorithm we'll explore is called MASE, and is probably the easiest to understand if you know how Adjacency Spectral Embeddings work. Say you have some number of networks, and (like we said above) their nodes are aligned. The goal of MASE is to embed the networks into a single space, with each point in that space representing a single node - but, unlike simply averaging, MASE lets you use networks which aren't necessarily drawn from the same distribution. MASE is based on the common subspace independent-edge (COSIE) model from the multi-network models section of chapter 5, so we're operating under the assumption that there *is* some low-dimensional space common to all of our networks that we can embed into in the first place.

Let's try MASE on our group of normal and abnormal brains -- then, we'll dive deeper into what's going on under the hood.

In [None]:
from graspologic.embed import MultipleASE as MASE

# Use MASE to embed everything
mase = MASE(n_components=2)
latents_mase = mase.fit_transform(normals + abnormals)

In [None]:
plot_latents(latents_mase, title="Embedding when we use MASE on everything together", labels=labels);

Unlike simply averaging all of our networks together, MASE manages to keep the community structure that we found when we averaged the normal and abnormal networks separately. 

### What's Going On Under The Hood

The figure below shows how MASE works. First, we embed the adjacency matrices of a bunch of networks separately, using our standard Adjacency Spectral Embedding algorithm. Then, we take all of those embeddings, concatenate them into a single matrix, and embed the entire concatenated matrix.

```{figure} ../../Images/mase1.jpeg
---
height: 400px
name: mase-fig
---
The MASE algorithm
```

### A Collection of Networks

Suppose we have a set of networks generated from Stochastic Block Models with two communities in each network. The networks have aligned nodes -- meaning that the $i_{th}$ row of all of their adjacency matrices represent the same nodes. The nodes also all belong to the same communities. However, edge probabilities might change depending on the network - in the first network, you might have nodes in the same community having a high chance of connecting to each other, whereas in the second network, nodes are much more likely to be connected to other nodes in different communities. Your goal is to find a common space in which to represent all of these networks, that captures the common community structure in all of these.

Below is Python code which generates four networks with Stochastic Block Models. The labels are the same across the networks (which means that nodes have a consistent community no matter which network you're looking at), but the block probability matrices are quite different.

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:
import numpy as np
from graspologic.simulations import sbm

n = 100
p1, p2, p3 = .12, .06, .03
A1, labels = make_network(p1, p3, p3, p1, 
                      return_labels=True)
A2 = make_network(p1, p3, p3, p2)
A3 = make_network(p3, p2, p2, p3)
A4 = make_network(p1, p3, p3, p3)

In [None]:
from graspologic.plot import binary_heatmap
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap


fig, axs = plt.subplots(2, 2, figsize=(7,7))
for ax, graph in zip(axs.flat, [A1, A2, A3, A4]):
    hmap = binary_heatmap(graph, ax=ax, cbar=False)
    for spine in ax.spines.values():
        spine.set_visible(True)
    hmap.vlines(n, 0, n*2, colors="black", lw=.9, linestyle="dashed", alpha=.8)
    hmap.hlines(n, 0, n*2, colors="black", lw=.9, linestyle="dashed", alpha=.8)
    
plt.suptitle("Four different networks", fontsize=26, y=.95)
fig.subplots_adjust(hspace=.05, wspace=.05)

cmap = ListedColormap(["white", "black"])
cbar_ax = fig.add_axes([.95, 0.13, 0.07, 0.75])
cbar = fig.colorbar(hmap.imshow(A1, cmap=cmap), cax=cbar_ax)
cbar.set_ticks([0.25, 0.75])
cbar.set_ticklabels(["No Edge", "Edge"])

Now, we want to embed each of the four networks separately. Doing this lets us see what the individual embeddings of the networks each look like. We'll also need these embeddings to unify the networks all into a single space.

In [None]:
from graspologic.embed import AdjacencySpectralEmbed as ASE

networks = [A1, A2, A3, A4]
latents = []
for network in networks:
    ase = ASE(n_components=2)
    latent = ase.fit_transform(network)
    latents.append(latent)

In [None]:
fig, axs = plt.subplots(2, 2, figsize=(7,7), sharex=True, sharey=True)
for i, ax in enumerate(axs.flat):
    plot_latents(latents[i], title=None, labels=labels, ax=ax)
    ax.yaxis.set_major_locator(plt.MaxNLocator(3))
plt.suptitle("Embeddings of our four networks", fontsize=20);

# TODO: add legend

Now, we want to take these individual embeddings, concatenate them into a single matrix, and then embed the result.

In [None]:
from graspologic.embed import selectSVD

concatenated = np.hstack(latents)
joint_embedding, *_ = selectSVD(concatenated, n_components=2)

In [None]:
plot_latents(joint_embedding, title="Joint embedding of our four networks",
             labels=labels)

## Using Graspologic

In [None]:
from graspologic.embed import MultipleASE as MASE

mase = MASE(n_components=2)
latents = mase.fit_transform(networks)

plot_latents(latents, title="MASE embedding", labels=labels)