# Hierarchical stochastic block model (HSBM)

In [None]:
import numpy as np
from graspologic.plot import heatmap

An hierarchical stochastic block model (HSBM) consists of a hierarchy of stochastic block models (SBMs) where at each level, a graph can be considered as an SBM that is nested in a larger SBM (for details on SBM, please see the tutorial on SBM). The model can be useful for studying the community structure of data with a natural hierarchical structure. Similar to an SBM, an HSBM is defined by $K$, the number of blocks, $B$, a $K\times K$ block-block connectivity matrix, $n$, the number of nodes in the graph, and $l$, the number of levels.

Let us consider a 2-level HSBM (undirected, no self-loops) made up of 2 SBMs either of which is composed of 2 smaller SBMs. Define the block-blcok connectivity matrices corresponding to the 2 SBMs at the coarser level as B1 and B2, respectively.

In [None]:
B1 = np.array([[0.5, 0.1], [0.1, 0.6]])
B2 = np.array([[0.8, 0.1], [0.1, 0.8]])

In [None]:
# synthetic data

from graspologic.simulations import er_np, sbm
from graspologic.models.base import _n_to_labels
np.random.seed(9)

# first generate an Erdos-Reyni (ER) model of 200 nodes 
# where the probability of an edge between any pair of nodes is 0.01
# (see the tutorial on ER model for more details)
n = np.array([100, 100]).astype(int)
block_labels = _n_to_labels(n)
n_verts = np.sum(n)
global_p = 0.01
graph = er_np(n_verts, global_p)

# then add 2 equivalent blocks to the ER graph either of which is an SBM 
# with a block connectivity matrix defined before
prop = np.array([[0.5, 0.5], [0.5, 0.5]])
B_list = [B1, B2]
for i, n_sub_verts in enumerate(n):
    p = prop[i, :]
    n_vec = (n_sub_verts * p).astype(int)
    B = B_list[i]
    subgraph = sbm(n_vec, B)
    inds = block_labels == i
    graph[np.ix_(inds, inds)] = subgraph

heatmap(graph, title="Synthetic HSBM")

In [None]:
# fit an HSBM model to the synthetic graph

np.random.seed(1)
hsbm = HSBMEstimator(max_level=2)
hsbm.fit(graph)

In [None]:
# compute the parameters for the first level
hsbm.compute_model_params(graph, 1)
# plot the probability matrix
heatmap(
    hsbm.p_mat_, vmin=0, vmax=1, title="HSBM probability matrix at level 1", font_scale=1.5
)

In [None]:
# similarly for the second level
hsbm.compute_model_params(graph, 2)
heatmap(
    hsbm.p_mat_, vmin=0, vmax=1, title="HSBM probability matrix at level 2", font_scale=1.5
)

In [None]:
# generate a sample graph which has the structure of the second level of the model
heatmap(
    hsbm.sample()[0], title="HSBM sample at level 2", font_scale=1.5
)