# Initializing the notebook

This notebook provides an introduction to using sparsebm on GPU with _Google Colab_.

**⚠️ Do not skip this step ⚠️**

You must enable GPUs for the notebook:
 - Navigate to Edit→Notebook Settings;
 - Select GPU from the Hardware Accelerator drop-down list.


# Installing `sparsebm` and importing the module

The SparseBM module is distributed through the [PyPI repository](https://pypi.org/project/sparsebm/) and the documentation is available [here](https://jbleger.gitlab.io/sparsebm).

On _Google colab_, the `cupy` module to use GPU is already installed. Only `sparsebm` needs to be installed:

In [None]:
# estimated time in colab : <10s
!pip install --upgrade sparsebm

Now, we just have to import the module.

In [None]:
# estimated time in colab: <1s
import sparsebm

# Example with the Stochastic Block Model with a known number of classes

First, we generate a synthetic graph. To illustrate the behavior, we choose to generate a large graph ($10^4$ nodes), with 4 clusters, with an average degree of 40.

Note that you should consider a smaller size if the GPU you've been allocated doesn't have enough memory to handle graphs of this size.

In [None]:
# estimated time in colab: <1m
import numpy as np

N=10**4
U = np.random.uniform(size=(4,4))
connection_matrix = 40/N*2*U/U.mean()
connection_matrix[np.tril_indices(4)]=connection_matrix.T[np.tril_indices(4)]

dataset = sparsebm.generate_SBM_dataset(symmetric=True, number_of_nodes=N, number_of_clusters=4, connection_probabilities=connection_matrix)

We can now access the generated dataset with the `dataset` object. The most useful values are `dataset.data` (the sparse adjacency matrix), `dataset.labels` (the simulated labels). Other attributes or properties can be accessed.

In [None]:
dataset.data

## Inference with a known number of groups

In this part, we assume the number of clusters is known (4 in this example).

We can now perform inference, declaring the `model` object and fiting the model. The module uses here the scikit-learn syntax.

In [None]:
# estimated time in colab: <2m
model = sparsebm.SBM(4)
model.fit(dataset.data, symmetric=True)

We can now compare the infered labels in `model.labels` to the simulated labels in `dataset.labels` using the adjusted Rand index:


In [None]:
# estimated time in colab: <1s
sparsebm.utils.ARI(dataset.labels, model.labels)

Since the ARI is close to 1, we can conclude that the inference is able to retrieve the graph structure. Other elements can be extracted.

### Inference with an unknown number of groups
In this part, we assume that the number of clusters is unknown.

In [None]:
# estimated time in colab: <5m
model_selection = sparsebm.ModelSelection(model_type="SBM", plot=False)
models = model_selection.fit(dataset.data, symmetric=True)

We can show the best fit:

In [None]:
models.best

In [None]:
# estimated time in colab: <1s
sparsebm.utils.ARI(dataset.labels,models.best.labels) # for the best (for ICL) fit

We can also examine a specific model for a arbitrary number of groups:

In [None]:
models[5]

In [None]:
# estimated time in colab: <1s
sparsebm.utils.ARI(dataset.labels,models[5].labels) # to use the result for 4 groups

We can show the ICL as a function of the number of groups

In [None]:
import matplotlib.pyplot as plt

plt.plot(models.keys(), [m.get_ICL() for m in models.values()])
plt.xlabel('number of groups')
plt.ylabel('ICL')