# Align all and Compute for Graphs

$\textbf{Lead Author: Anna Calissano}$

Dear learner, 

the aim of the current notebook is to introduce the align all and compute as a learning method for graphs. The align all and compute allows to estimate the Frechet Mean, the Generalized Geodesic Principal Components and the Regression. In this notebook you will learn how use all the learning methods.

In [None]:
import os
import sys
import warnings

sys.path.append(os.path.dirname(os.getcwd()))
warnings.filterwarnings("ignore")

import geomstats.backend as gs

gs.random.seed(2020)

from geomstats.geometry.euclidean import EuclideanMetric
from geomstats.geometry.symmetric_matrices import SymmetricMatrices, MatricesMetric
from geomstats.geometry.stratified.graph_space import (
    GraphPoint,
    GraphSpace,
    GraphSpaceMetric,
)
from geomstats.learning.aac import AAC
import networkx as nx
import geomstats.backend as gs
import matplotlib.pyplot as plt
import random

Step 1. Importing or simulating a set of graphs. Here we use networkx as a package to simulate the data.

In [None]:
graphset_1= gs.array([nx.to_numpy_matrix(nx.erdos_renyi_graph(n=5,p=0.6,directed=True)) for i in range(10)])
graphset_2= gs.array([nx.to_numpy_matrix(nx.erdos_renyi_graph(n=5,p=0.6,directed=True)) for i in range(100)])
graphset_3= gs.array([nx.to_numpy_matrix(nx.erdos_renyi_graph(n=3,p=0.6,directed=True)) for i in range(1000)])


Step 2. Initializing the embedding space and the metric. In this example we select the euclidean metric as the one compatible with both the GGPCA and Regression model.

In [None]:
graph_space = GraphSpace(n_nodes= 5)
graph_space.n_nodes

In [None]:
gs_m= GraphSpaceMetric(space=graph_space)
gs_m.total_space_metric

### Understanding the metric

In [None]:
gs_m.set_aligner('FAQ')

Given the FAQ alignment and the default Frobenious norm, we match two graphs and a set of graphs to a base graph:

In [None]:
perm=gs_m.align_point_to_point(base_graph= graphset_1[0], graph_to_permute =graphset_1[1])
gs_m.align_point_to_point(base_graph= graphset_1[0], graph_to_permute =graphset_1[1:3])

To compute the metric we can either call the distance function:

In [None]:
gs_m.dist(graphset_1[0], graphset_1[1])

If the matching has been already run, we can permute the graph and use the identity matcher in the distance, to avoid computing the matching twice:

In [None]:
gs_m.set_aligner('ID')
graph_permuted=graph_space.permute(graphset_1[1], perm)
gs_m.dist(graphset_1[0], graph_permuted)

In [None]:
gs_m.align_point_to_point(graphset_1[0], graphset_1[1])

Option 2: change total space metric:

In [None]:
gs_m.total_space_metric=MatricesMetric(n=5, m=5)

Step 3. Initializing the align all and compute algorithm as compatible with the given space and metric.

### Frechet Mean
Reference: Calissano, A., Feragen, A., & Vantini, S. (2020). Populations of unlabeled networks: Graph space geometry and geodesic principal components. MOX Report.

Given $\{[X_1], \dots, [X_k]\}, [x_i] \in X/T$, we estimate the Frechet Mean using AAC consisting on two steps:
1. Compute $\hat{X}$ as arithmetic mean of $\{X_1, \dots, X_k\}, X_i \in X$ 
2. Using graph to graph alignment to find $\{X_1, \dots, X_k\}, X_i \in X$ optimally aligned with $\hat{X}$

In [None]:
aac_fm= AAC(estimate='frechet_mean', metric=gs_m)

In [None]:
fm =aac_fm.fit(graphset_2)

In [None]:
fm.estimate_

### Principal Components
Reference: Calissano, A., Feragen, A., & Vantini, S. (2020). Populations of unlabeled networks: Graph space geometry and geodesic principal components. MOX Report.

We estimate the Generalized Geodesics Principal Components using AAC. Given $\{[X_1], \dots, [X_k]\}, (s_i,[X_i]) \in X/T $ we are searching for:
$$\gamma: \mathbb{R}\rightarrow X/T$$ generalized geodesic principal component capturing the majority of the variability of the dataset. The AAC for ggpca works in two steps: 

1. finding $\delta: \mathbb{R}\rightarrow X$ principal component in the set of adjecency matrices $\{X_1, \dots, X_k\}, X_i \in X$ 
2. finding $\{X_1, \dots, X_k\}, X_i \in X$ as optimally aligned with respect to $\gamma$. The estimation required a point to geodesic aligment defined in the metric.

In [None]:
gs_m= GraphSpaceMetric(space=graph_space)

In [None]:
gs_m.set_point_to_geodesic_aligner('default',s_min=0, s_max=2)

In [None]:
aac_ggpca= AAC(estimate='ggpca', metric=gs_m, n_components=9)

In [None]:
aac_ggpca.fit(graphset_3)

In [None]:
graphset_3.shape

In [None]:
import geomstats.datasets.utils as data_utils
mammals = data_utils.load_mammals()

In [None]:
gs_m= GraphSpaceMetric(space=GraphSpace(n_nodes=20))

In [None]:
aac_ggpca.fit(mammals)

## Regression
Reference: Calissano, A., Feragen, A., & Vantini, S. (2022). Graph-valued regression: Prediction of unlabelled networks in a non-Euclidean graph space. Journal of Multivariate Analysis, 190, 104950.

We estimate a graph-to-value regression model to predict graph from scalar or vectors. Given $\{(s_1,[X_1]), \dots, (s_k, [X_k])\}, (s_i,[X_i]) \in \mathbb{R}^p\times X/T $ we are searching for:
$$f: \mathbb{R}^p\rightarrow X/T$$
where $f\in \mathcal{F}(X/T)$ is a generalized geodesic regression model, i.e., the canonical projection onto Graph Space of a regression line $h_\beta : \mathbb{R}^p\rightarrow X$ of the form $$h_\beta(s) = \sum_{j=1}^{p} \beta_i s_i$$
The AAC algorithm for regression combines the estimation of $h_\beta$ given $\{X_1, \dots, X_k\}, X_i \in X$
$$\sum_{i=0}^{k} d_X(h_\beta(s_i), X_i)$$
and the searching for $\{X_1, \dots, X_k\}, X_i \in X$ optimally aligned with respect to the prediction along the current regression model:
$$\min_{t\in T}d_X(h_\beta(s_i),t^TX_it)$$

In [None]:
s = gs.array([random.randint(0,10) for i in range(10)])

In [None]:
aac_reg= AAC(estimate='regression', metric=gs_m)

In [None]:
aac_reg.fit(s,graphset_1)

In [None]:
aac_reg.regressor.coef_

In [None]:
aac_reg.regressor.predict(s)