# Align all and Compute for Graphs

$\textbf{Lead Author: Anna Calissano}$

Dear learner, 

the aim of the current notebook is to introduce the align all and compute as a learning method for graphs. The align all and compute allows to estimate the Frechet Mean, the Generalized Geodesic Principal Components and the Regression. In this notebook you will learn how use all the learning methods.

In [1]:
import os
import sys
import warnings

sys.path.append(os.path.dirname(os.getcwd()))
warnings.filterwarnings("ignore")

import geomstats.backend as gs

gs.random.seed(2020)

from geomstats.geometry.euclidean import EuclideanMetric
from geomstats.geometry.stratified.graph_space import (
    Graph,
    GraphSpace,
    GraphSpaceMetric,
)
from geomstats.learning.aac import AAC
import networkx as nx
import geomstats.backend as gs
import matplotlib.pyplot as plt
import random

INFO: Using numpy backend


Step 1. Importing or simulating a set of graphs. Here we use networkx as a package to simulate the data.

In [2]:
graphset_1= gs.array([nx.to_numpy_matrix(nx.erdos_renyi_graph(n=5,p=0.6,directed=True)) for i in range(10)])
graphset_2= gs.array([nx.to_numpy_matrix(nx.erdos_renyi_graph(n=5,p=0.6,directed=True)) for i in range(100)])
graphset_3= gs.array([nx.to_numpy_matrix(nx.erdos_renyi_graph(n=3,p=0.6,directed=True)) for i in range(1000)])


Step 2. Initializing the embedding space and the metric. In this example we select the euclidean metric as the one compatible with both the GGPCA and Regression model.

In [3]:
graph_space = GraphSpace(n_nodes= 5)
graph_space.n_nodes

5

In [4]:
gs_m= GraphSpaceMetric(space=graph_space)
gs_m.total_space_metric

<geomstats.geometry.matrices.MatricesMetric at 0x1f560e62730>

Step 3. Initializing the align all and compute algorithm as compatible with the given space and metric.

### Frechet Mean
Reference: Calissano, A., Feragen, A., & Vantini, S. (2020). Populations of unlabeled networks: Graph space geometry and geodesic principal components. MOX Report.

Given $\{[X_1], \dots, [X_k]\}, [x_i] \in X/T$, we estimate the Frechet Mean using AAC consisting on two steps:
1. Compute $\hat{X}$ as arithmetic mean of $\{X_1, \dots, X_k\}, X_i \in X$ 
2. Using graph to graph alignment to find $\{X_1, \dots, X_k\}, X_i \in X$ optimally aligned with $\hat{X}$

In [5]:
aac_fm= AAC(estimate='frechet', metric=gs_m)

In [7]:
fm =aac_fm.fit(graphset_2)

In [8]:
fm.mean_estimator.estimate_

array([[0.  , 0.63, 0.6 , 0.62, 0.62],
       [0.55, 0.  , 0.58, 0.7 , 0.58],
       [0.64, 0.61, 0.  , 0.56, 0.65],
       [0.62, 0.55, 0.6 , 0.  , 0.61],
       [0.65, 0.55, 0.61, 0.67, 0.  ]])

In [9]:
fm.estimate_

array([[0.  , 0.63, 0.6 , 0.62, 0.62],
       [0.55, 0.  , 0.58, 0.7 , 0.58],
       [0.64, 0.61, 0.  , 0.56, 0.65],
       [0.62, 0.55, 0.6 , 0.  , 0.61],
       [0.65, 0.55, 0.61, 0.67, 0.  ]])

### Principal Components
Reference: Calissano, A., Feragen, A., & Vantini, S. (2020). Populations of unlabeled networks: Graph space geometry and geodesic principal components. MOX Report.

We estimate the Generalized Geodesics Principal Components using AAC. Given $\{[X_1], \dots, [X_k]\}, (s_i,[X_i]) \in X/T $ we are searching for:
$$\gamma: \mathbb{R}\rightarrow X/T$$ generalized geodesic principal component capturing the majority of the variability of the dataset. The AAC for ggpca works in two steps: 

1. finding $\delta: \mathbb{R}\rightarrow X$ principal component in the set of adjecency matrices $\{X_1, \dots, X_k\}, X_i \in X$ 
2. finding $\{X_1, \dots, X_k\}, X_i \in X$ as optimally aligned with respect to $\gamma$. The estimation required a point to geodesic aligment defined in the metric.

In [10]:
gs_m= GraphSpaceMetric(space=graph_space)

In [11]:
gs_m.set_p2g_aligner('default',s_min=0, s_max=2)

<geomstats.geometry.stratified.graph_space.PointToGeodesicAligner at 0x1f560e62df0>

In [12]:
aac_ggpca= AAC(estimate='ggpca', metric=gs_m, n_components=9)

In [13]:
aac_ggpca.fit(graphset_3)

_AACGGPCA(metric=<geomstats.geometry.stratified.graph_space.GraphSpaceMetric object at 0x000001F534569FA0>,
          n_components=9)

In [14]:
graphset_3.shape

(1000, 3, 3)

In [15]:
aac_ggpca.metric.p2g_aligner.perm_

array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       ...,
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

## Regression
Reference: Calissano, A., Feragen, A., & Vantini, S. (2022). Graph-valued regression: Prediction of unlabelled networks in a non-Euclidean graph space. Journal of Multivariate Analysis, 190, 104950.

We estimate a graph-to-value regression model to predict graph from scalar or vectors. Given $\{(s_1,[X_1]), \dots, (s_k, [X_k])\}, (s_i,[X_i]) \in \mathbb{R}^p\times X/T $ we are searching for:
$$f: \mathbb{R}^p\rightarrow X/T$$
where $f\in \mathcal{F}(X/T)$ is a generalized geodesic regression model, i.e., the canonical projection onto Graph Space of a regression line $h_\beta : \mathbb{R}^p\rightarrow X$ of the form $$h_\beta(s) = \sum_{j=1}^{p} \beta_i s_i$$
The AAC algorithm for regression combines the estimation of $h_\beta$ given $\{X_1, \dots, X_k\}, X_i \in X$
$$\sum_{i=0}^{k} d_X(h_\beta(s_i), X_i)$$
and the searching for $\{X_1, \dots, X_k\}, X_i \in X$ optimally aligned with respect to the prediction along the current regression model:
$$\min_{t\in T}d_X(h_\beta(s_i),t^TX_it)$$

In [16]:
s = gs.array([random.randint(0,10) for i in range(10)])

In [17]:
aac_reg= AAC(estimate='regression', metric=gs_m)

In [18]:
aac_reg.fit(s,graphset_1)

_AACRegressor(metric=<geomstats.geometry.stratified.graph_space.GraphSpaceMetric object at 0x000001F534569FA0>,
              regressor_kwargs={})

In [19]:
aac_reg.regressor.coef_

array([[-0.00000000e+00],
       [-0.00000000e+00],
       [-1.34920635e-01],
       [ 2.11640212e-02],
       [ 8.46560847e-02],
       [ 1.00529101e-01],
       [-0.00000000e+00],
       [ 7.93650794e-03],
       [-1.27687779e-17],
       [ 2.64550265e-03],
       [-6.87830688e-02],
       [ 1.32275132e-02],
       [-0.00000000e+00],
       [-6.61375661e-02],
       [-3.17460317e-02],
       [-2.11640212e-02],
       [-3.70370370e-02],
       [ 1.00529101e-01],
       [-0.00000000e+00],
       [-5.82010582e-02],
       [-5.55555556e-02],
       [ 4.49735450e-02],
       [ 1.85185185e-02],
       [-2.64550265e-03],
       [-0.00000000e+00]])

In [20]:
aac_reg.regressor.predict(s)

array([[[ 0.        ,  0.5       ,  0.22222222,  0.75925926,
          1.03703704],
        [ 0.98148148,  0.        ,  0.72222222,  0.5       ,
          0.40740741],
        [ 0.40740741,  0.53703704,  0.        ,  0.31481481,
          0.61111111],
        [ 0.74074074,  0.7962963 ,  0.98148148,  0.        ,
          0.53703704],
        [ 0.44444444,  0.92592593,  0.35185185,  0.59259259,
          0.        ]],

       [[ 0.        ,  0.5       ,  0.8968254 ,  0.65343915,
          0.61375661],
        [ 0.47883598,  0.        ,  0.68253968,  0.5       ,
          0.39417989],
        [ 0.75132275,  0.47089947,  0.        ,  0.64550265,
          0.76984127],
        [ 0.84656085,  0.98148148,  0.47883598,  0.        ,
          0.82804233],
        [ 0.72222222,  0.7010582 ,  0.25925926,  0.60582011,
          0.        ]],

       [[ 0.        ,  0.5       ,  0.76190476,  0.67460317,
          0.6984127 ],
        [ 0.57936508,  0.        ,  0.69047619,  0.5       ,
          0