# Simplicial Paths as Higher-Order Motifs

A strong inductive bias for deep learning models is processing signals while respecting the local structure of their underlying space. Many real-world systems operate on asymmetric relational structures, leading to directed graph representations. However, most graph and topological models forcibly symmetrize these relationships, thereby losing critical information. While some graph neural networks have recently started incorporating **asymmetric** pairwise interactions, extending the topological deep learning (TDL) framework to account for asymmetric higher-order relationships remains unexplored.

Recent studies have examined cascading dynamics on networks at the simplicial level [2]. In Topological Data Analysis (TDA), the use of topological tools to address questions in neuroscience has generated interest in constructing topological spaces from digraphs to better understand the phenomena they support [3].

For this reason, we suggest using **maximal simplicial paths**, deerived from **directed graphs**, as cell of a **combinatorial complex**. Therefore, we are proposing a lifting **from directed graphs to combinatorial complexes**.

Next, we provide an introduction to the fundamental concepts underlying our approach. For a more comprehensive exploration of these basics, we refer the reader to [1]. To the best of our knowledge, this is the first lifting taking into account an **higher-order notion of directionality** in defining cells, differently from, e.g., taking directly as cells the simplices of a directed flag complex (see below).

## Complexes

**Directed Graphs**

A *directed graph* (digraph) is a pair $G = (V,E)$ of a finite set $V$ of vertices and $E \subseteq [V]^2/\Delta_V$ is a relation, where $\Delta_V = \{(v,v)|v \in V\}$. Note that the relation is not necessarily symmetric. Quotienting by $\Delta_V$ we avoid loops on the graph, i.e., no edges $(v,v)$.

**Abstract Simplicial Complexes**

An *abstract simplicial complex* is a pair $K = (V, \Sigma)$, where $V$ is a finite set of vertices, and $\Sigma$ is a collection of subsets of $\Sigma$ such that for all element $\sigma \in \Sigma$, $\tau \subseteq \sigma$ implies $\tau \in \Sigma$. An element $\sigma$ of $\Sigma$ is an *abstract simplex* of $\mathcal{K}$. It is a *k-simplex* if $|\sigma| = k+1$. If $\tau \subseteq \sigma \in \mathcal{K}$, then $\tau$ is a face of $\sigma$. If the dimension $\tau$ is $\dim(\tau) = \dim(\sigma) - 1$, then it is a *facet* of $\sigma$. The *dimension* $\dim(\mathcal{K})$ of $\mathcal{K}$ is the maximal dimension of a simplex in $\mathcal{K}$.

There is a standard way of building an abstract simplicial complex from a graph.

**Flag Complex**

Given a graph $G$, its associated flag complex is the abstract simplicial complex whose $k$-simplices are formed by the $(k+1)$-cliques of the graph.

The following are the natural generalization of flag complexes for digraphs.

**Directed Flag Complex**

An ordered $k$-clique of a directed graph $G$ is a totally ordered $k$-tuple $(v_1, \dots, v_n)$ of vertices of $G$ with the property that $(v_i, v_{j}) \in E$ for $i < j$. Given a digraph $G$, its directed flag complex is the abstract simplicial complex whose simplices are all the directed $(k+1)$-cliques.


## Simplicial Paths

**Edge paths on digraphs**

A path on a digraph is a sequence $(v_0, v_1, \dots, v_n)$ such that any consecutive pair $(v_i, v_{i+1}) \in E$, moving from a source vertex to a sink vertex. Directed graphs support various directed edge paths.

Directed cliques have an inherent directionality, which we exploit to extend the notion to higher-dimensional simplicial paths formed by sequences of simplices in the directed flag complex.

We will impose the notion of direction via face maps.

**Face maps**

Face maps uniquely identify the faces of the simplex by omitting the $i$th-vertex. Let $\sigma$ be an $n$-simplex. We denote by $\hat{d}_i$ the face map

$$
\hat{d}_i(\sigma) =
\begin{cases}
(v_0, \ldots, \hat{v}_i, \ldots, v_n) & \text{if } i < n, \\
(v_0, \ldots, v_{n-1}, \hat{v}_n) & \text{if } i \geq n.
\end{cases}
$$

**Directed Q-Connectivity**

For an ordered simplicial complex $K$, let $(\sigma, \tau)$ be an ordered pair of simplices $\sigma \in K_s$ and $\tau \in K_t$, where $s, t \geq q$. Let $(\hat{d}_i, \hat{d}_j)$ be an ordered pair of the $ith$ and  $jth$ face maps. Then $(\sigma, \tau)$ is $q$-*near along* $(\hat{d}_i, \hat{d}_j)$ if either of the following conditions is true:

1. $\sigma \leftrightarrow \tau$,
2. $\hat{d}_i(\sigma) \leftrightarrow \alpha \leftrightarrow \hat{d}_j(\tau)$, for some $q$-simplex $\alpha \in K$.

By closing the directed q-nearness transitively, the ordered pair $(\sigma, \tau)$ of simplices of $K$ is $q$-*connected along* $(\hat{d}_i, \hat{d}_j)$ if there is a sequence of simplices in $K$,

$$\sigma = \alpha_0, \alpha_1, \alpha_2, \ldots, \alpha_n, \alpha_{n+1} = \tau,$$

such that any two consecutive ones are $q$-*near along* $(\hat{d}_i, \hat{d}_j)$. The sequence of simplices is called a $q$-*connection* along $(\hat{d}_i, \hat{d}_j)$ between $\sigma$ and $\tau$ or $(q, \hat{d}_i, \hat{d}_j)$-*connection*, when the choices of $q$ and directions $\hat{d}_i$ and $\hat{d}_j$ are made. From now on we refer $(q, \hat{d}_i, \hat{d}_j)$ as $(q, i, j)$.

*Theorem* The relation of being $(q,i,j)$-connected is a preorder on $\Sigma_{\geq q}$.

**Directions and Simplicial Paths as Topological Information**

Instead of focusing on the path structure of the digraph, we look at the path structure of the high-dimensional simplices by exploring the $q$-connectivity preorder.

Different choices of $q,i,j$ allow to enphasize different features of directionality. For instance, $(1,0,2)$-connected paths of 2-simplices exhibit directed flows aligned with the directionality of the total order of the adjacent simplices. On the other hand, the $(1,1,2)$ preorder reveals circular flows around a source vertex

<p align="center">
    <img src="./figures/sp.jpeg" alt="Alt Text" style="max-width: 50%; max-height: 50%;">
</p>

The $(q, i, j)$-connections exhibit different homotopical information compared to the original complex arising from the structure of the digraph. The following two digraphs span a $2$-dimensional directed flag complex homotopic to the $2$-sphere, making them indistinguishable by homology. However, by examining the $(1,0,2)$ and $(1,1,2)$ preorders, we can homotopically distinguish these complexes. The $(1,1,2)$ preorder, in particular, allows us to identify circular flows in both the upper and lower hemispheres. Specifically, the first complex has a circular flow only in the upper hemisphere, whereas the second complex exhibits circular flows in both the upper and lower hemispheres.

<p align="center">
    <img src="./figures/sph.jpeg" alt="Alt Text" style="max-width: 50%; max-height: 50%;">
</p>

## References

[1] Henri Riihïmaki. [Simplicial q-Connectivity of Directed Graphs with Applications to Network Analysis](https://arxiv.org/pdf/2202.07307).

[2] Bengier Ulgen, Dane Taylor. [Simplicial cascades are orchestrated by the multidimensional geometry of neuronal complexes](https://arxiv.org/pdf/2201.02071).

[3] Dane Taylor, Florian Klimm. [Topological data analysis of contagion maps for examining spreading processes on networks](https://arxiv.org/pdf/1408.1168)

[4] D. Lütgehetmann, D. Govc, J.P. Smith, and R. Levi. [Computing persistent homology of directed flag complexes](https://arxiv.org/pdf/arXiv:1906.10458).


### Imports and utilities

In [6]:
# With this cell any imported module is reloaded before each cell execution
%load_ext autoreload
%autoreload 2
from modules.data.load.loaders import GraphLoader
from modules.data.preprocess.preprocessor import PreProcessor
from modules.utils.utils import (describe_data, load_dataset_config, load_model_config, load_transform_config, )

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Loading the Dataset

In this section, we load the dataset. We will use the DBLP dataset, which is a citation network dataset. The dataset is loaded using the CitationFull class from torch_geometric.datasets.citation_full instead of the proposed challenge databases because our interest lies in directed networks. The dataset is then described using the describe_data function.

In [7]:
import os.path as osp
from typing import Callable, Optional

from torch_geometric.data import InMemoryDataset, download_url
from torch_geometric.io import read_npz


class CitationFull(InMemoryDataset):
    r"""The full citation network datasets from the
    `"Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via
    Ranking" <https://arxiv.org/abs/1707.03815>`_ paper.
    Nodes represent documents and edges represent citation links.
    Datasets include :obj:`"Cora"`, :obj:`"Cora_ML"`, :obj:`"CiteSeer"`,
    :obj:`"DBLP"`, :obj:`"PubMed"`.

    Args:
        root (str): Root directory where the dataset should be saved.
        name (str): The name of the dataset (:obj:`"Cora"`, :obj:`"Cora_ML"`
            :obj:`"CiteSeer"`, :obj:`"DBLP"`, :obj:`"PubMed"`).
        transform (callable, optional): A function/transform that takes in an
            :obj:`torch_geometric.data.Data` object and returns a transformed
            version. The data object will be transformed before every access.
            (default: :obj:`None`)
        pre_transform (callable, optional): A function/transform that takes in
            an :obj:`torch_geometric.data.Data` object and returns a
            transformed version. The data object will be transformed before
            being saved to disk. (default: :obj:`None`)
        to_undirected (bool, optional): Whether the original graph is
            converted to an undirected one. (default: :obj:`True`)
        force_reload (bool, optional): Whether to re-process the dataset.
            (default: :obj:`False`)

    **STATS:**

    .. list-table::
        :widths: 10 10 10 10 10
        :header-rows: 1

        * - Name
          - #nodes
          - #edges
          - #features
          - #classes
        * - Cora
          - 19,793
          - 126,842
          - 8,710
          - 70
        * - Cora_ML
          - 2,995
          - 16,316
          - 2,879
          - 7
        * - CiteSeer
          - 4,230
          - 10,674
          - 602
          - 6
        * - DBLP
          - 17,716
          - 105,734
          - 1,639
          - 4
        * - PubMed
          - 19,717
          - 88,648
          - 500
          - 3
    """

    url = 'https://github.com/abojchevski/graph2gauss/raw/master/data/{}.npz'

    def __init__(
        self,
        root: str,
        name: str,
        transform: Optional[Callable] = None,
        pre_transform: Optional[Callable] = None,
        to_undirected: bool = False,
        force_reload: bool = False,
    ) -> None:
        self.name = name.lower()
        self.to_undirected = to_undirected
        assert self.name in ['cora', 'cora_ml', 'citeseer', 'dblp', 'pubmed']
        super().__init__(root, transform, pre_transform,
                         force_reload=force_reload)
        self.load(self.processed_paths[0])

    @property
    def raw_dir(self) -> str:
        return osp.join(self.root, self.name, 'raw')

    @property
    def processed_dir(self) -> str:
        return osp.join(self.root, self.name, 'processed')

    @property
    def raw_file_names(self) -> str:
        return f'{self.name}.npz'

    @property
    def processed_file_names(self) -> str:
        suffix = 'undirected' if self.to_undirected else 'directed'
        return f'data_{suffix}.pt'

    def download(self) -> None:
        download_url(self.url.format(self.name), self.raw_dir)

    def process(self) -> None:
        data = read_npz(self.raw_paths[0], to_undirected=self.to_undirected)
        data = data if self.pre_transform is None else self.pre_transform(data)
        self.save([data], self.processed_paths[0])

    def __repr__(self) -> str:
        return f'{self.name.capitalize()}Full()'


In [8]:
dataset  = CitationFull(root = "data/cora", name = "cora")
describe_data(dataset)

Downloading https://github.com/abojchevski/graph2gauss/raw/master/data/cora.npz
Processing...
Done!



Dataset only contains 1 sample:
[]
 - Graph with 19793 vertices and 65311 edges.
 - Features dimensions: [8710, 0]
 - There are 0 isolated nodes.



## Loading and Applying the Lifting

In this section, we will instantiate the lifting we want to apply to the data. We generate a combinatorial complex from the directed graph satisfying the following conditions:

- Rank 0 cells are the vertices of the graph.
- Rank 1 cells are the directed edges of the graph.
- Rank 2 cells are the maximal simplicial paths of length greater than 1 obtained from the (1,1,2)-connectivity preorder arising from the directed flag complex associated with our directed network dataset. (circular flows around a source vertex)

The threshold length and the face maps imposing the directionality are defined in the transform_config dictionary and can be modified according to the user’s needs.


In [9]:
transform_type = "liftings"
transform_id = "graph2combinatorial/sp_lifting"
transform_config = {"lifting": load_transform_config(transform_type, transform_id)}


Transform configuration for graph2combinatorial/sp_lifting:

{'transform_type': 'lifting',
 'transform_name': 'SimplicialPathsLifting',
 'd1': 2,
 'd2': 2,
 'q': 1,
 'i': 1,
 'j': 2,
 'complex_dim': 2,
 'chunk_size': 1024,
 'threshold': 1}


We apply the transform via the preprocessor and describe the resulting dataset.

In [10]:
lifted_dataset = PreProcessor(dataset, transform_config, "data/pubmed")
describe_data(lifted_dataset)

Processing...


[[3829, 3831], [3829, 3832], [3837, 3831], [3837, 3832], [4903, 4896], [4903, 4897], [5712, 5715], [5712, 5716], [5712, 5717], [8772, 8779], [8777, 8779], [9660, 9662], [9661, 9662], [9990, 9988], [9990, 9989], [10012, 10006], [10012, 10007], [10012, 10008], [11623, 11619], [11623, 11620], [11623, 11621], [11629, 11619], [11629, 11620], [11629, 11621], [11958, 11955], [11958, 11956], [15921, 15918], [15921, 15919], [15925, 15918], [15925, 15919], [15972, 15975], [15972, 15976], [15972, 15977], [15972, 15978], [15972, 15979], [15984, 15975], [15984, 15976], [15984, 15977], [15984, 15978], [15984, 15979], [19923, 19925], [19923, 19926], [19923, 19927], [20788, 20786], [22427, 22428], [22427, 22429], [22427, 22430], [23172, 23167], [23172, 23168], [24356, 24360, 24357], [24358, 24360, 24356, 24361], [24358, 24360, 24357], [24358, 24361], [24363, 24360, 24356, 24361], [24363, 24360, 24357], [24363, 24361], [24614, 24617], [24985, 24986], [25347, 25362], [25347, 25363], [25347, 25364], [253

Done!



Dataset only contains 1 sample:
[19793, 127]
 - Graph with 19793 vertices and 127 edges.
 - Features dimensions: [8710, 8710]
 - There are 0 isolated nodes.



## Create and Run a NN Model

In this section a simple model is created to test that the used lifting works as intended.

In [11]:
from modules.models.combinatorial.spcc import SPCCNN
model_type = "combinatorial"
model_id = "spccnn"
n_nodes = dataset.x.shape[0]
model = SPCCNN(channels_per_layer = [[[lifted_dataset.x.shape[1],lifted_dataset.x.shape[1]],[1,1]]], out_channels_0 = 1)

In [12]:
y_hat = model(lifted_dataset.get(0))