# Simplicial Paths Lifting (Graph to Combinatorial)

Many real-world systems have an asymmetric relational structure leading to directed graph representations, but most graph and topological models forcibly symmetrize these relationships. While some graph neural networks have recently begun integrating asymmetric pairwise interactions, extending the TDL machinery to account for asymmetric higher-order relationships remains unexplored.

Digraphs naturally support various directed edge paths, but by definition, directed cliques have an inherent directionality imposed by their total order. Motivated by this fact, [44] extended the notion of q-connectivity to account for directionality and directed edge paths to simplicial paths formed by sequences of directed simplices.

Directed flag complexes, which generalize flag complexes to digraphs, have become popular tools in applied topology for capturing finer topological information of these spaces. Digraphs naturally support various directed edge paths, while simplices, arising from directed cliques in the flag complex, inherently possess directionality due to their total order. We introduce preorders via user-chosen face maps that encode directions of high-order simplices inducing simplicial paths.

## Background

**Directed graphs** A *directed graph* (digraph) is a pair $G = (V,E)$ of a finite set $V$ of vertices and $E \subseteq [V]^2/\Delta_V$ is a relation, where $\Delta_V = \{(v,v)|v \in V\}$. Note that the relation is not necessarily symmetric. Quotienting by $\Delta_V$ we avoid loops on the graph, i.e., no edges $(v,v)$.

**Paths on digraphs** A path on a digraph is a sequence $(v_0, v_1, \dots, v_n)$ such that any consecutive pair $(v_i, v_{i+1}) \in E$.  Going from a *source* vertex to a *sink* vertex.

**Abstract simplicial complexes** An *abstract simplicial complex* is a pair $K = (V, \Sigma)$, where $V$ is a finite set of vertices, and $\Sigma$ is a collection of subsets of $\Sigma$ such that for all element $\sigma \in \Sigma$, $\tau \subseteq \sigma$ implies $\tau \in \Sigma$. An element $\sigma$ of $\Sigma$ is an *abstract simplex* of $\mathcal{K}$. It is a *k-simplex* if $|\sigma| = k+1$. If $\tau \subseteq \sigma \in \mathcal{K}$, then $\tau$ is a face of $\sigma$. If the dimension $\tau$ is $\dim(\tau) = \dim(\sigma) - 1$, then it is a *facet* of $\sigma$. The *dimension* $\dim(\mathcal{K})$ of $\mathcal{K}$ is the maximal dimension of a simplex in $\mathcal{K}$.

**Flag complex** There is a standard way of building an abstract simplicial complex from a graph. Given a graph $G$, its associated flag complex is the abstract simplicial complex whose $k$-simplices are formed by the $(k+1)$-cliques of the graph. The construction is functorial.

**Directed Flag Complex** We consider directed flag complexes, the natural generalization of flag complexes for digraphs. An ordered $k$-clique of a directed graph $G$ is a totally ordered $k$-tuple $(v_1, \dots, v_n)$ of vertices of $G$ with the property that $(v_i, v_{j}) \in E$ for $i < j$. Given a digraph $G$, its directed flag complex is the abstract simplicial complex whose simplices are all the directed $(k+1)$-cliques. This construction can be regarded is also functorial. If $\phi: G_1 \rightarrow G_2$ is a morphism of digraphs via sending ordered cliques of $G_1$ to ordered cliques of $G_2$, it induces a simplicial morphism $F_\phi: dFl(G_1) \rightarrow dFl(G_1)$ by sending each simplex $\sigma \in dFl(G_1)$, hence an ordered clique $(v_0,\dots,v_k)$ of $G_1$, to the simplex $F_\phi = (\phi(v_0),\dots,\phi(v_k))$.

Directions

**Face maps** Face maps uniquely identify the faces of the simplex by suppressing the $i$th-vertex.

Let $\sigma$ be an $n$-simplex. We denote by $\hat{d}_i$ the face map

$$
\hat{d}_i(\sigma) =
\begin{cases}
(v_0, \ldots, \hat{v}_i, \ldots, v_n) & \text{if } i < n, \\
(v_0, \ldots, v_{n-1}, \hat{v}_n) & \text{if } i \geq n.
\end{cases}
$$


We impose directions via the following face maps

<img src="./figures/sphs.png" alt="Alt Text" width="50%" height="50%">



For an ordered simplicial complex $K$, let $(\sigma, \tau)$ be an ordered pair of simplices $\sigma \in K_s$ and $\tau \in K_t$, where $s, t \geq q$. Let $(\hat{d}_i, \hat{d}_j)$ be an ordered pair of the $ith$ and  $jth$ face maps. Then $(\sigma, \tau)$ is $q$-*near along* $(\hat{d}_i, \hat{d}_j)$ if either of the following conditions is true:

1. $\sigma \leftrightarrow \tau$,
2. $\hat{d}_i(\sigma) \leftrightarrow \alpha \leftrightarrow \hat{d}_j(\tau)$, for some $q$-simplex $\alpha \in K$.


<img src="./figures/simplicial_path.png" alt="Alt Text" width="50%" height="50%">

The ordered pair $(\sigma, \tau)$ of simplices of $K$ is $q$-*connected along* $(\hat{d}_i, \hat{d}_j)$ if there is a sequence of simplices in $K$,

$$\sigma = \alpha_0, \alpha_1, \alpha_2, \ldots, \alpha_n, \alpha_{n+1} = \tau,$$

such that any two consecutive ones are $q$-*near along* $(\hat{d}_i, \hat{d}_j)$. The sequence of simplices is called a $q$-*connection* along $(\hat{d}_i, \hat{d}_j)$ between $\sigma$ and $\tau$. We simply write this connection as $(\sigma\alpha_1\alpha_2 \ldots \alpha_n\tau)$. We will call the above connection $(q, \hat{d}_i, \hat{d}_j)$-*connection*, when the choices of $q$ and directions $\hat{d}_i$ and $\hat{d}_j$ are made, and similarly we say $(q, \hat{d}_i, \hat{d}_j)$-*near*. From now on we refer $(q, \hat{d}_i, \hat{d}_j)$ as $(q, i, j)$.

**Theorem** The relation of being $(q,i,j)$-connected is a preorder on $\Sigma_{\geq q}$.



Different choices of q,i,j allow to enphasize different features of directionality.

$(1,0,2)$-connected paths of 2-simplices point from source to target vertices.

On the other side, for example, $(1,1,2)$ reveal circular flows around a source vertex.

The $(q,i,j)$-connections exhibit different homotopical information than the original complex arising from the structure of the digraph.

The realization of the following digraphs is homotopy Homology cannot distinguish the geometric realization of the. However, choosing $(1,0,2)$ we are able to depict second order simplicial flows from $[0,1,2] \rightarrow [1,2,3]$ and $[0,2,1] \rightarrow [2,1,3]$. Also, $(1,1,2)$ depicts circular flows around vertices. In particular, we find circular flows on the upper and lower hemispheres. The first, just having a circular flow on the upper hemisphere, while the second in the upper and the lower.

Distinct from q-analysis, in q-directed analysis, the topological information but rather from a collection of topological spaces arising from the $(q,i,j)$-preorders, which in turn, can be seen as a digraph by quotienting along elements connected by arrows of opposite direction, which are in bijection with finite $T_0$ topological spaces.

This associates to a simplicial complex its generalized path components in a cannonical way as the connected components of the $q$-graph, captured the simplicial connections given the imposed directions.

Preorders are in bijection with topological spaces with Alexandroff topologies. Applying directed q-connectivity preorders allows to assign new homotopy types to the directed flag complex.

Instead of looking to the path structure of the digraph, we identify high-order motifs by exploring the path structure of high dimensional simplices in the q-digraph. This is the key idea behind the simplicial paths lifting.


The homotopy types of a simplicial complex and its face poset agree. Therefore, the relational structure of the q-connections contains the homotopical information of the complex on this intuition, thus being more discriminative than simplicial homology.


- Viewing the structure maps asa collection of digraphs, allows us for path searches


*Example:*


# TO DO

1. Dataset Loading
Implements the pipeline to load a dataset from the src domain. Since the challenge repository doesn’t allow storing large files, loaders must download datasets from external sources into the datasets/ folder.
This pipeline is provided for several graph-based datasets. For any other src domain, participants are allowed to transform graph datasets into the corresponding domain through our provided lifting mappings –or just dropping their connectivity to get point-clouds.
(Bonus) Designing a loader for a new dataset (ones that are not already provided in the tutorials) will be positively taken into consideration in the final evaluation.

2. Pre-processing the Dataset
Applies the lifting transform to the dataset.
Needs to be done through the PreProcessor, which we provide in
modules/io/preprocess/preprocessor.py.

3. Running a Model over the Lifted Dataset
Creates a Neural Network model that operates over the dst domain, leveraging TopoModelX for higher order topologies or torch_geometric for graphs.
Runs the model on the lifted dataset.

In [1]:
import csv
import time
import torch
import numpy as np
import networkx as nx
import scipy.sparse as sp
import pyflagsercount as pfc
import sys

sys.path.append("../../")
from modules.transforms.liftings.graph2combinatorial.sp_lifting import (DirectedFlagComplex as dfc, )

# from datasets.data_loading import get_dataset, get_dataset_split

In [2]:
# With this cell any imported module is reloaded before each cell execution
%load_ext autoreload
%autoreload 2
from modules.data.load.loaders import GraphLoader
from modules.data.preprocess.preprocessor import PreProcessor
from modules.utils.utils import (describe_data, load_dataset_config, load_model_config, load_transform_config, )

In [3]:
CHAMELEON = "chameleon"
CORNELL = "Cornell"
WISCONSIN = "Wisconsin"
TEXAS = "Texas"
ROMAN_EMPIRE = "directed-roman-empire"
SQUIRREL = "squirrel"
OGBN_ARXIV = "ogbn-arxiv"
SNAP_PATENTS = "snap-patents"
CORA_ML = "cora_ml"
CITESEER_FULL = "citeseer_full"
ARXIV_YEAR = "arxiv-year"
SYN_DIR = "syn-dir"

In [4]:
dataset_name = "cocitation_cora"
dataset_config = load_dataset_config(dataset_name)
loader = GraphLoader(dataset_config)


Dataset configuration for cocitation_cora:

{'data_domain': 'graph',
 'data_type': 'cocitation',
 'data_name': 'Cora',
 'data_dir': 'datasets/graph/cocitation',
 'num_features': 1433,
 'num_classes': 7,
 'task': 'classification',
 'loss_type': 'cross_entropy',
 'monitor_metric': 'accuracy',
 'task_level': 'node'}


In [5]:
dataset = loader.load()
describe_data(dataset)


Dataset only contains 1 sample:
 - Graph with 2708 vertices and 10556 edges.
 - Features dimensions: [1433, 0]
 - There are 0 isolated nodes.



In [6]:
dataset.edge_index.shape

torch.Size([2, 10556])

In [11]:
# Define transformation type and id
transform_type = "liftings"
# If the transform is a topological lifting, it should include both the type of the lifting and the identifier
transform_id = "graph2combinatorial/sp_lifting"

# Read yaml file
transform_config = {"lifting": load_transform_config(transform_type, transform_id)
    # other transforms (e.g. data manipulations, feature liftings) can be added here
}


Transform configuration for graph2combinatorial/sp_lifting:

{'transform_type': 'lifting',
 'transform_name': 'Graph2CombinatorialLifting',
 'd1': 2,
 'd2': 2,
 'q': 1,
 'i': 0,
 'j': 2,
 'complex_dim': 2,
 'offset': 'torch.tensor([[0], [0]])',
 'chunk_size': 1024,
 'save_path': 'None',
 'threshold': 1}


In [None]:
lifted_dataset = PreProcessor(dataset, transform_config, loader.data_dir)
describe_data(lifted_dataset)

In [None]:
# # %%

# def create_csv_datasets(dataset_name, dataset_dir="../dataset/"):
#     dataset, evaluator = get_dataset(dataset_name, dataset_dir)
#     source = dataset.edge_index[0].tolist()  # source
#     target = dataset.edge_index[1].tolist()  # target

#     csv_file_name = "./dataset/vis/original/" + dataset_name + ".csv"

#     with open(csv_file_name, "w", newline="") as file:
#         writer = csv.writer(file)

#         # Write the list content as rows
#         for a, b in zip(source, target):
#             writer.writerow([a, b])

#     print(f'CSV file "{csv_file_name}" created successfully.')


# def create_csv_condensations(dataset_name):

#     dataset_digraph = create_digraph_from_dataset(dataset_name)
#     condensation_digraph = nx.condensation(dataset_digraph)

#     condensation_digraph_edges = list(condensation_digraph.edges)

#     if dataset_name == "cora_ml":
#         dataset_name = "cora-ml"

#     if dataset_name == "citeseer_full":
#         dataset_name = "citeseer-full"

#     csv_file_name = "./dataset/vis/condensations/" + dataset_name + "-condensation.csv"

#     with open(csv_file_name, "w", newline="") as file:
#         writer = csv.writer(file)

#         # Write the list content as rows
#         for e in condensation_digraph_edges:
#             writer.writerow([e[0], e[1]])

#     print(f'CSV file "{csv_file_name}" created successfully.')


# def create_csv_condensations_from_dataset():

#     dataset_list = [
#         CHAMELEON,
#         ROMAN_EMPIRE,
#         SQUIRREL,
#         OGBN_ARXIV,
#         CORA_ML,
#         CITESEER_FULL,
#         ARXIV_YEAR,
#     ]

#     for dataset in dataset_list:
#         create_csv_datasets(dataset)
#         create_csv_condensations(dataset)


# def flagser_count(dataset_name, complex_dim=2, num_threads=4):
#     dataset_digraph = create_digraph_from_dataset(dataset_name)

#     sparse_adjacency_matrix = nx.to_scipy_sparse_array(dataset_digraph, format="csr")

#     start_time = time.time()

#     X = pfc.flagser_count(
#         sparse_adjacency_matrix,
#         threads=num_threads,
#         return_simplices=True,
#         max_dim=complex_dim,
#     )
#     end_time = time.time()
#     print("Time elapsed: ", end_time - start_time)
#     return X


# def create_digraph_from_dataset(dataset_name, dataset_dir="../dataset/"):
#     dataset, evaluator = get_dataset(dataset_name, dataset_dir)
#     dataset_digraph = nx.DiGraph()
#     dataset_digraph.add_edges_from(
#         list(zip(dataset.edge_index[0].tolist(), dataset.edge_index[1].tolist()))
#     )
#     print("Number of nodes: ", dataset_digraph.number_of_nodes(), " Number of edges: ", dataset_digraph.number_of_edges())
#     return dataset_digraph


# def create_flag_complex_from_dataset(dataset_name, dataset_dir, complex_dim=2):
#     dataset_digraph = create_digraph_from_dataset(dataset_name, dataset_dir)
#     flag_complex = dfc.DirectedFlagComplex(dataset_digraph, complex_dim)
#     return flag_complex


# def create_condensed_digraph_from_dataset(dataset_name):
#     dataset_digraph = create_digraph_from_dataset(dataset_name)
#     condensation_digraph = nx.condensation(dataset_digraph)
#     return condensation_digraph


# def dataset_stats(dataset_name, complex_dim=2):
#     G = create_digraph_from_dataset(dataset_name)
#     FlG = dfc.DirectedFlagComplex(G, complex_dim)
#     for d in range(complex_dim + 1):
#         print(dataset_name + " number of " + str(d) + "-simplices", len(FlG.complex[d]))
#     return FlG


# def qij(
#     dataset_name, dataset_dir, q, i, j, complex_dim=2, chunk_size=1024, save_path=None
# ):
#     FlG = create_flag_complex_from_dataset(dataset_name, dataset_dir, complex_dim)
#     return FlG.qij(q, i, j, chunk_size, save_path)


# if __name__ == "__main__":
#     DATASET_DIR = "../../dataset/"
#     qij(
#         WISCONSIN,
#         DATASET_DIR,
#         1,
#         0,
#         2,
#         complex_dim=2,
#         chunk_size=100,
#         # save_path="../../dataset/cornell/102.pt",
#         save_path=None
#     )
#     pass