# Working with TopoTune

In this tutorial, we go over the basic workings of [TopoTune](https://arxiv.org/pdf/2410.06530), a comprehensive framework for easily defining and training new, general TDL models on any domain. These models, called Generalized Combinatorial Complex Neural Networks, are built using any (graph) neural network, which we will denote ω. 

In a GCCN (pictured below), the input complex--whether it be a hypergraph, cell complex, simplicial complex, or combinatorial complex--is represented as an ensemble of graphs (specifically, strictly augmented Hasse graphs), one per neighborhood of the complex. Each of these Hasse graphs is processed by a sub model GNN (ω), and the outputs are rank-wise aggregated in between layers. 

![gccn](https://github.com/user-attachments/assets/97747900-8e5e-401c-9ad9-764e16e1698e)
**Generalized Combinatorial Complex Network (GCCN).** In this example, the input complex $\mathcal{C}$ has neighborhoods $\mathcal{N_C}$ = { $\mathcal{N_1}$ , $\mathcal{N_2}$, $\mathcal{N_3}$ }. **A.** The complex is expanded into three augmented Hasse graphs $\mathcal{G_\mathcal{N_i}}$ , $i=\{1,2,3\}$, each with features $H_\mathcal{N_i}$ represented as a colored disc. **B.** A GCCN layer dedicates one base architecture $\omega_\mathcal{N_i}$ **C.** The output of all the architectures $\omega_\mathcal{N_i}$ is aggregated rank-wise, then updated. In this example, only the complex's edge features (originally pink) are aggregated across multiple neighborhoods ($\mathcal{N_2}$ and $\mathcal{N_3}$).

## Table of contents<font><a class='anchor' id='top'></a>
We will go over **three example cases** of using TopoTune for training and defining GCCNs.

&emsp;[- Imports](##sec1)

&emsp;[- Use Case A:](##sec2) A GCCN using a GNN available by import (GAT imported from PyG) and a dataset either available in TopoBench or available in PyG-like format.

&emsp;[- Use Case B:](##sec3) A GCCN using a custom neural network.

&emsp;[- Use Case C:](##sec5) Running large scale experiments considering many different possible versions of Use Case A, as is the case in the [TopoTune](https://arxiv.org/pdf/2410.06530) paper.

In all of these cases, you are encouraged to try different options and exploit the flexibility of TopoTune. This could mean trying different combinations of neighborhoods, different sub-models, different architecture choices, different training schemes, or different datasets. The purpose of this Notebook is to allow these for such exploration without requiring greater knowledge of TopoBench.


## Imports <a class="anchor" id="sec1"></a>

In [19]:
import torch
import lightning as pl
# Hydra related imports
from omegaconf import OmegaConf
# Data related imports
from topobench.data.loaders.graph import TUDatasetLoader
from topobench.dataloader.dataloader import TBDataloader
from topobench.data.preprocessor import PreProcessor
# Model related imports
from topobench.model.model import TBModel
from topomodelx.nn.simplicial.scn2 import SCN2
from topobench.nn.wrappers.simplicial import SCNWrapper
from topobench.nn.encoders import AllCellFeatureEncoder
from topobench.nn.readouts import PropagateSignalDown
from topobench.nn.backbones.combinatorial.gccn import TopoTune
from topobench.nn.wrappers.combinatorial import TuneWrapper
from torch_geometric.nn import GAT
# Optimization related imports
from topobench.loss.loss import TBLoss
from topobench.optimizer import TBOptimizer
from topobench.evaluator.evaluator import TBEvaluator

## **Use Case A:** GCCN with imported GNN and dataset available in TopoBench <a class="anchor" id="sec2"></a>

In this example, we will define and train a GCCN using a GNN that is readily available in an imported package like PyTorch Geometric or Deep Graph Library. We will train and test the model with one of the many datasets avilabale in TopoBench. 

*Step 1 :* Define the choice of neighborhoods to be considered.

This is also where we will specify the neighborhoods to be considered. To specify a set of neighborhoods on the complex, use a list of neighborhoods each specified as a string of the form 
`r-{neighborhood}-k`, where $k$ represents the source cell rank, and $r$ is the number of ranks up or down that the selected `{neighborhood}` considers. Currently, the following options for `{neighborhood}` are supported:
- `up_laplacian`, between cells of rank $k$ through $k+r$ cells.
- `down_laplacian`, between cells of rank $k$ through $k-r$ cells.
- `hodge_laplacian`, between cells of rank $k$ through both $k-r$ and $k+r$ cells.
- `up_adjacency`, between cells of rank $k$ through $k+r$ cells.
- `down_adjacency`, between cells of rank $k$ through $k-r$ cells.
- `up_incidence`, from rank $k$ to $k+r$.
- `down_incidence`, from rank $k$ to $k-r$.

The number $r$ can be omitted, in which case $r=1$ by default (e.g. `up_incidence-k` represents the incidence from rank $k$ to $k+1$).
Here are some examples of neighborhoods with the stirng notation:

- node to node (up-Laplacian), through edges : `up_laplacian-0`
- node to node, through faces (up-Laplacian): `2-up_laplacian-0`
- edge to node (boundary, also called incidence): `down_incidence-1`
- face to edge (boundary): `down_incidence-2`
- face to node (boundary): `2-down_incidence-2`

In [68]:
neighborhoods = ["1-up_laplacian-0", "1-down_incidence-1", "1-down_incidence-2"]


 Now we define the model channels, choice of dataset, choice of lifting (i.e. the choice of topological domain), dataset split, and training scheme (readout, loss, evaluator, optimizer). Remark : when we run TopoBench from the command line, we rely on the yamls stored in `configs` to specify these choices (see Use Case D for command line examples).

In [None]:

in_channels = 7
out_channels = 2
dim_hidden = 16

In [20]:
loader_config = {
    "data_domain": "graph",
    "data_type": "TUDataset",
    "data_name": "MUTAG",
    "data_dir": "./data/MUTAG/",
    }

transform_config = { "cycle_lifting":
    {"transform_type": "lifting",
    "transform_name": "CellCycleLifting",
    "neighborhoods": neighborhoods,

    }
}

split_config = {
    "learning_setting": "inductive",
    "split_type": "random",
    "data_seed": 0,
    "data_split_dir": "./data/MUTAG/splits/",
    "train_prop": 0.5,
}

readout_config = {
    "readout_name": "PropagateSignalDown",
    "num_cell_dimensions": 3,
    "hidden_dim": dim_hidden,
    "out_channels": out_channels,
    "task_level": "graph",
    "pooling_type": "sum",
}

loss_config = {
    "dataset_loss": 
        {
            "task": "classification", 
            "loss_type": "cross_entropy"
        }
}

evaluator_config = {"task": "classification",
                    "num_classes": out_channels,
                    "metrics": ["accuracy", "precision", "recall"]}

optimizer_config = {"optimizer_id": "Adam",
                    "parameters":
                        {"lr": 0.001,"weight_decay": 0.0005}
                    }

loader_config = OmegaConf.create(loader_config)
transform_config = OmegaConf.create(transform_config)
split_config = OmegaConf.create(split_config)
readout_config = OmegaConf.create(readout_config)
loss_config = OmegaConf.create(loss_config)
evaluator_config = OmegaConf.create(evaluator_config)
optimizer_config = OmegaConf.create(optimizer_config)

*Step 2 :* Load the data. In this example we use the MUTAG dataset on the cell domain. In order to transform the dataset from the the graph domain to the cell domain, we use the cycle lifting. The README of the [repository](https://github.com/geometric-intelligence/TopoBench?tab=readme-ov-file#rocket-liftings--transforms) has more information on the various liftings offered. 

Remark: if a user wanted to run a custom graph dataset not offered in TopoBench, it would be sufficnet to check that it is formatted like a `PyTorchGeometric` graph dataset. It could then be passed to the `PreProessor` class for lifting.

Remark: the dataset needs to be re-loaded whenever the `neighborhood` object is modified.


In [60]:
graph_loader = TUDatasetLoader(loader_config)

dataset, dataset_dir = graph_loader.load()

preprocessor = PreProcessor(dataset, dataset_dir, transform_config)
dataset_train, dataset_val, dataset_test = preprocessor.load_dataset_splits(split_config)
datamodule = TBDataloader(dataset_train, dataset_val, dataset_test, batch_size=32)

Transform parameters are the same, using existing data_dir: data/MUTAG/MUTAG/cycle_lifting/1611498484


*Step 4 :* Define the model. This is where we select our model to be a GCCN, and specify which GNN is used to build the GCCN. As with the choice of dataset, since the GNN is readily available (in this case, from PyTorch Geometric), all we need is to specify the config.

In [22]:
sub_gccn_model = GAT(in_channels=dim_hidden, hidden_channels=dim_hidden, num_layers=1, out_channels=dim_hidden, heads=2, v2=False)

backbone_config = {
    "GNN": sub_gccn_model,
    "neighborhoods": neighborhoods,
    "layers": 2,
    "use_edge_attr": False,
    "activation": "relu"
}

backbone = TopoTune(**backbone_config)

Now that the model is defined we can create the TBModel, which takes care of implementing everything else that is needed to train the model. We will define a feature encoder and readout to book-end the GCCN (the `backbone` of the model) as well as instantiate a `loss`, `evaluator`, and `optimizer`.

In [61]:

feature_encoder = AllCellFeatureEncoder(in_channels=[in_channels, in_channels, in_channels], out_channels=dim_hidden)
readout = PropagateSignalDown(**readout_config)

loss = TBLoss(**loss_config)
evaluator = TBEvaluator(**evaluator_config)
optimizer = TBOptimizer(**optimizer_config)

Now we can instantiate the TBModel.

In [62]:
wrapper = TuneWrapper(backbone=backbone, out_channels=out_channels, num_cell_dimensions=3, residual_connections=False) # task_level="graph", pooling_type="sum")
model = TBModel(backbone=wrapper,
                 backbone_wrapper=None,
                 readout=readout,
                 loss=loss,
                 feature_encoder=feature_encoder,
                 evaluator=evaluator,
                 optimizer=optimizer,
                 compile=False)

*Step 5 :* Define the training scheme. This is where we specify the training scheme to be used. In this case, we will use the default training scheme. We can use the `lightning` trainer to train the model.

In [63]:
trainer = pl.Trainer(max_epochs=50, accelerator="cpu", enable_progress_bar=False, log_every_n_steps=1)
trainer.fit(model, datamodule)
train_metrics = trainer.callback_metrics

GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores


HPU available: False, using: 0 HPUs
/home/papillon/anaconda3/envs/tb/lib/python3.11/site-packages/lightning/pytorch/trainer/setup.py:177: GPU available but not used. You can set it by doing `Trainer(accelerator='gpu')`.

  | Name            | Type                  | Params | Mode 
------------------------------------------------------------------
0 | feature_encoder | AllCellFeatureEncoder | 1.3 K  | train
1 | backbone        | TuneWrapper           | 3.6 K  | train
2 | readout         | PropagateSignalDown   | 1.7 K  | train
3 | val_acc_best    | MeanMetric            | 0      | train
------------------------------------------------------------------
6.6 K     Trainable params
0         Non-trainable params
6.6 K     Total params
0.026     Total estimated model params size (MB)
32        Modules in train mode
64        Modules in eval mode
/home/papillon/anaconda3/envs/tb/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does

PARAMS
DomainDataBatch(x_2=[83, 16], x_0=[552, 16], 1-down_incidence-1=[552, 603], x_1=[603, 16], incidence_2=[603, 83], incidence_0=[0, 552], val_mask=[32], edge_index=[2, 1206], x=[552, 7], incidence_1=[552, 603], 1-down_incidence-2=[603, 83], train_mask=[32], test_mask=[32], edge_attr=[1206, 4], y=[32], 1-up_laplacian-0=[552, 552], ptr=[33], batch_2=[83], batch_0=[552], batch_1=[603], cell_statistics=[32, 3], model_state='Validation')
PARAMS
DomainDataBatch(x_2=[40, 16], x_0=[266, 16], 1-down_incidence-1=[266, 291], x_1=[291, 16], incidence_2=[291, 40], incidence_0=[0, 266], val_mask=[15], edge_index=[2, 582], x=[266, 7], incidence_1=[266, 291], 1-down_incidence-2=[291, 40], train_mask=[15], test_mask=[15], edge_attr=[582, 4], y=[15], 1-up_laplacian-0=[266, 266], ptr=[16], batch_2=[40], batch_0=[266], batch_1=[291], cell_statistics=[15, 3], model_state='Validation')
PARAMS
DomainDataBatch(x_2=[94, 16], x_0=[581, 16], 1-down_incidence-1=[581, 643], x_1=[643, 16], incidence_2=[643, 94

`Trainer.fit` stopped: `max_epochs=50` reached.


PARAMS
DomainDataBatch(x_2=[83, 16], x_0=[552, 16], 1-down_incidence-1=[552, 603], x_1=[603, 16], incidence_2=[603, 83], incidence_0=[0, 552], val_mask=[32], edge_index=[2, 1206], x=[552, 7], incidence_1=[552, 603], 1-down_incidence-2=[603, 83], train_mask=[32], test_mask=[32], edge_attr=[1206, 4], y=[32], 1-up_laplacian-0=[552, 552], ptr=[33], batch_2=[83], batch_0=[552], batch_1=[603], cell_statistics=[32, 3], model_state='Validation')
PARAMS
DomainDataBatch(x_2=[40, 16], x_0=[266, 16], 1-down_incidence-1=[266, 291], x_1=[291, 16], incidence_2=[291, 40], incidence_0=[0, 266], val_mask=[15], edge_index=[2, 582], x=[266, 7], incidence_1=[266, 291], 1-down_incidence-2=[291, 40], train_mask=[15], test_mask=[15], edge_attr=[582, 4], y=[15], 1-up_laplacian-0=[266, 266], ptr=[16], batch_2=[40], batch_0=[266], batch_1=[291], cell_statistics=[15, 3], model_state='Validation')


In [64]:
print('      Training metrics\n', '-'*26)
for key in train_metrics:
    print('{:<21s} {:>5.4f}'.format(key+':', train_metrics[key].item()))

      Training metrics
 --------------------------
train/accuracy:       0.8723
train/precision:      0.8518
train/recall:         0.8720
val/loss:             0.4800
val/accuracy:         0.7021
val/precision:        0.6647
val/recall:           0.6750
train/loss:           0.3171


In [65]:
trainer.test(model, datamodule)
test_metrics = trainer.callback_metrics

PARAMS
DomainDataBatch(x_2=[88, 16], x_0=[556, 16], 1-down_incidence-1=[556, 612], x_1=[612, 16], incidence_2=[612, 88], incidence_0=[0, 556], val_mask=[32], edge_index=[2, 1224], x=[556, 7], incidence_1=[556, 612], 1-down_incidence-2=[612, 88], train_mask=[32], test_mask=[32], edge_attr=[1224, 4], y=[32], 1-up_laplacian-0=[556, 556], ptr=[33], batch_2=[88], batch_0=[556], batch_1=[612], cell_statistics=[32, 3], model_state='Test')
PARAMS
DomainDataBatch(x_2=[51, 16], x_0=[297, 16], 1-down_incidence-1=[297, 333], x_1=[333, 16], incidence_2=[333, 51], incidence_0=[0, 297], val_mask=[15], edge_index=[2, 666], x=[297, 7], incidence_1=[297, 333], 1-down_incidence-2=[333, 51], train_mask=[15], test_mask=[15], edge_attr=[666, 4], y=[15], 1-up_laplacian-0=[297, 297], ptr=[16], batch_2=[51], batch_0=[297], batch_1=[333], cell_statistics=[15, 3], model_state='Test')



/home/papillon/anaconda3/envs/tb/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=127` in the `DataLoader` to improve performance.


In [66]:
print('      Testing metrics\n', '-'*25)
for key in test_metrics:
    print('{:<20s} {:>5.4f}'.format(key+':', test_metrics[key].item()))

      Testing metrics
 -------------------------
test/loss:           0.3787
test/accuracy:       0.7660
test/precision:      0.7471
test/recall:         0.7529


## **Use Case B:** GCCN with custom GNN and dataset available in TopoBench <a class="anchor" id="sec3"></a>

In this use case, we repeat the same process as in [Use Case A](##sec2), except that the sub-model we use to build the GCCN is a custom neural network, such as a GNN or otherwise. For our purposes, we will define a toy model below as an example.

In [47]:
class myModel(pl.LightningModule):
    def __init__(self, hidden_channels, out_channels):
        super().__init__()
        self.hidden_channels = hidden_channels
        self.out_channels = out_channels
        self.linear_0 = torch.nn.Linear(hidden_channels, out_channels)
        self.linear_1 = torch.nn.Linear(hidden_channels, out_channels)
        self.linear_2 = torch.nn.Linear(hidden_channels, out_channels)

    def forward(self, batch):
        x_0 = batch.x_0
        x_1 = batch.x_1
        x_2 = batch.x_2
        x_0 = self.linear_0(x_0)
        x_0 = torch.relu(x_0)
        x_1 = self.linear_1(x_1)
        x_1 = torch.relu(x_1)
        x_2 = self.linear_2(x_2)
        x_2 = torch.relu(x_2)
        
        model_out = {"labels": batch.y, "batch_0": batch.batch_0}
        model_out["x_0"] = x_0
        model_out["x_1"] = x_1
        model_out["x_2"] = x_2
        return model_out

Now, we can build a GCCN with this custom model. Note that we increase the amount of GCCN layers (i.e., amount of sub-models) here.

In [69]:
custom_sub_gccn_model = myModel(dim_hidden, out_channels)

backbone_config = {
    "GNN": sub_gccn_model,
    "neighborhoods": neighborhoods,
    "layers": 4,
    "use_edge_attr": False,
    "activation": "relu"
}

backbone = TopoTune(**backbone_config)

Now we can train this custom model as before.

In [70]:
readout = PropagateSignalDown(**readout_config)
loss = TBLoss(**loss_config)
feature_encoder = AllCellFeatureEncoder(in_channels=[in_channels, in_channels, in_channels], out_channels=dim_hidden)

evaluator = TBEvaluator(**evaluator_config)
optimizer = TBOptimizer(**optimizer_config)

wrapper = TuneWrapper(backbone=backbone, out_channels=out_channels, num_cell_dimensions=3, residual_connections=False)
model = TBModel(backbone=wrapper,
                 backbone_wrapper=None,
                 readout=readout,
                 loss=loss,
                 feature_encoder=feature_encoder,
                 evaluator=evaluator,
                 optimizer=optimizer,
                 compile=False)

In [71]:
# Increase the number of epochs to get better results
trainer = pl.Trainer(max_epochs=50, accelerator="cpu", enable_progress_bar=False, log_every_n_steps=1)

trainer.fit(model, datamodule)
train_metrics = trainer.callback_metrics

GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


/home/papillon/anaconda3/envs/tb/lib/python3.11/site-packages/lightning/pytorch/trainer/setup.py:177: GPU available but not used. You can set it by doing `Trainer(accelerator='gpu')`.

  | Name            | Type                  | Params | Mode 
------------------------------------------------------------------
0 | feature_encoder | AllCellFeatureEncoder | 1.3 K  | train
1 | backbone        | TuneWrapper           | 7.1 K  | train
2 | readout         | PropagateSignalDown   | 1.7 K  | train
3 | val_acc_best    | MeanMetric            | 0      | train
------------------------------------------------------------------
10.1 K    Trainable params
0         Non-trainable params
10.1 K    Total params
0.041     Total estimated model params size (MB)
158       Modules in train mode
0         Modules in eval mode
/home/papillon/anaconda3/envs/tb/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be 

PARAMS
DomainDataBatch(x_2=[83, 16], x_0=[552, 16], 1-down_incidence-1=[552, 603], x_1=[603, 16], incidence_2=[603, 83], incidence_0=[0, 552], val_mask=[32], edge_index=[2, 1206], x=[552, 7], incidence_1=[552, 603], 1-down_incidence-2=[603, 83], train_mask=[32], test_mask=[32], edge_attr=[1206, 4], y=[32], 1-up_laplacian-0=[552, 552], ptr=[33], batch_2=[83], batch_0=[552], batch_1=[603], cell_statistics=[32, 3], model_state='Validation')
PARAMS
DomainDataBatch(x_2=[40, 16], x_0=[266, 16], 1-down_incidence-1=[266, 291], x_1=[291, 16], incidence_2=[291, 40], incidence_0=[0, 266], val_mask=[15], edge_index=[2, 582], x=[266, 7], incidence_1=[266, 291], 1-down_incidence-2=[291, 40], train_mask=[15], test_mask=[15], edge_attr=[582, 4], y=[15], 1-up_laplacian-0=[266, 266], ptr=[16], batch_2=[40], batch_0=[266], batch_1=[291], cell_statistics=[15, 3], model_state='Validation')
PARAMS
DomainDataBatch(x_2=[98, 16], x_0=[602, 16], 1-down_incidence-1=[602, 668], x_1=[668, 16], incidence_2=[668, 98

`Trainer.fit` stopped: `max_epochs=50` reached.


PARAMS
DomainDataBatch(x_2=[96, 16], x_0=[584, 16], 1-down_incidence-1=[584, 648], x_1=[648, 16], incidence_2=[648, 96], incidence_0=[0, 584], val_mask=[32], edge_index=[2, 1296], x=[584, 7], incidence_1=[584, 648], 1-down_incidence-2=[648, 96], train_mask=[32], test_mask=[32], edge_attr=[1296, 4], y=[32], 1-up_laplacian-0=[584, 584], ptr=[33], batch_2=[96], batch_0=[584], batch_1=[648], cell_statistics=[32, 3], model_state='Training')
PARAMS
DomainDataBatch(x_2=[97, 16], x_0=[570, 16], 1-down_incidence-1=[570, 635], x_1=[635, 16], incidence_2=[635, 97], incidence_0=[0, 570], val_mask=[32], edge_index=[2, 1270], x=[570, 7], incidence_1=[570, 635], 1-down_incidence-2=[635, 97], train_mask=[32], test_mask=[32], edge_attr=[1270, 4], y=[32], 1-up_laplacian-0=[570, 570], ptr=[33], batch_2=[97], batch_0=[570], batch_1=[635], cell_statistics=[32, 3], model_state='Training')
PARAMS
DomainDataBatch(x_2=[83, 16], x_0=[546, 16], 1-down_incidence-1=[546, 599], x_1=[599, 16], incidence_2=[599, 83],

In [72]:
print('      Training metrics\n', '-'*26)
for key in train_metrics:
    print('{:<21s} {:>5.4f}'.format(key+':', train_metrics[key].item()))

      Training metrics
 --------------------------
train/accuracy:       0.8617
train/precision:      0.8407
train/recall:         0.8559
val/loss:             0.4541
val/accuracy:         0.7447
val/precision:        0.7180
val/recall:           0.7417
train/loss:           0.2961


In [73]:
trainer.test(model, datamodule)
test_metrics = trainer.callback_metrics

PARAMS
DomainDataBatch(x_2=[88, 16], x_0=[556, 16], 1-down_incidence-1=[556, 612], x_1=[612, 16], incidence_2=[612, 88], incidence_0=[0, 556], val_mask=[32], edge_index=[2, 1224], x=[556, 7], incidence_1=[556, 612], 1-down_incidence-2=[612, 88], train_mask=[32], test_mask=[32], edge_attr=[1224, 4], y=[32], 1-up_laplacian-0=[556, 556], ptr=[33], batch_2=[88], batch_0=[556], batch_1=[612], cell_statistics=[32, 3], model_state='Test')
PARAMS
DomainDataBatch(x_2=[51, 16], x_0=[297, 16], 1-down_incidence-1=[297, 333], x_1=[333, 16], incidence_2=[333, 51], incidence_0=[0, 297], val_mask=[15], edge_index=[2, 666], x=[297, 7], incidence_1=[297, 333], 1-down_incidence-2=[333, 51], train_mask=[15], test_mask=[15], edge_attr=[666, 4], y=[15], 1-up_laplacian-0=[297, 297], ptr=[16], batch_2=[51], batch_0=[297], batch_1=[333], cell_statistics=[15, 3], model_state='Test')



/home/papillon/anaconda3/envs/tb/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=127` in the `DataLoader` to improve performance.


In [74]:
print('      Testing metrics\n', '-'*25)
for key in test_metrics:
    print('{:<20s} {:>5.4f}'.format(key+':', test_metrics[key].item()))

      Testing metrics
 -------------------------
test/loss:           0.3804
test/accuracy:       0.7660
test/precision:      0.7471
test/recall:         0.7529


## **Use Case C:** Running large scale GCCN sweeps with TopoBench <a class="anchor" id="sec4"></a>

In this case, we consider command line operations that will allow for rapid defining and testing of many possible GCCN architectures.To implement and train a GCCN from the command line, run the following with the desired choice of dataset, lifting domain (ex: `cell`, `simplicial`), PyTorch Geometric backbone model (ex: `GCN`, `GIN`, `GAT`, `GraphSAGE`) and parameters (ex. `model.backbone.GNN.num_layers=2`), neighborhood structure (routes), and other hyperparameters. To use a single augmented Hasse graph expansion, use `model={domain}/topotune_onehasse` instead of `model={domain}/topotune`.

In [76]:
! python -m topobench \
    dataset=graph/PROTEINS \
    dataset.split_params.data_seed=1 \
    model=cell/topotune\
    model.tune_gnn=GCN \
    model.backbone.GNN.num_layers=2 \
    model.backbone.neighborhoods=\[1-up_laplacian-0,1-down_incidence-2\] \
    model.backbone.layers=4 \
    model.feature_encoder.out_channels=32 \
    model.feature_encoder.proj_dropout=0.3 \
    model.readout.readout_name=PropagateSignalDown \
    logger.wandb.project=TopoTune_cell \
    trainer.max_epochs=1000 \
    callbacks.early_stopping.patience=50

[[36m2025-05-07 16:29:21,984[0m][[34mtopobench.utils.utils[0m][[32mINFO[0m] - [rank: 0] Enforcing tags! <cfg.extras.enforce_tags=True>[0m
[[36m2025-05-07 16:29:21,987[0m][[34mtopobench.utils.utils[0m][[32mINFO[0m] - [rank: 0] Printing config tree with Rich! <cfg.extras.print_config=True>[0m
[2mCONFIG[0m
[2m├── [0m[2mmodel[0m
[2m│   [0m[2m└── [0m[2;91;40m_target_[0m[2;97;40m:[0m[2;97;40m [0m[2;40mtopobench.model.TBModel                                       [0m
[2m│   [0m[2m    [0m[2;91;40mmodel_name[0m[2;97;40m:[0m[2;97;40m [0m[2;40mtopotune                                                    [0m
[2m│   [0m[2m    [0m[2;91;40mmodel_domain[0m[2;97;40m:[0m[2;97;40m [0m[2;40mcell                                                      [0m
[2m│   [0m[2m    [0m[2;91;40mtune_gnn[0m[2;97;40m:[0m[2;97;40m [0m[2;40mGCN                                                           [0m
[2m│   [0m[2m    [0m[2;91;40mfeature_encoder[0m

To extend this process to many GCCNs, it is sufficient to pass a list of options as an argument, as well as the `--multirun` flag. This is a shortcut for running every possible combination of the specified parameters in a single command.

In [77]:
! python -m topobench \
    dataset=graph/cocitation_cora \
    model=cell/topotune,cell/topotune_onehasse \
    model.feature_encoder.out_channels=32 \
    model.tune_gnn=GCN,GIN,GAT,GraphSAGE \
    model.backbone.GNN.num_layers=1,2 \
    model.backbone.neighborhoods=\[1-up_laplacian-0,1-down_laplacian-1,1-up_laplacian-1],\[1-up_laplacian-0,1-down_incidence-1,1-up_laplacian-1,1-down_incidence-2\] \
    model.backbone.layers=2,4 \
    model.feature_encoder.proj_dropout=0.3 \
    dataset.split_params.data_seed=1,3,5,7,9 \
    model.readout.readout_name=PropagateSignalDown \
    logger.wandb.project=TopoTune_cell \
    trainer.max_epochs=1000 \
    trainer.min_epochs=50 \
    trainer.devices=\[1\] \
    trainer.check_val_every_n_epoch=1 \
    callbacks.early_stopping.patience=50 \
    tags="[FirstExperiments]" \
    --multirun

[[36m2025-05-07 16:43:29,436[0m][[35mHYDRA[0m] Launching 320 jobs locally[0m
[[36m2025-05-07 16:43:29,436[0m][[35mHYDRA[0m] 	#0 : dataset=graph/cocitation_cora model=cell/topotune model.feature_encoder.out_channels=32 model.tune_gnn=GCN model.backbone.GNN.num_layers=1 model.backbone.neighborhoods=[1-up_laplacian-0,1-down_laplacian-1,1-up_laplacian-1] model.backbone.layers=2 model.feature_encoder.proj_dropout=0.3 dataset.split_params.data_seed=1 model.readout.readout_name=PropagateSignalDown logger.wandb.project=TopoTune_cell trainer.max_epochs=1000 trainer.min_epochs=50 trainer.devices=[1] trainer.check_val_every_n_epoch=1 callbacks.early_stopping.patience=50 tags=[FirstExperiments][0m
[[36m2025-05-07 16:43:29,600[0m][[34mtopobench.utils.utils[0m][[32mINFO[0m] - [rank: 0] Enforcing tags! <cfg.extras.enforce_tags=True>[0m
[[36m2025-05-07 16:43:29,603[0m][[34mtopobench.utils.utils[0m][[32mINFO[0m] - [rank: 0] Printing config tree with Rich! <cfg.extras.print_config

### Using backbone models from any package
By default, backbone models are imported from `torch_geometric.nn.models`. To import and specify a backbone model from any other package, such as `torch.nn.Transformer` or `dgl.nn.GATConv`, it is sufficient to 1) make sure the package is installed and 2) specify in the command line:

```
model.tune_gnn = {backbone_model}
model.backbone.GNN._target_={package}.{backbone_model}
```

### Reproducing experiments

We provide scripts to reproduce experiments on a broad class of GCCNs in [`scripts/topotune`](scripts/topotune) and reproduce iterations of existing neural networks in [`scripts/topotune/existing_models`](scripts/topotune/existing_models), as previously reported in the [TopoTune paper](https://arxiv.org/pdf/2410.06530).