Copyright (c) 2023 Graphcore Ltd. All rights reserved.

# Heterogeneous graph learning on IPUs

Many real-world graphs are heterogeneous, meaning the TODO

In this tutorial you will learn how to:

- Use a couple of PyTorch Geometric approaches to Heterogeneous graph learning and how to run them on the IPU.
- Understand how to sample heterogeneous graphs with a fixed size suitable for the IPU.

While this tutorial will cover enough of the basics of GNNs, PyTorch Geometric and PopTorch
for you to start developing and porting your GNN applications to the IPU;
the following resources can be used to complement your understanding of:

- PopTorch : [Introduction to PopTorch - running a simple model](https://github.com/graphcore/tutorials/tree/master/tutorials/pytorch/basics);
- GNNs : [A Gentle Introduction to Graph Neural Networks](https://distill.pub/2021/gnn-intro/)
- PyTorch Geometric (PyG): [Heterogeneous Graph Documentation](https://pytorch-geometric.readthedocs.io/en/latest/tutorial/heterogeneous.html)

## Running on Paperspace

The Paperspace environment lets you run this notebook with no set up. To improve your experience we preload datasets and pre-install packages, this can take a few minutes, if you experience errors immediately after starting a session please try restarting the kernel before contacting support. If a problem persists or you want to give us feedback on the content of this notebook, please reach out to through our community of developers using our [slack channel](https://www.graphcore.ai/join-community) or raise a [GitHub issue](https://github.com/graphcore/examples).

Requirements:

* Python packages installed with `pip install -r ./requirements.txt`


In order to improve usability and support for future users, Graphcore would like to collect information about the
applications and code being run in this notebook. The following information will be anonymised before being sent to Graphcore:

- User progression through the notebook
- Notebook details: number of cells, code being run and the output of the cells
- Environment details

You can disable logging at any time by running `%unload_ext gc_logger` from any cell.

In [66]:
%pip install -q -r ./requirements.txt
from examples_utils import notebook_logging
%load_ext gc_logger


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


ImportError: cannot import name 'notebook_logging' from 'examples_utils' (/nethome/adams/venvs/3.2.0+1277/3.2.0+1277_poptorch/lib/python3.8/site-packages/examples_utils/__init__.py)

And for compatibility with the Paperspace environment variables we will do the following:

In [3]:
import os

executable_cache_dir = (
    os.getenv("POPLAR_EXECUTABLE_CACHE_DIR", "/tmp/exe_cache/") + "/pyg-packing"
)
dataset_directory = os.getenv("DATASETS_DIR", "data")

Now we are ready to start!

## Introduction to heterogeneous graphs

Heterogeneous graphs are graphs with different types of nodes and edges. TODO

TODO: Image

### Loading a heterogeneous graph dataset

In this tutorial we will use the [IMDB](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.datasets.IMDB.html) from PyTorch Geometric:

In [93]:
from torch_geometric.datasets import IMDB

dataset = IMDB(root=f"{dataset_directory}/IMDB")

This dataset is a single large heterogeneous graph, let's take a look at it:

In [94]:
data = dataset[0]
data

HeteroData(
  [1mmovie[0m={
    x=[4278, 3066],
    y=[4278],
    train_mask=[4278],
    val_mask=[4278],
    test_mask=[4278]
  },
  [1mdirector[0m={ x=[2081, 3066] },
  [1mactor[0m={ x=[5257, 3066] },
  [1m(movie, to, director)[0m={ edge_index=[2, 4278] },
  [1m(movie, to, actor)[0m={ edge_index=[2, 12828] },
  [1m(director, to, movie)[0m={ edge_index=[2, 4278] },
  [1m(actor, to, movie)[0m={ edge_index=[2, 12828] }
)

Here you can see the heterogeneous graph is made up of three node types **movie**, **director** and **actor**, each with their own sets of features (`x`). These nodes are connected by two edge types, **movie to director** and **movie to actor**, with the reverse of those edges also present.

The **movie** node type is the target for any training we will do, let's take a look at the labels for this node type:

In [95]:
import torch

classes = torch.unique(data["movie"].y)
num_classes = len(classes)
classes, num_classes

(tensor([0, 1, 2]), 3)

You can see the labels are one of three classes, these correspond to the genre of the movie, action, comedy or drama.

Now we have some understanding of what a heterogeneous graph looks like in PyTorch Geometric, let's next understand how we would construct a model to be able to learn from a heterogeneous graph.

## Creating Heterogeneous GNNs

TODO

PyTorch Geometric provides three ways to create a model for heterogeneous graph data, we will take a look at each in turn and understand any modifications to make to enable these models to run on the IPU.

### Converting a GNN model

The first approach we will look at is converting a PyTorch Geometric GNN model to a model for heterogeneous graphs using the `torch_geometric.nn.to_hetero()` transformation.

We will only cover the basics here to enable running on the IPU, but if you are interested in learning more about this approach see the [PyTorch Geometric Documentation](https://pytorch-geometric.readthedocs.io/en/latest/tutorial/heterogeneous.html#automatically-converting-gnn-models).

To begin with let's create a PyTorch Geometric GNN model, comprising of a couple of convolution layers:

In [97]:
import torch
from torch_geometric.nn import SAGEConv


class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = SAGEConv((-1, -1), 64)
        self.conv2 = SAGEConv((-1, -1), 64)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index)
        return x

Now we can use the `to_hetero()` transformation to transform this GNN model into a heterogeneous model:

In [98]:
from torch_geometric.nn import to_hetero

# Initialize the model
model = Model()
# Convert the model to a heterogeneous model
model = to_hetero(model, data.metadata(), aggr='sum')
model

GraphModule(
  (conv1): ModuleDict(
    (movie__to__director): SAGEConv((-1, -1), 64, aggr=mean)
    (movie__to__actor): SAGEConv((-1, -1), 64, aggr=mean)
    (director__to__movie): SAGEConv((-1, -1), 64, aggr=mean)
    (actor__to__movie): SAGEConv((-1, -1), 64, aggr=mean)
  )
  (conv2): ModuleDict(
    (movie__to__director): SAGEConv((-1, -1), 64, aggr=mean)
    (movie__to__actor): SAGEConv((-1, -1), 64, aggr=mean)
    (director__to__movie): SAGEConv((-1, -1), 64, aggr=mean)
    (actor__to__movie): SAGEConv((-1, -1), 64, aggr=mean)
  )
)

You can now see that we have a convolution layer for each edge type, which has enabled this model to do message passing on a heterogeneous graph. The model will now expect a dictionary of node and edge types as inputs.

Notice how we set the convolution layer `in_channels` to `-1`. This allows PyTorch Geometric to use lazy initialization based on the input dimensions, which means we don't need to manually specify the dimensions for each node type. We can then perform this lazy initialization on the CPU as follows:

In [99]:
# Initialize lazy modules.
with torch.no_grad():
    out = model(data.x_dict, data.edge_index_dict)

To run your model using PyTorch Geometric on the IPU, the model will need to target PopTorch and will require a number of changes.

The first change is to move the loss function inside the `forward` method of the model. We can do this by creating a simple module that wraps the transformed heterogeneous model, that includes the loss calculation:

In [100]:
# Create model wrapper to include the loss in the model

import torch.nn.functional as F


class ModelWithLoss(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model

    def forward(self, x_dict, edge_index_dict, target=None, train_mask=None):
        out = self.model(x_dict, edge_index_dict)
        # TODO: Should I project down to num_classes
        if self.training:
            target = torch.where(train_mask, target, -100)
            # TODO: Is this loss function right to use for this case?
            loss = F.cross_entropy(out['movie'], target)
            return out, loss
        return out

# Include loss in model
model = ModelWithLoss(model)

Now our model is ready for training with PopTorch on IPUs.

In the normal way we can wrap our model in `poptorch.trainingModel`:

In [75]:
# Set up training
import poptorch

model.train()

# Initialise model and convert the model to a poptorch model
opts = poptorch.Options().enableExecutableCaching(executable_cache_dir)
optim = poptorch.optim.Adam(model.parameters(), lr=0.01)
poptorch_model = poptorch.trainingModel(model, options=opts, optimizer=optim)

And run the training loop. Note the backward pass and optimizer step is handled by PopTorch automatically so does not need to be included.

In [None]:
# Train
for _ in range(3):
    out, loss = poptorch_model(data.x_dict,
                               data.edge_index_dict,
                               target=data['movie'].y,
                               train_mask=data['movie'].train_mask)

Here we have seen how to create a heterogeneous GNN using the `to_hetero()` transformation and start training on the IPU. An alternative approach is to use the `HeteroConv` layer which we will see next.

### Using the Heterogeneous Convolution Wrapper

TODO: Do same as above wrapping your model in a module with the loss function

[PyTorch Geometric Documentation](https://pytorch-geometric.readthedocs.io/en/latest/tutorial/heterogeneous.html#using-the-heterogeneous-convolution-wrapper)

In [86]:
from torch_geometric.nn import HeteroConv, SAGEConv, GATConv, Linear


class HeteroGNN(torch.nn.Module):
    def __init__(self, hidden_channels, out_channels, num_layers):
        super().__init__()

        self.convs = torch.nn.ModuleList()
        for _ in range(num_layers):
            conv = HeteroConv({
                ('movie', 'to', 'director'): SAGEConv((-1, -1), hidden_channels),
                ('director', 'to', 'movie'): SAGEConv((-1, -1), hidden_channels),
                ('movie', 'to', 'actor'): GATConv((-1, -1), hidden_channels, add_self_loops=False),
                ('actor', 'to', 'movie'): GATConv((-1, -1), hidden_channels, add_self_loops=False),
            }, aggr='sum')
            self.convs.append(conv)

        self.lin = Linear(hidden_channels, out_channels)

    def forward(self,
                x_dict,
                edge_index_dict,
                target=None,
                train_mask=None):
        for conv in self.convs:
            x_dict = conv(x_dict, edge_index_dict)
            x_dict = {key: x.relu() for key, x in x_dict.items()}
        out = self.lin(x_dict['movie'])

        if self.training:
            target = torch.where(train_mask, target, -100)
            loss = F.cross_entropy(out, target)
            return out, loss
        return out

model = HeteroGNN(hidden_channels=64,
                  out_channels=num_classes,
                  num_layers=2)

In the same way as before we set the convolution layer `in_channels` to `-1`. We can then perform the lazy initialization on the CPU again as follows:

In [88]:
# Initialize lazy modules.
with torch.no_grad():
    out = model(data.x_dict,
                data.edge_index_dict,
                target=data['movie'].y,
                train_mask=data['movie'].train_mask)

We wrap the model in `poptorch.trainingModel`:

In [89]:
# Set up training
model.train()

# Initialise model and convert the model to a poptorch model
opts = poptorch.Options().enableExecutableCaching(executable_cache_dir)
optim = poptorch.optim.Adam(model.parameters(), lr=0.01)
poptorch_model = poptorch.trainingModel(model, options=opts, optimizer=optim)

And perform the training loop:

In [None]:
# Train
for _ in range(3):
    out, loss = poptorch_model(data.x_dict,
                               data.edge_index_dict,
                               target=data['movie'].y,
                               train_mask=data['movie'].train_mask)

We have now seen two approaches to creating heterogeneous GNNs ready for the IPU using PyTorch Geometric. We will next look at the final approach, using heterogeneous operators.

### Using Heterogeneous operators

The final approach PyTorch Geometric provides to create a heterogeneous GNN model is to use operators specifically designed for heterogeneous graphs. These can be used as normal, taking care to do the normal steps to mentioned previously to run on IPUs: moving the loss inside the model, wrapping the model in `poptorch.trainingModel` and removing the call to the backward pass and optimizer step.

See the [PyTorch Geometric Documenation](https://pytorch-geometric.readthedocs.io/en/latest/tutorial/heterogeneous.html#deploy-existing-heterogeneous-operators) for more information.

## Fixed size heterogeneous data loading

As real-world heterogeneous graphs can be quite large, it may often be appropriate to move from full-batch training to mini-batch training using some form of sampling. PyTorch Geometric provides a range of samplers suitable for heterogeneous graphs, for example the `NeighborLoader` which we will look at below.

When moving from full-batch to mini-batch on the IPU, one must consider the sizes of the mini-batches. The IPU uses ahead-of-time compilation, which means all mini-batches must be the same size, outlined in previous tutorials (TODO). In the homogeneous graph case, making our mini-batches fixed size is relatively trivial, adding padding to make the nodes and edges up to a fixed size. This becomes more complex with heterogeneous graphs when there are different node and edge types.

Let's create an instance of the PyTorch Geometric `NeighborLoader` with our dataset, and see what the first mini-batch looks like:

In [102]:
# As normal

from torch_geometric.loader import NeighborLoader


train_loader = NeighborLoader(
    data,
    num_neighbors=[5] * 2,
    batch_size=128,
    input_nodes=('movie', data['movie'].train_mask),
)

next(iter(train_loader))

HeteroData(
  [1mmovie[0m={
    x=[1172, 3066],
    y=[1172],
    train_mask=[1172],
    val_mask=[1172],
    test_mask=[1172],
    n_id=[1172],
    input_id=[128],
    batch_size=128
  },
  [1mdirector[0m={
    x=[112, 3066],
    n_id=[112]
  },
  [1mactor[0m={
    x=[329, 3066],
    n_id=[329]
  },
  [1m(movie, to, director)[0m={
    edge_index=[2, 413],
    e_id=[413]
  },
  [1m(movie, to, actor)[0m={
    edge_index=[2, 1196],
    e_id=[1196]
  },
  [1m(director, to, movie)[0m={
    edge_index=[2, 128],
    e_id=[128]
  },
  [1m(actor, to, movie)[0m={
    edge_index=[2, 384],
    e_id=[384]
  }
)

To make up this mini-batch to a fixed size, we could simply pad the nodes and edges of each node and edge type to a particular value. We could manually create `poptorch_geometric.FixedSizeOptions` to do this, or we could sample from the above data loader and get an estimate on the required sizes based on that:

In [None]:
# TODO: Maybe enable this in fixed size options

fixed_size_options = FixedSizeOptions.from_loader(train_loader)
fixed_size_options

Here you can see the fixed sizes that will be appropriate for the neighbor loading. Now we can use these sizes to create a fixed size version of the `NeighborLoader` the `poptorch_geometric.FixedSizeNeighborLoader` that will do the same sampling but produce fixed size mini-batches.

TODO: See sampling tutorial.

In [None]:
fixed_size_train_loader = FixedSizeNeighborLoader(
    data,
    num_neighbors=[15] * 2,
    batch_size=128,
    input_nodes=('movie', data['movie'].train_mask),
    fixed_size_options=fixed_size_options,
    over_size_behaviour=OverSizeBehaviour.TRIM_NODES_AND_EDGES
)

next(iter(train_loader))

Note how we have set `over_size_behaviour=OverSizeBehaviour.TRIM_NODES_AND_EDGES`. Unfortunately we don't know ahead of time whether we have allocated enough space for the padding, therefore we can enable trimming any excess nodes from our samples in the case the mini-batches are greater than our specified sizes.

There may be cases when you want to specify a different fixed size for each node and edge type, this can be done like:

In [103]:
# TODO: Different fixed size for each node and edge type

The approach to achieve fixed size mini-batches for heterogeneous graphs can be achieved using the other data loaders in PopTorch Geometric.

## Conclusion

In this tutorial we have learnt how to train heterogeneous GNNs on the IPU using PyTorch Geometric.

You should now have a good understanding of:
 - the approaches PyTorch Geometric provides to create heterogeneous GNN models
 - how to run the model produced by each approach on the IPU
 - how to achieve fixed size mini-batches of heterogeneous graphs suitable for the IPU.

Additional resources which may help you understand Heterogeneous Graph Learning can be found in the [PyTorch Geometric documentation](https://pytorch-geometric.readthedocs.io/en/latest/tutorial/heterogeneous.html)