Copyright (c) 2023 Graphcore Ltd. All rights reserved.

# Heterogeneous graph learning on IPUs

TODO: Do intro

## Running on Paperspace

The Paperspace environment lets you run this notebook with no set up. To improve your experience we preload datasets and pre-install packages, this can take a few minutes, if you experience errors immediately after starting a session please try restarting the kernel before contacting support. If a problem persists or you want to give us feedback on the content of this notebook, please reach out to through our community of developers using our [slack channel](https://www.graphcore.ai/join-community) or raise a [GitHub issue](https://github.com/graphcore/examples).

Requirements:

* Python packages installed with `pip install -r ./requirements.txt`


In order to improve usability and support for future users, Graphcore would like to collect information about the
applications and code being run in this notebook. The following information will be anonymised before being sent to Graphcore:

- User progression through the notebook
- Notebook details: number of cells, code being run and the output of the cells
- Environment details

You can disable logging at any time by running `%unload_ext gc_logger` from any cell.

In [66]:
%pip install -q -r ./requirements.txt
from examples_utils import notebook_logging
%load_ext gc_logger


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


ImportError: cannot import name 'notebook_logging' from 'examples_utils' (/nethome/adams/venvs/3.2.0+1277/3.2.0+1277_poptorch/lib/python3.8/site-packages/examples_utils/__init__.py)

And for compatibility with the Paperspace environment variables we will do the following:

In [3]:
import os

executable_cache_dir = (
    os.getenv("POPLAR_EXECUTABLE_CACHE_DIR", "/tmp/exe_cache/") + "/pyg-packing"
)
dataset_directory = os.getenv("DATASETS_DIR", "data")

Now we are ready to start!

## Introduction to heterogeneous graphs

TODO

### Loading TODO dataset

TODO

In [6]:
from torch_geometric.datasets import IMDB

dataset = IMDB(root=f"{dataset_directory}/IMDB")

Downloading https://www.dropbox.com/s/g0btk9ctr1es39x/IMDB_processed.zip?dl=1
Extracting data/IMDB/raw/IMDB_processed.zip
Processing...
Done!


In [68]:
data = dataset[0]
data

HeteroData(
  [1mmovie[0m={
    x=[4278, 3066],
    y=[4278],
    train_mask=[4278],
    val_mask=[4278],
    test_mask=[4278]
  },
  [1mdirector[0m={ x=[2081, 3066] },
  [1mactor[0m={ x=[5257, 3066] },
  [1m(movie, to, director)[0m={ edge_index=[2, 4278] },
  [1m(movie, to, actor)[0m={ edge_index=[2, 12828] },
  [1m(director, to, movie)[0m={ edge_index=[2, 4278] },
  [1m(actor, to, movie)[0m={ edge_index=[2, 12828] }
)

In [84]:
import torch

# Movie has three categories: (action, comedy, drama)
classes = torch.unique(data["movie"].y)
num_classes = len(classes)
classes, num_classes

(tensor([0, 1, 2]), 3)

## Creating Heterogeneous GNNs

TODO

### Converting a homogeneous model

TODO

In [70]:
import torch
from torch_geometric.nn import SAGEConv


class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = SAGEConv((-1, -1), 64)
        self.conv2 = SAGEConv((-1, -1), 64)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index)
        return x

In [71]:
from torch_geometric.nn import to_hetero

# Initialize the model
model = Model()
# Convert the model to a heterogeneous model
model = to_hetero(model, data.metadata(), aggr='sum')
model

GraphModule(
  (conv1): ModuleDict(
    (movie__to__director): SAGEConv((-1, -1), 64, aggr=mean)
    (movie__to__actor): SAGEConv((-1, -1), 64, aggr=mean)
    (director__to__movie): SAGEConv((-1, -1), 64, aggr=mean)
    (actor__to__movie): SAGEConv((-1, -1), 64, aggr=mean)
  )
  (conv2): ModuleDict(
    (movie__to__director): SAGEConv((-1, -1), 64, aggr=mean)
    (movie__to__actor): SAGEConv((-1, -1), 64, aggr=mean)
    (director__to__movie): SAGEConv((-1, -1), 64, aggr=mean)
    (actor__to__movie): SAGEConv((-1, -1), 64, aggr=mean)
  )
)

In [72]:
# Initialize lazy modules.
with torch.no_grad():
    out = model(data.x_dict, data.edge_index_dict)

In [73]:
# Create model wrapper to include the loss in the model

import torch.nn.functional as F


class ModelWithLoss(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model

    def forward(self, x_dict, edge_index_dict, target=None, train_mask=None):
        out = self.model(x_dict, edge_index_dict)
        # TODO: Should I project down to num_classes
        if self.training:
            target = torch.where(train_mask, target, -100)
            # TODO: Is this loss function right to use for this case?
            loss = F.cross_entropy(out['movie'], target)
            return out, loss
        return out

In [74]:
# Include loss in model
model = ModelWithLoss(model)

In [75]:
# Set up training
import poptorch

model.train()

# Initialise model and convert the model to a poptorch model
opts = poptorch.Options().enableExecutableCaching(executable_cache_dir)
optim = poptorch.optim.Adam(model.parameters(), lr=0.01)
poptorch_model = poptorch.trainingModel(model, options=opts, optimizer=optim)

In [76]:
# Train
for _ in range(3):
    out, loss = poptorch_model(data.x_dict,
                               data.edge_index_dict,
                               target=data['movie'].y,
                               train_mask=data['movie'].train_mask)

Graph compilation:   0%|          | 0/100 [00:00<?]2023-05-16T11:22:27.656842Z popart:popart 73541.73541 E: Failure in ReshapeOp::setup() for Op(model/director__to__movie/aggr_module/Reshape (ai.onnx.Reshape:5), inputs=[model/director__to__movie/aggr_module/Max:0], outputs=[model/director__to__movie/aggr_module/Reshape:0/1]). Trying to reshape from [4278] to [2081 1]. The number of elements of the input is 4278, while the number of elements of the output is 2081. The number of elements cannot change for a ReshapeOp

[0] popart::Graph::constructFromOnnxGraph(onnx::GraphProto const&)
[1] popart::Ir::constructFromOnnxGraph(onnx::GraphProto const&, popart::Scope const&)
[2] popart::Ir::constructForwards()
[3] popart::Ir::prepareImpl(popart::IrBundle const&, std::map<unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, st

Error: In unknown:0: 'popart_exception': Failure in ReshapeOp::setup() for Op(model/director__to__movie/aggr_module/Reshape (ai.onnx.Reshape:5), inputs=[model/director__to__movie/aggr_module/Max:0], outputs=[model/director__to__movie/aggr_module/Reshape:0/1]). Trying to reshape from [4278] to [2081 1]. The number of elements of the input is 4278, while the number of elements of the output is 2081. The number of elements cannot change for a ReshapeOp
Error raised in:
  [0] popart::TrainingSession::createFromOnnxModel
  [1] Compiler::initSession
  [2] LowerToPopart::compile
  [3] compileWithManualTracing
  [4] popart::Graph::constructFromOnnxGraph(onnx::GraphProto const&)
  [5] popart::Ir::constructFromOnnxGraph(onnx::GraphProto const&, popart::Scope const&)
  [6] popart::Ir::constructForwards()
  [7] popart::Ir::prepareImpl(popart::IrBundle const&, std::map<unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, unsigned long)
  [8] popart::Ir::prepare(popart::IrBundle const&, std::map<unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, unsigned long)
  [9] popart::Session::configureFromOnnx(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, popart::DataFlow const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, popart::Optimizer const*, popart::InputShapeInfo const&, std::shared_ptr<popart::DeviceInfo>, popart::SessionOptions const&, popart::Patterns const&)
  [10] popart::TrainingSession::createFromOnnxModel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, popart::DataFlow const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, popart::Optimizer const&, std::shared_ptr<popart::DeviceInfo>, popart::InputShapeInfo const&, popart::SessionOptions const&, popart::Patterns const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
  [11] poptorch::popart_compiler::Compiler::initSession(std::vector<poptorch::popart_compiler::Optimizer, std::allocator<poptorch::popart_compiler::Optimizer> > const&, char const*)


### Using the Heterogeneous Convolution Wrapper

TODO: Do same as above wrapping your model in a module with the loss function

In [86]:
from torch_geometric.nn import HeteroConv, SAGEConv, GATConv, Linear


class HeteroGNN(torch.nn.Module):
    def __init__(self, hidden_channels, out_channels, num_layers):
        super().__init__()

        self.convs = torch.nn.ModuleList()
        for _ in range(num_layers):
            conv = HeteroConv({
                ('movie', 'to', 'director'): SAGEConv((-1, -1), hidden_channels),
                ('director', 'to', 'movie'): SAGEConv((-1, -1), hidden_channels),
                ('movie', 'to', 'actor'): GATConv((-1, -1), hidden_channels, add_self_loops=False),
                ('actor', 'to', 'movie'): GATConv((-1, -1), hidden_channels, add_self_loops=False),
            }, aggr='sum')
            self.convs.append(conv)

        self.lin = Linear(hidden_channels, out_channels)

    def forward(self,
                x_dict,
                edge_index_dict,
                target=None,
                train_mask=None):
        for conv in self.convs:
            x_dict = conv(x_dict, edge_index_dict)
            x_dict = {key: x.relu() for key, x in x_dict.items()}
        out = self.lin(x_dict['movie'])

        if self.training:
            target = torch.where(train_mask, target, -100)
            loss = F.cross_entropy(out, target)
            return out, loss
        return out

model = HeteroGNN(hidden_channels=64,
                  out_channels=num_classes,
                  num_layers=2)

In [88]:
# Initialize lazy modules.
with torch.no_grad():
    out = model(data.x_dict,
                data.edge_index_dict,
                target=data['movie'].y,
                train_mask=data['movie'].train_mask)

In [89]:
# Set up training
import poptorch

model.train()

# Initialise model and convert the model to a poptorch model
opts = poptorch.Options().enableExecutableCaching(executable_cache_dir)
optim = poptorch.optim.Adam(model.parameters(), lr=0.01)
poptorch_model = poptorch.trainingModel(model, options=opts, optimizer=optim)

In [91]:
# Train
for _ in range(3):
    out, loss = poptorch_model(data.x_dict,
                               data.edge_index_dict,
                               target=data['movie'].y,
                               train_mask=data['movie'].train_mask)

Graph compilation:   0%|          | 0/100 [00:00<?]2023-05-16T11:29:39.830384Z popart:popart 73541.73541 E: Failure in ReshapeOp::setup() for Op(1/director__to__movie/aggr_module/Reshape (ai.onnx.Reshape:5), inputs=[1/director__to__movie/aggr_module/Max:0], outputs=[1/director__to__movie/aggr_module/Reshape:0]). Trying to reshape from [4278] to [2081 1]. The number of elements of the input is 4278, while the number of elements of the output is 2081. The number of elements cannot change for a ReshapeOp

[0] popart::Graph::constructFromOnnxGraph(onnx::GraphProto const&)
[1] popart::Ir::constructFromOnnxGraph(onnx::GraphProto const&, popart::Scope const&)
[2] popart::Ir::constructForwards()
[3] popart::Ir::prepareImpl(popart::IrBundle const&, std::map<unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<c

Error: In unknown:0: 'popart_exception': Failure in ReshapeOp::setup() for Op(1/director__to__movie/aggr_module/Reshape (ai.onnx.Reshape:5), inputs=[1/director__to__movie/aggr_module/Max:0], outputs=[1/director__to__movie/aggr_module/Reshape:0]). Trying to reshape from [4278] to [2081 1]. The number of elements of the input is 4278, while the number of elements of the output is 2081. The number of elements cannot change for a ReshapeOp
Error raised in:
  [0] popart::TrainingSession::createFromOnnxModel
  [1] Compiler::initSession
  [2] LowerToPopart::compile
  [3] compileWithManualTracing
  [4] popart::Graph::constructFromOnnxGraph(onnx::GraphProto const&)
  [5] popart::Ir::constructFromOnnxGraph(onnx::GraphProto const&, popart::Scope const&)
  [6] popart::Ir::constructForwards()
  [7] popart::Ir::prepareImpl(popart::IrBundle const&, std::map<unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, unsigned long)
  [8] popart::Ir::prepare(popart::IrBundle const&, std::map<unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, unsigned long)
  [9] popart::Session::configureFromOnnx(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, popart::DataFlow const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, popart::Optimizer const*, popart::InputShapeInfo const&, std::shared_ptr<popart::DeviceInfo>, popart::SessionOptions const&, popart::Patterns const&)
  [10] popart::TrainingSession::createFromOnnxModel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, popart::DataFlow const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, popart::Optimizer const&, std::shared_ptr<popart::DeviceInfo>, popart::InputShapeInfo const&, popart::SessionOptions const&, popart::Patterns const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
  [11] poptorch::popart_compiler::Compiler::initSession(std::vector<poptorch::popart_compiler::Optimizer, std::allocator<poptorch::popart_compiler::Optimizer> > const&, char const*)


### Using Heterogeneous operators

TODO: These should just work as normal, ensure to include your loss in the model description

## Fixed size heterogeneous data loading

TODO: Supports the same stuff as the fixed size homogeneous data loaders

TODO: Demonstrate fixed size neighbour loader with heterogeneous graphs

In [None]:
# As normal

from torch_geometric.loader import NeighborLoader


train_loader = NeighborLoader(
    data,
    num_neighbors=[15] * 2,
    batch_size=128,
    input_nodes=('movie', data['movie'].train_mask),
)

next(iter(train_loader))

In [None]:
fixed_size_options = FixedSizeOptions.from_loader(train_loader)
fixed_size_options

TODO: Mention TRIM_NODES_AND_EDGES

In [None]:
fixed_size_train_loader = FixedSizeNeighborLoader(
    data,
    # Sample 15 neighbors for each node and each edge type for 2 iterations:
    num_neighbors=[15] * 2,
    # Use a batch size of 128 for sampling training nodes of type "movie":
    batch_size=128,
    input_nodes=('movie', data['movie'].train_mask),
    fixed_size_options=fixed_size_options,
    over_size_behaviour=OverSizeBehaviour.TRIM_NODES_AND_EDGES
)

In [None]:
next(iter(train_loader))

TODO: Can be useful to set the number of neighbours for each edge type to balance your samples and waste less space

## Conclusion

TODO