<a href="https://colab.research.google.com/github/gagan-iitb/CS550/blob/main/GNN_Tutorial_NyAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


How Does DGL Represent A Graph?
===============================


DGL Graph Construction
----------------------

DGL represents a directed graph as a ``DGLGraph`` object. You can
construct a graph by specifying the number of nodes in the graph as well
as the list of source and destination nodes.  Nodes in the graph have
consecutive IDs starting from 0.

For instance, the following code constructs a directed star graph with 5
leaves. The center node's ID is 0. The edges go from the
center node to the leaves.




In [None]:
import dgl
import numpy as np
import torch

g = dgl.graph(([0, 0, 0, 0, 0], [1, 2, 3, 4, 5]), num_nodes=6)
# Equivalently, PyTorch LongTensors also work.
g = dgl.graph((torch.LongTensor([0, 0, 0, 0, 0]), torch.LongTensor([1, 2, 3, 4, 5])), num_nodes=6)

# You can omit the number of nodes argument if you can tell the number of nodes from the edge list alone.
g = dgl.graph(([0, 0, 0, 0, 0], [1, 2, 3, 4, 5]))

Edges in the graph have consecutive IDs starting from 0, and are
in the same order as the list of source and destination nodes during
creation.




In [None]:
# Print the source and destination nodes of every edge.
print(g.edges())

<div class="alert alert-info">
    
**Note**: ``DGLGraph``'s are always directed to best fit the computation
   pattern of graph neural networks, where the messages sent
   from one node to the other are often different between both
   directions. If you want to handle undirected graphs, you may consider
   treating it as a bidirectional graph. See [Graph
   Transformations](#Graph-Transformations) for an example of making
   a bidirectional graph.
</div>




Assigning Node and Edge Features to Graph
-----------------------------------------

Many graph data contain attributes on nodes and edges.
Although the types of node and edge attributes can be arbitrary in real
world, ``DGLGraph`` only accepts attributes stored in tensors (with
numerical contents). Consequently, an attribute of all the nodes or
edges must have the same shape. In the context of deep learning, those
attributes are often called *features*.

You can assign and retrieve node and edge features via ``ndata`` and
``edata`` interface.




In [None]:
# Assign a 3-dimensional node feature vector for each node.
g.ndata['x'] = torch.randn(6, 3)
# Assign a 4-dimensional edge feature vector for each edge.
g.edata['a'] = torch.randn(5, 4)
# Assign a 5x4 node feature matrix for each node.  Node and edge features in DGL can be multi-dimensional.
g.ndata['y'] = torch.randn(6, 5, 4)

print(g.edata['a'])

<div class="alert alert-info">
    
**Note**: The vast development of deep learning has provided us many
   ways to encode various types of attributes into numerical features.
   Here are some general suggestions:

   -  For categorical attributes (e.g. gender, occupation), consider
      converting them to integers or one-hot encoding.
   -  For variable length string contents (e.g. news article, quote),
      consider applying a language model.
   -  For images, consider applying a vision model such as CNNs.

You can find plenty of materials on how to encode such attributes
   into a tensor in the [PyTorch Deep Learning
   Tutorials](https://pytorch.org/tutorials/)

</div>




Querying Graph Structures
-------------------------

``DGLGraph`` object provides various methods to query a graph structure.




In [None]:
print(g.num_nodes())
print(g.num_edges())
# Out degrees of the center node
print(g.out_degrees(0))
# In degrees of the center node - note that the graph is directed so the in degree should be 0.
print(g.in_degrees(0))

Graph Transformations
---------------------




DGL provides many APIs to transform a graph to another such as
extracting a subgraph:




In [None]:
# Induce a subgraph from node 0, node 1 and node 3 from the original graph.
sg1 = g.subgraph([0, 1, 3])
# Induce a subgraph from edge 0, edge 1 and edge 3 from the original graph.
sg2 = g.edge_subgraph([0, 1, 3])

You can obtain the node/edge mapping from the subgraph to the original
graph by looking into the node feature ``dgl.NID`` or edge feature
``dgl.EID`` in the new graph.




In [None]:
# The original IDs of each node in sg1
print(sg1.ndata[dgl.NID])
# The original IDs of each edge in sg1
print(sg1.edata[dgl.EID])
# The original IDs of each node in sg2
print(sg2.ndata[dgl.NID])
# The original IDs of each edge in sg2
print(sg2.edata[dgl.EID])

``subgraph`` and ``edge_subgraph`` also copies the original features
to the subgraph:




In [None]:
# The original node feature of each node in sg1
print(sg1.ndata['x'])
# The original edge feature of each node in sg1
print(sg1.edata['a'])
# The original node feature of each node in sg2
print(sg2.ndata['x'])
# The original edge feature of each node in sg2
print(sg2.edata['a'])

Another common transformation is to add a reverse edge for each edge in
the original graph with ``dgl.add_reverse_edges``.

<div class="alert alert-info">
    
**Note**: If you have an undirected graph, it is better to convert it
   into a bidirectional graph first via adding reverse edges.

</div>




In [None]:
newg = dgl.add_reverse_edges(g)
newg.edges()

Loading and Saving Graphs
-------------------------

You can save a graph or a list of graphs via ``dgl.save_graphs`` and
load them back with ``dgl.load_graphs``.




In [None]:
# Save graphs
dgl.save_graphs('graph.dgl', g)
dgl.save_graphs('graphs.dgl', [g, sg1, sg2])

# Load graphs
(g,), _ = dgl.load_graphs('graph.dgl')
print(g)
(g, sg1, sg2), _ = dgl.load_graphs('graphs.dgl')
print(g)
print(sg1)
print(sg2)


Node Classification with DGL
============================



In [None]:
import dgl
import torch
import torch.nn as nn
import torch.nn.functional as F

Overview of Node Classification with GNN
----------------------------------------

One of the most popular and widely adopted tasks on graph data is node
classification, where a model needs to predict the ground truth category
of each node. Before graph neural networks, many proposed methods are
using either connectivity alone (such as DeepWalk or node2vec), or simple
combinations of connectivity and the node's own features.  GNNs, by
contrast, offers an opportunity to obtain node representations by
combining the connectivity and features of a *local neighborhood*.

[Kipf et
al.,](https://arxiv.org/abs/1609.02907) is an example that formulates
the node classification problem as a semi-supervised node classification
task. With the help of only a small portion of labeled nodes, a graph
neural network (GNN) can accurately predict the node category of the
others.

This tutorial will show how to build such a GNN for semi-supervised node
classification with only a small number of labels on the **Cora
dataset**,
a citation network with papers as nodes and citations as edges. The task
is to predict the category of a given paper. Each paper node contains a
word count vector as its features, normalized so that they sum up to one,
as described in Section 5.2 of [the paper](https://arxiv.org/abs/1609.02907).

Loading Cora Dataset
--------------------




In [None]:
import dgl.data

dataset = dgl.data.CoraGraphDataset()
print('Number of categories:', dataset.num_classes)

A DGL Dataset object may contain one or multiple graphs. The Cora
dataset used in this tutorial only consists of one single graph.




In [None]:
g = dataset[0]

Here, `g` is a `DGLGraph` object. A `DGLGraph` represents a graph.

In [None]:
# Get the number of nodes
print('Number of nodes:', g.num_nodes())
# Get the number of edges
print('Number of edges:', g.num_edges())

A `DGLGraph` stores node features and edge features in two
dictionary-like attributes called ``ndata`` and ``edata``.
In the DGL Cora dataset, the graph contains the following node features:

- ``train_mask``: A boolean tensor indicating whether the node is in the
  training set.

- ``val_mask``: A boolean tensor indicating whether the node is in the
  validation set.

- ``test_mask``: A boolean tensor indicating whether the node is in the
  test set.

- ``label``: The ground truth node category.

-  ``feat``: The node features.




In [None]:
print('Node feature names:', g.ndata.keys())
print('Edge feature names:', g.edata.keys())

In [None]:
g.ndata['train_mask']

In [None]:
print('Number of training nodes:', int(g.ndata['train_mask'].int().sum()))

In [None]:
g.ndata['label']

In [None]:
print('Node feature tensor shape:', g.ndata['feat'].shape)

Defining a Graph Convolutional Network (GCN)
--------------------------------------------

This tutorial will build a two-layer [Graph Convolutional Network(GCN)](http://tkipf.github.io/graph-convolutional-networks/). Each
layer computes new node representations by aggregating neighbor
information.

![img](https://tkipf.github.io/graph-convolutional-networks/images/gcn_web.png)

To build a multi-layer GCN you can simply stack ``dgl.nn.GraphConv``
modules, which inherit ``torch.nn.Module``.




In [None]:
from dgl.nn import GraphConv

class GCN(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN, self).__init__()
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        return h

# Create the model with given dimensions
model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes)

DGL provides implementation of many popular neighbor aggregation
modules. You can easily invoke them with one line of code.




Training the GCN
----------------

Training this GCN is similar to training other PyTorch neural networks.




In [None]:
def train(g, model):
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
    best_val_acc = 0
    best_test_acc = 0

    features = g.ndata['feat']
    labels = g.ndata['label']
    train_mask = g.ndata['train_mask']
    val_mask = g.ndata['val_mask']
    test_mask = g.ndata['test_mask']

    for e in range(100):
        # Forward
        logits = model(g, features)

        # Compute prediction
        pred = logits.argmax(1)

        # Compute loss
        # Note that you should only compute the losses of the nodes in the training set.
        loss = F.cross_entropy(logits[train_mask], labels[train_mask])

        # Compute accuracy on training/validation/test
        train_acc = (pred[train_mask] == labels[train_mask]).float().mean()
        val_acc = (pred[val_mask] == labels[val_mask]).float().mean()
        test_acc = (pred[test_mask] == labels[test_mask]).float().mean()

        # Save the best validation accuracy and the corresponding test accuracy.
        if best_val_acc < val_acc:
            best_val_acc = val_acc
            best_test_acc = test_acc

        # Backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if e % 5 == 0:
            print('In epoch {}, loss: {:.3f}, val acc: {:.3f} (best {:.3f}), test acc: {:.3f} (best {:.3f})'.format(
                e, loss, val_acc, best_val_acc, test_acc, best_test_acc))
model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes)
train(g, model)

Training on GPU
---------------

Training on GPU requires to put both the model and the graph onto GPU
with the ``to`` method, similar to what you will do in PyTorch.

In [None]:
print('Before:', g.device)
g = g.to('cuda')
print('After:', g.device)

It copies all the `ndata` and `edata` to GPU too.

In [None]:
print(g.ndata['train_mask'].device)

Train it on GPU.

In [None]:
model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes).to('cuda')
train(g, model)


Write your own GNN module
=========================

In [None]:
import dgl
import torch
import torch.nn as nn
import torch.nn.functional as F

Message passing and GNNs
------------------------

DGL follows the *message passing paradigm* inspired by the Message
Passing Neural Network proposed by [Gilmer et
al.](https://arxiv.org/abs/1704.01212) Essentially, they found many
GNN models can fit into the following framework:

$$
m_{u\to v}^{(l)} = M^{(l)}\left(h_v^{(l-1)}, h_u^{(l-1)}, e_{u\to v}^{(l-1)}\right);\\
m_{v}^{(l)} = \sum_{u\in\mathcal{N}(v)}m_{u\to v}^{(l)};\\
h_v^{(l)} = U^{(l)}\left(h_v^{(l-1)}, m_v^{(l)}\right)
$$

where DGL calls $M^{(l)}$ the *message function*, $\sum$ the
*reduce function* and $U^{(l)}$ the *update function*. Note that
$\sum$ here can represent any function and is not necessarily a
summation.




For example, the [GraphSAGE convolution (Hamilton et al.,
2017)](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf)
takes the following mathematical form:

$$
h_{\mathcal{N}(v)}^k\leftarrow \text{Average}\{h_u^{k-1},\forall u\in\mathcal{N}(v)\}\\
h_v^k\leftarrow \text{ReLU}\left(W^k\cdot \text{CONCAT}(h_v^{k-1}, h_{\mathcal{N}(v)}^k) \right)
$$

You can see that message passing is directional: the message sent from
one node $u$ to other node $v$ is not necessarily the same
as the other message sent from node $v$ to node $u$ in the
opposite direction.

Although DGL has builtin support of GraphSAGE via `dgl.nn.pytorch.SAGEConv`,
here is how you can implement GraphSAGE convolution in DGL by your own.

In [None]:
import dgl.function as fn

class SAGEConv(nn.Module):
    """Graph convolution module used by the GraphSAGE model.

    Parameters
    ----------
    in_feat : int
        Input feature size.
    out_feat : int
        Output feature size.
    """
    def __init__(self, in_feat, out_feat):
        super(SAGEConv, self).__init__()
        # A linear submodule for projecting the input and neighbor feature to the output.
        self.linear = nn.Linear(in_feat * 2, out_feat)

    def forward(self, g, h):
        """Forward computation

        Parameters
        ----------
        g : Graph
            The input graph.
        h : Tensor
            The input node feature.
        """
        with g.local_scope():
            g.ndata['h'] = h
            # update_all is a message passing API.
            g.update_all(message_func=fn.copy_u('h', 'm'), reduce_func=fn.mean('m', 'h_N'))
            h_N = g.ndata['h_N']
            h_total = torch.cat([h, h_N], dim=1)
            return self.linear(h_total)

The central piece in this code is the `g.update_all`
function, which gathers and averages the neighbor features. There are
three concepts here:

* Message function ``fn.copy_u('h', 'm')`` that
  copies the node feature under name ``'h'`` as *messages* sent to
  neighbors.

* Reduce function ``fn.mean('m', 'h_N')`` that averages
  all the received messages under name ``'m'`` and saves the result as a
  new node feature ``'h_N'``.

* ``update_all`` tells DGL to trigger the
  message and reduce functions for all the nodes and edges.




Afterwards, you can stack your own GraphSAGE convolution layers to form
a multi-layer GraphSAGE network.




In [None]:
class Model(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(Model, self).__init__()
        self.conv1 = SAGEConv(in_feats, h_feats)
        self.conv2 = SAGEConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        return h

### Training loop

The following code for data loading and training loop is directly copied
from the introduction tutorial.




In [None]:
import dgl.data

dataset = dgl.data.CoraGraphDataset()
g = dataset[0]

def train(g, model):
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
    all_logits = []
    best_val_acc = 0
    best_test_acc = 0

    features = g.ndata['feat']
    labels = g.ndata['label']
    train_mask = g.ndata['train_mask']
    val_mask = g.ndata['val_mask']
    test_mask = g.ndata['test_mask']
    for e in range(200):
        # Forward
        logits = model(g, features)

        # Compute prediction
        pred = logits.argmax(1)

        # Compute loss
        # Note that we should only compute the losses of the nodes in the training set,
        # i.e. with train_mask 1.
        loss = F.cross_entropy(logits[train_mask], labels[train_mask])

        # Compute accuracy on training/validation/test
        train_acc = (pred[train_mask] == labels[train_mask]).float().mean()
        val_acc = (pred[val_mask] == labels[val_mask]).float().mean()
        test_acc = (pred[test_mask] == labels[test_mask]).float().mean()

        # Save the best validation accuracy and the corresponding test accuracy.
        if best_val_acc < val_acc:
            best_val_acc = val_acc
            best_test_acc = test_acc

        # Backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        all_logits.append(logits.detach())

        if e % 5 == 0:
            print('In epoch {}, loss: {:.3f}, val acc: {:.3f} (best {:.3f}), test acc: {:.3f} (best {:.3f})'.format(
                e, loss, val_acc, best_val_acc, test_acc, best_test_acc))

model = Model(g.ndata['feat'].shape[1], 16, dataset.num_classes)
train(g, model)

More customization
------------------

In DGL, we provide many built-in message and reduce functions under the
``dgl.function`` package. You can find more details in [the API
documentation](https://docs.dgl.ai/api/python/dgl.function.html).

These APIs allow one to quickly implement new graph convolution modules.
For example, the following implements a new ``SAGEConv`` that aggregates
neighbor representations using a weighted average. Note that ``edata``
member can hold edge features which can also take part in message
passing.




In [None]:
class WeightedSAGEConv(nn.Module):
    """Graph convolution module used by the GraphSAGE model with edge weights.

    Parameters
    ----------
    in_feat : int
        Input feature size.
    out_feat : int
        Output feature size.
    """
    def __init__(self, in_feat, out_feat):
        super(WeightedSAGEConv, self).__init__()
        # A linear submodule for projecting the input and neighbor feature to the output.
        self.linear = nn.Linear(in_feat * 2, out_feat)

    def forward(self, g, h, w):
        """Forward computation

        Parameters
        ----------
        g : Graph
            The input graph.
        h : Tensor
            The input node feature.
        w : Tensor
            The edge weight.
        """
        with g.local_scope():
            g.ndata['h'] = h
            g.edata['w'] = w
            g.update_all(message_func=fn.u_mul_e('h', 'w', 'm'), reduce_func=fn.mean('m', 'h_N'))
            h_N = g.ndata['h_N']
            h_total = torch.cat([h, h_N], dim=1)
            return self.linear(h_total)

Because the graph in this dataset does not have edge weights, we
manually assign all edge weights to one in the ``forward()`` function of
the model. You can replace it with your own edge weights.




In [None]:
class Model(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(Model, self).__init__()
        self.conv1 = WeightedSAGEConv(in_feats, h_feats)
        self.conv2 = WeightedSAGEConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat, torch.ones(g.num_edges()).to(g.device))
        h = F.relu(h)
        h = self.conv2(g, h, torch.ones(g.num_edges()).to(g.device))
        return h

model = Model(g.ndata['feat'].shape[1], 16, dataset.num_classes)
train(g, model)

Even more customization by user-defined function
------------------------------------------------

DGL allows user-defined message and reduce function for the maximal
expressiveness. Here is a user-defined message function that is
equivalent to ``fn.u_mul_e('h', 'w', 'm')``.




In [None]:
def u_mul_e_udf(edges):
    return {'m' : edges.src['h'] * edges.data['w']}

``edges`` has three members: ``src``, ``data`` and ``dst``, representing
the source node feature, edge feature, and destination node feature for
all edges.




You can also write your own reduce function. For example, the following
is equivalent to the builtin ``fn.sum('m', 'h')`` function that sums up
the incoming messages:




In [None]:
def sum_udf(nodes):
    return {'h': nodes.mailbox['m'].sum(1)}

In short, DGL will group the nodes by their in-degrees, and for each
group DGL stacks the incoming messages along the second dimension. You
can then perform a reduction along the second dimension to aggregate
messages.

For more details on customizing message and reduce function with
user-defined function, please refer to the [API
reference](https://docs.dgl.ai/api/python/udf.html).




Best practice of writing custom GNN modules
-------------------------------------------

DGL recommends the following practice ranked by preference:

-  Use ``dgl.nn`` modules.
-  Use ``dgl.nn.functional`` functions which contain lower-level complex
   operations such as computing a softmax for each node over incoming
   edges.
-  Use ``update_all`` with builtin message and reduce functions.
-  Use user-defined message or reduce functions.





Link Prediction using Graph Neural Networks
===========================================

In [None]:
import dgl
import torch
import torch.nn as nn
import torch.nn.functional as F
import itertools
import numpy as np
import scipy.sparse as sp

Overview of Link Prediction with GNN
------------------------------------

Many applications such as social recommendation, item recommendation,
knowledge graph completion, etc., can be formulated as link prediction,
which predicts whether an edge exists between two particular nodes. This
tutorial shows an example of predicting whether a citation relationship,
either citing or being cited, between two papers exists in a citation
network.

This tutorial formulates the link prediction problem as a binary classification
problem as follows:

-  Treat the edges in the graph as *positive examples*.
-  Sample a number of non-existent edges (i.e. node pairs with no edges
   between them) as *negative* examples.
-  Divide the positive examples and negative examples into a training
   set and a test set.
-  Evaluate the model with any binary classification metric such as Area
   Under Curve (AUC).

<div class="alert alert-info">
    
**Note**: The practice comes from
   [SEAL](https://papers.nips.cc/paper/2018/file/53f0d7c537d99b3824f0f99d62ea2428-Paper.pdf),
   although the model here does not use their idea of node labeling.

</div>

In some domains such as large-scale recommender systems or information
retrieval, you may favor metrics that emphasize good performance of
top-K predictions. In these cases you may want to consider other metrics
such as mean average precision, and use other negative sampling methods,
which are beyond the scope of this tutorial.

Loading graph and features
--------------------------

Following the [introduction](1_introduction.ipynb), this tutorial
first loads the Cora dataset.




In [None]:
import dgl.data

dataset = dgl.data.CoraGraphDataset()
g = dataset[0]

Prepare training and testing sets
---------------------------------

This tutorial randomly picks 10% of the edges for positive examples in
the test set, and leave the rest for the training set. It then samples
the same number of edges for negative examples in both sets.




In [None]:
# Split edge set for training and testing
u, v = g.edges()

eids = np.arange(g.number_of_edges())
eids = np.random.permutation(eids)
test_size = int(len(eids) * 0.1)
train_size = g.number_of_edges() - test_size
test_pos_u, test_pos_v = u[eids[:test_size]], v[eids[:test_size]]
train_pos_u, train_pos_v = u[eids[test_size:]], v[eids[test_size:]]

# Find all negative edges and split them for training and testing
adj = sp.coo_matrix((np.ones(len(u)), (u.numpy(), v.numpy())))
adj_neg = 1 - adj.todense() - np.eye(g.number_of_nodes())
neg_u, neg_v = np.where(adj_neg != 0)

neg_eids = np.random.choice(len(neg_u), g.number_of_edges() // 2)
test_neg_u, test_neg_v = neg_u[neg_eids[:test_size]], neg_v[neg_eids[:test_size]]
train_neg_u, train_neg_v = neg_u[neg_eids[test_size:]], neg_v[neg_eids[test_size:]]

When training, you will need to remove the edges in the test set from
the original graph. You can do this via ``dgl.remove_edges``.

<div class="alert alert-info">
    
**Note**: ``dgl.remove_edges`` works by creating a subgraph from the
   original graph, resulting in a copy and therefore could be slow for
   large graphs. If so, you could save the training and test graph to
   disk, as you would do for preprocessing.

</div>




In [None]:
train_g = dgl.remove_edges(g, eids[:test_size])

Define a GraphSAGE model
------------------------

This tutorial builds a model consisting of two
[GraphSAGE](https://arxiv.org/abs/1706.02216) layers, each computes
new node representations by averaging neighbor information. DGL provides
``dgl.nn.SAGEConv`` that conveniently creates a GraphSAGE layer.




In [None]:
from dgl.nn import SAGEConv

# ----------- 2. create model -------------- #
# build a two-layer GraphSAGE model
class GraphSAGE(nn.Module):
    def __init__(self, in_feats, h_feats):
        super(GraphSAGE, self).__init__()
        self.conv1 = SAGEConv(in_feats, h_feats, 'mean')
        self.conv2 = SAGEConv(h_feats, h_feats, 'mean')

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        return h

The model then predicts the probability of existence of an edge by
computing a score between the representations of both incident nodes
with a function (e.g. an MLP or a dot product), which you will see in
the next section.

\begin{align}\hat{y}_{u\sim v} = f(h_u, h_v)\end{align}




Positive graph, negative graph, and ``apply_edges``
---------------------------------------------------

In previous tutorials you have learned how to compute node
representations with a GNN. However, link prediction requires you to
compute representation of *pairs of nodes*.

DGL recommends you to treat the pairs of nodes as another graph, since
you can describe a pair of nodes with an edge. In link prediction, you
will have a *positive graph* consisting of all the positive examples as
edges, and a *negative graph* consisting of all the negative examples.
The *positive graph* and the *negative graph* will contain the same set
of nodes as the original graph.  This makes it easier to pass node
features among multiple graphs for computation.  As you will see later,
you can directly fed the node representations computed on the entire
graph to the positive and the negative graphs for computing pair-wise
scores.

The following code constructs the positive graph and the negative graph
for the training set and the test set respectively.




In [None]:
train_pos_g = dgl.graph((train_pos_u, train_pos_v), num_nodes=g.number_of_nodes())
train_neg_g = dgl.graph((train_neg_u, train_neg_v), num_nodes=g.number_of_nodes())

test_pos_g = dgl.graph((test_pos_u, test_pos_v), num_nodes=g.number_of_nodes())
test_neg_g = dgl.graph((test_neg_u, test_neg_v), num_nodes=g.number_of_nodes())

The benefit of treating the pairs of nodes as a graph is that you can
use the ``DGLGraph.apply_edges`` method, which conveniently computes new
edge features based on the incident nodes’ features and the original
edge features (if applicable).

DGL provides a set of optimized builtin functions to compute new
edge features based on the original node/edge features. For example,
``dgl.function.u_dot_v`` computes a dot product of the incident nodes’
representations for each edge.




In [None]:
import dgl.function as fn

class DotPredictor(nn.Module):
    def forward(self, g, h):
        with g.local_scope():
            g.ndata['h'] = h
            # Compute a new edge feature named 'score' by a dot-product between the
            # source node feature 'h' and destination node feature 'h'.
            g.apply_edges(fn.u_dot_v('h', 'h', 'score'))
            # u_dot_v returns a 1-element vector for each edge so you need to squeeze it.
            return g.edata['score'][:, 0]

You can also write your own function if it is complex.
For instance, the following module produces a scalar score on each edge
by concatenating the incident nodes’ features and passing it to an MLP.




In [None]:
class MLPPredictor(nn.Module):
    def __init__(self, h_feats):
        super().__init__()
        self.W1 = nn.Linear(h_feats * 2, h_feats)
        self.W2 = nn.Linear(h_feats, 1)

    def apply_edges(self, edges):
        """
        Computes a scalar score for each edge of the given graph.

        Parameters
        ----------
        edges :
            Has three members ``src``, ``dst`` and ``data``, each of
            which is a dictionary representing the features of the
            source nodes, the destination nodes, and the edges
            themselves.

        Returns
        -------
        dict
            A dictionary of new edge features.
        """
        h = torch.cat([edges.src['h'], edges.dst['h']], 1)
        return {'score': self.W2(F.relu(self.W1(h))).squeeze(1)}

    def forward(self, g, h):
        with g.local_scope():
            g.ndata['h'] = h
            g.apply_edges(self.apply_edges)
            return g.edata['score']

<div class="alert alert-info">
    
**Note**: The builtin functions are optimized for both speed and memory.
   We recommend using builtin functions whenever possible.

</div>

<div class="alert alert-info">
    
**Note**: If you have read the [message passing
   tutorial](3_message_passing.ipynb), you will notice that the
   argument ``apply_edges`` takes has exactly the same form as a message
   function in ``update_all``.

</div>




Training loop
-------------

After you defined the node representation computation and the edge score
computation, you can go ahead and define the overall model, loss
function, and evaluation metric.

The loss function is simply binary cross entropy loss.

\begin{align}\mathcal{L} = -\sum_{u\sim v\in \mathcal{D}}\left( y_{u\sim v}\log(\hat{y}_{u\sim v}) + (1-y_{u\sim v})\log(1-\hat{y}_{u\sim v})) \right)\end{align}

The evaluation metric in this tutorial is AUC.




In [None]:
model = GraphSAGE(train_g.ndata['feat'].shape[1], 16)
# You can replace DotPredictor with MLPPredictor.
#pred = MLPPredictor(16)
pred = DotPredictor()

def compute_loss(pos_score, neg_score):
    scores = torch.cat([pos_score, neg_score])
    labels = torch.cat([torch.ones(pos_score.shape[0]), torch.zeros(neg_score.shape[0])])
    return F.binary_cross_entropy_with_logits(scores, labels)

def compute_auc(pos_score, neg_score):
    scores = torch.cat([pos_score, neg_score]).numpy()
    labels = torch.cat(
        [torch.ones(pos_score.shape[0]), torch.zeros(neg_score.shape[0])]).numpy()
    return roc_auc_score(labels, scores)

The training loop goes as follows:

<div class="alert alert-info">
    
**Note**: This tutorial does not include evaluation on a validation
   set. In practice you should save and evaluate the best model based on
   performance on the validation set.

</div>




In [None]:
# ----------- 3. set up loss and optimizer -------------- #
# in this case, loss will in training loop
optimizer = torch.optim.Adam(itertools.chain(model.parameters(), pred.parameters()), lr=0.01)

# ----------- 4. training -------------------------------- #
all_logits = []
for e in range(100):
    # forward
    h = model(train_g, train_g.ndata['feat'])
    pos_score = pred(train_pos_g, h)
    neg_score = pred(train_neg_g, h)
    loss = compute_loss(pos_score, neg_score)

    # backward
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if e % 5 == 0:
        print('In epoch {}, loss: {}'.format(e, loss))

# ----------- 5. check results ------------------------ #
from sklearn.metrics import roc_auc_score
with torch.no_grad():
    pos_score = pred(test_pos_g, h)
    neg_score = pred(test_neg_g, h)
    print('AUC', compute_auc(pos_score, neg_score))


Training a GNN for Graph Classification
=======================================


In [None]:
import dgl
import torch
import torch.nn as nn
import torch.nn.functional as F

Overview of Graph Classification with GNN
-----------------------------------------

Graph classification or regression requires a model to predict certain
graph-level properties of a single graph given its node and edge
features.  Molecular property prediction is one particular application.

This tutorial shows how to train a graph classification model for a
small dataset from the paper [How Powerful Are Graph Neural
Networks](https://arxiv.org/abs/1810.00826).

Loading Data
------------




In [None]:
import dgl.data

# Generate a synthetic dataset with 10000 graphs, ranging from 10 to 500 nodes.
dataset = dgl.data.GINDataset('PROTEINS', self_loop=True)

The dataset is a set of graphs, each with node features and a single
label. One can see the node feature dimensionality and the number of
possible graph categories of ``GINDataset`` objects in ``dim_nfeats``
and ``gclasses`` attributes.




In [None]:
print('Node feature dimensionality:', dataset.dim_nfeats)
print('Number of graph categories:', dataset.gclasses)

Defining Data Loader
--------------------

A graph classification dataset usually contains two types of elements: a
set of graphs, and their graph-level labels. Similar to an image
classification task, when the dataset is large enough, we need to train
with mini-batches. When you train a model for image classification or
language modeling, you will use a ``DataLoader`` to iterate over the
dataset. In DGL, you can use the ``GraphDataLoader``.

You can also use various dataset samplers provided in
[`torch.utils.data.sampler`](https://pytorch.org/docs/stable/data.html#data-loading-order-and-sampler).
For example, this tutorial creates a training ``GraphDataLoader`` and
test ``GraphDataLoader``, using ``SubsetRandomSampler`` to tell PyTorch
to sample from only a subset of the dataset.




In [None]:
from dgl.dataloading import GraphDataLoader
from torch.utils.data.sampler import SubsetRandomSampler

num_examples = len(dataset)
num_train = int(num_examples * 0.8)

train_sampler = SubsetRandomSampler(torch.arange(num_train))
test_sampler = SubsetRandomSampler(torch.arange(num_train, num_examples))

train_dataloader = GraphDataLoader(
    dataset, sampler=train_sampler, batch_size=5, drop_last=False)
test_dataloader = GraphDataLoader(
    dataset, sampler=test_sampler, batch_size=5, drop_last=False)

You can try to iterate over the created ``GraphDataLoader`` and see what it
gives:




In [None]:
it = iter(train_dataloader)
batch = next(it)
print(batch)

As each element in ``dataset`` has a graph and a label, the
``GraphDataLoader`` will return two objects for each iteration. The
first element is the batched graph, and the second element is simply a
label vector representing the category of each graph in the mini-batch.
Next, we’ll talked about the batched graph.

A Batched Graph in DGL
----------------------

In each mini-batch, the sampled graphs are combined into a single bigger
batched graph via ``dgl.batch``. The single bigger batched graph merges
all original graphs as separately connected components, with the node
and edge features concatenated. This bigger graph is also a ``DGLGraph``
instance (so you can
still treat it as a normal ``DGLGraph`` object as in
[here](2_dglgraph.ipynb)). It however contains the information
necessary for recovering the original graphs, such as the number of
nodes and edges of each graph element.




In [None]:
batched_graph, labels = batch
print('Number of nodes for each graph element in the batch:', batched_graph.batch_num_nodes())
print('Number of edges for each graph element in the batch:', batched_graph.batch_num_edges())

# Recover the original graph elements from the minibatch
graphs = dgl.unbatch(batched_graph)
print('The original graphs in the minibatch:')
print(graphs)

Define Model
------------

This tutorial will build a two-layer [Graph Convolutional Network
(GCN)](http://tkipf.github.io/graph-convolutional-networks/). Each of
its layer computes new node representations by aggregating neighbor
information. If you have gone through the
[introduction](1_introduction.ipynb), you will notice two
differences:

-  Since the task is to predict a single category for the *entire graph*
   instead of for every node, you will need to aggregate the
   representations of all the nodes and potentially the edges to form a
   graph-level representation. Such process is more commonly referred as
   a *readout*. A simple choice is to average the node features of a
   graph with ``dgl.mean_nodes()``.

-  The input graph to the model will be a batched graph yielded by the
   ``GraphDataLoader``. The readout functions provided by DGL can handle
   batched graphs so that they will return one representation for each
   minibatch element.




In [None]:
from dgl.nn import GraphConv

class GCN(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN, self).__init__()
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        g.ndata['h'] = h
        return dgl.mean_nodes(g, 'h')

Training Loop
-------------

The training loop iterates over the training set with the
``GraphDataLoader`` object and computes the gradients, just like
image classification or language modeling.




In [None]:
# Create the model with given dimensions
model = GCN(dataset.dim_nfeats, 16, dataset.gclasses)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(20):
    for batched_graph, labels in train_dataloader:
        pred = model(batched_graph, batched_graph.ndata['attr'].float())
        loss = F.cross_entropy(pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

num_correct = 0
num_tests = 0
for batched_graph, labels in test_dataloader:
    pred = model(batched_graph, batched_graph.ndata['attr'].float())
    num_correct += (pred.argmax(1) == labels).sum().item()
    num_tests += len(labels)

print('Test accuracy:', num_correct / num_tests)

``DGLDataset`` Object Overview
------------------------------

Your custom graph dataset should inherit the ``dgl.data.DGLDataset``
class and implement the following methods:

-  ``__getitem__(self, i)``: retrieve the ``i``-th example of the
   dataset. An example often contains a single DGL graph, and
   occasionally its label.
-  ``__len__(self)``: the number of examples in the dataset.
-  ``process(self)``: load and process raw data from disk.




Creating a Dataset for Node Classification or Link Prediction from CSV
----------------------------------------------------------------------

A node classification dataset often consists of a single graph, as well
as its node and edge features.

This tutorial takes a small dataset based on [Zachary’s Karate Club
network](https://en.wikipedia.org/wiki/Zachary%27s_karate_club). It
contains

* A ``members.csv`` file containing the attributes of all
  members, as well as their attributes.

* An ``interactions.csv`` file
  containing the pair-wise interactions between two club members.




In [None]:
import urllib.request
import pandas as pd
urllib.request.urlretrieve(
    'https://data.dgl.ai/tutorial/dataset/members.csv', './members.csv')
urllib.request.urlretrieve(
    'https://data.dgl.ai/tutorial/dataset/interactions.csv', './interactions.csv')

members = pd.read_csv('./members.csv')
members.head()

interactions = pd.read_csv('./interactions.csv')
interactions.head()

This tutorial treats the members as nodes and interactions as edges. It
takes age as a numeric feature of the nodes, affiliated club as the label
of the nodes, and edge weight as a numeric feature of the edges.

<div class="alert alert-info">
    
**Note**: The original Zachary’s Karate Club network does not have
   member ages. The ages in this tutorial are generated synthetically
   for demonstrating how to add node features into the graph for dataset
   creation.

</div>

<div class="alert alert-info">
    
**Note**: In practice, taking age directly as a numeric feature may
   not work well in machine learning; strategies like binning or
   normalizing the feature would work better. This tutorial directly
   takes the values as-is for simplicity.

</div>




## Understanding DGL Datasets

In [None]:
import dgl
from dgl.data import DGLDataset
import torch
import os

class KarateClubDataset(DGLDataset):
    def __init__(self):
        super().__init__(name='karate_club')

    def process(self):
        nodes_data = pd.read_csv('./members.csv')
        edges_data = pd.read_csv('./interactions.csv')
        node_features = torch.from_numpy(nodes_data['Age'].to_numpy())
        node_labels = torch.from_numpy(nodes_data['Club'].astype('category').cat.codes.to_numpy())
        edge_features = torch.from_numpy(edges_data['Weight'].to_numpy())
        edges_src = torch.from_numpy(edges_data['Src'].to_numpy())
        edges_dst = torch.from_numpy(edges_data['Dst'].to_numpy())

        self.graph = dgl.graph((edges_src, edges_dst), num_nodes=nodes_data.shape[0])
        self.graph.ndata['feat'] = node_features
        self.graph.ndata['label'] = node_labels
        self.graph.edata['weight'] = edge_features

        # If your dataset is a node classification dataset, you will need to assign
        # masks indicating whether a node belongs to training, validation, and test set.
        n_nodes = nodes_data.shape[0]
        n_train = int(n_nodes * 0.6)
        n_val = int(n_nodes * 0.2)
        train_mask = torch.zeros(n_nodes, dtype=torch.bool)
        val_mask = torch.zeros(n_nodes, dtype=torch.bool)
        test_mask = torch.zeros(n_nodes, dtype=torch.bool)
        train_mask[:n_train] = True
        val_mask[n_train:n_train + n_val] = True
        test_mask[n_train + n_val:] = True
        self.graph.ndata['train_mask'] = train_mask
        self.graph.ndata['val_mask'] = val_mask
        self.graph.ndata['test_mask'] = test_mask

    def __getitem__(self, i):
        return self.graph

    def __len__(self):
        return 1

dataset = KarateClubDataset()
graph = dataset[0]

print(graph)

Since a link prediction dataset only involves a single graph, preparing
a link prediction dataset will have the same experience as preparing a
node classification dataset.




Creating a Dataset for Graph Classification from CSV
----------------------------------------------------

Creating a graph classification dataset involves implementing
``__getitem__`` to return both the graph and its graph-level label.

This tutorial demonstrates how to create a graph classification dataset
with the following synthetic CSV data:

-  ``graph_edges.csv``: containing three columns:

   -  ``graph_id``: the ID of the graph.
   -  ``src``: the source node of an edge of the given graph.
   -  ``dst``: the destination node of an edge of the given graph.

-  ``graph_properties.csv``: containing three columns:

   -  ``graph_id``: the ID of the graph.
   -  ``label``: the label of the graph.
   -  ``num_nodes``: the number of nodes in the graph.




In [None]:
urllib.request.urlretrieve(
    'https://data.dgl.ai/tutorial/dataset/graph_edges.csv', './graph_edges.csv')
urllib.request.urlretrieve(
    'https://data.dgl.ai/tutorial/dataset/graph_properties.csv', './graph_properties.csv')
edges = pd.read_csv('./graph_edges.csv')
properties = pd.read_csv('./graph_properties.csv')

edges.head()

properties.head()

class SyntheticDataset(DGLDataset):
    def __init__(self):
        super().__init__(name='synthetic')

    def process(self):
        edges = pd.read_csv('./graph_edges.csv')
        properties = pd.read_csv('./graph_properties.csv')
        self.graphs = []
        self.labels = []

        # Create a graph for each graph ID from the edges table.
        # First process the properties table into two dictionaries with graph IDs as keys.
        # The label and number of nodes are values.
        label_dict = {}
        num_nodes_dict = {}
        for _, row in properties.iterrows():
            label_dict[row['graph_id']] = row['label']
            num_nodes_dict[row['graph_id']] = row['num_nodes']

        # For the edges, first group the table by graph IDs.
        edges_group = edges.groupby('graph_id')

        # For each graph ID...
        for graph_id in edges_group.groups:
            # Find the edges as well as the number of nodes and its label.
            edges_of_id = edges_group.get_group(graph_id)
            src = edges_of_id['src'].to_numpy()
            dst = edges_of_id['dst'].to_numpy()
            num_nodes = num_nodes_dict[graph_id]
            label = label_dict[graph_id]

            # Create a graph and add it to the list of graphs and labels.
            g = dgl.graph((src, dst), num_nodes=num_nodes)
            self.graphs.append(g)
            self.labels.append(label)

        # Convert the label list to tensor for saving.
        self.labels = torch.LongTensor(self.labels)

    def __getitem__(self, i):
        return self.graphs[i], self.labels[i]

    def __len__(self):
        return len(self.graphs)

dataset = SyntheticDataset()
graph, label = dataset[0]
print(graph, label)

CREDITS: Vaibhav Arora, Department of CSE, IIT Bhilai