![MLU Logo](../images/MLU_Logo.png)



DGL at a Glance
=========================

The goal of this tutorial:

- Understand how DGL enables computation on graph from a high level.
- Train a simple graph neural network in DGL to classify nodes in a graph.

At the end of this tutorial, we hope you get a brief feeling of how DGL works.

*This tutorial assumes basic familiarity with pytorch.*


In [1]:
!pip install dgl
!pip install torch
!pip install matplotlib



Tutorial problem description
----------------------------

The tutorial is based on the "Zachary's karate club" problem. The karate club
is a social network that includes 34 members and documents pairwise links
between members who interact outside the club.  The club later divides into
two communities led by the instructor (node 0) and the club president (node
33). The network is visualized as follows with the color indicating the
community:

![](https://data.dgl.ai/tutorial/img/karate-club.png)


The task is to predict which side (0 or 33) each member tends to join given
the social network itself.



In [6]:
import boto3
import os

course_ID = "MLA-GML"
bucketname = "mlu-courses-datalake"

gcn_filepath = course_ID + "/data/gcn.png"


pathname = './data'
s3 = boto3.resource('s3')
if not os.path.exists("./data"):
    try:
        os.makedirs(pathname)
    except OSError:
        print ("Creation of the directory %s failed" % pathname)
s3.Bucket(bucketname).download_file(gcn_filepath, "./data/gcn.png")
print ("Successfully created the directory %s and downloaded the files" % pathname)

Successfully created the directory ./data and downloaded the files


Step 1: Creating a graph in DGL
-------------------------------
Create the graph for Zachary's karate club as follows:



In [2]:
%matplotlib inline
import dgl
import numpy as np

def build_karate_club_graph():
    # All 78 edges are stored in two numpy arrays. One for source endpoints
    # while the other for destination endpoints.
    src = np.array([1, 2, 2, 3, 3, 3, 4, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9, 10, 10,
        10, 11, 12, 12, 13, 13, 13, 13, 16, 16, 17, 17, 19, 19, 21, 21,
        25, 25, 27, 27, 27, 28, 29, 29, 30, 30, 31, 31, 31, 31, 32, 32,
        32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33,
        33, 33, 33, 33, 33, 33, 33, 33, 33, 33])
    dst = np.array([0, 0, 1, 0, 1, 2, 0, 0, 0, 4, 5, 0, 1, 2, 3, 0, 2, 2, 0, 4,
        5, 0, 0, 3, 0, 1, 2, 3, 5, 6, 0, 1, 0, 1, 0, 1, 23, 24, 2, 23,
        24, 2, 23, 26, 1, 8, 0, 24, 25, 28, 2, 8, 14, 15, 18, 20, 22, 23,
        29, 30, 31, 8, 9, 13, 14, 15, 18, 19, 20, 22, 23, 26, 27, 28, 29, 30,
        31, 32])
    
    # Edges are directional in DGL. Make them bi-directional.
    u = np.concatenate([src, dst])
    v = np.concatenate([dst, src])
    
    # Construct a DGLGraph
    return dgl.DGLGraph((u, v))

Using backend: pytorch


Print out the number of nodes and edges in our newly constructed graph:



In [3]:
%%time

G = build_karate_club_graph()
print('We have %d nodes.' % G.number_of_nodes())
print('We have %d edges.' % G.number_of_edges())

We have 34 nodes.
We have 156 edges.
CPU times: user 4.77 ms, sys: 64 µs, total: 4.84 ms
Wall time: 3.48 ms




Visualize the graph by converting it to a [networkx](
https://networkx.github.io/documentation/stable/) graph:

Till now we only defined the dataset just like the week 1 notebook walkthrough.

Let's move to the next steps.

Step 2: Assign features to nodes or edges
--------------------------------------------
Graph neural networks associate features with nodes and edges for training.
For our classification example, since there is no input feature, we assign each node
with a learnable embedding vector.



In [4]:
# In DGL, you can add features for all nodes at once, using a feature tensor that
# batches node features along the first dimension. The code below adds the learnable
# embeddings for all nodes:

import torch
import torch.nn as nn
import torch.nn.functional as F

# X_a, X_b
embed = nn.Embedding(34, 5)  # 34 nodes with embedding dim equal to 5
G.ndata['feat'] = embed.weight

Print out the node features to verify:



In [5]:
# print out node 2's input feature
print(G.ndata['feat'][2])

# print out node 10 and 11's input features
print(G.ndata['feat'][[10, 11]])

tensor([ 0.6903, -0.8352, -1.9306,  1.1125,  0.5706], grad_fn=<SelectBackward>)
tensor([[ 0.6195,  1.1955, -0.9682, -1.1588,  1.0805],
        [ 1.4310, -0.9853, -0.1343,  1.3694, -1.1998]],
       grad_fn=<IndexBackward>)


Step 3: Define a Graph Convolutional Network (GCN)
--------------------------------------------------
To perform node classification, use the Graph Convolutional Network(GCN) developed by `Kipf and Welling <https://arxiv.org/abs/1609.02907>`. We recommend that you 
read the original paper for more details.

- At layer $l$, each node $v_i^l$ carries a feature vector $h_i^l$.
- Each layer of the GCN tries to aggregate the features from $u_i^{l}$ where
  $u_i$'s are neighborhood nodes to $v$ into the next layer representation at
  $v_i^{l+1}$. This is followed by an affine transformation with some
  non-linearity.

The above definition of GCN fits into a **message-passing** paradigm: Each
node will update its own feature with information sent from neighboring
nodes. 

As we saw in class, the following diagram can be used as a reference to understand GCNs - 


![GCN](./data/gcn.png)

DGL has readily available implementations of popular Graph Neural Network layers under the `dgl.<backend>.nn` subpackage. The `dgl.nn.pytorch.GraphConv` module implements one Graph Convolutional layer.



In [7]:
from dgl.nn.pytorch import GraphConv

Define a deeper GCN model that contains two GCN layers:



In [8]:
class GCN(nn.Module):
    def __init__(self, in_feats, hidden_size, num_classes):
        super(GCN, self).__init__()
        self.conv1 = GraphConv(in_feats, hidden_size)
        self.conv2 = GraphConv(hidden_size, num_classes)

    def forward(self, g, inputs):
        h = self.conv1.forward(g, inputs)
        h = F.relu(h)
        h = self.conv2.forward(g, h)
        return h

# The first layer transforms input features of size of 5 to a hidden size of 5.
# The second layer transforms the hidden layer and produces output features of
# size 2, corresponding to the two groups of the karate club.
net = GCN(5, 5, 2)

Step 4: Data preparation and initialization
-------------------------------------------

We use learnable embeddings to initialize the node features. Since this is a
semi-supervised setting, only the instructor (node 0) and the club president
(node 33) are assigned labels. The implementation is available as follow.



In [9]:
inputs = embed.weight
labeled_nodes = torch.tensor([0, 33])  # only the instructor and the president nodes are labeled
labels = torch.tensor([0, 1])  # their labels are different

Step 5: Train then visualize
----------------------------

The training loop is exactly the same as other PyTorch models.We (1) create an optimizer, (2) feed the inputs to the model, (3) calculate the loss and (4) use autograd to optimize the model.

**NOTE** : Notice the use of random initializations (using `nn.Embedding`) here since we do not have access to any node features. In this case, there is no inductive bias stemming from the features themselves (random input will not effectively inform modifications of random neural network weights). The only source of inductive bias is the structure of the graph. We also modify the embeddings during training (look at the `itertools.chain` call below) - this does not really mean we can use the embeddings as representations for the nodes later, in fact, it promotes overfitting of the GCN parameters for this particular problem.

In [10]:
%%time

import itertools

optimizer = torch.optim.Adam(itertools.chain(net.parameters(), embed.parameters()), lr=0.01)

all_logits = []
for epoch in range(50):
    logits = net(G, inputs)
    # we save the logits for visualization later
    all_logits.append(logits.detach())
    logp = F.log_softmax(logits, dim=1)
    # we only compute loss for labeled nodes
    loss = F.nll_loss(logp[labeled_nodes], labels)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    print('Epoch %d | Loss: %.4f' % (epoch, loss.item()))

Epoch 0 | Loss: 0.7043
Epoch 1 | Loss: 0.6932
Epoch 2 | Loss: 0.6821
Epoch 3 | Loss: 0.6702
Epoch 4 | Loss: 0.6574
Epoch 5 | Loss: 0.6437
Epoch 6 | Loss: 0.6287
Epoch 7 | Loss: 0.6126
Epoch 8 | Loss: 0.5954
Epoch 9 | Loss: 0.5770
Epoch 10 | Loss: 0.5579
Epoch 11 | Loss: 0.5375
Epoch 12 | Loss: 0.5163
Epoch 13 | Loss: 0.4940
Epoch 14 | Loss: 0.4696
Epoch 15 | Loss: 0.4437
Epoch 16 | Loss: 0.4169
Epoch 17 | Loss: 0.3896
Epoch 18 | Loss: 0.3620
Epoch 19 | Loss: 0.3343
Epoch 20 | Loss: 0.3070
Epoch 21 | Loss: 0.2801
Epoch 22 | Loss: 0.2540
Epoch 23 | Loss: 0.2290
Epoch 24 | Loss: 0.2049
Epoch 25 | Loss: 0.1822
Epoch 26 | Loss: 0.1611
Epoch 27 | Loss: 0.1416
Epoch 28 | Loss: 0.1239
Epoch 29 | Loss: 0.1079
Epoch 30 | Loss: 0.0935
Epoch 31 | Loss: 0.0807
Epoch 32 | Loss: 0.0695
Epoch 33 | Loss: 0.0598
Epoch 34 | Loss: 0.0513
Epoch 35 | Loss: 0.0441
Epoch 36 | Loss: 0.0379
Epoch 37 | Loss: 0.0325
Epoch 38 | Loss: 0.0280
Epoch 39 | Loss: 0.0242
Epoch 40 | Loss: 0.0210
Epoch 41 | Loss: 0.0182
Ep

This is a toy example, so it does not have a validation or test
set. Instead, since the model produces an output feature of size 2 for each node, we can
visualize by plotting the output feature in a 2D space.
The following code animates the training process from initial guess
(where the nodes are not classified correctly at all) to the end
(where the nodes are linearly separable).



In [11]:
import matplotlib.animation as animation
import matplotlib.pyplot as plt
import networkx as nx

def draw(i):
    cls1color = '#00FFFF'
    cls2color = '#FF00FF'
    pos = {}
    colors = []
    for v in range(34):
        pos[v] = all_logits[i][v].numpy()
        cls = pos[v].argmax()
        colors.append(cls1color if cls else cls2color)
    ax.cla()
    ax.axis('off')
    ax.set_title('Epoch: %d' % i)
    nx.draw_networkx(nx_G.to_undirected(), pos, node_color=colors,
            with_labels=True, node_size=300, ax=ax)

fig = plt.figure(dpi=150)
fig.clf()
ax = fig.subplots()
nx_G = G.to_networkx()
draw(0)  # draw the prediction of the first epoch
plt.close()

![](https://data.dgl.ai/tutorial/1_first/karate0.png)

The following animation shows how the model correctly predicts the community
after a series of training epochs.



![](https://data.dgl.ai/tutorial/1_first/karate.gif)


In [None]:
# To draw your animation, uncomment and use the following code
# from IPython.display import HTML
# ani = animation.FuncAnimation(fig, draw, frames=len(all_logits), interval=200)
# HTML(ani.to_jshtml())

Optional Exercise
-------------------------------------------

Define a GCN with 3 GraphConv layers (as opposed to 2 defined above). The size of the input (`in_feats`) and output (`num_classes`) remains the same. You can use any choice of hidden size for the intermediate layers. Use the ReLU non-linearity for each intermediate layer.

Now try using this new GCN in the training loop above and see if there are any changes in the loss values.

In [1]:
!lsblk -f

NAME    FSTYPE LABEL UUID                                 MOUNTPOINT
xvda                                                      
└─xvda1 ext4   /     c41aa7cf-1eb0-4e70-9127-dedd83afc209 /
xvdf    ext4         104d5b6d-b0ad-4f67-b466-704f7c195caa /home/ec2-user/SageMak
