Deep Graph Library (DGL)
=====================

DGL is designed to bring machine learning closer to graph-structured data. Specifically DGL enables trouble-free implementation of graph neural network (GNN) model family. Unlike PyTorch or TensorFlow, DGL provides friendly APIs to perform the fundamental operations in GNNs such as message passing and reduction. Through DGL, we hope to benefit both researchers trying out new ideas and engineers in production.

In this tutorial, we demonstrate the basics of DGL including:
- How to create a graph?
- How to manipulate node/edge features on a graph?
- How to convert a graph to/from other formats?

Although this tutorial uses [PyTorch](https://pytorch.org) as backend for tensor-related computations (thus some familiarity with PyTorch is preferred), DGL is designed to be platform-agnostic and can be seamlessly integrated into other frameworks like [MXNet](https://mxnet.apache.org/) and [TensorFlow](https://www.tensorflow.org/), and we are actively working on this.

In [1]:
# A bit of setup, just ignore this cell
import matplotlib.pyplot as plt

# for auto-reloading external modules
%load_ext autoreload
%autoreload 2

%matplotlib inline
plt.rcParams['figure.figsize'] = (8.0, 6.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
plt.rcParams['animation.html'] = 'html5'

We start by creating the well-known *"Zachary's karate club"* social network. The network captures 34 members of a karate club, documenting pairwise links between members who interacted outside the club. The club later splits into two communities led by the instructor (node 0) and club president (node 33). You could read more about the story in the [wiki page](https://en.wikipedia.org/wiki/Zachary%27s_karate_club) A visualization of the network and the community is as follows:

![karate](https://www.dropbox.com/s/uqzor4lqsmbnz8k/karate1.jpg?dl=1)

Creating a graph
-----------------------------------

Let's see how we can create such a graph in DGL. We start with importing `dgl` and other relevant packages.

In [2]:
import dgl

We first create an empty `DGLGraph`. In DGL, nodes are consecutive integers start from 0. The following codes add all the club members into this graph (34 nodes).

In [3]:
G = dgl.DGLGraph()
G.add_nodes(34)
print('Number of nodes:', G.number_of_nodes())

Number of nodes: 34


The Karate Club network contains 78 edges:
```
[1 0]
[2 0] [2 1]
[3 0] [3 1] [3 2]
[4 0]
[5 0]
[6 0] [6 4] [6 5]
[7 0] [7 1] [7 2] [7 3]
[8 0] [8 2]
[9 2]
[10 0] [10 4] [10 5]
[11 0]
[12 0] [12 3]
[13 0] [13 1] [13 2] [13 3]
[16 5] [16 6]
[17 0] [17 1]
[19 0] [19 1]
[21 0] [21 1]
[25 23] [25 24]
[27 2] [27 23] [27 24]
[28 2]
[29 23] [29 26]
[30 1] [30 8]
[31 0] [31 24] [31 25] [31 28]
[32 2] [32 8] [32 14] [32 15] [32 18] [32 20] [32 22] [32 23] [32 29] [32 30] [32 31]
[33 8] [33 9] [33 13] [33 14] [33 15] [33 18] [33 19] [33 20] [33 22] [33 23] [33 26] [33 27] [33 28] [33 29] [33 30] [33 31] [33 32]
```

In DGL, edges can be added by specifying the two endpoints.

In [4]:
G.add_edge(1, 0)
print('Now we have %d edges!' % G.number_of_edges())

Now we have 1 edges!


To add multiple edges at once, use a list/tensor of nodes to specify the endpoints.

In [5]:
import torch

########
# NOTE: in DGL, edges are added by specifying a list of source nodes and a list of destination nodes,
# rather than a list of source-destination node pairs. This is different from other popular graph
# package such as networkx, python-igraph.

########
# NOTE: edges in DGLGraphs are all directional.

# add two edges 2->0 and 2->1 using list
G.add_edges([2, 2], [0, 1])

# add three edges 3->0, 3->1 and 3->2 using torch tensor
src = torch.tensor([3, 3, 3])
dst = torch.tensor([0, 1, 2])
G.add_edges(src, dst)

print('Now we have %d edges!' % G.number_of_edges())

Now we have 6 edges!


In [6]:
# add two edges 4->0, 5->0 using list
G.add_edges([4, 5], 0)

# add three edges 6->0 6->4 6->5 using torch tensor
G.add_edges(6, torch.tensor([0, 4, 5]))

print('Now we have %d edges!' % G.number_of_edges())

Now we have 11 edges!


If the edges share the same source or destination nodes, the list/tensor type can be replaced with a single integer.

In [8]:
# Exercise: please finish the karate club graph by adding the remaining edges. We have provided you all the
# remaining edge tuples in a list.

edge_list = [(7, 0), (7, 1), (7, 2), (7, 3), (8, 0), (8, 2), (9, 2), (10, 0), (10, 4), (10, 5),
             (11, 0), (12, 0), (12, 3), (13, 0), (13, 1), (13, 2), (13, 3), (16, 5), (16, 6),
             (17, 0), (17, 1), (19, 0), (19, 1), (21, 0), (21, 1), (25, 23), (25, 24), (27, 2),
             (27, 23), (27, 24), (28, 2), (29, 23), (29, 26), (30, 1), (30, 8), (31, 0), (31, 24),
             (31, 25), (31, 28), (32, 2), (32, 8), (32, 14), (32, 15), (32, 18), (32, 20), (32, 22),
             (32, 23), (32, 29), (32, 30), (32, 31), (33, 8), (33, 9), (33, 13), (33, 14), (33, 15),
             (33, 18), (33, 19), (33, 20), (33, 22), (33, 23), (33, 26), (33, 27), (33, 28),
             (33, 29), (33, 30), (33, 31), (33, 32)]

# >>> YOUR CODE STARTS
src, dst = [], []
for edge in edge_list:
    src.append(edge[0])
    dst.append(edge[1]) 
G.add_edges(src, dst)
# <<< YOUR CODE ENDS

# We should have 78 edges now!
print('Now we have %d edges!' % G.number_of_edges())

Now we have 78 edges!


Manipulating node/edge features
---------------------------------------------------------

Nodes and edges in `DGLGraph` can have **features** tensors. Features of multiple nodes/edges are batched on the first dimension. Let's start by assigning a random feature vector of length 5 to all nodes.

In [9]:
G.ndata['feat'] = torch.randn((34, 5))

Now each node has a feature vector `'feat'` that has 5 elements. Note since there are 34 nodes in this graph, the first dimension must be of size 34, so that each row corresponds to the feature vector of each node. Error will be raised if the dimension mismatches:

In [10]:
# This will raise error!!
G.ndata['wrong_feat'] = torch.randn((35, 5))

DGLError: Expect number of features to match number of nodes (len(u)). Got 35 and 34 instead.

The `G.ndata` is a dictionary-like structure, so it is compatible with any operation on dictionary.

In [11]:
# Use `dict.update` to add new features (vector of length 3)
G.ndata.update({'another_feat' : torch.randn((34, 3))})

# Print the feature dictionary
print(G.ndata)

# Delete the new feature using `dict.pop`
G.ndata.pop('another_feat')

{'feat': tensor([[ 2.3332e-01, -1.0035e-01, -4.1504e-01,  1.4558e+00, -2.6827e+00],
        [ 1.8231e-01, -1.3371e-01,  3.9542e-01, -5.7070e-01, -3.1217e-01],
        [ 8.2983e-01, -8.1458e-03,  5.8997e-01,  1.3136e+00, -3.0299e-01],
        [-5.3499e-01,  4.7559e-02, -1.2680e+00,  1.2792e-01,  1.6119e-01],
        [-7.8719e-02, -1.2351e+00, -5.0030e-02, -1.2439e+00,  4.4286e-01],
        [ 1.5499e+00,  2.0018e-01,  6.4173e-01,  5.8446e-01,  5.2903e-01],
        [-7.6309e-01,  9.9267e-01, -1.1328e+00, -2.4366e-01,  1.7406e-01],
        [ 5.9102e-01, -2.5117e-02,  1.2031e+00,  1.2212e+00, -1.2203e+00],
        [ 1.2748e+00,  2.4481e-01,  1.5200e+00, -5.2756e-01,  7.5824e-01],
        [ 1.7327e+00,  7.8290e-02, -1.4734e+00,  8.6586e-01,  2.5237e-01],
        [-9.6980e-02,  1.2405e+00,  5.3351e-01,  5.3782e-02, -1.0017e+00],
        [ 5.1013e-01,  6.5147e-01, -2.8352e-01,  1.2659e-01, -1.9465e-01],
        [-6.5036e-02,  3.5325e-01,  2.3813e+00,  3.7117e-01, -5.4881e-01],
        [-6.2291

tensor([[-1.2521,  1.3724,  0.2794],
        [ 1.1787,  0.9644,  0.2220],
        [ 0.3907,  0.3037,  0.9076],
        [ 1.4880, -0.2950,  0.6710],
        [ 0.4060,  0.6824, -0.9988],
        [ 0.7120, -0.5023, -0.2040],
        [-0.4162, -0.4957,  0.9620],
        [ 1.6955, -0.0422,  0.4221],
        [-0.7999, -0.6350,  1.0791],
        [-1.1050,  0.4889, -0.7737],
        [ 2.5395,  0.3825,  0.8944],
        [-1.2996, -1.1596,  0.2047],
        [ 1.8444, -1.4946,  0.4931],
        [-0.9379, -0.3652,  1.0037],
        [-1.4550, -0.1688,  0.4134],
        [-0.6114, -0.5547,  1.2629],
        [-0.5437,  1.0634, -0.0189],
        [-1.6850, -0.0144, -0.7817],
        [ 0.1690,  0.8591,  1.4010],
        [ 0.7272, -0.6738, -0.1946],
        [-1.0178,  0.4127,  0.9394],
        [-1.1290, -1.2488,  0.6854],
        [ 0.8623,  0.0468,  1.6126],
        [-2.7204,  1.5763,  2.4068],
        [ 0.0824, -0.4213, -0.8123],
        [-0.8718,  0.4020,  0.1914],
        [-0.0325, -0.2603, -0.8176],
 

Sometimes, you might want to update features of some but not all of the nodes. This can be done using the following syntax:

In [12]:
# Set node 0's feat to be all-zeros vector. Please be aware of the extra size 1 dimension here.
G.nodes[0].data['feat'] = torch.zeros((1, 5))

# Set node 2, 3's feat to be all-ones vector at once using list type.
G.nodes[[2, 3]].data['feat'] = torch.ones((2, 5))

# Set node 10, 11, 12's feat to be all-twos vector at once using tensor type.
to_change = torch.tensor([10, 11, 12])
G.nodes[to_change].data['feat'] = torch.ones((3, 5)) * 2

In [13]:
# Print the feature dictionary
print(G.ndata)

{'feat': tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 1.8231e-01, -1.3371e-01,  3.9542e-01, -5.7070e-01, -3.1217e-01],
        [ 1.0000e+00,  1.0000e+00,  1.0000e+00,  1.0000e+00,  1.0000e+00],
        [ 1.0000e+00,  1.0000e+00,  1.0000e+00,  1.0000e+00,  1.0000e+00],
        [-7.8719e-02, -1.2351e+00, -5.0030e-02, -1.2439e+00,  4.4286e-01],
        [ 1.5499e+00,  2.0018e-01,  6.4173e-01,  5.8446e-01,  5.2903e-01],
        [-7.6309e-01,  9.9267e-01, -1.1328e+00, -2.4366e-01,  1.7406e-01],
        [ 5.9102e-01, -2.5117e-02,  1.2031e+00,  1.2212e+00, -1.2203e+00],
        [ 1.2748e+00,  2.4481e-01,  1.5200e+00, -5.2756e-01,  7.5824e-01],
        [ 1.7327e+00,  7.8290e-02, -1.4734e+00,  8.6586e-01,  2.5237e-01],
        [ 2.0000e+00,  2.0000e+00,  2.0000e+00,  2.0000e+00,  2.0000e+00],
        [ 2.0000e+00,  2.0000e+00,  2.0000e+00,  2.0000e+00,  2.0000e+00],
        [ 2.0000e+00,  2.0000e+00,  2.0000e+00,  2.0000e+00,  2.0000e+00],
        [-6.2291

Similar to `G.ndata` and `G.nodes`, we have `G.edata` and `G.edges` to access and modify edge features:

In [14]:
# The broness edge feature is just a scalar.
G.edata['broness'] = torch.ones((G.number_of_edges(),))

# The instructor (node 0) is a tough guy, so his friends are a little bit scared of him.
G.edges[G.predecessors(0), 0].data['broness'] *= 0.5

print(G.edata)

{'broness': tensor([0.5000, 0.5000, 1.0000, 0.5000, 1.0000, 1.0000, 0.5000, 0.5000, 0.5000,
        1.0000, 1.0000, 0.5000, 1.0000, 1.0000, 1.0000, 0.5000, 1.0000, 1.0000,
        0.5000, 1.0000, 1.0000, 0.5000, 0.5000, 1.0000, 0.5000, 1.0000, 1.0000,
        1.0000, 1.0000, 1.0000, 0.5000, 1.0000, 0.5000, 1.0000, 0.5000, 1.0000,
        1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000,
        1.0000, 0.5000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000,
        1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000,
        1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000,
        1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000])}


In [15]:
# Exercise: We know that measuring bro-ness cannot be accurate. Could you add some small random noise to it?
# Hint: Use `torch.randn` to add small permutation to it.
#
# >>> YOUR CODE STARTS
G.edata.update({'broness': G.edata['broness']+torch.randn((G.number_of_edges(), ))})
# <<< YOUR CODE ENDS

# You should see some randomness here
print(G.edata['broness'])

tensor([ 0.5700,  0.2474,  1.0279,  1.3383,  0.7024,  0.9783, -0.0583,  0.1227,
         1.4124,  1.6931,  0.4506,  0.7817,  1.7034,  1.4562,  2.0995,  2.9771,
         0.7628,  1.2248,  0.9483,  0.3909,  0.5625,  0.5149, -0.3310,  2.3653,
         1.0948,  0.4690,  2.0007,  2.4956,  1.3753,  0.4548,  3.8960,  1.0648,
         1.1822,  1.4494,  0.9059,  2.1531,  1.0841,  0.7119,  0.9003, -0.0289,
         0.6917,  1.0599,  2.6733, -0.6791,  0.8997,  1.2639,  0.9668,  1.6487,
         1.1514,  2.0106, -0.5015, -0.3952,  1.9192, -0.4011,  1.2004,  0.9634,
         4.8334,  0.6972,  2.4843,  0.7833,  0.1764,  3.3373,  1.3703, -0.4906,
         0.8432,  2.0474,  2.4506,  2.0565,  1.5535,  1.9139, -1.9474, -0.7935,
         1.1582,  0.3328,  0.9263,  0.3746, -0.2375,  2.0368])
