# 官方教程

从官方教程出发，总是能获得精确核心的描述。

- Generalizing well-established neural models like RNNs or CNNs to work on arbitrarily structured graphs is a challenging problem. 
- Cora 数据集合中如果不用图网络，就做一个简单的分类模型，准确率能达到多少？
- 在这里我们的样本是图，每个样本是一个结点，而特征不是图。
- 从 Cora 例子可以看出，图网络是从IID到Non-IID的过渡，样本之间的相关性.
- Graph Pooling 会改变图结构吗？

![](http://tkipf.github.io/graph-convolutional-networks/images/gcn_web.png)

## Research Ideas

- T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction, 时间序列的每个样本可以当成一个节点？

# Quick Start

快速的看看有哪些官方教程和相关文档分别讲了什么。发现要理解这个官方教程，需要做如下的事情：

-  找到对应的文档说明 
    - What is edge convolutional layer?
    - What is MessagePassing?

- https://github.com/rusty1s/pytorch_geometric

PyTorch Geometric (PyG) is a geometric deep learning extension library for PyTorch.

It consists of various methods for deep learning on graphs and other irregular structures, also known as geometric deep learning, from a variety of published papers. In addition, it consists of an easy-to-use mini-batch loader for many small and single giant graphs, multi gpu-support, a large number of common benchmark datasets (based on simple interfaces to create your own), and helpful transforms, both for learning on arbitrary graphs as well as on 3D meshes or point clouds. 

- https://pytorch-geometric.readthedocs.io/en/latest/index.html

PyTorch Geometric user document.



# Zero To All

详细看看官方教程的关注内容。

## PyTorch Geometric Docs

site: https://pytorch-geometric.readthedocs.io/en/latest/index.html



### Introduction by Example

We shortly introduce the fundamental concepts of PyTorch Geometric through self-contained examples. At its core, PyTorch Geometric provides the following main features:

- Data Handling of Graphs
- Common Benchmark Datasets
- Mini-batches
- Data Transforms 是什么？Transforms are a common way in torchvision to transform images and perform augmentation. 是数据增强的一种方式。
- Learning Methods on Graphs 有什么？For a high-level explanation on GCN, have a look at its [blog post](http://tkipf.github.io/graph-convolutional-networks/).

首先需要介绍的是数据类型，居然有一些不错的方法：

- data.contains_isolated_nodes() 会计算每个节点的度？
- data.contains_self_loops() 会有 loops ?

In [4]:
import torch
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

#数据集加载
from torch_geometric.datasets import Planetoid
dataset = Planetoid(root='./tmp/Cora', name='Cora')
device = torch.device('cpu')
data = dataset[0].to(device)


In [7]:
data.x.shape # 2708 篇文献， 1403个关键词 === [num_nodes, num_node_features]

torch.Size([2708, 1433])

In [10]:
data.edge_index.shape # [2, num_edges] 

torch.Size([2, 10556])

In [16]:
data.edge_attr, data.y.shape, data.pos, data.face

(None, torch.Size([2708]), None, None)

We show a simple example of an unweighted and undirected graph with three nodes and four edges. Each node contains exactly one feature: 

![](https://pytorch-geometric.readthedocs.io/en/latest/_images/graph.svg)

该例子和普通概率图例子稍微有点不一样，有几个注意点：

- $x_1=-1$ 是第一个节点的一个特征
- 两条边需要用 four index tuples 来表示. Although the graph has only two edges, we need to define four index tuples to account for both directions of a edge.

In [17]:
import torch
from torch_geometric.data import Data

edge_index = torch.tensor([[0, 1, 1, 2],
                           [1, 0, 2, 1]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)

data = Data(x=x, edge_index=edge_index)
data

Data(edge_index=[2, 4], x=[3, 1])

PyTorch Geometric contains a large number of common benchmark datasets, e.g. all Planetoid datasets (Cora, Citeseer, Pubmed), all graph classification datasets from http://graphkernels.cs.tu-dortmund.de/ and their cleaned versions, the QM7 and QM9 dataset, and a handful of 3D mesh/point cloud datasets like FAUST, ModelNet10/40 and ShapeNet.

Initializing a dataset is straightforward. An initialization of a dataset will automatically download its raw files and process them to the previously described Data format. E.g., to load the ENZYMES dataset (consisting of 600 graphs within 6 classes), type:

In [18]:
from torch_geometric.datasets import TUDataset
dataset = TUDataset(root='./tmp/ENZYMES', name='ENZYMES')

Downloading https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/ENZYMES.zip
Extracting tmp/ENZYMES/ENZYMES.zip
Processing...
Done!


dataset.num_classes

In [55]:
dataset.num_classes, dataset.num_node_features, len(dataset)


(7, 1433, 1)

In [27]:
data = dataset[0]
data, data.is_undirected()

(Data(edge_index=[2, 168], x=[37, 3], y=[1]), True)

### Creating Message Passing Networks

- 为什么是图卷积的推广？


- 官方理解 [Creating Message Passing Networks](https://github.com/rusty1s/pytorch_geometric/blob/master/docs/source/notes/create_gnn.rst)


Generalizing the convolution operator to irregular domains is typically expressed as a *neighborhood aggregation* or *message passing* scheme. With $\mathbf{x}^{(k-1)}_i \in \mathbb{R}^F$ denoting node features of node $i$ in layer $(k-1)$ and $\mathbf{e}_{i,j} \in \mathbb{R}^D$ denoting (optional) edge features from node $i$ to node $j$, message passing graph neural networks can be described as

$$
\mathbf{x}_i^{(k)} = \gamma^{(k)} \left( \mathbf{x}_i^{(k-1)}, \square_{j \in \mathcal{N}(i)} \, \phi^{(k)}\left(\mathbf{x}_i^{(k-1)}, \mathbf{x}_j^{(k-1)},\mathbf{e}_{i,j}\right) \right),
$$

where $\square$ denotes a differentiable, permutation invariant function, e.g., sum, mean or max, and $\gamma$ and $\phi$ denote differentiable functions such as MLPs (Multi Layer Perceptrons).

和图卷积的关系


$$
\mathbf{x}_i^{(k)} = \sum_{j \in \mathcal{N}(i) \cup \{ i \}} \frac{1}{\sqrt{\deg(i)} \cdot \sqrt{deg(j)}} \cdot \left( \mathbf{\Theta} \cdot \mathbf{x}_j^{(k-1)} \right),
$$

where neighboring node features are first transformed by a weight matrix $\Theta$, normalized by their degree, and finally summed up. $\sum_{j \in \mathcal{N}(i) \cup \{ i \}} \frac{1}{\sqrt{\deg(i)} \cdot \sqrt{deg(j)}}$ 表示 Adjacent.

代码中的写法是这样的：The graph convolutional operator from the “Semi-supervised Classification with Graph Convolutional Networks” paper

$$
\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}
\mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},
$$

where $\hat{A}=A+I$ denotes the adjacency matrix with inserted self-loops and $D^{ii}=\sum_{j=0}A_{ij}$ its diagonal degree matrix.

- 那么为什么是 GAT 的推广呢？
    - 边信息： $e_{ij}^k = a^k(Wh_i, W h_j)$
    - 结点更新：$h'_i = \sigma(\frac{1}{K}\sum_{k=1}^K\sum_{j\in N_i} \alpha_{ij}^kW^k h_j)$

### 注意力机制网络


site: https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html


The graph attentional operator from the “Graph Attention Networks” paper

$$
\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} +
\sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},
$$

where the attention coefficients $\alpha_{i,j}$ are computed as

$$
\alpha_{i,j} =
\frac{
\exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top}
[\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j]
\right)\right)}
{\sum_{k \in \mathcal{N}(i) \cup \{ i \}}
\exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top}
[\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k]
\right)\right)}.
$$

这个网络继承自一个很好父类, **有关于继承的trick可以学习**。 MessagePassing site: https://pytorch-geometric.readthedocs.io/en/latest/_modules/torch_geometric/nn/conv/message_passing.html#MessagePassing



### Pooling 

有多种不同的 pooling

- global_add_pool(x, batch, size=None)
- global_mean_pool(x, batch, size=None)
- global_max_pool(x, batch, size=None)

那么 Pooling 会改变图结构吗？

## 官方推荐博客

site: http://tkipf.github.io/graph-convolutional-networks/

需要从这个博客中了解到GCN的每个实现细节。

![](http://tkipf.github.io/graph-convolutional-networks/images/gcn_web.png)

### 两层 GCN

In [42]:
import torch
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

#数据集加载
from torch_geometric.datasets import Planetoid
dataset = Planetoid(root='./tmp/Cora', name='Cora')
len(dataset), dataset.num_classes, dataset.num_node_features

#网络定义
class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = GCNConv(dataset.num_node_features, 16)
        self.conv2 = GCNConv(16, dataset.num_classes)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)

        return F.log_softmax(x, dim=1)

# device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device = torch.device('cpu')

model = Net().to(device)
data = dataset[0].to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

#网络训练
model.train()
for epoch in range(200):
    optimizer.zero_grad()
    out = model(data)
    loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    
#测试
model.eval()
_, pred = model(data).max(dim=1)
correct = float(pred[data.test_mask].eq(data.y[data.test_mask]).sum().item())
acc = correct / data.test_mask.sum().item()
print('Accuracy: {:.4f}'.format(acc))

Accuracy: 0.7940


In [54]:
import torch
import torch
import torch.nn.functional as F
import torch.nn as nn
from torch_geometric.nn import GCNConv

#数据集加载
from torch_geometric.datasets import Planetoid
dataset = Planetoid(root='./tmp/Cora', name='Cora')
len(dataset), dataset.num_classes, dataset.num_node_features

#网络定义
class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = GCNConv(dataset.num_node_features, 16)
        self.conv2 = GCNConv(16, dataset.num_classes)
        self.fc = nn.Linear(7, 7) # 加入全连接层

    def forward(self, data):
        x, edge_index = data.x, data.edge_index 
        print(x.shape)
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        print(x.shape)
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)
        x = self.fc(x) 
        print(x.shape)
        return F.log_softmax(x, dim=1)

# device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device = torch.device('cpu')
model = Net().to(device)
data = dataset[0].to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
out = model(data)
    

torch.Size([2708, 1433])
torch.Size([2708, 16])
torch.Size([2708, 7])


In [37]:
data.x.shape, data.edge_index.shape

(torch.Size([2708, 1433]), torch.Size([2, 10556]))

In [41]:
data.edge_index

tensor([[   0,    0,    0,  ..., 2707, 2707, 2707],
        [ 633, 1862, 2582,  ...,  598, 1473, 2706]])

In [49]:
[i.shape for i in model.parameters()]

[torch.Size([1433, 16]),
 torch.Size([16]),
 torch.Size([16, 7]),
 torch.Size([7])]

In [50]:
data.edge_index # Adjacent Matrix 的维度应该是 [N, N] , 也就是 [2708, 2708]

tensor([[   0,    0,    0,  ..., 2707, 2707, 2707],
        [ 633, 1862, 2582,  ...,  598, 1473, 2706]])

相邻的边才能影响到相关的取值 with $A$ is the Adjacent Matrix. 在网络中这应该是一个自动去计算的模块。

$$
\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}\mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},  
$$

为什么这个数学公式做到了只有相邻节点的边会影响状态更新呢？从简单出发，如没有没有 $A$（也就是说 $A=I$），那么就样本之间相互不会影响，有了 $A$ 就相当于考虑的非独立同分布的数据，样本之间有了相互影响。


In [51]:
?GCNConv

[0;31mInit signature:[0m
[0mGCNConv[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0min_channels[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mout_channels[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mimproved[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcached[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mbias[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m**[0m[0mkwargs[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
The graph convolutional operator from the `"Semi-supervised
Classification with Graph Convolutional Networks"
<https://arxiv.org/abs/1609.02907>`_ paper

.. math::
    \mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}
    \mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},

where :math:`\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}` denotes the
adjacency matrix with inserted self-loops and
:math:`\hat{D}_{ii} = \sum_{j

### Conclusion




Research on this topic is just getting started. The past several months have seen exciting developments, but we have probably only scratched the surface of these types of models so far. It remains to be seen how neural networks on graphs can be further taylored to specific types of problems, like, e.g., learning on directed or relational graphs, and how one can use learned graph embeddings for further tasks down the line, etc. This list is by no means exhaustive and I expect further interesting applications and extensions to pop up in the near future. Let me know in the comments below if you have some exciting ideas or questions to share!


关于该主题的研究才刚刚开始。在过去的几个月中，出现了令人振奋的发展，但到目前为止，我们可能只涉及这些类型的模型。还有待观察的是，图上的神经网络如何进一步解决特定类型的问题，例如在有向图或关系图上学习，以及如何将学习到的图嵌入用于线下的其他任务等。此列表这绝不是详尽无遗的，我希望在不久的将来会出现更多有趣的应用程序和扩展。如果您有一些令人兴奋的想法或问题要分享，请在下面的评论中告诉我！