## PyG简介（PyG 1.3.2）
1. PyG为用户提供通用的MessagePassing接口，以便对新的研究想法进行快速干净的原型制作。此外，几乎所有近期提出的邻域聚合函数都适用于此接口，其中包括PyG已经集成的方法。

## 1.torch_geometric.data
### a.图数据转换
<img src="pic/PYG1.png" width="350"/> 

In [18]:
import torch
from torch_geometric.data import Data
#无向图表达时，两个节点之间的一条边需要用两个tuple表示

#边表达1：
#每个list代表一条有向边，list第一元素是起始节点，第二个元素是终止节点
edges1 = [[0,1],[1,0],[1,2],[2,1]]
edge_index1 = torch.tensor(edges1, dtype = torch.long)#先转化为tensor

#边表达2：
#两个list，第一个list存储起始节点，第二个list存储终止节点
edges2 = [[0, 1, 1, 2], [1, 0, 2, 1]]
edge_index2 = torch.tensor(edges2, dtype = torch.long)

node_features = [[-1], [0], [1]]
x = torch.tensor(node_features, dtype = torch.float)

data1 = Data(x=x, edge_index = edge_index1.t().contiguous())#转化为PyG DATA对象
data2 = Data(x=x, edge_index = edge_index2)

print(data1)
print(data2)

Data(edge_index=[2, 4], x=[3, 1])
Data(edge_index=[2, 4], x=[3, 1])


### b.图数据的属性查看

In [52]:
print("查看data属性:",data1.keys)

print("查看节点特征:",data1['x'])

for key, item in data1:
    print("{} found in data".format(key))
    
'edge_attr' in data1
 
print("查看节点数:",data1.num_nodes)

print("查看边数",data1.num_edges)

print("查看节点特征维度:", data1.num_node_features)

print("查看是否包含独立节点:", data1.contains_isolated_nodes())

print("查看是否包含节点自环:", data1.contains_self_loops())

print("查看是否为有向图:", data1.is_directed())

'''
data所有方法可以在以下链接查看:
https://pytorch-geometric.readthedocs.io/en/1.3.2/modules/data.html#torch_geometric.data.Data
'''

查看data属性: ['x', 'edge_index']
查看节点特征: tensor([[-1.],
        [ 0.],
        [ 1.]])
edge_index found in data
x found in data
查看节点数: 3
查看边数 4
查看节点特征维度: 1
查看是否包含独立节点: False
查看是否包含节点自环: False
查看是否为有向图: False


'\ndata所有方法可以在以下链接查看:\nhttps://pytorch-geometric.readthedocs.io/en/1.3.2/modules/data.html#torch_geometric.data.Data\n'

### 2.torch_geometric.datasets(公共基准数据集)
<img src="pic/PYG2.png" width="450"/> 

### a.TUDataset

In [53]:
from torch_geometric.datasets import TUDataset

dataset = TUDataset(root = "/home/jerry/local_git/notebook/Pytorch_Tutorail/PyG_Benchmark/ENZYMES", 
                    name = 'ENZYMES')

print("图数据个数:",len(dataset))
print("数据种类:",dataset.num_classes)
print("节点特征数:",dataset.num_node_features)

data = dataset[0]
print("第一张图数据信息:",data,"37个节点，每个节点3维，168/2=84条边，一个图标签")
#图数据分割
train_dataset = dataset[:540]
test_dataset = dataset[540:]
print("训练图数据规模:",train_dataset)
print("测试图数据规模:",test_dataset)
print("打乱数据集dataset.shuffle():",dataset.shuffle())

图数据个数: 600
数据种类: 6
节点特征数: 3
第一张图数据信息: Data(edge_index=[2, 168], x=[37, 3], y=[1]) 37个节点，每个节点3维，168/2=84条边，一个图标签
训练图数据规模: ENZYMES(540)
测试图数据规模: ENZYMES(60)
打乱数据集dataset.shuffle(): ENZYMES(600)


### b.Citeseer

In [55]:
from torch_geometric.datasets import Planetoid

dataset = Planetoid(root="/home/jerry/local_git/notebook/Pytorch_Tutorail/PyG_Benchmark/Citeseer", 
                    name = "Citeseer")

print("图数据个数:", len(dataset))
print("数据种类:",dataset.num_classes)
print("节点特征维度:",dataset.num_node_features)
data = dataset[0]
print("图属性:",data)
print("data.train_mask:",data.train_mask)
print("data.train_mask个数:",data.train_mask.sum().item())

图数据个数: 1
数据种类: 6
节点特征维度: 3703
图属性: Data(edge_index=[2, 9104], test_mask=[3327], train_mask=[3327], val_mask=[3327], x=[3327, 3703], y=[3327])
data.train_mask: tensor([ True,  True,  True,  ..., False, False, False])
data.train_mask个数: 120


### 3.Mini-batches
### a.torch_geometric.data.DataLoader
<img src="pic/PYG3.png" width="450"/> 

In [69]:
from torch_geometric.datasets import TUDataset
from torch_geometric.data import DataLoader

dataset = dataset = TUDataset(root = "/home/jerry/local_git/notebook/Pytorch_Tutorail/PyG_Benchmark/ENZYMES", 
                              name = 'ENZYMES',
                              use_node_attr=True)
loader = DataLoader(dataset, batch_size = 32, shuffle=True)
#loader = DataLoader(dataset, batch_size = 32)

for batch in loader:
    print("一个batch属性:",batch)
    print("一个batch包含的图数据量:",batch.num_graphs)

print(batch.batch)

一个batch属性: Batch(batch=[1042], edge_index=[2, 4048], x=[1042, 21], y=[32])
一个batch包含的图数据量: 32
一个batch属性: Batch(batch=[988], edge_index=[2, 3914], x=[988, 21], y=[32])
一个batch包含的图数据量: 32
一个batch属性: Batch(batch=[893], edge_index=[2, 3536], x=[893, 21], y=[32])
一个batch包含的图数据量: 32
一个batch属性: Batch(batch=[907], edge_index=[2, 3550], x=[907, 21], y=[32])
一个batch包含的图数据量: 32
一个batch属性: Batch(batch=[1079], edge_index=[2, 4142], x=[1079, 21], y=[32])
一个batch包含的图数据量: 32
一个batch属性: Batch(batch=[928], edge_index=[2, 3668], x=[928, 21], y=[32])
一个batch包含的图数据量: 32
一个batch属性: Batch(batch=[1036], edge_index=[2, 3948], x=[1036, 21], y=[32])
一个batch包含的图数据量: 32
一个batch属性: Batch(batch=[1230], edge_index=[2, 4568], x=[1230, 21], y=[32])
一个batch包含的图数据量: 32
一个batch属性: Batch(batch=[1125], edge_index=[2, 4306], x=[1125, 21], y=[32])
一个batch包含的图数据量: 32
一个batch属性: Batch(batch=[1070], edge_index=[2, 4130], x=[1070, 21], y=[32])
一个batch包含的图数据量: 32
一个batch属性: Batch(batch=[1096], edge_index=[2, 3910], x=[1096, 21], y

### b.batch的含义：
<img src="pic/PYG4.png" width="800"/> 

### c.torch_scatter.scatter_mean
<img src="pic/PYG5.png" width="800"/> 
关于PyTorch Scatter的用法:
https://pytorch-scatter.readthedocs.io/en/latest/

In [76]:
from torch_scatter import scatter_mean
from torch_geometric.datasets import TUDataset
from torch_geometric.data import DataLoader

dataset = dataset = TUDataset(root = "/home/jerry/local_git/notebook/Pytorch_Tutorail/PyG_Benchmark/ENZYMES", 
                              name = 'ENZYMES',
                              use_node_attr=True)
loader = DataLoader(dataset, batch_size = 32, shuffle=True)

for data in loader:
    data
#在一个batch中获取每一个图的节点特征矩平均向量，组成一个矩阵
x = scatter_mean(data.x, data.batch, dim=0)
print(x.size())                           

torch.Size([24, 21])


### 4.Data Transforms(构建自己的数据集)
<img src="pic/PYG6.png" width="800"/> 
将17000个3D点云图转化为2D图

In [100]:
import urllib.request
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
response = urllib.request.urlopen('https://www.python.org')

from torch_geometric.datasets import ShapeNet

dataset = ShapeNet(root='/home/jerry/local_git/notebook/Pytorch_Tutorail/PyG_Benchmark/ShapeNet',
                   categories=['Airplane'])

print("3D point 数据：", dataset[0])



3D point 数据： Data(category=[1], edge_index=[2, 15108], pos=[2518, 3], y=[2518])


<img src="pic/PYG7.png" width="800"/> 
使用KNN算法将3D点云图转化为2D图
# 3D图转化为2D图需要第一在下载数据集时就处理，如果是数据集已经处理过，则3D图转2D图失败

In [99]:
import urllib.request
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
response = urllib.request.urlopen('https://www.python.org')

# 使用KNN算法将3D云图转化为2D图
import torch_geometric.transforms as T
from torch_geometric.datasets import ShapeNet

dataset = ShapeNet(root='/home/jerry/local_git/notebook/Pytorch_Tutorail/PyG_Benchmark/ShapeNet',
                   categories=['Airplane'],
                  pre_transform = T.KNNGraph(k=6))

print("3D point 数据 2D 化：", dataset[0])

3D point 数据 2D 化： Data(category=[1], edge_index=[2, 15108], pos=[2518, 3], y=[2518])


<img src="pic/PYG8.png" width="800"/> 
使用KNN算法将3D点云图转化为2D图，添加随机绕动
# 3D图转化为2D图需要第一在下载数据集时就处理，如果是数据集已经处理过，则3D图转2D图失败

In [98]:
import torch_geometric.transforms as T
from torch_geometric.datasets import ShapeNet

dataset = ShapeNet(root='/home/jerry/local_git/notebook/Pytorch_Tutorail/PyG_Benchmark/ShapeNet',
                   categories=['Airplane'],
                  pre_transform = T.KNNGraph(k=6),
                  transform=T.RandomTranslate(0.01))

print("3D point 数据 2D 化,并随机化处理节点：", dataset[0])

3D point 数据 2D 化,并随机化处理节点： Data(category=[1], edge_index=[2, 15108], pos=[2518, 3], y=[2518])


### 5.GCN构建

In [101]:
from torch_geometric.datasets import Planetoid
dataset = Planetoid(root="/home/jerry/local_git/notebook/Pytorch_Tutorail/PyG_Benchmark/Citeseer", 
                    name = "Citeseer")
print(dataset)

Citeseer()


In [104]:
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = GCNConv(dataset.num_node_features, 16)
        self.conv2 = GCNConv(16, dataset.num_classes)
        
        
    def forward(self, data):
        x, edge_index = data.x , data.edge_index
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)
        
        return F.log_softmax(x, dim=1)

device =  torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Net().to(device)
data = dataset[0].to(device)
optimizer = torch.optim.Adam(model.parameters(),
                             lr=0.01, 
                             weight_decay=5e-4)
# 转换为训练模式
model.train()
for epoch in range(200):
    optimizer.zero_grad()
    out = model(data)
    loss = F.nll_loss(out[data.train_mask],data.y[data.train_mask])
    loss.backward()
    optimizer.step()

# 转换为测试模式
model.eval()
_, pred = model(data).max(dim=1)
correct = float (pred[data.test_mask].eq(data.y[data.test_mask]).sum().item())
acc = correct / data.test_mask.sum().item()
print('Accuracy: {:.4f}'.format(acc))
    

Accuracy: 0.6790
