data.x: Node feature matrix with shape [num_nodes, num_node_features]

data.edge_index: Graph connectivity in COO format with shape [2, num_edges] and type torch.long

data.edge_attr: Edge feature matrix with shape [num_edges, num_edge_features]

data.y: Target to train against (may have arbitrary shape), e.g., node-level targets of shape [num_nodes, *] or graph-level targets of shape [1, *]

data.pos: Node position matrix with shape [num_nodes, num_dimensions]

![](https://pytorch-geometric.readthedocs.io/en/latest/_images/graph.svg)

In [23]:
import torch
from torch_geometric.data import Data
# Some notes:
    # - row based, not tuples
    # - bidirectional edges so we need to include them in both directions
edge_index = torch.tensor([[0, 1, 1, 2],
                           [1, 0, 2, 1]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)
data = Data(x=x,edge_index=edge_index)
data

# Can also input data this way:
    # need to transpose and make contiguous
edge_index = torch.tensor([[0,1],
                           [1,0],
                           [1,2],
                           [2,1]])
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)
data = Data(x=x,edge_index=edge_index.t().contiguous())
data # prints shape



Data(x=[3, 1], edge_index=[2, 4])

In [24]:
# DATA UTILS
print(data.keys)
print(data['x'])
data.num_nodes
data.num_edges
data.num_node_features
data.is_undirected()

['edge_index', 'x']
tensor([[-1.],
        [ 0.],
        [ 1.]])


True

## BENCHMARK DATASETS ##

PyG contains a large number of common benchmark datasets, e.g., all Planetoid datasets (Cora, Citeseer, Pubmed), all graph classification datasets from http://graphkernels.cs.tu-dortmund.de and their cleaned versions, the QM7 and QM9 dataset, and a handful of 3D mesh/point cloud datasets like FAUST, ModelNet10/40 and ShapeNet.

In [18]:
from torch_geometric.datasets import TUDataset
dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES')
dataset

ENZYMES(600)

In [25]:
print(len(dataset),dataset.num_classes,dataset.num_node_features)

# split dataset by grabbing samples:
train = dataset[:540]
test = dataset[540:]

600 6 3


In [22]:
data = dataset[0]
print(data,data.is_undirected())

Data(edge_index=[2, 168], x=[37, 3], y=[1]) True


In [27]:
from torch_geometric.datasets import Planetoid
dataset = Planetoid(root='/tmp/Cora', name='Cora')
print(len(dataset),dataset.num_classes,dataset.num_node_features)
data = dataset[0]
print(data.train_mask.sum().item(),data.val_mask.sum().item(),data.test_mask.sum().item())

1 7 1433
140 500 1000


## Mini Batching ##
Neural networks are usually trained in a batch-wise fashion. PyG achieves parallelization over a mini-batch by creating sparse block diagonal adjacency matrices (defined by edge_index) and concatenating feature and target matrices in the node dimension. This composition allows differing number of nodes and edges over examples in one batch:
https://pytorch-geometric.readthedocs.io/en/latest/notes/introduction.html#mini-batches

batch is a column vector which maps each node to its respective graph in the batch:
You can use it to, e.g., average node features in the node dimension for each graph individually:

In [31]:
from torch_geometric.loader import DataLoader
from torch_geometric.datasets import TUDataset
dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES')
loader = DataLoader(dataset, batch_size=32, shuffle=True)
for batch in loader:
    print(batch)
    print(batch.num_graphs)
    break

DataBatch(edge_index=[2, 4406], x=[1115, 3], y=[32], batch=[1115], ptr=[33])
32


## Data Transforms ##

Transforms are a common way in torchvision to transform images and perform augmentation. PyG comes with its own transforms, which expect a Data object as input and return a new transformed Data object. Transforms can be chained together using torch_geometric.transforms.Compose and are applied before saving a processed dataset on disk (pre_transform) or before accessing a graph in a dataset (transform).

In [32]:
from torch_geometric.datasets import ShapeNet

dataset = ShapeNet(root='/tmp/ShapeNet', categories=['Airplane'])
# dataset = ShapeNet(root='/tmp/ShapeNet', categories=['Airplane'],
#                     pre_transform=T.KNNGraph(k=6))

dataset[0]


Downloading https://shapenet.cs.stanford.edu/media/shapenetcore_partanno_segmentation_benchmark_v0_normal.zip
Extracting /tmp/ShapeNet/shapenetcore_partanno_segmentation_benchmark_v0_normal.zip
Processing...
Done!


Data(x=[2518, 3], y=[2518], pos=[2518, 3], category=[1])

We use the pre_transform to convert the data before saving it to disk (leading to faster loading times). Note that the next time the dataset is initialized it will already contain graph edges, even if you do not pass any transform. If the pre_transform does not match with the one from the already processed dataset, you will be given a warning.

In addition, we can use the transform argument to randomly augment a Data object, e.g., translating each node position by a small number:

In [33]:
import torch_geometric.transforms as T
from torch_geometric.datasets import ShapeNet

dataset = ShapeNet(root='/tmp/ShapeNet', categories=['Airplane'],
                    pre_transform=T.KNNGraph(k=6),
                    transform=T.RandomJitter(0.01))

dataset[0]



Data(x=[2518, 3], y=[2518], pos=[2518, 3], category=[1])

## Learning Methods on Graphs ##

In [34]:

from torch_geometric.datasets import Planetoid
dataset = Planetoid(root='/tmp/Cora', name='Cora')

In [37]:
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

# define the network
class GCN(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = GCNConv(dataset.num_node_features, 16)
        self.conv2 = GCNConv(16, dataset.num_classes)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index

        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)

        return F.log_softmax(x, dim=1)

In [39]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = GCN().to(device)
data = dataset[0].to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)


model.train()
for epoch in range(200):
    optimizer.zero_grad()
    out = model(data)
    loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()

model.eval()
pred = model(data).argmax(dim=1)
correct = (pred[data.test_mask] == data.y[data.test_mask]).sum()
acc = int(correct) / int(data.test_mask.sum())
print(f'Accuracy: {acc:.4f}')

Accuracy: 0.8100
