## **Common Benchmark Datasets**
PyTorch Geometric contains a large number of common benchmark datasets, e.g. all Planetoid datasets (Cora, Citeseer, Pubmed), all graph classification datasets from http://graphkernels.cs.tu-dortmund.de/ and their cleaned versions, the QM7 and QM9 dataset, and a handful of 3D mesh/point cloud datasets like FAUST, ModelNet10/40 and ShapeNet.

In [23]:
import torch
from torch_geometric.data import Data
from torch_geometric.datasets import TUDataset
from torch_geometric.datasets import Planetoid

### **ENZYMES dataset**
- Total 600 graphs exists
- The graphs are classified 6 classes

In [39]:
# Load ENZYMES graph dataset 
# consisting of 600 graphs within 6 classes
dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES')

In [40]:
def summary_torch_geometric_data(data):
    # Print data keys
    print("- Keys of the data : {}".format(data.keys))
    # Print feature of node
    print("- Feature of nodes : {}".format(data['x']))
    print("- Labels (y) : {}".format(data['y']))
    print("- Edge indices : {}".format(data['edge_index']))
    # For loop like dict.items()
    for key, item in data:
        print("- {} found in data".format(key))
    # Checking whether contains given keys
    # Print number of the graph nodes
    print("- Number of nodes : {}".format(data.num_nodes))
    print("- Number of edges : {}".format(data.num_edges))
    # Print number of the node features
    print("- Number of node features : {}".format(data.num_node_features))
    # Check isolated nodes in the graph given
    print("- Does the graph contains isolated nodes? : {}".format(data.contains_isolated_nodes()))
    # Check self loop node exists
    print("- Does the graph contains self-loop nodes? : {}".format(data.contains_self_loops()))
    # Check whether the graph is directed graph of not
    print("- Is the graph directed ? : {}".format(data.is_directed()))

In [41]:
print("- number of dataset graphs : {}".format(len(dataset)))

- number of dataset graphs : 600


In [42]:
print(dataset[0])

Data(edge_index=[2, 168], x=[37, 3], y=[1])


In [22]:
summary_torch_geometric_data(dataset[0])

- Keys of the data : ['x', 'edge_index', 'y']
- Feature of nodes : tensor([[1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [

In [19]:
# Example - Train, Test split
dataset = dataset.shuffle()
train_dataset = dataset[:540]
test_dataset  = dataset[540:]
print("- train dataset : {}".format(train_dataset))
print("- test dataset  : {}".format(test_dataset))

- train dataset : ENZYMES(540)
- test dataset  : ENZYMES(60)


### **Cora dataset**
- Single large graph dataset
- Total 7 classes exist in each node
- Total 1433 features exist in each node
- Undirected citation graph

In [24]:
CORA = Planetoid(root='/tmp/Cora', name='Cora')

Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...
Done!


In [26]:
CORA_data = CORA[0]
print(CORA_data)
summary_torch_geometric_data(CORA_data)

Data(edge_index=[2, 10556], test_mask=[2708], train_mask=[2708], val_mask=[2708], x=[2708, 1433], y=[2708])
- Keys of the data : ['x', 'edge_index', 'y', 'train_mask', 'val_mask', 'test_mask']
- Feature of nodes : tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])
- Labels (y) : tensor([3, 4, 4,  ..., 3, 3, 3])
- Edge indices : tensor([[   0,    0,    0,  ..., 2707, 2707, 2707],
        [ 633, 1862, 2582,  ...,  598, 1473, 2706]])
- edge_index found in data
- test_mask found in data
- train_mask found in data
- val_mask found in data
- x found in data
- y found in data
- Number of nodes : 2708
- Number of edges : 10556
- Number of node features : 1433
- Does the graph contains isolated nodes? : False
- Does the graph contains self-loop nodes? : False
- Is the graph directed ? : False


In [34]:
print("- {}".format(CORA_data['train_mask']))
print("- length of train mask : {}".format(sum(CORA_data['train_mask'])))

- tensor([ True,  True,  True,  ..., False, False, False])
- length of train mask : 140


In [36]:
print("- {}".format(CORA_data['val_mask']))
print("- length of val mask : {}".format(sum(CORA_data['val_mask'])))

- tensor([False, False, False,  ..., False, False, False])
- length of val mask : 500


In [37]:
print("- {}".format(CORA_data['test_mask']))
print("- length of test mask : {}".format(sum(CORA_data['test_mask'])))

- tensor([False, False, False,  ...,  True,  True,  True])
- length of test mask : 1000
