# Access node labels for Dynamic Node Property Prediction

This tutorial will show you how to access node labels and edge data for the node property prediction datasets in `tgb`.

The source code is stored in `dataset_pyg.py` in `tgb/nodeproppred` folder

This tutorial requires `Pytorch` and `PyG`, refer to `README.md` for installation instructions

This tutorial uses `PyG TemporalData` object, however it is possible to use `numpy` arrays as well.

see examples in `examples/nodeproppred` folder for more details.


In [1]:
from tgb.nodeproppred.dataset_pyg import PyGNodePropPredDataset
from torch_geometric.loader import TemporalDataLoader

specifying the name of the dataset

In [2]:
name = "tgbn-genre"

### Process and load the dataset

if the dataset has been processed, it will be loaded from disc for fast access

if the dataset has not been downloaded, it will be processed automatically

In [3]:
dataset = PyGNodePropPredDataset(name=name, root="datasets")
type(dataset)

file found, skipping download
Dataset directory is  /mnt/f/code/TGB/tgb/datasets/tgbn_genre
loading processed file


tgb.nodeproppred.dataset_pyg.PyGNodePropPredDataset

### Train, Validation and Test splits with dataloaders

spliting the edges into train, val, test sets and construct dataloader for each

In [4]:
train_mask = dataset.train_mask
val_mask = dataset.val_mask
test_mask = dataset.test_mask


data = dataset.get_TemporalData()

train_data = data[train_mask]
val_data = data[val_mask]
test_data = data[test_mask]

batch_size = 200
train_loader = TemporalDataLoader(train_data, batch_size=batch_size)
val_loader = TemporalDataLoader(val_data, batch_size=batch_size)
test_loader = TemporalDataLoader(test_data, batch_size=batch_size)


### Access node label data 

In `tgb`, the node label data are queried based on the nearest edge observed so far and retrieves the node label data for the corresponding day. 

Note that this is because the node labels often have different timestamps from the edges thus should be processed at the correct time in the edge stream.

In the example below, we show how to iterate through the edges and retrieve the node labels of the corresponding time. 

In [5]:
#query the timestamps for the first node labels
label_t = dataset.get_label_time()

for batch in train_loader:
    #access the edges in this batch
    src, dst, t, msg = batch.src, batch.dst, batch.t, batch.msg
    query_t = batch.t[-1]
    # check if this batch moves to the next day
    if query_t > label_t:
        # find the node labels from the past day
        label_tuple = dataset.get_node_label(query_t)
        # node labels are structured as a tuple with (timestamps, source node, label) format, label is a vector
        label_ts, label_srcs, labels = (
            label_tuple[0],
            label_tuple[1],
            label_tuple[2],
        )
        label_t = dataset.get_label_time()

        #insert your code for backproping with node labels here
            