# Latency Dataset - GNN Model

Considering the dataset is encoded in a graph format, here is an example of using GNN to predict the model latency with the bench dataset. 

In the previous work of [BRP-NAS](https://arxiv.org/abs/2007.08668v2), the authors propose an end-to-end latency predictor which consists of a GCN. Their GCN predictor demonstrates significant improvement over the layer-wise predictor on [NAS-Bench-201](https://arxiv.org/abs/2001.00326). While on our bench dataset, the preformance of BRP-NAS is consistently poor. As discussed in our paper, the reason is the model graph difference between training and testing set. GNN learns the representation of model graphs. Although the models in our bench dataset have largely overlapped operator types, the operator configurations, edges, and model latency ranges are different.

To better deal with the problems above, we give a GNN example with graph representation improved. We first build our GNN model, which is constructed based on GraphSAGE, and maxpooling is selected as out pooling method. Next, we will start training after the data is loaded. `GNNDataset` and `GNNDataloader` in `nn_meter/dataset/gnn_dataloader.py` build the model structure of the Dataset in `.jsonl` format into our required Dataset and Dataloader. 

Let's start our journey!

## Step 1: Build our GraphSAGE Model

We built our model with the help of DGL library.

In [1]:
import torch
import torch.nn as nn
from torch.nn.modules.module import Module

from dgl.nn.pytorch.glob import MaxPooling
import dgl.nn as dglnn
from torch.optim.lr_scheduler import CosineAnnealingLR


class GNN(Module):
    def __init__(self, 
                num_features=0, 
                num_layers=2,
                num_hidden=32,
                dropout_ratio=0):

        super(GNN, self).__init__()
        self.nfeat = num_features
        self.nlayer = num_layers
        self.nhid = num_hidden
        self.dropout_ratio = dropout_ratio
        self.gc = nn.ModuleList([dglnn.SAGEConv(self.nfeat if i==0 else self.nhid, self.nhid, 'pool') for i in range(self.nlayer)])
        self.bn = nn.ModuleList([nn.LayerNorm(self.nhid) for i in range(self.nlayer)])
        self.relu = nn.ModuleList([nn.ReLU() for i in range(self.nlayer)])
        self.pooling = MaxPooling()
        self.fc = nn.Linear(self.nhid, 1)
        self.fc1 = nn.Linear(self.nhid, self.nhid)
        self.dropout = nn.ModuleList([nn.Dropout(self.dropout_ratio) for i in range(self.nlayer)])

    def forward_single_model(self, g, features):
        x = self.relu[0](self.bn[0](self.gc[0](g, features)))
        x = self.dropout[0](x)
        for i in range(1,self.nlayer):
            x = self.relu[i](self.bn[i](self.gc[i](g, x)))
            x = self.dropout[i](x)
        return x

    def forward(self, g, features):
        x = self.forward_single_model(g, features)
        with g.local_scope():
            g.ndata['h'] = x
            x = self.pooling(g, x)
            x = self.fc1(x)
            return self.fc(x)

Using backend: pytorch


## Step 2: Loading Data.

Next, we will finish loading the data and learn about the size of the Training and Testing datasets.

In [2]:
import os
from nn_meter.dataset import gnn_dataloader

target_device = "cortexA76cpu_tflite21"

print("Processing Training Set.")
train_set = gnn_dataloader.GNNDataset(train=True, device=target_device) 
print("Processing Testing Set.")
test_set = gnn_dataloader.GNNDataset(train=False, device=target_device)

train_loader = gnn_dataloader.GNNDataloader(train_set, batchsize=1 , shuffle=True)
test_loader = gnn_dataloader.GNNDataloader(test_set, batchsize=1, shuffle=False)
print('Train Dataset Size:', len(train_set))
print('Testing Dataset Size:', len(test_set))
print('Attribute tensor shape:', next(train_loader)[1].ndata['h'].size(1))
ATTR_COUNT = next(train_loader)[1].ndata['h'].size(1)

Processing Training Set.
Processing Testing Set.
Train Dataset Size: 20732
Testing Dataset Size: 5173
Attribute tensor shape: 26


## Step 3: Run and Test

We can run the model and evaluate it now!

In [3]:
if torch.cuda.is_available():
    print("Using CUDA.")
# device = "cpu"
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Start Training
load_model = False
if load_model:
    model = GNN(ATTR_COUNT, 3, 400, 0.1).to(device)
    opt = torch.optim.AdamW(model.parameters(), lr=4e-4)
    checkpoint = torch.load('LatencyGNN.pt')
    model.load_state_dict(checkpoint['model_state_dict'])
    opt.load_state_dict(checkpoint['optimizer_state_dict'])
    # EPOCHS = checkpoint['epoch']
    EPOCHS = 0
    loss_func = checkpoint['loss']
else:
    model = GNN(ATTR_COUNT, 3, 400, 0.1).to(device)
    opt = torch.optim.AdamW(model.parameters(), lr=4e-4)
    EPOCHS=20
    loss_func = nn.L1Loss()

lr_scheduler = CosineAnnealingLR(opt, T_max=EPOCHS)
loss_sum = 0
for epoch in range(EPOCHS):
    train_length = len(train_set)
    tran_acc_ten = 0
    loss_sum = 0 
    # latency, graph, types, flops
    for batched_l, batched_g in train_loader:
        opt.zero_grad()
        batched_l = batched_l.to(device).float()
        batched_g = batched_g.to(device)
        batched_f = batched_g.ndata['h'].float()
        logits = model(batched_g, batched_f)
        for i in range(len(batched_l)):
            pred_latency = logits[i].item()
            prec_latency = batched_l[i].item()
            if (pred_latency >= 0.9 * prec_latency) and (pred_latency <= 1.1 * prec_latency):
                tran_acc_ten += 1
        # print("true latency: ", batched_l)
        # print("Predict latency: ", logits)
        batched_l = torch.reshape(batched_l, (-1 ,1))
        loss = loss_func(logits, batched_l)
        loss_sum += loss
        loss.backward()
        opt.step()
    lr_scheduler.step()
    print("[Epoch ", epoch, "]: ", "Training accuracy within 10%: ", tran_acc_ten / train_length * 100, " %.")
    # print('Learning Rate:', lr_scheduler.get_last_lr())
    # print('Loss:', loss_sum / train_length)

# Save The Best Model
torch.save({
    'epoch': EPOCHS,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': opt.state_dict(),
    'loss': loss_func,
}, 'LatencyGNN.pt')

# Start Testing
count = 0
with torch.no_grad():
    test_length = len(test_set)
    test_acc_ten = 0
    for batched_l, batched_g in test_loader:
        batched_l = batched_l.to(device).float()
        batched_g = batched_g.to(device)
        batched_f = batched_g.ndata['h'].float()
        result = model(batched_g, batched_f)
        if (result.item() >= 0.9 * batched_l.item()) and (result.item() <= 1.1 * batched_l.item()):
            test_acc_ten += 1
        acc = (abs(result.item() - batched_l.item()) / batched_l.item()) * 100
        count += 1
    print("Testing accuracy within 10%: ", test_acc_ten / test_length * 100, " %.")

[Epoch  0 ]:  Training accuracy within 10%:  21.999807061547365  %.
[Epoch  1 ]:  Training accuracy within 10%:  27.725255643449742  %.
[Epoch  2 ]:  Training accuracy within 10%:  30.228632066370825  %.
[Epoch  3 ]:  Training accuracy within 10%:  31.357322014277443  %.
[Epoch  4 ]:  Training accuracy within 10%:  33.06000385876906  %.
[Epoch  5 ]:  Training accuracy within 10%:  34.917036465367545  %.
[Epoch  6 ]:  Training accuracy within 10%:  36.48466139301563  %.
[Epoch  7 ]:  Training accuracy within 10%:  39.070036658306  %.
[Epoch  8 ]:  Training accuracy within 10%:  40.10708084121165  %.
[Epoch  9 ]:  Training accuracy within 10%:  41.530001929384525  %.
[Epoch  10 ]:  Training accuracy within 10%:  43.26162454177118  %.
[Epoch  11 ]:  Training accuracy within 10%:  45.34053636889832  %.
[Epoch  12 ]:  Training accuracy within 10%:  48.45166891761528  %.
[Epoch  13 ]:  Training accuracy within 10%:  50.945398417904684  %.
[Epoch  14 ]:  Training accuracy within 10%:  54.5774