# Graph Attention Networks Benchmark - Cora

This notebook aims to verify quality of the implementation of [GAT - Graph Attention Networks](https://arxiv.org/abs/1710.10903) paper.

## Objectives
- Test the PyTorch Implementation of [GAT](https://github.com/Diego999/pyGAT);
- Test the [Original Implementation of GAT](https://github.com/PetarV-/GAT), from one of the authors, Petar Veličković.
- Improve the usability of these implementations in order to provide a modular and simple code, favouring the adoption and replication of it in other experiments.

_Note: Both repositories have been forked to my personal Github profile, in order to mantain a version control over the changes done to each code_


## [PyTorch Implementation](https://github.com/Diego999/pyGAT)

In [None]:
!git clone https://github.com/joaopedromattos/pyGAT
!pip install --quiet spektral

Cloning into 'pyGAT'...
remote: Enumerating objects: 163, done.[K
remote: Total 163 (delta 0), reused 0 (delta 0), pack-reused 163[K
Receiving objects: 100% (163/163), 216.21 KiB | 3.54 MiB/s, done.
Resolving deltas: 100% (89/89), done.
[K     |████████████████████████████████| 102kB 3.7MB/s 
[?25h

Running our model with the default dataset (Cora)

In [None]:
!python3 pyGAT/train.py

2020-10-28 20:29:14.427138: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
[LOAD DATA]: cora
tensor([0, 6, 3,  ..., 2, 5, 0])
Epoch: 0001 loss_train: 2.3115 acc_train: 0.1700 loss_val: 1.8774 acc_val: 0.3192 time: 18.9700s
Epoch: 0002 loss_train: 2.0392 acc_train: 0.2420 loss_val: 1.7392 acc_val: 0.4041 time: 15.4444s
Epoch: 0003 loss_train: 1.8679 acc_train: 0.3282 loss_val: 1.6107 acc_val: 0.4613 time: 15.3702s
Epoch: 0004 loss_train: 1.7721 acc_train: 0.3688 loss_val: 1.4859 acc_val: 0.5240 time: 15.3919s
Traceback (most recent call last):
  File "pyGAT/train.py", line 202, in <module>
    gat.train_pipeline()
  File "pyGAT/train.py", line 134, in train_pipeline
    loss_values.append(train(epoch))
  File "pyGAT/train.py", line 104, in train
    output = model(features, adj)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwar

### Testing field

This subsection was used to understand how the function ```original_load_data```, from file ```pyGAT/utils.py``` works. So unless you're also interested in understanding each detail from that function, you can skip this section.

Here we have the function we want to understand. This function is the original DataLoader from GAT, thus we'll try to reuse it to run other benchmarks. 

Note: ```normalize_features```, ```normalize_adj``` and ```encode_onehot``` are just auxiliary functions and its name might be just enough to understand what each one does :)



In [None]:
import numpy as np
import scipy.sparse as sp
import torch
import spektral as spk
import pandas as pd
import networkx as nx
import os

def normalize_adj(mx):
    """Row-normalize sparse matrix"""
    rowsum = np.array(mx.sum(1))
    r_inv_sqrt = np.power(rowsum, -0.5).flatten()
    r_inv_sqrt[np.isinf(r_inv_sqrt)] = 0.
    r_mat_inv_sqrt = sp.diags(r_inv_sqrt)
    return mx.dot(r_mat_inv_sqrt).transpose().dot(r_mat_inv_sqrt)

def encode_onehot(labels):
    classes = set(labels)
    classes_dict = {c: np.identity(len(classes))[
        i, :] for i, c in enumerate(classes)}
    labels_onehot = np.array(
        list(map(classes_dict.get, labels)), dtype=np.int32)
    return labels_onehot

def cora_networkx(path=None):
    if (path == None):
        raise ValueError("Dataset path shouldn't be of type 'None'.")
    else:
        # Reading our graph, according to documentation
        edgelist = pd.read_csv(os.path.join(
            path, "cora.cites"), sep='\t', header=None, names=["target", "source"])
        edgelist["label"] = "cites"

        # Transforming it into a
        Gnx = nx.from_pandas_edgelist(edgelist, edge_attr="label")
        
        adj = nx.to_scipy_sparse_matrix(Gnx)

        # Sparse feature matrix
        feature_names = ["w_{}".format(ii) for ii in range(1433)]
        column_names = feature_names + ["subject"]
        node_data = pd.read_csv(os.path.join(
            path, "cora.content"), sep='\t', header=None, names=column_names)
        node_data.to_numpy()[:, :-1]
        features = sp.csr_matrix(node_data.to_numpy()[
                                 :, :-1], dtype=np.float32)

        # Train / val / test spliting...
        num_nodes = features.shape[0]
        idxs = np.arange(0, num_nodes)
        idx_train, idx_val, idx_test = np.split(
            idxs, [int(.6*num_nodes), int(.8*num_nodes)])

        labels = encode_onehot(node_data.to_numpy()[:, -1])

        return adj, features, labels, idx_train, idx_val, idx_test

adj, features, labels, idx_train, idx_val, idx_test = cora_networkx("./pyGAT/data/cora/")

In [None]:
print(adj)

  (0, 1)	1
  (0, 99)	1
  (0, 324)	1
  (0, 330)	1
  (0, 1736)	1
  (1, 0)	1
  (1, 2)	1
  (1, 3)	1
  (1, 4)	1
  (1, 5)	1
  (1, 6)	1
  (1, 7)	1
  (1, 8)	1
  (1, 9)	1
  (1, 10)	1
  (1, 11)	1
  (1, 12)	1
  (1, 13)	1
  (1, 14)	1
  (1, 15)	1
  (1, 16)	1
  (1, 17)	1
  (1, 18)	1
  (1, 19)	1
  (1, 20)	1
  :	:
  (2693, 1101)	1
  (2694, 2695)	1
  (2695, 2694)	1
  (2696, 2697)	1
  (2697, 2696)	1
  (2698, 2365)	1
  (2699, 2700)	1
  (2700, 2699)	1
  (2701, 2702)	1
  (2702, 2701)	1
  (2703, 2396)	1
  (2704, 1493)	1
  (2704, 1502)	1
  (2704, 2705)	1
  (2704, 2706)	1
  (2705, 1502)	1
  (2705, 2704)	1
  (2705, 2706)	1
  (2705, 2707)	1
  (2706, 1493)	1
  (2706, 1502)	1
  (2706, 2704)	1
  (2706, 2705)	1
  (2707, 729)	1
  (2707, 2705)	1


In [None]:
import scipy.sparse as sp
import numpy as np
import torch

def normalize_features(mx):
    """Row-normalize sparse matrix"""
    rowsum = np.array(mx.sum(1))
    r_inv = np.power(rowsum, -1).flatten()
    r_inv[np.isinf(r_inv)] = 0.
    r_mat_inv = sp.diags(r_inv)
    mx = r_mat_inv.dot(mx)
    return mx

def normalize_adj(mx):
    """Row-normalize sparse matrix"""
    rowsum = np.array(mx.sum(1))
    r_inv_sqrt = np.power(rowsum, -0.5).flatten()
    r_inv_sqrt[np.isinf(r_inv_sqrt)] = 0.
    r_mat_inv_sqrt = sp.diags(r_inv_sqrt)
    return mx.dot(r_mat_inv_sqrt).transpose().dot(r_mat_inv_sqrt)


def encode_onehot(labels):
    classes = set(labels)
    classes_dict = {c: np.identity(len(classes))[i, :] for i, c in enumerate(classes)}
    labels_onehot = np.array(list(map(classes_dict.get, labels)), dtype=np.int32)
    return labels_onehot


path="./pyGAT/data/cora/"
dataset="cora"
"""Load citation network dataset (cora only for now)"""
print('Loading {} dataset...'.format(dataset))

# Reading from NumPy file...
idx_features_labels = np.genfromtxt("{}{}.content".format(path, dataset), dtype=np.dtype(str))

# 
features = sp.csr_matrix(idx_features_labels[:, 1:-1], dtype=np.float32)
labels = encode_onehot(idx_features_labels[:, -1])

# build graph
idx = np.array(idx_features_labels[:, 0], dtype=np.int32)
idx_map = {j: i for i, j in enumerate(idx)}
edges_unordered = np.genfromtxt("{}{}.cites".format(path, dataset), dtype=np.int32)
edges = np.array(list(map(idx_map.get, edges_unordered.flatten())), dtype=np.int32).reshape(edges_unordered.shape)
adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])), shape=(labels.shape[0], labels.shape[0]), dtype=np.float32)

# build symmetric adjacency matrix
adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)
print(adj)
features = normalize_features(features)


adj = normalize_adj(adj + sp.eye(adj.shape[0]))

idx_train = range(140)
idx_val = range(200, 500)
idx_test = range(500, 1500)

adj = torch.FloatTensor(np.array(adj.todense()))
features = torch.FloatTensor(np.array(features.todense()))
labels = torch.LongTensor(np.where(labels)[1])

idx_train = torch.LongTensor(idx_train)
idx_val = torch.LongTensor(idx_val)
idx_test = torch.LongTensor(idx_test)

Loading cora dataset...
  (0, 8)	1.0
  (0, 14)	1.0
  (0, 258)	1.0
  (0, 435)	1.0
  (0, 544)	1.0
  (1, 344)	1.0
  (2, 410)	1.0
  (2, 471)	1.0
  (2, 552)	1.0
  (2, 565)	1.0
  (3, 197)	1.0
  (3, 463)	1.0
  (3, 601)	1.0
  (4, 170)	1.0
  (5, 490)	1.0
  (5, 2164)	1.0
  (6, 251)	1.0
  (6, 490)	1.0
  (7, 258)	1.0
  (8, 0)	1.0
  (8, 14)	1.0
  (8, 258)	1.0
  (8, 435)	1.0
  (8, 751)	1.0
  (9, 308)	1.0
  :	:
  (2698, 2697)	1.0
  (2698, 2700)	1.0
  (2699, 2153)	1.0
  (2700, 2697)	1.0
  (2700, 2698)	1.0
  (2701, 2247)	1.0
  (2701, 2263)	1.0
  (2702, 881)	1.0
  (2702, 2624)	1.0
  (2703, 1221)	1.0
  (2703, 1409)	1.0
  (2703, 2200)	1.0
  (2704, 209)	1.0
  (2704, 2407)	1.0
  (2705, 1784)	1.0
  (2705, 1839)	1.0
  (2705, 1840)	1.0
  (2705, 2216)	1.0
  (2706, 1046)	1.0
  (2706, 1138)	1.0
  (2706, 1640)	1.0
  (2706, 1752)	1.0
  (2707, 774)	1.0
  (2707, 1389)	1.0
  (2707, 2344)	1.0


Exhibiting each value returned by the function ```load_data()```:

In [None]:
features

<2708x1433 sparse matrix of type '<class 'numpy.float32'>'
	with 49216 stored elements in Compressed Sparse Row format>

In [None]:
edges_unordered.flatten()

array([     35,    1033,      35, ...,  853118,  954315, 1155073],
      dtype=int32)

In [None]:
adj

tensor([[0.1667, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.5000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.2000,  ..., 0.0000, 0.0000, 0.0000],
        ...,
        [0.0000, 0.0000, 0.0000,  ..., 0.2000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.2000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.2500]])

In [None]:
labels

tensor([3, 5, 6,  ..., 1, 0, 3])

In [None]:
idx_train.shape

torch.Size([140])

In [None]:
idx_val.shape

torch.Size([300])

In [None]:
idx_test.shape

torch.Size([1000])

## [Original implementation of GAT](https://github.com/PetarV-/GAT)

This section aims to verify if the PyTorch implementation of GAT is equivalent in accuracy and performance to the original implementation, developed in Tensowflow. One can see, by the results, that the PyTorch implementation is equivalent to the Tensorflow version.


In [None]:
!git clone https://github.com/joaopedromattos/GAT

fatal: destination path 'GAT' already exists and is not an empty directory.


This code was originally written in Tensorflow 1.6, so we'll downgrade.

In [None]:
!pip install tensorflow-gpu==1.6.0
!pip install tensorflow==1.6.0
import tensorflow as tf
print(tf.__version__)

Collecting tensorflow-gpu==1.6.0
[?25l  Downloading https://files.pythonhosted.org/packages/b1/8c/35ba6f94dd9729517b899c3ba764e604ffe22daeba04f7c771dd452ba55b/tensorflow_gpu-1.6.0-cp36-cp36m-manylinux1_x86_64.whl (209.2MB)
[K     |████████████████████████████████| 209.2MB 72kB/s 
[?25hCollecting tensorboard<1.7.0,>=1.6.0
[?25l  Downloading https://files.pythonhosted.org/packages/b0/67/a8c91665987d359211dcdca5c8b2a7c1e0876eb0702a4383c1e4ff76228d/tensorboard-1.6.0-py3-none-any.whl (3.0MB)
[K     |████████████████████████████████| 3.1MB 36.6MB/s 
Collecting bleach==1.5.0
  Downloading https://files.pythonhosted.org/packages/33/70/86c5fec937ea4964184d4d6c4f0b9551564f821e1c3575907639036d9b90/bleach-1.5.0-py2.py3-none-any.whl
Collecting html5lib==0.9999999
[?25l  Downloading https://files.pythonhosted.org/packages/ae/ae/bcb60402c60932b32dfaf19bb53870b29eda2cd17551ba5639219fb5ebf9/html5lib-0.9999999.tar.gz (889kB)
[K     |████████████████████████████████| 890kB 41.5MB/s 
Building wheel

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


1.6.0


In [None]:
run GAT/execute_cora.py

Dataset: cora
----- Opt. hyperparams -----
lr: 0.005
l2_coef: 0.0005
----- Archi. hyperparams -----
nb. layers: 1
nb. units per layer: [8]
nb. attention heads: [8, 1]
residual: False
nonlinearity: <function elu at 0x7fbb2d4cf840>
model: <class 'models.gat.GAT'>
(2708, 2708)
(2708, 1433)
Instructions for updating:
`NHWC` for data_format is deprecated, use `NWC` instead
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

Training: loss = 1.94072, acc = 0.16429 | Val: loss = 1.94814, acc = 0.15200
Training: loss = 1.95123, acc = 0.17857 | Val: loss = 1.94673, acc = 0.19000
Training: loss = 1.94346, acc = 0.18571 | Val: loss = 1.94322, acc = 0.23600
Training: loss = 1.93882, acc = 0.21429 | Val: loss = 1.93923, acc = 0.31800
Training: loss = 1.93936, acc = 0.12857 | Val: loss = 1.93453, acc = 0.34800
Training: loss = 1.93183, acc = 0.22857 | Val: loss = 1

## Test driving: [Spektral](https://graphneural.network/) - A very cool GNN library recognized by names such as François Chollet.

We'll probably use this library to run our experiments in the future :)



In [None]:
!pip install --quiet spektral



In [None]:
import spektral as spk



adj, features, labels, train, val, test = spk.datasets.citation.load_data(dataset_name='cora', normalize_features=True, random_split=True)

# Converting one-hot encoding into categorical 
# values with the indexes of each dataset partition
idx_train, idx_val, idx_test = np.where(train)[0], np.where(val)[0], np.where(test)[0]


adj = torch.FloatTensor(adj.todense())
features = torch.FloatTensor(features.todense())
labels_a = torch.LongTensor(np.where(labels)[1])
idx_train = torch.LongTensor(idx_train)
idx_val = torch.LongTensor(idx_val)
idx_test = torch.LongTensor(idx_test)




Loading citeseer dataset
Pre-processing node features


  r_inv = np.power(rowsum, -1).flatten()
