Cora and Facebook Page-Page Dataset

The Cora dataset consists of 2708 scientific publications classified into one of seven classes and this network contains around 5429 links. Each node or a a publication is represented by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary

Facebook Page-Page is a page-page graph of verified Facebook sites. Nodes represent official Facebook pages while the links are mutual likes between sites. Node features are extracted from the site descriptions that the page owners created to summarize the purpose of the site

Inorder to visualize the network we use the torch geometric library

In [2]:
from torch_geometric.datasets import Planetoid

In [39]:
dataset = Planetoid(root=".", name="Cora")

Let us explore the dataset !!!

In [40]:
dataset[0]

Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])

We can see that there 1433 features, 2708 graphs in nodes

In [5]:
dataset.num_classes

7

There are seven classes for the nodes. Now let us look into Facebook page-page dataset

In [6]:
from torch_geometric.datasets import FacebookPagePage
facebook_dataset = FacebookPagePage('./')

In [7]:
facebook_dataset[0]

Data(x=[22470, 128], edge_index=[2, 342004], y=[22470])

One thing to note is that facebook page page dataset does not have train_mask when compared to the Cora dataset, so we have to create the mask randomly

In [8]:
facebook_dataset[0].train_mask = range(18000)
facebook_dataset[0].val_mask = range(18001, 20000)
facebook_dataset[0].test_mask = range(20001, 22470)

In [9]:
print(facebook_dataset[0].is_directed())
print(facebook_dataset[0].has_self_loops())
print(facebook_dataset[0].has_isolated_nodes())
print(facebook_dataset[0].is_coalesced())

False
True
False
False


In [10]:
print(dataset[0].is_directed())
print(dataset[0].has_self_loops())
print(dataset[0].has_isolated_nodes())
print(dataset[0].is_coalesced())

False
False
False
True


So we can see the Facebook page-page graph contains self-loop while the Cora dataset edge indices are sorted and does not have duplicate entries.

One way to do node classification will be to use the node features directly for classification without taking into consideration the topology of the network. Other to use Graph Neural netowrk for node classification. 

In [15]:
X = dataset[0].x.numpy()
y=  dataset[0].y.numpy()

In [16]:
X

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

In [17]:
y

array([3, 4, 4, ..., 3, 3, 3])

In [18]:
x_train = X[dataset[0].train_mask]
y_train = y[dataset[0].train_mask]
x_val = X[dataset[0].val_mask]
y_val = y[dataset[0].val_mask]
x_test = X[dataset[0].test_mask]
y_test = y[dataset[0].test_mask]

In [11]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [26]:
class MLP(nn.Module):
    def __init__(self,inp,hidden,out):
        super().__init__()
        self.linear1 = nn.Linear(inp,hidden)
        self.linear2 = nn.Linear(hidden,out)
    def forward(self,inp):
        x = self.linear1(inp)
        x = self.linear2(x)
        return F.log_softmax(x,dim=1)

In [12]:
def accuracy(y_pred,y_true):
    return torch.sum(y_pred == y_true) / len(y_true)

In [27]:
model = MLP(dataset.num_features, 16, dataset.num_classes)

In [37]:
n_epochs=30
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
for i in range(n_epochs):
    optimizer.zero_grad()
    out = model(dataset[0].x)
    loss = criterion(out[dataset[0].train_mask],dataset[0].y[dataset[0].train_mask])
    acc = accuracy(out[dataset[0].train_mask].argmax(dim=1),dataset[0].y[dataset[0].train_mask])
    loss.backward()
    optimizer.step()
    if i % 10 == 0:
            val_loss = criterion(out[dataset[0].val_mask], dataset[0].y[dataset[0].val_mask])
            val_acc = accuracy(out[dataset[0].val_mask].argmax(dim=1), dataset[0].y[dataset[0].val_mask])
            print(f'Epoch {i:>3} | Train Loss: {loss:.3f} | Train Acc: {acc*100:>5.2f}% | Val Loss: {val_loss:.2f} | Val Acc: {val_acc*100:.2f}%')

Epoch   0 | Train Loss: 0.216 | Train Acc: 100.00% | Val Loss: 1.47 | Val Acc: 52.20%
Epoch  10 | Train Loss: 0.025 | Train Acc: 100.00% | Val Loss: 1.47 | Val Acc: 50.80%
Epoch  20 | Train Loss: 0.009 | Train Acc: 100.00% | Val Loss: 1.50 | Val Acc: 50.60%


In [38]:
out = model(dataset[0].x)
test_acc = accuracy(out[dataset[0].test_mask].argmax(dim=1), dataset[0].y[dataset[0].test_mask])

In [41]:
print("Test accuracy: ",test_acc)

Test accuracy:  tensor(0.5080)


We are getting around 50% accuracy. Which does not look that good. Now let us try the same in Facebook page page dataset

In [30]:
new_data= facebook_dataset
# model = MLP(new_data.num_features, 16, new_data.num_classes)

In [31]:
train_mask = range(18000)
val_mask = range(18001, 20000)
test_mask = range(20001, 22470)

In [50]:
n_epochs=30
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
for i in range(n_epochs):
    optimizer.zero_grad()
    out = model(new_data[0].x)
    loss = criterion(out[train_mask],new_data[0].y[train_mask])
    acc = accuracy(out[train_mask].argmax(dim=1),new_data[0].y[train_mask])
    loss.backward()
    optimizer.step()
    if i % 10 == 0:
            val_loss = criterion(out[val_mask], new_data[0].y[val_mask])
            val_acc = accuracy(out[val_mask].argmax(dim=1), new_data[0].y[val_mask])
            print(f'Epoch {i:>3} | Train Loss: {loss:.3f} | Train Acc: {acc*100:>5.2f}% | Val Loss: {val_loss:.2f} | Val Acc: {val_acc*100:.2f}%')

Epoch   0 | Train Loss: 1.486 | Train Acc: 25.49% | Val Loss: 1.50 | Val Acc: 24.71%
Epoch  10 | Train Loss: 0.733 | Train Acc: 71.53% | Val Loss: 0.74 | Val Acc: 70.44%
Epoch  20 | Train Loss: 0.639 | Train Acc: 74.89% | Val Loss: 0.66 | Val Acc: 73.69%


This looks much better. Now let us see how including the topology of the graph improves the accuracy of the model.

Now let us build the custom GNN layer. $H=\tilde{A}^T X W^T$, here $\tilde{A} = A + I$, ensures that the information from the neigbourhood is aggregated while finding the representation. The I is added to ensure that the central node is also included.

In [13]:
class GNNlayer(nn.Module):
    def __init__(self,inp,out):
        super().__init__()
        self.linear1 = nn.Linear(inp,out)
    def forward(self,x,adj):
        x = self.linear1(x)
        return torch.sparse.mm(adj,x)

In [14]:
from torch_geometric.utils import to_dense_adj

In [24]:
adj = to_dense_adj(dataset[0].edge_index)[0]
adj += torch.eye(len(adj))

In [25]:
adj

tensor([[1., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 1.,  ..., 0., 0., 0.],
        [0., 1., 1.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 1., 1.],
        [0., 0., 0.,  ..., 0., 1., 1.]])

In [26]:
class mymodel(nn.Module):
    def __init__(self,inp,hidden,out):
        super().__init__()
        self.layer1 = GNNlayer(inp,hidden)
        self.layer2 = GNNlayer(hidden,out)
    def forward(self,x,adj):
        x = self.layer1(x,adj)
        x = self.layer2(x,adj)
        return F.log_softmax(x,dim=1)

In [32]:
model = mymodel(new_data.num_features, 16, new_data.num_classes)

In [34]:
adj = to_dense_adj(new_data[0].edge_index)[0]
adj += torch.eye(len(adj))

In [35]:
n_epochs=30
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
for i in range(n_epochs):
    optimizer.zero_grad()
    out = model(new_data[0].x,adj)
    loss = criterion(out[train_mask],new_data[0].y[train_mask])
    acc = accuracy(out[train_mask].argmax(dim=1),new_data[0].y[train_mask])
    loss.backward()
    optimizer.step()
    if i % 10 == 0:
            val_loss = criterion(out[val_mask], new_data[0].y[val_mask])
            val_acc = accuracy(out[val_mask].argmax(dim=1), new_data[0].y[val_mask])
            print(f'Epoch {i:>3} | Train Loss: {loss:.3f} | Train Acc: {acc*100:>5.2f}% | Val Loss: {val_loss:.2f} | Val Acc: {val_acc*100:.2f}%')

Epoch   0 | Train Loss: 45.475 | Train Acc: 40.23% | Val Loss: 41.26 | Val Acc: 40.27%
Epoch  10 | Train Loss: 16.360 | Train Acc: 74.03% | Val Loss: 11.23 | Val Acc: 74.84%
Epoch  20 | Train Loss: 5.211 | Train Acc: 81.28% | Val Loss: 3.66 | Val Acc: 82.94%


In [37]:
out = model(new_data[0].x,adj)
test_acc = accuracy(out[test_mask].argmax(dim=1), new_data[0].y[test_mask])
print("Test accuracy: ",test_acc)

Test accuracy:  tensor(0.8360)


😮. The accuracy has increased from 73% to 83% with neighbourhood aggregation using Adjacency matrix. This is very interesting. Now let us try it with cora dataset.

In [41]:
model = mymodel(dataset.num_features, 16, dataset.num_classes)

In [42]:
adj = to_dense_adj(dataset[0].edge_index)[0]
adj += torch.eye(len(adj))

In [45]:
n_epochs=30
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
for i in range(n_epochs):
    optimizer.zero_grad()
    out = model(dataset[0].x,adj)
    loss = criterion(out[dataset[0].train_mask],dataset[0].y[dataset[0].train_mask])
    acc = accuracy(out[dataset[0].train_mask].argmax(dim=1),dataset[0].y[dataset[0].train_mask])
    loss.backward()
    optimizer.step()
    if i % 10 == 0:
            val_loss = criterion(out[dataset[0].val_mask], dataset[0].y[dataset[0].val_mask])
            val_acc = accuracy(out[dataset[0].val_mask].argmax(dim=1), dataset[0].y[dataset[0].val_mask])
            print(f'Epoch {i:>3} | Train Loss: {loss:.3f} | Train Acc: {acc*100:>5.2f}% | Val Loss: {val_loss:.2f} | Val Acc: {val_acc*100:.2f}%')

Epoch   0 | Train Loss: 2.874 | Train Acc: 47.86% | Val Loss: 2.59 | Val Acc: 41.20%
Epoch  10 | Train Loss: 0.892 | Train Acc: 87.14% | Val Loss: 2.62 | Val Acc: 65.60%
Epoch  20 | Train Loss: 0.092 | Train Acc: 97.86% | Val Loss: 1.91 | Val Acc: 74.00%


There is an improvement in the accuracy of the CORA dataset too. This is very interesting. Thus including the node features and graph topology information together will help in improving the accuracy of the model. Thats all for today. Bye!!!