# 1. Auto-Encoders
- A technique use to compress input data (Low dimentional Mapping)
- Some examples are, 
  - Map image to a vector
  - Text Paragraph to embedding vector 
  - Graph to a embedding vector
- This done by reconstructing the input data using Encoder - Decoder architecture

# 2. Graph Auto-Encoders
- Graph Auto-Encoders are used to compress graph data into a low dimensional vector
- This is done by reconstructing the input graph using Encoder - Decoder architecture
- The Encoder and Decoder are GCNs (default decoder is dot product)

### Encoder
            X_ = GCN(X, A) = RELU(A~.X.Wo)
            Where A~ = D^-1/2.A.D^-1/2 (Normalized Adjacency Matrix)

### EMBEDDING Latent Space
            Z = X_

### Decoder
            Here we Reconstruct the Adjacency Matrix
            
            1. Default Decoder is Dot Product
                    Adj(Na, Nb) = Sigmoid(Za.Zb.T)
                    Where Na, Nb are nodes in graph &
                          Za, Zb are embedding vectors of Na, Nb

            2. Custom Decoder
                    Adj(Na, Nb) = Sigmoid(Za.Wd.Zb.T)
                    Where Wd is a weight matrix

In [1]:
import os
import torch
import torch
import torch_geometric.transforms as T
from torch_geometric.datasets import Planetoid
from torch_geometric.nn import GCNConv,  GAE, VGAE
from torch_geometric.utils import train_test_split_edges

os.environ['TORCH'] = torch.__version__
print(torch.__version__)

1.10.2


In [2]:
root = 'data/bench-mark/'
dataset = 'CiteSeer'

In [3]:
data = Planetoid(
                root, 
                dataset, 
                transform=T.NormalizeFeatures()
                )

##### Analyze the CiteSeer Dataset

In [4]:
print("Number of Graphs: ", len(data))
print("Number of Features: ", data.num_features)
print("Number of Classes: ", data.num_classes)
print("Number of Nodes: ", data.data.num_nodes)
print("Number of Edges: ", data.data.num_edges)
print("Average Degree: {:.2f}".format(data.data.num_edges / data.data.num_nodes))

Number of Graphs:  1
Number of Features:  3703
Number of Classes:  6
Number of Nodes:  3327
Number of Edges:  9104
Average Degree: 2.74


In [5]:
# Since dataset contains only one graph, we can access it directly
data = data[0]
data

Data(x=[3327, 3703], edge_index=[2, 9104], y=[3327], train_mask=[3327], val_mask=[3327], test_mask=[3327])

In [6]:
# reset masks

'''
reason for reseting the masks is that we can use train_test_split_edges function to split the edges into train and test set directly

'''

data.train_mask = data.val_mask = data.test_mask = None
data

Data(x=[3327, 3703], edge_index=[2, 9104], y=[3327])

In [7]:
data = train_test_split_edges(data)
data



Data(x=[3327, 3703], y=[3327], val_pos_edge_index=[2, 227], test_pos_edge_index=[2, 455], train_pos_edge_index=[2, 7740], train_neg_adj_mask=[3327, 3327], val_neg_edge_index=[2, 227], test_neg_edge_index=[2, 455])

A] Encode the graph using GAE

In [8]:
'''
The key difference between induction and transduction is that induction refers to learning a function that can be applied to any 
novel inputs, while transduction is only concerned with transferring some property onto a specific set of test inputs

graph classification - inductive
link prediction - transductive
'''

'''
PARAMETERS of GCNConv
    in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

    out_channels (int) – Size of each output sample.

    cached (bool, optional) – If set to True, the layer will cache the computation of on first execution, and will use the cached 
                              version for further executions. This parameter should only be set to True in transductive learning 
                              scenarios. (default: False)

    add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

    normalize (bool, optional) – Whether to add self-loops and compute symmetric normalization coefficients on the fly. (default: True)

'''
class VariationalGCNEncoder(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(VariationalGCNEncoder, self).__init__()
        self.conv1 = GCNConv(in_channels, 2 * out_channels, cached=True) # cached only for transductive learning
        self.conv_mu = GCNConv(2 * out_channels, out_channels, cached=True) # cached only for transductive learning
        self.conv_logstd = GCNConv(2 * out_channels, out_channels, cached=True) # cached only for transductive learning

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index).relu()
        return self.conv_mu(x, edge_index), self.conv_logstd(x, edge_index)

B] Define Auto-Encoder

In [9]:
# parameters
epochs = 1000
out_channels = 2
num_features = data.num_features

# model
model = VGAE(
            VariationalGCNEncoder(
                                num_features, # Size pf the Node Features
                                out_channels
                                )
            )

# move to GPU (if available)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

# inizialize the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

In [10]:
x = data.x.to(device)
train_pos_edge_index = data.train_pos_edge_index.to(device)

test_pos_edge_index = data.test_pos_edge_index.to(device)
test_neg_edge_index = data.test_neg_edge_index.to(device)

In [11]:
def train():
    model.train()
    optimizer.zero_grad()
    z = model.encode(x, train_pos_edge_index)
    loss = model.recon_loss(z, train_pos_edge_index)
    loss = loss + (1 / data.num_nodes) * model.kl_loss()  # new line
    loss.backward()
    optimizer.step()
    return float(loss)


def test():
    model.eval()
    with torch.no_grad():
        z = model.encode(x, train_pos_edge_index)
    return model.test(z, test_pos_edge_index, test_neg_edge_index)


for epoch in range(1, epochs + 1):
    loss = train()
    auc, ap = test()
    print('Epoch: {:03d}, AUC: {:.4f}, AP: {:.4f}'.format(epoch, auc, ap))

Epoch: 001, AUC: 0.6272, AP: 0.6676
Epoch: 002, AUC: 0.6341, AP: 0.6756
Epoch: 003, AUC: 0.6332, AP: 0.6751
Epoch: 004, AUC: 0.6317, AP: 0.6738
Epoch: 005, AUC: 0.6296, AP: 0.6715
Epoch: 006, AUC: 0.6272, AP: 0.6694
Epoch: 007, AUC: 0.6233, AP: 0.6657
Epoch: 008, AUC: 0.6169, AP: 0.6607
Epoch: 009, AUC: 0.6107, AP: 0.6556
Epoch: 010, AUC: 0.6039, AP: 0.6502
Epoch: 011, AUC: 0.5995, AP: 0.6470
Epoch: 012, AUC: 0.5969, AP: 0.6452
Epoch: 013, AUC: 0.5942, AP: 0.6433
Epoch: 014, AUC: 0.5969, AP: 0.6451
Epoch: 015, AUC: 0.6014, AP: 0.6479
Epoch: 016, AUC: 0.6068, AP: 0.6519
Epoch: 017, AUC: 0.6114, AP: 0.6558
Epoch: 018, AUC: 0.6129, AP: 0.6568
Epoch: 019, AUC: 0.6145, AP: 0.6581
Epoch: 020, AUC: 0.6145, AP: 0.6579
Epoch: 021, AUC: 0.6146, AP: 0.6580
Epoch: 022, AUC: 0.6158, AP: 0.6587
Epoch: 023, AUC: 0.6173, AP: 0.6597
Epoch: 024, AUC: 0.6200, AP: 0.6619
Epoch: 025, AUC: 0.6221, AP: 0.6635
Epoch: 026, AUC: 0.6233, AP: 0.6648
Epoch: 027, AUC: 0.6242, AP: 0.6656
Epoch: 028, AUC: 0.6252, AP: