<a href="https://colab.research.google.com/github/iamksseo/colab/blob/master/ai504_13_gnn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Week 14: GNN**

Today class consists of three things. <br>

## **1.  We will Make Graph Convolution Equation.**
1-1. We will make graph by using networx libary. <br> 
1-2. by using Adjacency Matrix, Node index and Node embedding vector from graph, We will follow the aggregation and combination step in Graph Convolution Equation. <br>
1-3. Finally We will make GCN layer <br>

## **2.  We will make node classification in Cora dataset.**
2-1. Cora dataset Information <br>
2-2. Implement GCN model with Cora dataset <br>
2-3. Visualize node embedding

## **3.  (DIY) Run the Graph classification on the Collab Dataset**
3-1. I will introduce some brief information about the code and pytorch geometric.  <br><br>

If you have any questions, feel free to ask

*   E-Mail Address : pacesun@kaist.ac.kr / seongjunyang@kaist.ac.kr
* Code made by Seongjun Yang at KAIST GSAI Edlab

## **Prelims** 

In [None]:
!pip install ipdb

In [None]:
!pip install networkx

In [None]:
import ipdb
import torch
import networkx as nx
import numpy as np
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
from scipy.linalg import fractional_matrix_power

import warnings
warnings.filterwarnings("ignore", category=UserWarning)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

### **1. Make Graph Convolution Equation**

![picture](https://drive.google.com/uc?id=1BqgZ_3xUQ7ScvoPVHUZbx5o4xBGSlLVS)

#### **1) Initialize the Graph G**
By using networkx library, you can do research in graph or network easily. <br>
So, in the Graph Convolution Equation, I'll use networkx libary. 

In [None]:
#1. Initialize the graph
G = nx.Graph(name='G')

In [None]:
#2. Create nodes
#In this class, we will make graph that consist of 6 nodes.
#Each node is assigned node feature which corresponds to the node name
for i in range(1,7):
    G.add_node(i, name=i)

In [None]:
#Define the edges and the edges to the graph
edges = [(1,2), (1,3), (2,4), (2,5), (3,4), (3,6) ]
G.add_edges_from(edges)

In [None]:
#See graph info
print('Graph Info:\n', nx.info(G))

In [None]:
#Inspect the node features
print('\nGraph Nodes: ', G.nodes.data())

In [None]:
#Plot the graph
nx.draw(G, with_labels=True, font_weight='bold')
plt.show()

#### Inserting Adjacency Matrix to forward pass 

In [None]:
nx.attr_matrix(G, node_attr='name')

In [None]:
#Get the Adjacency Matrix (A) and Node Features Matrix (X) as numpy array
A = np.array(nx.attr_matrix(G, node_attr='name')[0]) # Converting for getting numpy Adjacency Matrix (A)
X = np.array(nx.attr_matrix(G, node_attr='name')[1]) # Converting for getting numpy Node Features Matrix (X)
X = np.expand_dims(X,axis=1) # Make [6, 1] numpy Node Features Matrix (X)

In [None]:
print('Shape of A: ', A.shape) # [6, 6] matrix

In [None]:
print('\nShape of X: ', X.shape) # [6, 1] matrix

In [None]:
print('\nAdjacency Matrix (A):\n', A)

In [None]:
print('\nNode Features Matrix (X):\n', X)

In [None]:
#Dot product Adjacency Matrix (A) and Node Features (X)
AX = np.dot(A,X) # AX
print("Dot product of A and X (AX):\n", AX)

##### **Question : Is this the node representations H?**
##### No, A*X is just neighbor aggregation.
##### **We need the combination step!**

#### **Add Self-Loops and Normalize Adjacency Matrix (A)**
**A' = A + I**

In [None]:
#Add Self Loops
G_self_loops = G.copy() # A'

self_loops = []
for i in range(1, 1+ G.number_of_nodes()):
    self_loops.append((i,i))

G_self_loops.add_edges_from(self_loops) # A' = A + I


In [None]:
#Check the edges of G_self_loops after adding the self loops
print('Edges of G with self-loops:\n', G_self_loops.edges)

In [None]:
#Get the Adjacency Matrix (A) and Node Features Matrix (X) of added self-lopps graph
A_hat = np.array(nx.attr_matrix(G_self_loops, node_attr='name')[0]) # A' numpy Matrix
print('Adjacency Matrix of added self-loops G (A_hat):\n', A_hat)

##### **A' * X Matrix**

In [None]:
#Calculate the dot product of A_hat and X (AX)
A_hatX = np.dot(A_hat, X)
print('A_hatX:\n', A_hatX)

##### But, there is another problem.
##### Scales of node features differ by the number of neighbors.
##### Solution : Normalization by inverse degree matrix.
##### **D_inverse * A'**

In [None]:
#Get the Degree Matrix of the added self-loops graph
edge_List = G_self_loops.edges() 
Deg_Mat = [[i, 0] for i in G_self_loops.nodes()]

for element in edge_List:
  if element[0] != element[1]:
    Deg_Mat[element[0] - 1][1] += 1
    Deg_Mat[element[1] - 1][1] += 1
  else :
    Deg_Mat[element[0] - 1][1] += 1

print(Deg_Mat)

In [None]:
#Convert the Degree Matrix to a N x N matrix where N is the number of nodes
D = np.diag([deg for [n,deg] in Deg_Mat]) # Get degree matrix

print('Degree Matrix of added self-loops G as numpy array (D):\n', D)

##### **D_inverse**

In [None]:
#Get the inverse of Degree Matrix (D)
D_inv = np.linalg.inv(D)
print('Inverse of D:\n', D_inv)

##### **D_inverse*A'**

In [None]:
D_invA = np.dot(D_inv, A_hat)
print(D_invA)

##### **D_invA'X**

In [None]:
#Dot product of D and AX for normalization
DAX = np.dot(D_invA,X)

print('DAXW:\n', DAX)

#### **Add Weights and Activation Function**

In [None]:
#Initialize the weights
np.random.seed(12345)
n_h = 4 #number of neurons in the hidden layer
n_y = 2 #number of neurons in the output layer
W0 = np.random.randn(X.shape[1],n_h) * 0.01
W1 = np.random.randn(n_h,n_y) * 0.01

print('W0 weight:\n', W0)
print('W1 weight:\n', W1)

#### **GCN Layer**
##### **TODO : fill ????? with proper code and run**

In [None]:
#Implement ReLu as activation function,
#Originally, non-linear activation needed, but when I searched some material, relu is used for activate function generally.
def relu(x):
    return np.maximum(0,x)

#Build GCN layer
#In this function, we implement numpy to simplify
def gcn(A,H,W):
    #ipdb.set_trace()
    # Make a GCN Layer using the Graph Convolution Equation process so far.
    # You can use np.diag, np.sum, np.linalg.inv, np.dot
    I =????       # create Identity Matrix of A
    A_hat = ???   # add self-loop to A
    D = np.diag(np.sum(A_hat, axis=0))  # create Degree Matrix of A
    D_inv = np.linalg.inv(D)
    D_invA = ???
    DAXW = ???
    return relu(DAXW)

In [None]:
#Do forward propagation
H1 = gcn(A,X,W0) 
H2 = gcn(A,H1,W1) 
print('Node Embedding from GCN output:\n', H2)

#### **Plotting Node Embedding**

In [None]:
def visualize(h, color):
    plt.figure(figsize=(8, 8))
    plt.xlim([np.min(h[:,0])*0.9, np.max(h[:,0])*1.1])
    plt.xlabel('Dimension 0')
    plt.ylabel('Dimension 1')

    plt.scatter(h[:, 0], h[:, 1], s=140, c=color, cmap="Set2")
    plt.show()

visualize(H2, color=range(6)) # node3 and node 5 have same embedding, So Two nodes overlap on the screen.

## **2. Node classification on Cora Dataset**

### **Prelims**

In [None]:
import math
import numpy as np
import scipy.sparse as sp
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.parameter import Parameter
from torch.nn.modules.module import Module
import torch.optim as optim
import time

### **Cora Dataset**
Dataset link : https://relational.fit.cvut.cz/dataset/CORA <br>
The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words. <br>

In [None]:
!wget https://www.dropbox.com/s/fl9mvrio3hah4on/cora.content
!wget https://www.dropbox.com/s/l829sldp7xqrt0h/cora.cites

In [None]:
import pandas as pd
import os

edgelist = pd.read_csv(os.path.join("/content/", "cora.cites"), sep='\t', header=None, names=["target", "source"]) # it has graph
edgelist["label"] = "cites"
edgelist.sample(frac=1).head(5) # <ID of cited paper node> <ID of citing paper node>, by doing this, you can see the edge information

In [None]:
Gnx = nx.from_pandas_edgelist(edgelist, edge_attr="label")
nx.set_node_attributes(Gnx, "paper", "label")

print(Gnx.nodes) # from edgelist, by using from_pandas_edgelist() function, we can extract node list from edgelist
Gnx.nodes[12210] ## by type this, we can see the node feature

In [None]:
feature_names = ["word_{}".format(ii) for ii in range(1433)]
column_names =  feature_names + ["subject"]
node_data = pd.read_csv(os.path.join("/content/", "cora.content"), sep='\t', header=None, names=column_names)
node_data.head(5) # <paper node id> <word_attributes>+ <node label>

In [None]:
set(node_data["subject"]) # node class type

In the class, we will predict the subject of a paper (node) on the basis of the surrounding node data and the structure of the graph.

### **Hyperparameter**

In [None]:
EPOCH = 200
SEED = 42
NUM_HIDDEN = 16
dropout_rate = 0.5
learning_rate = 0.01
weight_decay = 5e-4

### **Preprocess and Make Dataset**

In [None]:
def encode_onehot(labels): # we will make all class(subject) to one-hot vector for training.
    classes = set(labels) # {'Case_Based', 'Genetic_Algorithms', 'Neural_Networks', 'Probabilistic_Methods', 'Reinforcement_Learning', 'Rule_Learning', 'Theory'}
    classes_dict = {c: np.identity(len(classes))[i, :] for i, c in enumerate(classes)}
    labels_onehot = np.array(list(map(classes_dict.get, labels)), dtype=np.int32)
    return labels_onehot

In [None]:
def normalize(mx): # This part is similar to the normalization process implemented earlier.
    #ipdb.set_trace()
    rowsum = np.array(mx.sum(1))
    r_inv = np.power(rowsum, -1).flatten()
    r_inv[np.isinf(r_inv)] = 0.
    r_mat_inv = sp.diags(r_inv)
    mx = r_mat_inv.dot(mx)
    return mx

In [None]:
def sparse_mx_to_torch_sparse_tensor(sparse_mx): # Convert a scipy sparse matrix to a torch sparse tensor.
    sparse_mx = sparse_mx.tocoo().astype(np.float32)
    indices = torch.from_numpy(np.vstack((sparse_mx.row, sparse_mx.col)).astype(np.int64))
    values = torch.from_numpy(sparse_mx.data)
    shape = torch.Size(sparse_mx.shape)
    return torch.sparse.FloatTensor(indices, values, shape)

In [None]:
def load_data(path="/content/", dataset="cora"):
    # In the function, by using above 3 function, 
    print('Loading {} dataset...'.format(dataset))

    #ipdb.set_trace()
    idx_features_labels = np.genfromtxt("{}{}.content".format(path, dataset), dtype=np.dtype(str)) # load all tables
    features = sp.csr_matrix(idx_features_labels[:, 1:-1], dtype=np.float32)  # Compress sparse matrix
    labels = encode_onehot(idx_features_labels[:, -1]) # Label onehot encoding

    # build graph
    idx = np.array(idx_features_labels[:, 0], dtype=np.int32) # node list
    idx_map = {j: i for i, j in enumerate(idx)}
    edges_unordered = np.genfromtxt("{}{}.cites".format(path, dataset),dtype=np.int32)
    edges = np.array(list(map(idx_map.get, edges_unordered.flatten())), dtype=np.int32).reshape(edges_unordered.shape)
    adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])), shape=(labels.shape[0], labels.shape[0]), dtype=np.float32)

    # build adjacency matrix
    adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)

    features = normalize(features)
    adj = normalize(adj + sp.eye(adj.shape[0]))

    # split all nodes to train/valid/test for node classification
    idx_train = range(140)
    idx_val = range(200, 500)
    idx_test = range(500, 1500)

    features = torch.FloatTensor(np.array(features.todense()))
    labels = torch.LongTensor(np.where(labels)[1])
    adj = sparse_mx_to_torch_sparse_tensor(adj)

    idx_train = torch.LongTensor(idx_train)
    idx_val = torch.LongTensor(idx_val)
    idx_test = torch.LongTensor(idx_test)

    return adj, features, labels, idx_train, idx_val, idx_test

In [None]:
def accuracy(output, labels):
    preds = output.max(1)[1].type_as(labels)
    correct = preds.eq(labels).double()
    correct = correct.sum()
    return correct / len(labels)

#### **Model Architecture**
##### **TODO : Fill ????? with proper code and Run**

In [None]:
class GraphConvolution(Module):
    
    #Simple GCN layer, similar to https://arxiv.org/abs/1609.02907
    
    def __init__(self, in_features, out_features):
        super(GraphConvolution, self).__init__()
        # initialize weight by using reset_parameters() function
        self.in_features = in_features 
        self.out_features = out_features
        self.weight = Parameter(torch.FloatTensor(in_features, out_features))
        self.reset_parameters()

    def reset_parameters(self):
        stdv = 1. / math.sqrt(self.weight.size(1))
        self.weight.data.uniform_(-stdv, stdv)

    def forward(self, input, adj):
        # You can use torch.mm
        support = ???? # Make XW  weight = W
        output = ???? # Make AXW  adj = A
        return output

In [None]:
class GCN(nn.Module):
    def __init__(self, nfeat, nhid, nclass, dropout):
        super(GCN, self).__init__()

        self.gc1 = GraphConvolution(nfeat, nhid)
        self.gc2 = GraphConvolution(nhid, nclass)
        self.dropout = dropout

    def forward(self, x, adj):
      # Obtain Node embedding
      #ipdb.set_trace()
      # Make forward propagation by referencing Section 1 (Graph Convolution Equation's forward propagation).
      x = self.gc1(x, adj) # Fisrt GraphConvlution Layer
      ???? # relu
      ???? # dropout
      ???? # Second Graph Convolution Layer
      ???? # log(softmax(x))
      return x


#### **Setting for training model**

In [None]:
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)

In [None]:
%%time

# Load data
adj, features, labels, idx_train, idx_val, idx_test = load_data() # adj -> adjacency matrix, same ax A,   features -> node feature matrix, same as X

In [None]:
# Model and optimizer
model = GCN(nfeat=features.shape[1], # [2708, 1433] -> [1433] for matrix multiplication of X and W
            nhid=NUM_HIDDEN,
            nclass=labels.max().item() + 1,
            dropout=dropout_rate)

optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)

In [None]:
model.cuda()
features = features.cuda()
adj = adj.cuda()
labels = labels.cuda()
idx_train = idx_train.cuda()
idx_val = idx_val.cuda()
idx_test = idx_test.cuda()

### **Train code**
In the train() function, We train GCN by using nll_loss objective function and Adam Optimizer. <br>
By using train and validation index, We get output in model result.

In [None]:
def train(epoch):
    t = time.time()
    model.train()
    optimizer.zero_grad()

    output = model(features, adj)
    loss_train = F.nll_loss(output[idx_train], labels[idx_train])
    acc_train = accuracy(output[idx_train], labels[idx_train])
    loss_train.backward()
    optimizer.step()

    # Evaluate validation set performance separately,
    # deactivates dropout during validation run.
    model.eval()
    output = model(features, adj)

    loss_val = F.nll_loss(output[idx_val], labels[idx_val])
    acc_val = accuracy(output[idx_val], labels[idx_val])

    print('Epoch: {:04d}'.format(epoch+1),
          'loss_train: {:.4f}'.format(loss_train.item()),
          'acc_train: {:.4f}'.format(acc_train.item()),
          'loss_val: {:.4f}'.format(loss_val.item()),
          'acc_val: {:.4f}'.format(acc_val.item()))

### **Test code**
In the test() function, we test trained model with node embedding visualization (T-SNE).

In [None]:
# Visualize
def visualize(h, label, idx):
    plt.figure(figsize=(8, 8))
    plt.xticks([])
    plt.yticks([])
    plt.xlabel('Dimension 0')
    plt.ylabel('Dimension 1')

    h_ = h[idx]
    color = [ label[i] for i in idx ]
    print(f'Embedding shape: {list(h_.shape)}')
    z = TSNE(n_components=2).fit_transform(h_.detach().cpu().numpy())  
    plt.scatter(z[:, 0], z[:, 1], s=70, c=color, cmap="Set2")
    plt.show()

def test(): # get loss and accuracy with node embedding visualization
    model.eval()
    output = model(features, adj)
    visualize(output, labels.detach().cpu(), idx_test)
    loss_test = F.nll_loss(output[idx_test], labels[idx_test])
    acc_test = accuracy(output[idx_test], labels[idx_test])
    print("Test set results:",
          "loss= {:.4f}".format(loss_test.item()),
          "accuracy= {:.4f}".format(acc_test.item()))

### **Train**
When I measure time for traing, About 1.35 sec

In [None]:
%%time

# Train model
t_total = time.time()
for epoch in range(EPOCH):
    train(epoch)
print("Optimization Finished!")
print("Total time elapsed: {:.4f}s".format(time.time() - t_total))

### **Test**
When I measure test time, About 6.79 sec

In [None]:
%%time

# Testing
test()

## **3. (HOMEWORK) Graph Classification on Collab Dataset**

**Collab Dataset** <br>
it is a large dataset containing many graphs and graph labels. <br>
This dataset is mainly used for graph classification.<br> 
COLLAB is a scientific collaboration dataset. A graph corresponds to a researcher’s ego network,  <br>
i.e., the researcher and its collaborators are nodes and an edge indicates collaboration between two researchers. <br> 
The code is made on pytorch_geometric library. <br> <br>

**Why do I use?** <br>
pytorch_geometric is very fast despite working on sparse data. <br>
Compared to the Deep GraphLibrary (DGL) 0.1.3, pytorch_geometric trains models up to 15 times faster. <br>

![picture](https://drive.google.com/uc?id=13jMho-M4Em8B32HzNWkXpnUcA4QGyXYU)

So, I recommend running the code and studying the library. <br><br>

Reference : https://medium.com/syncedreview/pytorch-geometric-a-fast-pytorch-library-for-dl-a833dff466e5



### **Prem**

In [None]:
!pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.9.0+cu111.html

In [None]:
from torch_geometric.datasets import TUDataset
from torch_geometric.utils import to_networkx
import torch_geometric.transforms as T
from torch_geometric.utils import degree

In [None]:
%%time

def create_one_hot_transform(dataset): # Since the collab dataset does not have a node feature, So I make a node feature using the max_degree value.
    max_degree = 0                     # I reference that in https://paperswithcode.com/sota/graph-classification-on-collab.
    degs = []
    for data in dataset:
        degs += [degree(data.edge_index[0], dtype=torch.long)]
        max_degree = max(max_degree, degs[-1].max().item())

    return T.OneHotDegree(max_degree)

def load_dataset():
        dataset = TUDataset(root='/tmp/COLLAB', name="COLLAB")
        dataset.transform = create_one_hot_transform(dataset)
        return dataset

dataset = load_dataset()

In [None]:
print()
print(f'Dataset: {dataset}:')
print('====================')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of classes: {dataset.num_classes}')
print(f'Number of features: {dataset.num_features}')



###### One graph #####
data = dataset[0]  # Get the first graph object.

print()
print(data)
print('=============================================================')

# Gather some statistics about the first graph.
print(f'Number of nodes: {data.num_nodes}')
print(f'Number of edges: {data.num_edges}')
print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')
print(f'Contains isolated nodes: {data.contains_isolated_nodes()}')
print(f'Contains self-loops: {data.contains_self_loops()}')
print(f'Is undirected: {data.is_undirected()}')

In [None]:
# One graph edges
print(data.edge_index)

In [None]:
from torch_geometric.utils import to_networkx

G = to_networkx(data, to_undirected=True)

#It shows one graph of Collab dataset.
plt.figure(figsize=(10, 10))
nx.draw_networkx(G, pos=nx.spring_layout(G, seed=42), with_labels=True, cmap="Set2", width=0.5, node_size=500, node_color='yellow')
plt.axis('off')
plt.tight_layout()
plt.show()

In [None]:
torch.manual_seed(12345)
dataset = dataset.shuffle() # Label data are sequentially located. (0, 1, 2)

# train / valid
train_dataset = dataset[:4000]
valid_dataset = dataset[4000:]

print(f'Number of training graphs: {len(train_dataset)}')
print(f'Number of test graphs: {len(valid_dataset)}')

In [None]:
from torch_geometric.data import DataLoader

# Unlike CV and NLP, in graph, DataLoader aggregates node_feature, weight and edge_index from different samples/ graphs into Batches
# So The GNN model needs this “batch” information to know which nodes belong to the same graph within a batch to perform computation. 
# Reference : https://towardsdatascience.com/hands-on-graph-neural-networks-with-pytorch-pytorch-geometric-359487e221a8
# Reference : https://pytorch-geometric.readthedocs.io/en/latest/notes/batching.html
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
valid_loader = DataLoader(valid_dataset, batch_size=64, shuffle=False)

In [None]:
from torch.nn import Linear
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.nn import global_mean_pool



class GCN(torch.nn.Module):
    def __init__(self, hidden_channels):
        super(GCN, self).__init__()
        torch.manual_seed(12345)
        self.conv1 = GCNConv(dataset.num_node_features, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, hidden_channels)
        self.conv3 = GCNConv(hidden_channels, hidden_channels) # When I used one more GCNConv, the performance came out better.
        self.lin = Linear(hidden_channels, dataset.num_classes)

    def forward(self, x, edge_index, batch):
        # 1. Obtain node embeddings 
        #ipdb.set_trace()
        
        x = self.conv1(x, edge_index)
        x = x.relu()
        x = self.conv2(x, edge_index)
        x = x.relu()
        x = self.conv3(x, edge_index)

        # 2. Readout layer
        x = global_mean_pool(x, batch)  # [batch_size, hidden_channels] , for graph classsification
        h = x.clone().detach() # for making graph embedding        

        # 3. Apply a final classifier
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.lin(x)
        
        return x , h

In [None]:
model = GCN(hidden_channels=64)
model = model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()

def train(epoch=None):
    model.train()
    for data in train_loader:  # Iterate in batches over the training dataset.
         data = data.to(device)
         out, _ = model(data.x, data.edge_index, data.batch)  # Perform a single forward pass.
         loss = criterion(out, data.y)  # Compute the loss.
         loss.backward()  # Derive gradients.
         optimizer.step()  # Update parameters based on gradients.
         optimizer.zero_grad()  # Clear gradients.
    
    print(f'Epoch: {epoch:03d}, Train loss: {loss:.4f}')
         

def test(loader, visual=False):
  model.eval()
  
  correct = 0
  for data in loader:  # Iterate in batches over the training/test dataset.
    data = data.to(device)
    out, h = model(data.x, data.edge_index, data.batch)
    pred = out.argmax(dim=1)  # Use the class with highest probability.
    correct += int((pred == data.y).sum())  # Check against ground-truth labels.
    
    if visual == True:
      colors = ['#3A3120', '#535D8E', '#BD3430']
      color = [ colors[i] for i in data.y.detach().cpu()]
      z = TSNE(n_components=2).fit_transform(h.detach().cpu().numpy())
      
      plt.figure(figsize=(10,10))
      plt.xticks([])
      plt.yticks([])
      print(f'Embedding shape: {list(h.shape)}')
      plt.scatter(z[:, 0], z[:, 1], s=70, c=color, cmap="Set2")
      plt.show()

  return correct / len(loader.dataset)  # Derive ratio of correct predictions.

In [None]:
%%time
################################
for epoch in range(1, 31):
    train(epoch)
    test_acc = test(valid_loader)
    if epoch % 5 == 0:
      print(f'Epoch: {epoch:03d}, Test Acc: {test_acc:.4f}')

In [None]:
### When you run the code, uncomment below command.
#test(valid_loader, visual=True) # t-SNE

## **Reference**
Thomas N. Kipf, Max Welling, Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017) <br>
http://tkipf.github.io/graph-convolutional-networks/ <br>
https://relational.fit.cvut.cz/dataset/CORA <br>
https://paperswithcode.com/sota/graph-classification-on-collab <br>
https://pytorch-geometric.readthedocs.io/en/latest/notes/batching.html <br>
https://medium.com/syncedreview/pytorch-geometric-a-fast-pytorch-library-for-dl-a833dff466e5 <br>
https://towardsdatascience.com/hands-on-graph-neural-networks-with-pytorch-pytorch-geometric-359487e221a8 <br>
https://paperswithcode.com/sota/node-classification-on-cora <br>
https://graphsandnetworks.com/the-cora-dataset/ <br>
https://github.com/tkipf/pygcn <br>
https://pytorch-geometric.readthedocs.io/en/latest/ <br>
https://colab.research.google.com/drive/1I8a0DfQ3fI7Njc62__mVXUlcAleUclnb?usp=sharing <br>
http://networkrepository.com/COLLAB.php <br>