## A model that does learn on cora dataset

[Cora dataset](https://github.com/tkipf/keras-gcn/tree/master/kegra/data/cora):
- nodes are papers
- node features : words (bag of words),  1433 unique words.
- adjacency matrix: $a_{ij} = 1$ if paper $i$ cites paper $j$
- label: on of the following classes: 
    - Case_Based, Genetic_Algorithms, Neural_Networks, Probabilistic_Methods, inforcement_Learning, Rule_Learning, Theory.


https://github.com/tkipf/keras-gcn/blob/master/kegra/train.py

In [7]:
from keras.layers import Input, Dropout
from keras.models import Model
from keras.optimizers import Adam
from keras.regularizers import l2

from kegra.layers.graph import GraphConvolution
from kegra.utils import *

import time

# Define parameters
DATASET = 'cora'
FILTER = 'localpool'  # 'chebyshev'
MAX_DEGREE = 2  # maximum polynomial degree
SYM_NORM = True  # symmetric (True) vs. left-only (False) normalization
NB_EPOCH = 200
PATIENCE = 10  # early stopping patience

# Get data
X, A, y = load_data(path="../data/cora/",dataset=DATASET)

Loading cora dataset...
Dataset has 2708 nodes, 5429 edges, 1433 features.


In [8]:
y_train, y_val, y_test, idx_train, idx_val, idx_test, train_mask = get_splits(y)

In [9]:

# Normalize X
X /= X.sum(1).reshape(-1, 1)

if FILTER == 'localpool':
    """ Local pooling filters (see 'renormalization trick' in Kipf & Welling, arXiv 2016) """
    print('Using local pooling filters...')
    A_ = preprocess_adj(A, SYM_NORM)
    support = 1
    graph = [X, A_]
    G = [Input(shape=(None, None), batch_shape=(None, None), sparse=True)]

X_in = Input(shape=(X.shape[1],))

# Define model architecture
# NOTE: We pass arguments for graph convolutional layers as a list of tensors.
# This is somewhat hacky, more elegant options would require rewriting the Layer base class.
H = Dropout(rate=0.5)(X_in)
H = GraphConvolution(16, support, activation='relu', kernel_regularizer=l2(5e-4))([H]+G)
H = Dropout(rate=0.5)(H)
Y = GraphConvolution(y.shape[1], support, activation='softmax')([H]+G)

# Compile model
model = Model(inputs=[X_in]+G, outputs=Y)
model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.01))



Using local pooling filters...


In [10]:
X.shape

(2708, 1433)

In [11]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_4 (InputLayer)            (None, 1433)         0                                            
__________________________________________________________________________________________________
dropout_3 (Dropout)             (None, 1433)         0           input_4[0][0]                    
__________________________________________________________________________________________________
input_3 (InputLayer)            (None, None)         0                                            
__________________________________________________________________________________________________
graph_convolution_3 (GraphConvo (None, 16)           22944       dropout_3[0][0]                  
                                                                 input_3[0][0]                    
__________

In [12]:

# Helper variables for main training loop
wait = 0
preds = None
best_val_loss = 99999

# Fit
for epoch in range(1, NB_EPOCH+1):

    # Log wall-clock time
    t = time.time()

    # Single training iteration (we mask nodes without labels for loss calculation)
    model.fit(graph, y_train, sample_weight=train_mask,
              batch_size=A.shape[0], epochs=1, shuffle=False, verbose=0)

    # Predict on full dataset
    preds = model.predict(graph, batch_size=A.shape[0])

    # Train / validation scores
    train_val_loss, train_val_acc = evaluate_preds(preds, [y_train, y_val],
                                                   [idx_train, idx_val])
    print("Epoch: {:04d}".format(epoch),
          "train_loss= {:.4f}".format(train_val_loss[0]),
          "train_acc= {:.4f}".format(train_val_acc[0]),
          "val_loss= {:.4f}".format(train_val_loss[1]),
          "val_acc= {:.4f}".format(train_val_acc[1]),
          "time= {:.4f}".format(time.time() - t))

    # Early stopping
    if train_val_loss[1] < best_val_loss:
        best_val_loss = train_val_loss[1]
        wait = 0
    else:
        if wait >= PATIENCE:
            print('Epoch {}: early stopping'.format(epoch))
            break
        wait += 1

# Testing
test_loss, test_acc = evaluate_preds(preds, [y_test], [idx_test])
print("Test set results:",
      "loss= {:.4f}".format(test_loss[0]),
      "accuracy= {:.4f}".format(test_acc[0]))


Epoch: 0001 train_loss= 1.9359 train_acc= 0.2929 val_loss= 1.9368 val_acc= 0.3467 time= 0.9949
Epoch: 0002 train_loss= 1.9240 train_acc= 0.3000 val_loss= 1.9261 val_acc= 0.3533 time= 0.0632
Epoch: 0003 train_loss= 1.9110 train_acc= 0.3143 val_loss= 1.9147 val_acc= 0.3533 time= 0.0746
Epoch: 0004 train_loss= 1.8971 train_acc= 0.3000 val_loss= 1.9024 val_acc= 0.3533 time= 0.0677
Epoch: 0005 train_loss= 1.8828 train_acc= 0.3000 val_loss= 1.8898 val_acc= 0.3500 time= 0.0722
Epoch: 0006 train_loss= 1.8677 train_acc= 0.3000 val_loss= 1.8768 val_acc= 0.3500 time= 0.0688
Epoch: 0007 train_loss= 1.8526 train_acc= 0.3071 val_loss= 1.8640 val_acc= 0.3533 time= 0.0805
Epoch: 0008 train_loss= 1.8377 train_acc= 0.3071 val_loss= 1.8514 val_acc= 0.3567 time= 0.0879
Epoch: 0009 train_loss= 1.8228 train_acc= 0.3143 val_loss= 1.8389 val_acc= 0.3567 time= 0.0776
Epoch: 0010 train_loss= 1.8082 train_acc= 0.3429 val_loss= 1.8268 val_acc= 0.3567 time= 0.0843
Epoch: 0011 train_loss= 1.7940 train_acc= 0.3500 v

Epoch: 0090 train_loss= 0.9812 train_acc= 0.8143 val_loss= 1.2114 val_acc= 0.6867 time= 0.0813
Epoch: 0091 train_loss= 0.9739 train_acc= 0.8143 val_loss= 1.2067 val_acc= 0.6900 time= 0.0885
Epoch: 0092 train_loss= 0.9668 train_acc= 0.8143 val_loss= 1.2024 val_acc= 0.7000 time= 0.0842
Epoch: 0093 train_loss= 0.9602 train_acc= 0.8214 val_loss= 1.1986 val_acc= 0.7000 time= 0.0888
Epoch: 0094 train_loss= 0.9540 train_acc= 0.8214 val_loss= 1.1949 val_acc= 0.6967 time= 0.0806
Epoch: 0095 train_loss= 0.9474 train_acc= 0.8214 val_loss= 1.1900 val_acc= 0.7033 time= 0.0700
Epoch: 0096 train_loss= 0.9408 train_acc= 0.8214 val_loss= 1.1847 val_acc= 0.7033 time= 0.0716
Epoch: 0097 train_loss= 0.9340 train_acc= 0.8143 val_loss= 1.1786 val_acc= 0.7033 time= 0.0664
Epoch: 0098 train_loss= 0.9272 train_acc= 0.8143 val_loss= 1.1719 val_acc= 0.7000 time= 0.0607
Epoch: 0099 train_loss= 0.9209 train_acc= 0.8143 val_loss= 1.1652 val_acc= 0.6967 time= 0.0625
Epoch: 0100 train_loss= 0.9149 train_acc= 0.8143 v

Epoch: 0178 train_loss= 0.5819 train_acc= 0.9357 val_loss= 0.9073 val_acc= 0.7867 time= 0.0708
Epoch: 0179 train_loss= 0.5792 train_acc= 0.9429 val_loss= 0.9054 val_acc= 0.7900 time= 0.0636
Epoch: 0180 train_loss= 0.5766 train_acc= 0.9429 val_loss= 0.9039 val_acc= 0.7900 time= 0.0633
Epoch: 0181 train_loss= 0.5745 train_acc= 0.9357 val_loss= 0.9034 val_acc= 0.7933 time= 0.0620
Epoch: 0182 train_loss= 0.5727 train_acc= 0.9286 val_loss= 0.9031 val_acc= 0.7967 time= 0.0685
Epoch: 0183 train_loss= 0.5712 train_acc= 0.9286 val_loss= 0.9025 val_acc= 0.8000 time= 0.0650
Epoch: 0184 train_loss= 0.5684 train_acc= 0.9357 val_loss= 0.9007 val_acc= 0.7967 time= 0.0624
Epoch: 0185 train_loss= 0.5654 train_acc= 0.9429 val_loss= 0.8979 val_acc= 0.8000 time= 0.0667
Epoch: 0186 train_loss= 0.5624 train_acc= 0.9429 val_loss= 0.8956 val_acc= 0.7900 time= 0.0753
Epoch: 0187 train_loss= 0.5590 train_acc= 0.9429 val_loss= 0.8920 val_acc= 0.7900 time= 0.0639
Epoch: 0188 train_loss= 0.5557 train_acc= 0.9429 v