# Knowledge Enhanced Neural Networks - Tutorial Notebook
With this notebook we present a simple application of KENN on the Citeseer Dataset, where relational logical knowledge is employed to improve the predictions of a baseline NN.

In [1]:
import tensorflow as tf
import numpy as np 
import pandas as pd

from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras import layers
from KENN2.parsers import relational_parser
from tensorflow.keras.activations import softmax

In [2]:
dataset_folder = 'dataset/CiteSeer/'

In [3]:
# Set Random Seed for tensorflow and numpy
random_seed = 0
tf.random.set_seed(random_seed)
np.random.seed(random_seed)

## The Citeseer Dataset
The Citeseer Dataset is essentially a directed graph: the nodes are documents, while the edges are directed in such a way that the tail node is the paper that makes the citation and the head node is the cited paper. More specifically, it consists of:
- **3312 scientific publications** classified into one of six classes (namely: Agents, AI, DB, IR, ML, HCI). 
- The citation network consists of **4732 links**. 
- Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of **3703 unique words**, which was obtained after stemming and removing stopwords. 

The task is to **correctly classify each scientific publication**, given in input the features for each sample and the relational information provided by the citation network.

### Data Representation
![title](imgs/data_repr.png)
We represent all the features as a matrix with $n$ rows and $k$ columns, where $n$ is the number of nodes and $k$ is the number of features.

The relational data (i.e. the relations between the nodes of the network) are summarized in a table of index couples: for each edge in the network, we store the index of the first node and the index of the second node in a single row of the **indexes** matrix. The **relations** table contains the truth value of each ordered couple in the indexes matrix; note that we consider only couples of connected nodes. Furthermore, note that with "truth value" we mean a number in the range $[-\infty, \infty]$, since, inside the architecture of KENN, those are considered as the preactivations of the final predictions.

### About the Training, Validation and Test splits
We split the whole dataset into Training, Validation and Test set. Specifically, we just take $10\%$ of the complete dataset as Training Set, and use $20\%$ of it for validation. We use the remaining $90\%$ as a Test set. We chose to pick this precise split ratio since it's the one where KENN can be more useful. In fact, we observed that when few data is present, KENN gives much better results compared to a standard NN, thanks to the addition of logical knowledge.

 When making splits, we used the **Inductive** paradigm: we considered only the edges $(x,y)$ such that both $x$ and $y$ are in the same split.
For each the following files will be used throughout this notebook:
- **features**: matrix containing the feature vectors for each node in the selected split;
- **labels**: matrix of one-hot encoded labels for each node in the selected split;
- **indexes**: matrix containing all the couples of nodes that are connected by an edge in the selected split;
- **relations**: vector, of length equal to the number of rows of the indexes matrix. It contains the truth value of the connection between the corresponding couple of nodes. Since the edges of the graph are not weighted, the possible values for a connection are just "connected" and "not connected". As stated above, the truth values can be seen as preactivations $z$ of a sigmoid function $\sigma(z)$. For this reason we just set the truth value of all connected pairs to 500, since $\sigma(500) \approx 1$.

### Import data

In [4]:
# IMPORT FEATURES
training_features = np.genfromtxt(dataset_folder + 'training_features.csv', delimiter=',')
validation_features = np.genfromtxt(dataset_folder + 'validation_features.csv', delimiter=',')
test_features = np.genfromtxt(dataset_folder + 'test_features.csv', delimiter=',')

# IMPORT LABELS
training_labels = np.genfromtxt(dataset_folder + 'training_labels.csv', delimiter=',')
validation_labels = np.genfromtxt(dataset_folder + 'validation_labels.csv', delimiter=',')
test_labels = np.genfromtxt(dataset_folder + 'test_labels.csv', delimiter=',')

# IMPORT EDGES INDEXES
indexes_training = np.genfromtxt(dataset_folder + 'indexes_training.csv', delimiter=',', dtype=np.int32)
indexes_validation = np.genfromtxt(dataset_folder + 'indexes_validation.csv', delimiter=',', dtype=np.int32)
indexes_test = np.genfromtxt(dataset_folder + 'indexes_test.csv', delimiter=',', dtype=np.int32)

# IMPORT RELATIONS
relations_training = np.genfromtxt(dataset_folder + 'relations_training.csv', delimiter=',')
relations_validations = np.genfromtxt(dataset_folder + 'relations_validation.csv', delimiter=',')
relations_test = np.genfromtxt(dataset_folder + 'relations_test.csv', delimiter=',')

# Reshape relations arrays to be column vectors
relations_training = np.expand_dims(relations_training, axis=1)
relations_validations = np.expand_dims(relations_validations, axis=1)
relations_test = np.expand_dims(relations_test, axis=1)

n_features = training_features.shape[1]

#### Features: 
Each row is a node in the graph, while each column takes values in $\{0,1\}$, the first meaning the absence and the second meaning the presence of the corresponding word in the dictionary.

In [5]:
pd.DataFrame(training_features)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,3693,3694,3695,3696,3697,3698,3699,3700,3701,3702
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
259,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
260,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
261,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
262,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


For example: those are the indexes of the words contained in the first document.

In [6]:
print(np.where(training_features[0]==1)[0])

[  32  211  249  383  407  493  507  609  619  731  744 1087 1118 1239
 1245 1548 1611 1619 1641 1841 2216 2395 2407 2448 2492 2539 2553 2563
 2568 2615 2741 2875 2902 2906 3122 3184 3463 3586 3594]


#### Labels: 
As we can see, each label is one-hot encoded into a vector of length 6. 

In [7]:
pd.DataFrame(training_labels)

Unnamed: 0,0,1,2,3,4,5
0,0.0,0.0,1.0,0.0,0.0,0.0
1,1.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,1.0,0.0
4,0.0,1.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...
259,0.0,0.0,1.0,0.0,0.0,0.0
260,0.0,0.0,0.0,0.0,0.0,1.0
261,0.0,0.0,1.0,0.0,0.0,0.0
262,0.0,0.0,0.0,0.0,0.0,1.0


In [8]:
# Example: this is the class to which the first 5 samples belong to:
topics = ["Agents", "AI", "DB", "IR", "ML", "HCI"]
for i in range(5):
    print("Document {}'s class: ".format(i) + topics[np.where(training_labels[i]==1)[0][0]])

Document 0's class: DB
Document 1's class: Agents
Document 2's class: Agents
Document 3's class: ML
Document 4's class: AI


#### Indexes: 
Each row represents a connection between the first and the second 

In [9]:
print("Shape of the training_indexes matrix: " + str(indexes_training.shape))
pd.DataFrame(indexes_training).head(10)

Shape of the training_indexes matrix: (34, 2)


Unnamed: 0,0,1
0,140,140
1,94,139
2,127,203
3,127,228
4,237,237
5,193,226
6,53,53
7,83,83
8,262,262
9,69,69


We can note that there is a good amount of papers that seemingly cite themselves. This strange behaviour is probably due to the fact that Citeseer is an automatic citation indexing system, which seems not capable to disambiguate between papers having the same author names.

Check https://clgiles.ist.psu.edu/papers/DL-1998-citeseer.pdf for more information.

#### Relations: 
As we explained above, this vector contains the truth values of the connection between two nodes.
Note that we just consider the couples of connected nodes, and exclude all the other non connected couples of nodes in the graph. For this reason the unique value in the relations vector is 500.

In [10]:
print("Shape of the training_relations matrix: " + str(relations_training.shape))
pd.DataFrame(relations_training).head(10)

Shape of the training_relations matrix: (34, 1)


Unnamed: 0,0
0,500.0
1,500.0
2,500.0
3,500.0
4,500.0
5,500.0
6,500.0
7,500.0
8,500.0
9,500.0


## Experiment Setup
![title](imgs/experiment_setup.png)

In this example, the relational data is injected directly from the citation network. KENN uses this data to increase the truth value of each clause that is given as input in the prior knowledge.
Specifically, in this case, the knowledge used by KENN codifies the idea that papers cite works that are related to them (i.e. the topic of a paper is often the same of the paper it cites). 

For this reason we instantiate the clause:
$$\forall x \forall y \quad T(x) \land Cite(x,y) \rightarrow T(y)$$
multiple times, for all the topics $T$.

## Define the models
Here we define a Standard Sequential Model, and a Relational KENN model with Tensorflow Subclassing.

### Base NN Model

In [11]:
class Standard(Model):
    def __init__(self):
        super(Standard, self).__init__()

    def build(self, input_shape):
        self.h1 = layers.Dense(50, input_shape=input_shape, activation='relu')
        self.d1 = layers.Dropout(0.5)
        self.h2 = layers.Dense(50, input_shape=(50,), activation='relu')
        self.d2 = layers.Dropout(0.5)
        self.h3 = layers.Dense(50, input_shape=(50,), activation='relu')
        self.d3 = layers.Dropout(0.5)

        self.last_layer = layers.Dense(
            6, input_shape=(50,), activation='linear')

    def preactivations(self, inputs):
        x = self.h1(inputs)
        x = self.d1(x)
        x = self.h2(x)
        x = self.d2(x)
        x = self.h3(x)
        x = self.d3(x)

        return self.last_layer(x)
        
    def call(self, inputs, **kwargs):
        z = self.preactivations(inputs)

        return z, softmax(z)

### KENN Model:
Here we define our KENN model, which extends the Standard NN by adding 3 KENN layers. 

In fact, combining more layers when using data in the form of a graph, can allow us to propagate the logical knowledge not only to adjacent nodes, but also to neighbors of neighbors, and so on. Furthermore, since KENN can learn clause weights, it can also learn the importance to give to not directly adjacent nodes by simply penalizing the clause weights of successive layers.

In [12]:
class Kenn(Standard):
    def __init__(self, knowledge_file, *args, **kwargs):
        super(Kenn, self).__init__(*args, **kwargs)
        self.knowledge = knowledge_file

    def build(self, input_shape):
        super(Kenn, self).build(input_shape)
        self.kenn_layer_1 = relational_parser(self.knowledge)
        self.kenn_layer_2 = relational_parser(self.knowledge)
        self.kenn_layer_3 = relational_parser(self.knowledge)

    def call(self, inputs, **kwargs):
        features = inputs[0]
        relations = inputs[1]
        sx = inputs[2]
        sy = inputs[3]
        
        z = self.preactivations(features)
        z, _ = self.kenn_layer_1(z, relations, sx, sy)
        z, _ = self.kenn_layer_2(z, relations, sx, sy)
        z, _ = self.kenn_layer_3(z, relations, sx, sy)

        return softmax(z)

## Training setup

In [13]:
# Training parameters
n_epochs = 300

# Early Stopping parameters
min_delta = 0.001
es_patience = 10


optimizer = keras.optimizers.Adam()
loss = keras.losses.CategoricalCrossentropy(from_logits=False)

### Early Stopping:
Our Early Stopping function takes as argument the list with all the validation accuracies. 
If patience=$k$, checks if the mean of the last $k$ accuracies is higher than the mean of the 
previous $k$ accuracies (i.e. we check that we are not overfitting). If not, stops learning.


In [14]:
def accuracy(predictions, labels):
    correctly_classified = tf.equal(
        tf.argmax(predictions, 1), tf.argmax(labels, 1))
    return tf.reduce_mean(tf.cast(correctly_classified, tf.float32))

def callback_early_stopping(AccList, min_delta=min_delta, patience=es_patience):
    if len(AccList)//patience < 2:
        return False
    
    mean_previous = np.mean(AccList[::-1][patience:2*patience])
    mean_recent = np.mean(AccList[::-1][:patience])
    delta = mean_recent - mean_previous

    if delta <= min_delta:
        print(
            "*CB_ES* Validation Accuracy didn't increase in the last %d epochs" % (patience))
        print("*CB_ES* delta:", delta)
    
    return delta <= min_delta

# Training the base NN

In [15]:
# Define and build model
standard_model = Standard()
standard_model.build((n_features,))


# Used for early stopping
valid_accuracies = []

for epoch in range(n_epochs):
    with tf.GradientTape() as tape:
        _, predictions = standard_model(training_features)
        training_loss = loss(predictions, training_labels)

        gradient = tape.gradient(training_loss, standard_model.variables)
        optimizer.apply_gradients(zip(gradient, standard_model.variables))

    
    _, v_predictions = standard_model(validation_features)
    v_accuracy = accuracy(v_predictions, validation_labels)
    valid_accuracies.append(v_accuracy.numpy())
    
    if epoch % 10 == 0:
        _, t_predictions = standard_model(training_features)
        t_loss = loss(t_predictions,training_labels)
        t_accuracy = accuracy(t_predictions, training_labels)
        
        v_loss = loss(v_predictions, validation_labels)

        print(
            "Epoch {}: Training Loss: {:5.4f} Validation Loss: {:5.4f} | Train Accuracy: {:5.4f} Validation Accuracy: {:5.4f};".format(
                epoch, t_loss, v_loss, t_accuracy, v_accuracy))


    # Early Stopping
    stopEarly = callback_early_stopping(valid_accuracies)
    if stopEarly:
        print("callback_early_stopping signal received at epoch= %d/%d" %
                (epoch, n_epochs))
        print("Terminating training ")
        break

Epoch 0: Training Loss: 13.3542 Validation Loss: 13.4059 | Train Accuracy: 0.3598 Validation Accuracy: 0.2687;
Epoch 10: Training Loss: 12.2838 Validation Loss: 13.1026 | Train Accuracy: 0.8674 Validation Accuracy: 0.3881;
Epoch 20: Training Loss: 8.8543 Validation Loss: 11.9123 | Train Accuracy: 0.9432 Validation Accuracy: 0.4478;
Epoch 30: Training Loss: 3.1537 Validation Loss: 10.3593 | Train Accuracy: 0.9924 Validation Accuracy: 0.5672;
Epoch 40: Training Loss: 0.3242 Validation Loss: 9.5506 | Train Accuracy: 1.0000 Validation Accuracy: 0.5075;
*CB_ES* Validation Accuracy didn't increase in the last 10 epochs
*CB_ES* delta: -0.01791042
callback_early_stopping signal received at epoch= 44/300
Terminating training 


# Training KENN

In [16]:
kenn_model = Kenn('knowledge_base')
kenn_model.build((n_features,))

valid_accuracies = []

for epoch in range(n_epochs):
    with tf.GradientTape() as tape:
        predictions_KENN = kenn_model(
            [training_features, relations_training, np.expand_dims(indexes_training[:,0], axis=1), np.expand_dims(indexes_training[:,1], axis=1)])

        l = loss(predictions_KENN, training_labels)

        gradient = tape.gradient(l, kenn_model.variables)
        optimizer.apply_gradients(zip(gradient, kenn_model.variables))
    
    v_predictions = kenn_model([validation_features, relations_validations, np.expand_dims(indexes_validation[:,0], axis=1), np.expand_dims(indexes_validation[:,1], axis=1)])
    v_accuracy = accuracy(v_predictions, validation_labels)
    valid_accuracies.append(v_accuracy)


    if epoch % 10 == 0:
        t_predictions = kenn_model(
                [training_features, relations_training, np.expand_dims(indexes_training[:,0], axis=1), np.expand_dims(indexes_training[:,1], axis=1)])
        t_loss = loss(t_predictions, training_labels)
        t_accuracy = accuracy(t_predictions, training_labels)


        v_loss = loss(v_predictions, validation_labels)

        print(
            "Epoch {}: Training Loss: {:5.4f} Validation Loss: {:5.4f} | Train Accuracy: {:5.4f} Validation Accuracy: {:5.4f};".format(
                epoch, t_loss, v_loss, t_accuracy, v_accuracy))

    # Early Stopping
    stopEarly = callback_early_stopping(valid_accuracies)
    if stopEarly:
        print("callback_early_stopping signal received at epoch= %d/%d" %
                (epoch, n_epochs))
        print("Terminating training ")
        break

Epoch 0: Training Loss: 13.3580 Validation Loss: 13.4560 | Train Accuracy: 0.2879 Validation Accuracy: 0.1940;
Epoch 10: Training Loss: 11.5904 Validation Loss: 13.1699 | Train Accuracy: 0.8674 Validation Accuracy: 0.2687;
Epoch 20: Training Loss: 5.2270 Validation Loss: 11.7052 | Train Accuracy: 0.9886 Validation Accuracy: 0.3284;
Epoch 30: Training Loss: 0.3318 Validation Loss: 9.1088 | Train Accuracy: 1.0000 Validation Accuracy: 0.5522;
Epoch 40: Training Loss: 0.0201 Validation Loss: 7.6865 | Train Accuracy: 1.0000 Validation Accuracy: 0.6418;
Epoch 50: Training Loss: 0.0043 Validation Loss: 7.2665 | Train Accuracy: 1.0000 Validation Accuracy: 0.6567;
*CB_ES* Validation Accuracy didn't increase in the last 10 epochs
*CB_ES* delta: -5.9604645e-08
callback_early_stopping signal received at epoch= 56/300
Terminating training 


## Evaluation on Test Set

In [17]:
_, predictions_test = standard_model(test_features)
test_accuracy = accuracy(predictions_test, test_labels)
print("Standard model Test Accuracy: {:.5f}%".format(test_accuracy.numpy() * 100))

Standard model Test Accuracy: 52.53271%


In [18]:
ind_x = np.expand_dims(indexes_test[:,0], axis=1)
ind_y = np.expand_dims(indexes_test[:,1], axis=1)

predictions_test_kenn = kenn_model(
    [test_features, relations_test, ind_x, ind_y])

test_accuracy_kenn = accuracy(predictions_test_kenn, test_labels)
print("KENN model Test Accuracy: {:.5f}%".format(test_accuracy_kenn.numpy() * 100))

KENN model Test Accuracy: 61.15398%
