# Knowledge Enhanced Neural Networks - Tutorial Notebook

## Introduction

This notebook is intended as a tutorial for the KENN2 (Knowledge Enhanced Neural Networks) framework. 
KENN2 (Knowledge Enhanced Neural Networks v2.1) is a library for Python 3 built on top of TensorFlow 2 that allows you to modify neural network models by providing logical knowledge in the form of a set of universally quantified FOL clauses. It does so by allowing the addition of a new final layer (KENN in the picture below) to the existing neural network. Such layer changes the original predictions of the standard neural network, enforcing the satisfaction of the provided knowledge.

![title](imgs/KENN_overview.png)

In the picture above, we used the symbol `z` to represent the output of the neural network to indicate that the input of KENN must be the preactivations of the predicted truth values. Indeed, KENN layer could be seen as a special type of activation function which enforces the logical knowledge.

Notice also that the neural network is represented in gray. This is bacause, in general, KENN is agnostic to the architecture used and the input `z` could be only partially calculated by the neural network. For instance, in this tutorial, the binary predicates are given as inputs since the task is to predict only the unary predicates.

To create the KENN layer, the library provides parsers functions, each of which represent a parser for a specific type of knowledge base file. The parser compiles the knowledge and returns the KENN layer, which can be used as a standard TensorFlow operator and can be inserted as a final layer of the model.

The main parsers available in KENN are two: `unary_parser` and `relational_parser` (though it is possible to extend the library with new parsers). In the first case, the knowledge contains only unary predicates, and the `x` and `y` of the picture above corresponds to matrices (where rows represent different objects and columns different predicates). In the second case, the `relational_parser` takes as input a knowledge file with both unary and binary predicates and `x` and `y` are (usually) graphs.

In this notebook, we focus on the relational case. In particular, here we focus on a simple application of KENN on the Citeseer Dataset, a citation network where the only binary predicate (which correponds the the edges of the graph) represent the citation relation between papers and the goal is to predict the topic of the papers given both the papers features and the citations among them.

For a simple explanation of KENN2 usage with the unary parser, please check the README page of the [github page of KENN2](https://github.com/DanieleAlessandro/KENN2).

## The Citeseer Dataset
The Citeseer Dataset is essentially a directed graph: the nodes are papers, while an edge between a paper `x` and another paper `y` means that `x` cites `y`. Each paper can be classified into one of six classes, namely: Agents, AI, DB, IR, ML and HCI. Moreover, for each paper a bag of words vector is provided as features.

The task is to **correctly classify each scientific publication**, considering both the features of the paper and the relational information provided by the citation network.

More specifically, the dataset consists of:
- **3312 scientific publications**; 
- **4732 links**;
- Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of **3703 unique words**, which was obtained after stemming and removing stopwords. 

### Training, Validation and Test splits

For this tutorial, we follow the **Inductive Learning** paradigm when splitting the whole dataset into Training, Validation and Test Set (i.e., only edges between nodes in the same set are used, while edges across the three sets are removed).

Specifically, we use $8\%$ of the nodes (papers) as Training Set, $2\%$ for validation and the remaining $90\%$ as a Test Set. This split is particularly relevant since there are not a lot of samples available at training time. In this type of scenario, the additional information coming from the knowledge become particularly relevant since tha data is not enough to learn everything from scratch.

More in depth experiments on CiteSeer are available on [GitHub](https://github.com/rmazzier/KENN-Citeseer-Experiments).


### The Prior Knowledge

The Prior Knowledge we want to inject is pretty simple. Indeed, it is composed of only 6 clauses, one for each topic. More in details, given a topic `T`, we inject the clause

$$\forall x \forall y \quad \lnot T(x) \lor \lnot Cite(x,y) \lor T(y)$$

Such rule codifies the idea that papers cite works that are related to them (i.e. the topic of a paper is often the same of the paper it cites). 

### Data Representation

Before looking in details the model architecture, we need to specify how to represent a graph in terms of matrices. Picture below shows an example of a graph (left) and the corresponding representation used by KENN2 (right).

In the example there are three unary predicates (labels of the nodes) and two binary predicates (red and green arrows). For both unary and binary predicates the numbers represent the preactivations of the truth values. To simplify the picture, many edges are set to false and not showed in the graph. Note that, since KENN layer works with preactivations, the false truth values' preactivations are set to a very small number (-500) in the matrices on the right.

![title](imgs/KENN_graph.png)


In KENN2, the graph is represented using two matrices and two indexes vectors:
- The `Unary` matrix contains the truth values of unary predicates. In particular, each column corresponds to a specific predicate, while the rows represent the different objects of the domain (the nodes of the graph). To simplify the presentation, an additional column with the objects' indexes has been added to the picture;
- The `Binary` matrix has a similar structure, but it contains the truth values of binary predicates. In this case, each row corresponds to a pair of nodes;
- Finally, the two vectors `sx` and `sy` contain the indexes of the objects referred by the `Binary` matrix. As an example, in the picture above, the first row of `Binary` contains the binary predicates referring the pairs of nodes 0 and 1. Hence, the values in the first row of `sx` and `sy` contains 0 and 1 respectively.

## KENN architecture on CiteSeer

<a id='architecture'></a>

![title](imgs/KENN_architecture.png)

The picture above depicts the architecture used in this tutorial. There are four steps, the first two can be seen as a pre-elaboration and they are applied only once:
- **Step 1: pre-elaboration of the relational data.** Here the graph is stored using the previously defined representation. Notice that in this case there is only one column in the `Binary` Matrix (`relations` in the picture above). Such column represents the `Cite` predicate and it is setted to 500 for all the pairs. This is because the citations are given as input and we consider only the pairs `(x,y)` for which `Cite(x,y)` is true;
- **Step 2: parsing the knowledge file.** Here the knowledge file is provided to the parser, which in turns generates the KENN layer.

After the first two steps the model is defined and it can be trained like any neural network in TensorFlow. The model produces its predictions in the last two steps:
- **Step 3: initial predictions provided by the neural network.** The NN uses the nodes' features to classify each paper. It returns the preactivations for each paper and each topic;
- **Step 4: knowledge injection.** The KENN layer updates the predictions of the NN and returns the final results.

# How to use KENN2

## Imports and initial settings

Lets now dive into the code, starting from the necessary imports. The first rows are standard python libraries. The only additional import required by KENN2 is the parser function, which allows to read a knowldge file for generating the KENN layer.

In [4]:
import tensorflow as tf
import numpy as np 
import pandas as pd

from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras import layers
from tensorflow.keras.activations import softmax

from KENN2.parsers import relational_parser

In order to fix the code behaviour, we also set the random seed for TensorFlow and numpy. Feel free to experiment with different values of the seed or directly remove the entire cell.

In [5]:
random_seed = 0
tf.random.set_seed(random_seed)
np.random.seed(random_seed)

Finally, lets define the path of the dataset.

In [6]:
dataset_folder = 'dataset/CiteSeer/'

## Import data
Here we import the relational data. In particular, the three matrices defined in the [KENN architecture on CiteSeer](#architecture) section: `features`, `indexes` and `relations`.

#### Features: 
The features matrix contains a row for each node in the graph, while the columns take values in $\{0,1\}$, 0 meaning the absence of a word of the dictionary, 1 meaning its presence.

In [30]:
training_features = np.genfromtxt(dataset_folder + 'training_features.csv', delimiter=',')
validation_features = np.genfromtxt(dataset_folder + 'validation_features.csv', delimiter=',')
test_features = np.genfromtxt(dataset_folder + 'test_features.csv', delimiter=',')

In [8]:
pd.DataFrame(training_features)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,3693,3694,3695,3696,3697,3698,3699,3700,3701,3702
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
259,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
260,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
261,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
262,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Notice that the matrix above seems to contains only zeros just because it is very sparse. As an example, here the indexes of the words contained in the first document:

In [9]:
print(np.where(training_features[0]==1)[0])

[  32  211  249  383  407  493  507  609  619  731  744 1087 1118 1239
 1245 1548 1611 1619 1641 1841 2216 2395 2407 2448 2492 2539 2553 2563
 2568 2615 2741 2875 2902 2906 3122 3184 3463 3586 3594]


#### Indexes: 

In [13]:
indexes_training = np.genfromtxt(dataset_folder + 'indexes_training.csv', delimiter=',', dtype=np.int32)
indexes_validation = np.genfromtxt(dataset_folder + 'indexes_validation.csv', delimiter=',', dtype=np.int32)
indexes_test = np.genfromtxt(dataset_folder + 'indexes_test.csv', delimiter=',', dtype=np.int32)

In [14]:
print("Shape of the training_indexes matrix: " + str(indexes_training.shape))
pd.DataFrame(indexes_training).head(10)

Shape of the training_indexes matrix: (34, 2)


Unnamed: 0,0,1
0,140,140
1,94,139
2,127,203
3,127,228
4,237,237
5,193,226
6,53,53
7,83,83
8,262,262
9,69,69


We can note that there is a good amount of papers that seemingly cite themselves. This strange behaviour is probably due to the fact that Citeseer was generated with an automatic citation indexing system, which seems not capable to disambiguate between papers having the same author names.

Check https://clgiles.ist.psu.edu/papers/DL-1998-citeseer.pdf for more information.

#### Relations: 
As we explained above, this vector contains the truth values of the connection between two nodes.
Note that we just consider the couples of connected nodes, and exclude all the other non connected couples of nodes in the graph. For this reason the only value in the relations matrix is 500.

In [15]:
# IMPORT RELATIONS
relations_training = np.genfromtxt(dataset_folder + 'relations_training.csv', delimiter=',')
relations_validations = np.genfromtxt(dataset_folder + 'relations_validation.csv', delimiter=',')
relations_test = np.genfromtxt(dataset_folder + 'relations_test.csv', delimiter=',')

# Reshape relations arrays to be column vectors
relations_training = np.expand_dims(relations_training, axis=1)
relations_validations = np.expand_dims(relations_validations, axis=1)
relations_test = np.expand_dims(relations_test, axis=1)

n_features = training_features.shape[1]

In [16]:
print("Shape of the training_relations matrix: " + str(relations_training.shape))
pd.DataFrame(relations_training).head(10)

Shape of the training_relations matrix: (34, 1)


Unnamed: 0,0
0,500.0
1,500.0
2,500.0
3,500.0
4,500.0
5,500.0
6,500.0
7,500.0
8,500.0
9,500.0


#### Labels: 
Finally, in order to perform training and evaluations, we need also to import the labels. Labels are encoded as a one-hot vector vector of length 6. 

In [10]:
training_labels = np.genfromtxt(dataset_folder + 'training_labels.csv', delimiter=',')
validation_labels = np.genfromtxt(dataset_folder + 'validation_labels.csv', delimiter=',')
test_labels = np.genfromtxt(dataset_folder + 'test_labels.csv', delimiter=',')

In [11]:
pd.DataFrame(training_labels)

Unnamed: 0,0,1,2,3,4,5
0,0.0,0.0,1.0,0.0,0.0,0.0
1,1.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,1.0,0.0
4,0.0,1.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...
259,0.0,0.0,1.0,0.0,0.0,0.0
260,0.0,0.0,0.0,0.0,0.0,1.0
261,0.0,0.0,1.0,0.0,0.0,0.0
262,0.0,0.0,0.0,0.0,0.0,1.0


## Define the models

Here we define the standard model (a feedforward neural network with three hidden layers) and KENN model (the same architecture as the standard model with the addition of KENN layers).

### Base NN Model

In [17]:
class Standard(Model):
    def __init__(self):
        super(Standard, self).__init__()

    def build(self, input_shape):
        self.h1 = layers.Dense(50, input_shape=input_shape, activation='relu')
        self.d1 = layers.Dropout(0.5)
        self.h2 = layers.Dense(50, input_shape=(50,), activation='relu')
        self.d2 = layers.Dropout(0.5)
        self.h3 = layers.Dense(50, input_shape=(50,), activation='relu')
        self.d3 = layers.Dropout(0.5)

        self.last_layer = layers.Dense(
            6, input_shape=(50,), activation='linear')

    def preactivations(self, inputs):
        x = self.h1(inputs)
        x = self.d1(x)
        x = self.h2(x)
        x = self.d2(x)
        x = self.h3(x)
        x = self.d3(x)

        return self.last_layer(x)
        
    def call(self, inputs, **kwargs):
        z = self.preactivations(inputs)

        return z, softmax(z)

### KENN Model

In [19]:
class Kenn(Standard):
    def __init__(self, knowledge_file, *args, **kwargs):
        super(Kenn, self).__init__(*args, **kwargs)
        self.knowledge = knowledge_file

    def build(self, input_shape):
        super(Kenn, self).build(input_shape)
        self.kenn_layer_1 = relational_parser(self.knowledge)
        self.kenn_layer_2 = relational_parser(self.knowledge)
        self.kenn_layer_3 = relational_parser(self.knowledge, activation=softmax)

    def call(self, inputs, **kwargs):
        features = inputs[0]
        relations = inputs[1]
        sx = inputs[2]
        sy = inputs[3]
        
        z = self.preactivations(features)
        z, _ = self.kenn_layer_1(z, relations, sx, sy)
        z, _ = self.kenn_layer_2(z, relations, sx, sy)
        z, _ = self.kenn_layer_3(z, relations, sx, sy)

        return z

In the above cell we defined the KENN model by extending the Standard NN with 3 additional KENN layers. We add three layers instead of one since a single layer considers only the neighbours of each node. By adding three layers we allow for a propagation of the changes to farther nodes.

As already mentioned previously, to generate a KENN layer is sufficient to call the `relational_parser` function, providing as input the knowledge file. More in details, the `relational_parser` function parses the knowledge file and returns a ```RelationalKenn``` object which is constructed by looking at the provided knowledge file. Such object can then be used with standard TensorFlow code, as we do below.

Notice also that the third layer is created by passing an activation function (the `softmax`) to the parser. This is bacause, internally, the KENN layer just modifies the preactivations provided as input (based on the knowledge) and then applies an activation function (the default one is a linear activation).

Lets now take a look at the knowldge base file:

In [32]:
!cat knowledge_base

_AI,_Agents,_DB,_HCI,_IR,_ML
Cite

>
_:n_AI(x),nCite(x.y),_AI(y)
_:n_Agents(x),nCite(x.y),_Agents(y)
_:n_DB(x),nCite(x.y),_DB(y)
_:n_HCI(x),nCite(x.y),_HCI(y)
_:n_IR(x),nCite(x.y),_IR(y)
_:n_ML(x),nCite(x.y),_ML(y)


The first row contains the list of unary predicates separated by a comma with no spaces, while the second row contains the binary predicates (in our case, just one). Each predicate should start with a capital letter or another symbol, to ensure that we have no conflicts with other rules we'll define below.

The other rows contain the clauses, which are also split in two groups: the first group contains only unary predicates, the second both unary and binary predicates. The two groups are separated by a row containing the > symbol.
Each clause is in a separate row and must be written respecting this properties:

- Logical disjunctions are represented with commas;
- If a literal is negated, it must be preceded by the lowercase 'n';
- They must contain only predicates specified in the first row;
- There shouldn't be spaces.

Additionally, each clause must be preceded by a positive weight that represents the strength of the clause. More precisely, the weight could be a numeric value or an underscore: in the first case, the weight is fixed and determined by the specified value, in the second case the weight is learned during training. 

Unary predicates are defined on a single variable (e.g. AI(x)) while binary predicates on two variables separated by a dot (e.g. Cite(x.y)).




## Training setup

Lets now set the training hyper parameters:

In [20]:
# Training parameters
n_epochs = 300

# Early Stopping parameters
min_delta = 0.001
es_patience = 10


optimizer = keras.optimizers.Adam()
loss = keras.losses.CategoricalCrossentropy(from_logits=False)

### Early Stopping:

In this tutorial we use Early Stopping which proved to be beneficial when applied to the KENN model.

The `callback_early_stopping` function takes as argument the list with all the validation accuracies. 
If `patience`=$k$, it checks if the mean of the last $k$ accuracies is higher than the mean of the 
previous $k$ accuracies (i.e. we check that we are not overfitting). If not, it stops learning.


In [21]:
def accuracy(predictions, labels):
    correctly_classified = tf.equal(
        tf.argmax(predictions, 1), tf.argmax(labels, 1))
    return tf.reduce_mean(tf.cast(correctly_classified, tf.float32))

def callback_early_stopping(AccList, min_delta=min_delta, patience=es_patience):
    if len(AccList)//patience < 2:
        return False
    
    mean_previous = np.mean(AccList[::-1][patience:2*patience])
    mean_recent = np.mean(AccList[::-1][:patience])
    delta = mean_recent - mean_previous

    if delta <= min_delta:
        print(
            "*CB_ES* Validation Accuracy didn't increase in the last %d epochs" % (patience))
        print("*CB_ES* delta:", delta)
    
    return delta <= min_delta

# Training the base NN

Here we train the standard NN on its own. Later, we will compare its Test Accuracy with the one of KENN.

In [22]:
# Define and build model
standard_model = Standard()
standard_model.build((n_features,))


# Used for early stopping
valid_accuracies = []

for epoch in range(n_epochs):
    with tf.GradientTape() as tape:
        _, predictions = standard_model(training_features)
        training_loss = loss(predictions, training_labels)

        gradient = tape.gradient(training_loss, standard_model.variables)
        optimizer.apply_gradients(zip(gradient, standard_model.variables))

    
    _, v_predictions = standard_model(validation_features)
    v_accuracy = accuracy(v_predictions, validation_labels)
    valid_accuracies.append(v_accuracy.numpy())
    
    if epoch % 10 == 0:
        _, t_predictions = standard_model(training_features)
        t_loss = loss(t_predictions,training_labels)
        t_accuracy = accuracy(t_predictions, training_labels)
        
        v_loss = loss(v_predictions, validation_labels)

        print(
            "Epoch {}: Training Loss: {:5.4f} Validation Loss: {:5.4f} | Train Accuracy: {:5.4f} Validation Accuracy: {:5.4f};".format(
                epoch, t_loss, v_loss, t_accuracy, v_accuracy))


    # Early Stopping
    stopEarly = callback_early_stopping(valid_accuracies)
    if stopEarly:
        print("callback_early_stopping signal received at epoch= %d/%d" %
                (epoch, n_epochs))
        print("Terminating training ")
        break

Epoch 0: Training Loss: 13.3542 Validation Loss: 13.4059 | Train Accuracy: 0.3598 Validation Accuracy: 0.2687;
Epoch 10: Training Loss: 12.2838 Validation Loss: 13.1026 | Train Accuracy: 0.8674 Validation Accuracy: 0.3881;
Epoch 20: Training Loss: 8.8543 Validation Loss: 11.9123 | Train Accuracy: 0.9432 Validation Accuracy: 0.4478;
Epoch 30: Training Loss: 3.1537 Validation Loss: 10.3593 | Train Accuracy: 0.9924 Validation Accuracy: 0.5672;
Epoch 40: Training Loss: 0.3242 Validation Loss: 9.5506 | Train Accuracy: 1.0000 Validation Accuracy: 0.5075;
*CB_ES* Validation Accuracy didn't increase in the last 10 epochs
*CB_ES* delta: -0.01791042
callback_early_stopping signal received at epoch= 44/300
Terminating training 


# Training KENN

The code for training KENN is not particularly different from the standard NN model. Indeed, the only difference is in the inputs of the model that were composed of just the nodes' features in the case of the NN and now contain also the relational data.

In [23]:
kenn_model = Kenn('knowledge_base')
kenn_model.build((n_features,))

valid_accuracies = []

for epoch in range(n_epochs):
    with tf.GradientTape() as tape:
        predictions_KENN = kenn_model(
            [training_features, relations_training, np.expand_dims(indexes_training[:,0], axis=1), np.expand_dims(indexes_training[:,1], axis=1)])

        l = loss(predictions_KENN, training_labels)

        gradient = tape.gradient(l, kenn_model.variables)
        optimizer.apply_gradients(zip(gradient, kenn_model.variables))
    
    v_predictions = kenn_model([validation_features, relations_validations, np.expand_dims(indexes_validation[:,0], axis=1), np.expand_dims(indexes_validation[:,1], axis=1)])
    v_accuracy = accuracy(v_predictions, validation_labels)
    valid_accuracies.append(v_accuracy)


    if epoch % 10 == 0:
        t_predictions = kenn_model(
                [training_features, relations_training, np.expand_dims(indexes_training[:,0], axis=1), np.expand_dims(indexes_training[:,1], axis=1)])
        t_loss = loss(t_predictions, training_labels)
        t_accuracy = accuracy(t_predictions, training_labels)


        v_loss = loss(v_predictions, validation_labels)

        print(
            "Epoch {}: Training Loss: {:5.4f} Validation Loss: {:5.4f} | Train Accuracy: {:5.4f} Validation Accuracy: {:5.4f};".format(
                epoch, t_loss, v_loss, t_accuracy, v_accuracy))

    # Early Stopping
    stopEarly = callback_early_stopping(valid_accuracies)
    if stopEarly:
        print("callback_early_stopping signal received at epoch= %d/%d" %
                (epoch, n_epochs))
        print("Terminating training ")
        break

Epoch 0: Training Loss: 13.3580 Validation Loss: 13.4560 | Train Accuracy: 0.2879 Validation Accuracy: 0.1940;
Epoch 10: Training Loss: 11.5904 Validation Loss: 13.1699 | Train Accuracy: 0.8674 Validation Accuracy: 0.2687;
Epoch 20: Training Loss: 5.2270 Validation Loss: 11.7052 | Train Accuracy: 0.9886 Validation Accuracy: 0.3284;
Epoch 30: Training Loss: 0.3318 Validation Loss: 9.1088 | Train Accuracy: 1.0000 Validation Accuracy: 0.5522;
Epoch 40: Training Loss: 0.0201 Validation Loss: 7.6865 | Train Accuracy: 1.0000 Validation Accuracy: 0.6418;
Epoch 50: Training Loss: 0.0043 Validation Loss: 7.2665 | Train Accuracy: 1.0000 Validation Accuracy: 0.6567;
*CB_ES* Validation Accuracy didn't increase in the last 10 epochs
*CB_ES* delta: -5.9604645e-08
callback_early_stopping signal received at epoch= 56/300
Terminating training 


## Evaluation on Test Set
<a id='evaluations'></a>

Lets evaluate the NN model with and without the KENN layers.

In [24]:
_, predictions_test = standard_model(test_features)
test_accuracy = accuracy(predictions_test, test_labels)
print("Standard model Test Accuracy: {:.5f}%".format(test_accuracy.numpy() * 100))

Standard model Test Accuracy: 52.53271%


In [25]:
ind_x = np.expand_dims(indexes_test[:,0], axis=1)
ind_y = np.expand_dims(indexes_test[:,1], axis=1)

predictions_test_kenn = kenn_model(
    [test_features, relations_test, ind_x, ind_y])

test_accuracy_kenn = accuracy(predictions_test_kenn, test_labels)
print("KENN model Test Accuracy: {:.5f}%".format(test_accuracy_kenn.numpy() * 100))

KENN model Test Accuracy: 61.15398%


As you can see, the results are quite larger in the second case. This improvement on the test accuracy can be explained as the results of adding the knowledge, since the two models differences are only in the final KENN layers.

Notice that it could happen that KENN does not improve the base neural networks accuracy due to random fluctuations. In the following image, some histograms that show the distributions of KENN results on 500 runs (with different splits of the dataset).

Each row corresponds to a different amount of training data (the first, like in this tutorial, is with 10% of the data for training and validation).

The left column contains the distributions of results of the NN and KENN models, while the second column the distributions of the improvements provided by KENN.


![CiteSeer Experiments](https://raw.githubusercontent.com/rmazzier/KENN-Citeseer-Experiments/main/plots/history_inductive.png)

For more information on the CiteSeer experiments, please visit the [GitHub](https://github.com/rmazzier/KENN-Citeseer-Experiments) page.

# KENN in summary

To summarize, there are few changes to standard TensorFlow code needed to incorporate the knowledge with KENN2:
- importing of a parser;
- calling the parser with the knowledge file path;
- using the resulting layer as a function on top of the NN model.