## 1 Overview
An enviroment to train and evaluate neural networks on learning logical consequence. 

In [None]:
# For Google Collab: Get repository and go to it in collab.
!git clone https://github.com/stereifberger/master-s-thesis
%cd master-s-thesis/

In [None]:
# Google Colab if above does not move to right directory
%cd /content/master-s-thesis/

In [None]:
# For VsCode after starting Jupyter server: go to right directory.
%cd master-s-thesis/

In [None]:
# Install required dependencies - not necessary on google colab
!pip install -r requirements.txt

In [None]:
# Import required libraries
from imports import *

In [None]:
# For reloading libraries.
importlib.reload(architectures)

## 2 Create dataset
First the dataset for training is generated. For this the function "create_dataset" from "generation.py" utilizes the functions "gen_outp_PA" to generate a set of random starting formulas, for which iterativly the applicability of rules is checked. All applicable rules are then used to generate new derivations. In each iteration of gen_oupt_PA, set by the iterations variable, new, longer examples are generated.

**Rules.** The rules are defined in calculi.py. Two sets are avaiable: Intuitionistic propositional logic (set below via "calculus = ipl") and classical propositional logic (set below via "calculus = cpl").

**Dataset entries.**
- **x_train.** Training input: [INDEX, PREMISES, DERIVATION SYMBOL, CONCLUSION]
- **y_train_ordered.** Dataset of correct derivations where each sublist i correspnds to INDEX: [DERIVATIONS_0...DERIVATION_N]

**Encoding.** Propositional variables and logical constants are encoded as integers. The integers are then one-hot-encoded into unique sequences containing only 0s and ones with the length of the maximum integer value, the feature length. The shape of the individual entries is 2D: [SEQUENCE LENGTH, FEATURE LENGTH].

**Example entries withouth numerical representation and one-hot-encoding.**
- **x_train.** [2345, A, A THEN B, DERIVES, B OR C]
- **y_train_ordered.** Sublist 2345 is entry entry: [[A, A THEN B, B, B OR C], [A, A THEN B, B, A AND B, B OR C]]


In [None]:
# Create dataset
x_train_2d, x_train_3d, y_train_ordered, max_y_train_len = generation.create_dataset(iterations = [1,2], calculus = calculi.ipl)

In [None]:
import json
torch.save(x_train_2d, 'x_train_2d.pt')
torch.save(x_train_3d, 'x_train_3d.pt')
torch.save(y_train_ordered, 'y_train_ordered.pt')
with open('Medium_max_y_train_len.json', 'w') as file:
    json.dump(max_y_train_len, file)

## 3 Prepare dataset and define models for training
Next with pytorch's dataloader the single training entries in x_train are assigned to batches of size "batch size" in mixed order. Then the different models are defined using definitions from "architectures.py". These models are:

- Feedforward network (net)
- Recurrent neural network (RNNNet)
- Long-short-term memory (LSTMNet)
- Transformers (TransformerModel)

In [None]:
# Use when gpu is present to empty its catch and define it as "device" for referencing it
torch.cuda.empty_cache()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
# Get the datasets' shapes for the model definitions later
two_d_shape = x_train_2d.shape
three_d_shape = x_train_3d.shape
max_y_length = int(max_y_train_len/14)

In [None]:
# Reverse one-hot encoding for encoder-decoder models
x = torch.argmax(x_train_3d, dim=2) 
x[:, 0] = x_train_2d[:, 0]
x_train_nu = x

In [None]:
# Set train-test split to 80-20 [^1]
train_size = int(0.8 * len(x_train_2d)) 
test_size = len(x_train_2d) - train_size 
x_train_2d, x_test_2d = random_split(x_train_2d, [train_size, test_size])
x_train_3d, x_test_3d = random_split(x_train_3d, [train_size, test_size])
x_train_nu, x_test_nu = random_split(x_train_nu, [train_size, test_size])

In [None]:
# Collect and mix the data in [^2]
train_dataloader_2d = DataLoader(dataset = x_train_2d, shuffle = True, batch_size = 50)
test_dataloader_2d = DataLoader(dataset = x_test_2d, shuffle = True, batch_size = 50)
train_dataloader_3d = DataLoader(dataset = x_train_3d, shuffle = True, batch_size = 50)
test_dataloader_3d = DataLoader(dataset = x_test_3d, shuffle = True, batch_size = 50)
train_dataloader_nu = DataLoader(dataset = x_test_nu, shuffle = True, batch_size = 50)
test_dataloader_nu = DataLoader(dataset = x_test_nu, shuffle = True, batch_size = 50)

In [None]:
# Load ground truth data to GPU
y_train = y_train_ordered.to(device)
y_train_3d = y_train.view(int(len(y_train)), int(len(y_train[0])), int(len(y_train[0][0])/14), 14)

In [None]:
# Define the simple one-hot to one-hot networks [^3]
## FFN (onehot to onehot)
#ffn_oh_model = architectures.ffn(input_size = two_d_shape[1]-1, 
#                              hidden_size = 10,
#                              output_size = max_y_train_len,
#                              dropout_rate = 0.1,
#                              input_size_in = 756)
## RNN (onehot to onehot)
#rnn_oh_model = architectures.SimpleRNN(input_size = three_d_shape[2],
#                              hidden_size = 150,
#                              output_size = three_d_shape[2])
## LSTM (onehot to onehot)
#lst_oh_model = architectures.lst(input_size = three_d_shape[2],
#                              hidden_size = 150,
#                             output_size = max_y_train_len)

In [None]:
# Define the Encoder-Decoder networks
## FFN | Inputs: input_dim, hidden dim
encoder_ffn = architectures.Encoder_FFN(three_d_shape[1], 150)
decoder_ffn = architectures.Decoder_FFN((max_y_length*14), 150)
ffn_ed_model = architectures.Seq2Seq(encoder_ffn, decoder_ffn, device)
## RNN | Inputs: input_dim, embedding dim, hidden dim, nr layers
encoder_rnn = architectures.Encoder_RNN(three_d_shape[1], 150, 150, 1)
decoder_rnn = architectures.Decoder_RNN(14, 150, 150, 3)
rnn_ed_model = architectures.Seq2Seq(encoder_rnn, decoder_rnn, device)
## LSTM | Inputs: input_dim, embedding dim, hidden dim, nr layers, droput
encoder_lstm = architectures.Encoder_LSTM(three_d_shape[1], 150, 150, 2, 0.5)
decoder_lstm = architectures.Decoder_LSTM(14, 150, 150, 3, 0.5)
lst_ed_model = architectures.Seq2Seq(encoder_lstm, decoder_lstm, device)
## Transformer | Inputs: input_dim, embedding dim, hidden dim, nr layers, droput
encoder_tra = architectures.TransformerEncoder(three_d_shape[1], 150, 5, 150, 1)
decoder_tra = architectures.TransformerDecoder(14, 150, 1, 150, 3)
tra_ed_model = architectures.Seq2SeqTransformer(encoder_tra, decoder_tra, device)

In [None]:
# Define optimizers for models
lr = 0.001
ffn_oh_optimizer = torch.optim.Adam(ffn_oh_model.parameters(),lr=lr)
rnn_oh_optimizer = torch.optim.Adam(rnn_oh_model.parameters(),lr=lr)
lst_oh_optimizer = torch.optim.Adam(lst_oh_model.parameters(),lr=lr)
ffn_ed_optimizer = torch.optim.Adam(ffn_ed_model.parameters(),lr=lr)
rnn_ed_optimizer = torch.optim.Adam(rnn_ed_model.parameters(),lr=lr)
lst_ed_optimizer = torch.optim.Adam(lst_ed_model.parameters(),lr=lr)
tra_ed_optimizer = torch.optim.Adam(tra_ed_model.parameters(),lr=lr)

# 4 Training

In [None]:
criterion = nn.CrossEntropyLoss()

## 4.1 Encoder-Decoder Networks

### 4.1.1 FFN Encoder-Decoder

In [None]:
# Load model to GPU
ffn_ed_model.to(device)

In [None]:
# Train model and save results
FFN_CELtrain, FFN_CELtest, FFN_ACCtrain, FFN_ACCtest = schedule.train_model(ffn_ed_model, train_dataloader_nu, test_dataloader_nu, ffn_ed_optimizer, criterion, 100, device, max_y_length, y_train)
torch.save(ffn_ed_model.state_dict(), 'addition_model.pth')

In [None]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(ffn_ed_model, test_dataloader_nu, device, max_y_length)

In [None]:
# Delete model from GPU to make space for new models
del ffn_ed_model
torch.cuda.empty_cache()

### 4.1.2 RNN Encoder-Decoder

In [None]:
# Load model to GPU
rnn_ed_model.to(device)

In [None]:
# Training Loop
RNN_CELtrain, RNN_CELtest, RNN_ACCtrain, RNN_ACCtest = schedule.train_model(rnn_ed_model, train_dataloader_nu, test_dataloader_nu, rnn_ed_optimizer, criterion, 100, device, max_y_length, y_train_3d)
torch.save(rnn_ed_model.state_dict(), 'addition_model.pth')

In [None]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(rnn_ed_model, test_dataloader_nu, device, max_y_length)

In [None]:
# Delete model from GPU to make space for new models
del rnn_ed_model
torch.cuda.empty_cache()

### 4.1.3 LSTM Encoder-Decoder

In [None]:
# Load model to GPU
lst_ed_model.to(device)

In [None]:
# Training Loop
LSTM_CELtrain, LSTM_CELtest, LSTM_ACCtrain, LSTM_ACCtest = schedule.train_model(lst_ed_model, train_dataloader_nu, test_dataloader_nu, lst_ed_optimizer, criterion, 100, device, max_y_length, y_train_3d)
torch.save(lst_ed_model.state_dict(), 'addition_model.pth')

In [None]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(lst_ed_model, test_dataloader_nu, device, max_y_length)

In [None]:
# Delete model from GPU to make space for new models
del lst_ed_model
torch.cuda.empty_cache()

### 4.1.4 Transformer

In [None]:
importlib.reload(schedule)

In [None]:
# Load model to GPU
tra_ed_model.to(device)

In [None]:
TRA_CELtrain, TRA_CELtest, TRA_ACCtrain, TRA_ACCtest = schedule.train_model(tra_ed_model, train_dataloader_nu, test_dataloader_nu, tra_ed_optimizer, criterion, 100, device, max_y_length, y_train_3d)
torch.save(tra_ed_model.state_dict(), 'addition_model.pth')

In [None]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(tra_ed_model, test_dataloader_nu, device, max_y_length)

In [None]:
# Delete model from GPU to make space for new models
del tra_ed_model
torch.cuda.empty_cache()

## 4.2 Onehot-to-Onehot | NOT WORKING
NOT WORKING RIGHT NOW. No encoder-decoder structure but only simple networks that get fed the onehot encoded data.

### 4.1.1 FFN

In [None]:
# Load the feedforward model to the gpu 
ffn_oh_model.to(device)

In [None]:
schedule.train_model(ffn_oh_model, train_dataloader_2d, test_dataloader_2d, ffn_oh_optimizer, criterion, 200, device, max_y_length, y_train)
torch.save(ffn_oh_model.state_dict(), 'addition_model.pth')

In [None]:
# A sanity test for wheter the outputs look appropriate.
schedule.sanity_r(ffn_oh_model, test_dataloader_2d, device, max_y_length)

In [None]:
# Delete model from GPU to make space for new models
del ffn_oh_model
torch.cuda.empty_cache()

### 4.1.2 Recurrent Neural Network

In [None]:
# Load the feedforward model to the gpu 
rnn_oh_model.to(device)

In [None]:
importlib.reload(schedule)

In [None]:
schedule.train_model(rnn_oh_model, train_dataloader_3d, test_dataloader_3d, ffn_oh_optimizer, criterion, 200, device, max_y_length, y_train_3d)
torch.save(rnn_oh_model.state_dict(), 'addition_model.pth')

In [None]:
### FFN ### [^5]
ffn_costval_train = [] # Define the lists for the loss values
ffn_costval_test = []
for j in tqdm(range(50), desc = "Epoch"): # Loop over all epochs
    ffn_oh_model.train() # Set to training mode (weights are adjusted)
    train_loss = 0
    for i, x_train in enumerate(train_dataloader_2d):   # Loop over all batches
        x_train = x_train.to(device)
        y_pred = ffn_oh_model(x_train[:,1:], max_y_length)               # Get the model's output for batch
        cost, y_train_collected = losses.nffn_mse_min_dist(y_pred, x_train, y_train, max_y_length, device) # Calculate loss
        # Backpropagation
        ffn_oh_optimizer.zero_grad()
        cost.backward()
        ffn_oh_optimizer.step()
        train_loss += cost.item() # Append loss to intermediary list for average loss calculation
    avg_train_loss = train_loss / len(train_dataloader_2d) # Calculate average loss
    ffn_costval_train.append(avg_train_loss)

    ffn_oh_model.eval() # Set evaluation mode (weights are not adjusted)
    test_loss = 0
    # Analog to above but without training a loop over all batches
    with torch.no_grad():
        for i, x_test in enumerate(test_dataloader_2d):
            x_test = x_test.to(device)
            y_pred = ffn_oh_model(x_test[:, 1:], max_y_length)
            cost, y_train_collected = losses.nffn_mse_min_dist(y_pred, x_test, y_train, max_y_length, device)
            test_loss += cost.item()
    avg_test_loss = test_loss / len(test_dataloader_2d)
    ffn_costval_test.append(avg_train_loss)

    if j % 10 == 0: # Get the loss every 10 epochs
        print(f"Epoch {j}: Train Loss - {avg_train_loss}, Test Loss - {avg_test_loss}")

In [None]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(tra_ed_model, test_dataloader_3d, device, max_y_length)

## 5 Plot results
Here all results from above are plotted.

In [None]:
from matplotlib import font_manager as fm

In [None]:
plt.rcParams['font.family'] = 'serif'

In [None]:
#Feedforward Network, Non-Deep, Medium Dataset, Cross Entropy Loss
plt.figure(figsize=(8, 8))
x_data = list(range(100))
prop = fm.FontProperties(fname='/usr/share/fonts/opentype/freefont/FreeSerif.otf')
plt.plot(x_data, FFN_CELtrain, label='Training cross entropy loss')
plt.plot(x_data, FFN_CELtest, label='Test cross entropy loss')
plt.xlabel('Epochs')
plt.ylabel('Cross entropy loss')
plt.legend()
plt.title("Feedforward Network, Non-Deep, Medium Dataset, Cross Entropy Loss")
plt.show()

In [None]:
# Feedforward Network, Non-Deep, Medium Dataset, Accuracy
plt.figure(figsize=(8, 8))
x_data = list(range(100))
plt.plot(x_data, FFN_ACCtrain, label='Training accuracy')
plt.plot(x_data, FFN_ACCtest, label='Test accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title("Feedforward Network, Non-Deep, Medium Dataset, Accuracy")
plt.legend()
plt.show()

In [None]:
# Recurrent Neural Network, Medium, Medium Dataset, Accuracy, Cross Entropy Loss
plt.figure(figsize=(8, 8))
x_data = list(range(100))
plt.plot(x_data, RNN_CELtrain, label='Training cross entropy loss')
plt.plot(x_data, RNN_CELtest, label='Test cross entropy loss')
plt.plot(x_data, RNN_ACCtrain, label='Training accuracy')
plt.plot(x_data, RNN_ACCtest, label='Test accuracy')
plt.xlabel('Epochs')
plt.ylabel('Transformers results')
plt.legend()
plt.title("Recurrent Neural Network, Medium, Medium Dataset, Accuracy, Cross Entropy Loss")
plt.show()

In [None]:
# Long Short-Term Memory, Medium, Medium Dataset, Accuracy, Cross Entropy Loss
plt.figure(figsize=(8, 8))
x_data = list(range(100))
plt.plot(x_data, LSTM_CELtrain, label='Training cross entropy loss')
plt.plot(x_data, LSTM_CELtest, label='Test cross entropy loss')
plt.plot(x_data, LSTM_ACCtrain, label='Training accuracy')
plt.plot(x_data, LSTM_ACCtest, label='Test accuracy')
plt.xlabel('Epochs')
plt.ylabel('Transformers results')
plt.legend()
plt.title("Long Short-Term Memory, Medium, Medium Dataset, Accuracy, Cross Entropy Loss")
plt.show()

In [None]:
# Transformer, Medium, Medium Dataset, Accuracy, Cross Entropy Loss
plt.figure(figsize=(8, 8))
x_data = list(range(100))
plt.plot(x_data, TRA_CELtrain, label='Training cross entropy loss')
plt.plot(x_data, TRA_CELtest, label='Test cross entropy loss')
plt.plot(x_data, TRA_ACCtrain, label='Training accuracy')
plt.plot(x_data, TRA_ACCtest, label='Test accuracy')
plt.xlabel('Epochs')
plt.ylabel('Transformers results')
plt.legend()
plt.title("Transformer, Medium, Medium Dataset, Accuracy, Cross Entropy Loss")
plt.show()

In [None]:
# Test-Accuracy all Networks, Medium Dataset, Accuracy
plt.figure(figsize=(8, 8))
x_data = list(range(100))
plt.plot(x_data, FFN_ACCtest, label='FFN Test accuracy')
plt.plot(x_data, RNN_ACCtest, label='RNN Test accuracy')
plt.plot(x_data, LSTM_ACCtest, label='LSTM Test accuracy')
plt.plot(x_data, TRA_ACCtest, label='Transformer Test accuracy')
plt.xlabel('Epochs')
plt.ylabel('Transformers results')
plt.legend()
plt.title("Test-Accuracy all Networks, Medium Dataset, Accuracy")
plt.show()