## 1 Overview
An enviroment to train and evaluate neural networks on learning logical consequence. 

In [None]:
# For Google Collab: Get repository and go to it in collab.
!git clone https://github.com/stereifberger/master-s-thesis
%cd master-s-thesis/

In [None]:
# For VsCode after starting Jupyter server: go to right directory.
%cd master-s-thesis/

In [None]:
# Install required dependencies - not necessary on google colab
!pip install -r requirements.txt

In [None]:
# Import required libraries
from imports import *

In [None]:
# For reloading libraries.
importlib.reload(architectures)

## 2 Create dataset
First the dataset for training is generated. For this the function "create_dataset" from "generation.py" utilizes the functions "gen_outp_PA" to generate a set of random starting formulas, for which iterativly the applicability of rules is checked. All applicable rules are then used to generate new derivations. In each iteration of gen_oupt_PA, set by the iterations variable, new, longer examples are generated.

**Rules.** The rules are defined in calculi.py. Two sets are avaiable: Intuitionistic propositional logic (set below via "calculus = ipl") and classical propositional logic (set below via "calculus = cpl").

**Dataset entries.**
- **x_train.** Training input: [INDEX, PREMISES, DERIVATION SYMBOL, CONCLUSION]
- **y_train_ordered.** Dataset of correct derivations where each sublist i correspnds to INDEX: [DERIVATIONS_0...DERIVATION_N]

**Encoding.** Propositional variables and logical constants are encoded as integers. The integers are then one-hot-encoded into unique sequences containing only 0s and ones with the length of the maximum integer value, the feature length. The shape of the individual entries is 2D: [SEQUENCE LENGTH, FEATURE LENGTH].

**Example entries withouth numerical representation and one-hot-encoding.**
- **x_train.** [2345, A, A THEN B, DERIVES, B OR C]
- **y_train_ordered.** Sublist 2345 is entry entry: [[A, A THEN B, B, B OR C], [A, A THEN B, B, A AND B, B OR C]]


In [None]:
# Create dataset
x_train_2d, x_train_3d, y_train_ordered, max_y_train_len = generation.create_dataset(iterations = [1,2], calculus = calculi.ipl)

## 3 Prepare dataset and define models for training
Next with pytorch's dataloader the single training entries in x_train are assigned to batches of size "batch size" in mixed order. Then the different models are defined using definitions from "architectures.py". These models are:

- Feedforward network (net)
- Recurrent neural network (RNNNet)
- Long-short-term memory (LSTMNet)
- Transformers (TransformerModel)

In [None]:
# Use when gpu is present to empty its catch and define it as "device" for referencing it
torch.cuda.empty_cache()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
# Get the datasets' shapes for the model definitions later
two_d_shape = x_train_2d.shape
three_d_shape = x_train_3d.shape
max_y_length = int(max_y_train_len/14)

In [None]:
# Reverse one-hot encoding for encoder-decoder models
x = torch.argmax(x_train_3d, dim=2) 
x[:, 0] = x_train_2d[:, 0]
x_train_nu = x

In [None]:
# Set train-test split to 80-20 [^1]
train_size = int(0.8 * len(x_train_2d)) 
test_size = len(x_train_2d) - train_size 
x_train_2d, x_test_2d = random_split(x_train_2d, [train_size, test_size])
x_train_3d, x_test_3d = random_split(x_train_3d, [train_size, test_size])
x_train_nu, x_test_nu = random_split(x_train_nu, [train_size, test_size])

In [None]:
# Collect and mix the data in [^2]
train_dataloader_2d = DataLoader(dataset = x_train_2d, shuffle = True, batch_size = 50)
test_dataloader_2d = DataLoader(dataset = x_test_2d, shuffle = True, batch_size = 50)
train_dataloader_3d = DataLoader(dataset = x_train_3d, shuffle = True, batch_size = 50)
test_dataloader_3d = DataLoader(dataset = x_test_3d, shuffle = True, batch_size = 50)
train_dataloader_nu = DataLoader(dataset = x_test_nu, shuffle = True, batch_size = 50)
test_dataloader_nu = DataLoader(dataset = x_test_nu, shuffle = True, batch_size = 50)

In [None]:
# Load ground truth data to GPU
y_train = y_train_ordered.to(device)
y_train_3d = y_train.view(int(len(y_train)), int(len(y_train[0])), int(len(y_train[0][0])/14), 14)

In [None]:
# Define the simple one-hot to one-hot networks [^3]
## FFN (onehot to onehot)
ffn_oh_model = architectures.ffn(input_size = two_d_shape[1]-1, 
                              hidden_size = 10,
                              output_size = max_y_train_len,
                              dropout_rate = 0.1,
                              input_size_in = 756)
## RNN (onehot to onehot)
rnn_oh_model = architectures.SimpleRNN(input_size = three_d_shape[2],
                              hidden_size = 150,
                              output_size = three_d_shape[2])
## LSTM (onehot to onehot)
lst_oh_model = architectures.lst(input_size = three_d_shape[2],
                              hidden_size = 150,
                              output_size = max_y_train_len)

In [None]:
# Define the Encoder-Decoder networks
## FFN
encoder_ffn = architectures.Encoder_FFN(55, 100)
decoder_ffn = architectures.Decoder_FFN((max_y_length*14), 100)
ffn_ed_model = architectures.Seq2Seq(encoder_ffn, decoder_ffn, device)
## RNN
encoder_rnn = architectures.Encoder_RNN(55, 150, 150, 1)
decoder_rnn = architectures.Decoder_RNN(14, 150, 150, 6)
rnn_ed_model = architectures.Seq2Seq(encoder_rnn, decoder_rnn, device)
## LSTM
encoder_lstm = architectures.Encoder_LSTM(55, 150, 150, 1)
decoder_lstm = architectures.Decoder_LSTM(14, 150, 150, 6)
lst_ed_model = architectures.Seq2Seq(encoder_lstm, decoder_lstm, device)
## Transformer
encoder_tra = architectures.TransformerEncoder(52, 150, 5, 150, 1)
decoder_tra = architectures.TransformerDecoder(14, 150, 1, 150, 3)
tra_ed_model = architectures.Seq2SeqTransformer(encoder_tra, decoder_tra, device)

In [None]:
# Define optimizers for models
lr = 0.001
ffn_oh_optimizer = torch.optim.Adam(ffn_oh_model.parameters(),lr=lr)
rnn_oh_optimizer = torch.optim.Adam(rnn_oh_model.parameters(),lr=lr)
lst_oh_optimizer = torch.optim.Adam(lst_oh_model.parameters(),lr=lr)
ffn_ed_optimizer = torch.optim.Adam(ffn_ed_model.parameters(),lr=lr)
rnn_ed_optimizer = torch.optim.Adam(rnn_ed_model.parameters(),lr=lr)
lst_ed_optimizer = torch.optim.Adam(lst_ed_model.parameters(),lr=lr)
tra_ed_optimizer = torch.optim.Adam(tra_ed_model.parameters(),lr=lr)

# 4 Training

In [None]:
criterion = nn.CrossEntropyLoss()

## 4.2 Encoder-Decoder Networks

### 4.2.1 FFN Encoder-Decoder

In [None]:
# Load model to GPU
ffn_ed_model.to(device)

In [None]:
# Train model and save results
schedule.train_model(ffn_ed_model, train_dataloader_nu, test_dataloader_nu, ffn_ed_optimizer, criterion, 50, device, max_y_length, y_train)
torch.save(ffn_ed_model.state_dict(), 'addition_model.pth')

In [None]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(ffn_ed_model, test_dataloader_nu, device, max_y_length)

In [None]:
# Delete model from GPU to make space for new models
del ffn_ed_model
torch.cuda.empty_cache()

### 4.2.2 RNN Encoder-Decoder

In [None]:
# Load model to GPU
rnn_ed_model.to(device)

In [None]:
# Training Loop
schedule.train_model(rnn_ed_model, train_dataloader_nu, test_dataloader_nu, rnn_ed_optimizer, criterion, 200, device, max_y_length, y_train_3d)
torch.save(rnn_ed_model.state_dict(), 'addition_model.pth')

In [None]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(rnn_ed_model, test_dataloader_nu, device, max_y_length)

In [None]:
# Delete model from GPU to make space for new models
del rnn_ed_model
torch.cuda.empty_cache()

### 4.2.3 LSTM Encoder-Decoder

In [None]:
# Load model to GPU
lst_ed_model.to(device)

In [None]:
# Training Loop
schedule.train_model(lst_ed_model, train_dataloader_nu, test_dataloader_nu, lst_ed_optimizer, criterion, 200, device, max_y_length, y_train_3d)
torch.save(lst_ed_model.state_dict(), 'addition_model.pth')

In [None]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(lst_ed_model, test_dataloader_nu, device, max_y_length)

In [None]:
# Delete model from GPU to make space for new models
del lst_ed_model
torch.cuda.empty_cache()

### 4.2.4 Transformer

In [None]:
# Load model to GPU
tra_ed_model.to(device)

In [None]:
schedule.train_model(tra_ed_model, train_dataloader_nu, test_dataloader_nu, tra_ed_optimizer, criterion, 200, device, max_y_length, y_train_3d)
torch.save(tra_ed_model.state_dict(), 'addition_model.pth')

In [None]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(tra_ed_model, test_dataloader_nu, device, max_y_length)

In [None]:
# Delete model from GPU to make space for new models
del tra_ed_model
torch.cuda.empty_cache()

## 4.1 Onehot-to-Onehot
Each subsequent cell trains one of the four models and calculates their mean squared error loss for the nearest correct derivation from the dataset to the derivation provided by the model. The logic for this is impolemented in the custom loss function "mse_min_dist" in losses.py.

### 4.1.1 FFN

In [None]:
# Load the feedforward model to the gpu 
ffn_oh_model.to(device)

In [35]:
schedule.train_model(ffn_oh_model, train_dataloader_2d, test_dataloader_2d, ffn_oh_optimizer, criterion, 200, device, max_y_length, y_train)
torch.save(ffn_oh_model.state_dict(), 'addition_model.pth')

torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50,

KeyboardInterrupt: 

In [34]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity_r(ffn_oh_model, test_dataloader_2d, device, max_y_length)

INPUT: pppppppppppppppppppppppppppppppppppppppppppppppppppppp
OUTPUT: p⊥)∨∧qrq⊢→rspr¬)tp(¬)∧∧∧srt∨∧t⊥∨⊢⊢⊥q∧rq)p)r∨r∧⊥ttp∧r→q(∨)→¬tq)t∨rss⊥t¬
INPUT: pppppppppppppppppppppppppppppppppppppppppppppppppppppp
OUTPUT: ¬⊥)∨∧qrq⊢→rspr¬)tp(¬)∧∧p∧t∨∧t⊥∨⊢p⊥s∧rq⊥p)r∨(r∧⊥ttp∧⊢→⊥(∨)→¬t¬⊢t∨rpq⊥t¬
INPUT: pppppppppppppppppppppppppppppppppppppppppppppppppppppp
OUTPUT: ¬⊥(∨∧qrq⊢→rspr¬)tp(¬)∧∧p∧rt∨∧t⊥∨⊢p∨sprq⊥p)r∨(r∧rttp∧⊢→⊥(∨)→¬)q⊢t∨∨pq⊥t¬
INPUT: pppppppppppppppppppppppppppppppppppppppppppppppppppppp
OUTPUT: p⊥)∨∧qrq⊢→rspr¬)t∨(¬)∧t)st∨∧t∧∨⊢⊢⊥qprq⊥p)r∨p∧⊥ttp∧⊥→q(∨)→¬tq⊥tqr∧q⊥tq
INPUT: pppppppppppppppppppppppppppppppppppppppppppppppppppppp
OUTPUT: p⊥(¬∧∨rq⊢(rspr¬→t∨¬)∧t)(t∨∧t∧∨⊢pqsprq⊥r)r∨(p∧⊥ttp∧⊥→⊥(∨)→¬)q⊥qr∧q⊥tq
INPUT: pppppppppppppppppppppppppppppppppppppppppppppppppppppp
OUTPUT: ¬⊥(∨∧trq⊢→rs¬∨¬)tp(¬)∧∧p∧tt∧t⊥∨⊢p→sprq⊥p∧r((⊢∧rtqp∧⊢→⊥(∨p→∧)q⊢t∨∨pq⊥t¬
INPUT: pppppppppppppppppppppppppppppppppppppppppppppppppppppp
OUTPUT: p⊥)∧qrq⊢→rspr¬)t∨(¬)∧t)srt∨∧t∧→t⊢⊥qprq⊥p∨r∨p⊢⊥ttp∧r→q(∨)→¬t¬⊥qrsq⊥tq
INPUT: pppppppppp

In [None]:
# Delete model from GPU to make space for new models
del ffh_oh_model
torch.cuda.empty_cache()

### 4.1.2 Recurrent Neural Network

In [None]:
# Load the feedforward model to the gpu 
rnn_oh_model.to(device)

In [None]:
importlib.reload(schedule)

In [None]:
schedule.train_model(rnn_oh_model, train_dataloader_3d, test_dataloader_3d, ffn_oh_optimizer, criterion, 200, device, max_y_length, y_train_3d)
torch.save(rnn_oh_model.state_dict(), 'addition_model.pth')

In [None]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(tra_ed_model, test_dataloader_nu, device, max_y_length)

## 5 Plot results
Here all results from above are plotted.

In [None]:
plt.figure(figsize=(8, 8))
x_data = list(range(100))
y_data_ffn = ffn_costval_train
y_data_rnn = rnn_costval_train
y_data_lst = lst_costval_train
y_data_tra = tra_costval_train
plt.plot(x_data, y_data_ffn, label='FFN')
plt.plot(x_data, y_data_rnn, label='RNN')
plt.plot(x_data, y_data_lst, label='LSTM')
plt.plot(x_data, y_data_tra, label='Transformers')
plt.xlabel('Epochs')
plt.ylabel('Training MSE')
plt.legend()
plt.show()

In [None]:
plt.figure(figsize=(8, 8))
x_data = list(range(100))
y_data_ffn = ffn_costval_test
y_data_rnn = rnn_costval_test
y_data_lst = lst_costval_test
y_data_tra = tra_costval_test
plt.plot(x_data, y_data_ffn, label='FFN')
plt.plot(x_data, y_data_rnn, label='RNN')
plt.plot(x_data, y_data_lst, label='LSTM')
plt.plot(x_data, y_data_tra, label='Transformers')
plt.xlabel('Epochs')
plt.ylabel('Test MSE')
plt.legend()
plt.show()