## 1 Overview
An enviroment to train and evaluate neural networks on learning logical consequence. 

In [None]:
# For Google Collab: Get repository and go to it in collab.
!git clone https://github.com/stereifberger/master-s-thesis
%cd master-s-thesis/

In [1]:
# For VsCode after starting Jupyter server: go to right directory.
%cd master-s-thesis/

/home/str/master-s-thesis


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [None]:
# Install required dependencies - not necessary on google colab
!pip install -r requirements.txt

In [2]:
# Import required libraries
from imports import *

In [141]:
# For reloading libraries.
importlib.reload(architectures)

<module 'architectures' from '/home/str/master-s-thesis/architectures.py'>

## 2 Create dataset
First the dataset for training is generated. For this the function "create_dataset" from "generation.py" utilizes the functions "gen_outp_PA" to generate a set of random starting formulas, for which iterativly the applicability of rules is checked. All applicable rules are then used to generate new derivations. In each iteration of gen_oupt_PA, set by the iterations variable, new, longer examples are generated.

**Rules.** The rules are defined in calculi.py. Two sets are avaiable: Intuitionistic propositional logic (set below via "calculus = ipl") and classical propositional logic (set below via "calculus = cpl").

**Dataset entries.**
- **x_train.** Training input: [INDEX, PREMISES, DERIVATION SYMBOL, CONCLUSION]
- **y_train_ordered.** Dataset of correct derivations where each sublist i correspnds to INDEX: [DERIVATIONS_0...DERIVATION_N]

**Encoding.** Propositional variables and logical constants are encoded as integers. The integers are then one-hot-encoded into unique sequences containing only 0s and ones with the length of the maximum integer value, the feature length. The shape of the individual entries is 2D: [SEQUENCE LENGTH, FEATURE LENGTH].

**Example entries withouth numerical representation and one-hot-encoding.**
- **x_train.** [2345, A, A THEN B, DERIVES, B OR C]
- **y_train_ordered.** Sublist 2345 is entry entry: [[A, A THEN B, B, B OR C], [A, A THEN B, B, A AND B, B OR C]]


In [3]:
# Create dataset
x_train_2d, x_train_3d, y_train_ordered, max_y_train_len = generation.create_dataset(iterations = [1,2], calculus = calculi.ipl)

Processed at iteration 1:   0%|          | 0/200 [00:00<?, ?it/s]

Processed at iteration 2:   0%|          | 0/1119 [00:00<?, ?it/s]

Processed premises for sample conclusions at iteration 2:   0%|          | 0/1341 [00:00<?, ?it/s]

Checked derivations for sample conclusions:   0%|          | 0/15899 [00:00<?, ?it/s]

Padded x_train entries:   0%|          | 0/8898 [00:00<?, ?it/s]

  0%|          | 0/8898 [00:00<?, ?it/s]

Processed entries for x_train and y_tdict:   0%|          | 0/8898 [00:00<?, ?it/s]

Padded y_train_ordered:   0%|          | 0/3968 [00:00<?, ?it/s]

Padded x_train entries:   0%|          | 0/3968 [00:00<?, ?it/s]

LENINPT: 3968
LENy_t: 3968
Number x_train examples: 3968
Average number ground truth examples/x_train example: 2.2424395161290325


## 3 Prepare dataset and define models for training
Next with pytorch's dataloader the single training entries in x_train are assigned to batches of size "batch size" in mixed order. Then the different models are defined using definitions from "architectures.py". These models are:

- Feedforward network (net)
- Recurrent neural network (RNNNet)
- Long-short-term memory (LSTMNet)
- Transformers (TransformerModel)

In [4]:
# Use when gpu is present to empty its catch and define it as "device" for referencing it
torch.cuda.empty_cache()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [5]:
# Get the datasets' shapes for the model definitions later
two_d_shape = x_train_2d.shape
three_d_shape = x_train_3d.shape
max_y_length = int(max_y_train_len/14)

In [6]:
# Reverse one-hot encoding for encoder-decoder models
x = torch.argmax(x_train_3d, dim=2) 
x[:, 0] = x_train_2d[:, 0]
x_train_nu = x

In [7]:
# Set train-test split to 80-20 [^1]
train_size = int(0.8 * len(x_train_2d)) 
test_size = len(x_train_2d) - train_size 
x_train_2d, x_test_2d = random_split(x_train_2d, [train_size, test_size])
x_train_3d, x_test_3d = random_split(x_train_3d, [train_size, test_size])
x_train_nu, x_test_nu = random_split(x_train_nu, [train_size, test_size])

In [8]:
# Collect and mix the data in [^2]
train_dataloader_2d = DataLoader(dataset = x_train_2d, shuffle = True, batch_size = 50)
test_dataloader_2d = DataLoader(dataset = x_test_2d, shuffle = True, batch_size = 50)
train_dataloader_3d = DataLoader(dataset = x_train_3d, shuffle = True, batch_size = 50)
test_dataloader_3d = DataLoader(dataset = x_test_3d, shuffle = True, batch_size = 50)
train_dataloader_nu = DataLoader(dataset = x_test_nu, shuffle = True, batch_size = 50)
test_dataloader_nu = DataLoader(dataset = x_test_nu, shuffle = True, batch_size = 50)

In [9]:
y_train_ordered.shape

torch.Size([3968, 10, 1008])

In [10]:
# Load ground truth data to GPU
y_train = y_train_ordered.to(device)
y_train_3d = y_train.view(int(len(y_train)), int(len(y_train[0])), int(len(y_train[0][0])/14), 14)

In [45]:
# Define the simple one-hot to one-hot networks [^3]
## FFN (onehot to onehot)
ffn_oh_model = architectures.ffn(input_size = two_d_shape[1]-1, 
                              hidden_size = 10,
                              output_size = max_y_train_len,
                              dropout_rate = 0.1,
                              input_size_in = 756)
## RNN (onehot to onehot)
rnn_oh_model = architectures.SimpleRNN(input_size = three_d_shape[2],
                              hidden_size = 150,
                              output_size = three_d_shape[2])
## LSTM (onehot to onehot)
lst_oh_model = architectures.lst(input_size = three_d_shape[2],
                              hidden_size = 150,
                              output_size = max_y_train_len)

In [46]:
# Define the Encoder-Decoder networks
## FFN
encoder_ffn = architectures.Encoder_FFN(55, 100)
decoder_ffn = architectures.Decoder_FFN((max_y_length*14), 100)
ffn_ed_model = architectures.Seq2Seq(encoder_ffn, decoder_ffn, device)
## RNN
encoder_rnn = architectures.Encoder_RNN(55, 150, 150, 1)
decoder_rnn = architectures.Decoder_RNN(14, 150, 150, 6)
rnn_ed_model = architectures.Seq2Seq(encoder_rnn, decoder_rnn, device)
## LSTM
encoder_lstm = architectures.Encoder_LSTM(55, 150, 150, 1)
decoder_lstm = architectures.Decoder_LSTM(14, 150, 150, 6)
lst_ed_model = architectures.Seq2Seq(encoder_lstm, decoder_lstm, device)
## Transformer
encoder_tra = architectures.TransformerEncoder(52, 150, 5, 150, 1)
decoder_tra = architectures.TransformerDecoder(14, 150, 1, 150, 3)
tra_ed_model = architectures.Seq2SeqTransformer(encoder_tra, decoder_tra, device)



In [13]:
# Define optimizers for models
lr = 0.001
ffn_oh_optimizer = torch.optim.Adam(ffn_oh_model.parameters(),lr=lr)
rnn_oh_optimizer = torch.optim.Adam(rnn_oh_model.parameters(),lr=lr)
lst_oh_optimizer = torch.optim.Adam(lst_oh_model.parameters(),lr=lr)
ffn_ed_optimizer = torch.optim.Adam(ffn_ed_model.parameters(),lr=lr)
rnn_ed_optimizer = torch.optim.Adam(rnn_ed_model.parameters(),lr=lr)
lst_ed_optimizer = torch.optim.Adam(lst_ed_model.parameters(),lr=lr)
tra_ed_optimizer = torch.optim.Adam(tra_ed_model.parameters(),lr=lr)

# 4 Training

In [14]:
criterion = nn.CrossEntropyLoss()

## 4.2 Encoder-Decoder Networks

### 4.2.1 FFN Encoder-Decoder

In [47]:
# Load model to GPU
ffn_ed_model.to(device)

Seq2Seq(
  (encoder): Encoder_FFN(
    (embedding): Embedding(55, 100)
    (fc_hidden): Linear(in_features=100, out_features=100, bias=True)
  )
  (decoder): Decoder_FFN(
    (fc_hidden): Linear(in_features=100, out_features=100, bias=True)
    (fc_out): Linear(in_features=100, out_features=1008, bias=True)
  )
)

In [52]:
# Train model and save results
schedule.train_model(ffn_ed_model, train_dataloader_nu, test_dataloader_nu, ffn_ed_optimizer, criterion, 50, device, max_y_length, y_train)
torch.save(ffn_ed_model.state_dict(), 'addition_model.pth')

torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([44, 1008])
Ep. 01, CRL-Train: 499.3252 | CRL-Test: 499.3212 | ACC-Train: 0.3944 | ACC-Tesh:  0.3872
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size

KeyboardInterrupt: 

In [53]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(ffn_ed_model, test_dataloader_nu, device, max_y_length)

INPUT: ((s∨p)(q∨(s∧t)))⊢((s∨p)∧(q∨(s∧t)))
OUTPUT: )r∨⊥¬t⊥q(r¬p⊢∧pq¬∧rrsq∧⊥→psrr)∧¬∨r)t)s)tqs⊢rt∨¬∧⊥(⊢⊢)tt¬p)rt→⊢→r¬
INPUT: ((p∨(q∨s))q)⊢(q∨q)
OUTPUT: )∧r∨¬¬t⊥qt⊥sp)⊢∧pp∧∧rp⊢tt⊥→prr)t¬p∨∧r)tt)∧tqs⊢)t⊥¬∧⊥(⊢)rt¬→)rt→⊢→r¬
INPUT: (rr)⊢(r∧r)
OUTPUT: t∧r∨⊥¬t¬st⊥sp)⊢∧ppt∧rp⊢ts∨⊥→prr)t⊥p∨∧r)(t)∧(qs⊢)t⊥¬∧⊥(⊢)rt¬→)rt→⊢→r¬
INPUT: (((s→q)∧r)t)⊢(((s→q)∧r)→r)
OUTPUT: )∧r∨⊥¬t⊥qt⊥s¬)⊢∧ppt∧rrstt⊥→prr(∧¬p∨∧r)t∨))(qs(tt⊥¬⊢⊥(⊢⊢)rt¬p)rt→⊢→r¬
INPUT: ((r∧q)(q→p))⊢(q∨(r∧q))
OUTPUT: )∧r∨⊥¬t⊥qt⊥sp)⊢∧ppt∧rp⊢tt⊥→prr)t¬p∨∧r)t∨))(qs⊢)t⊥¬∧⊥t⊢)rt¬→)rt→⊢→r¬
INPUT: (((t→p)∨s)(p∧t))⊢t
OUTPUT: )∧r∨⊥¬t¬qt⊥s¬)⊢∧ppp∧rr⊢ts∨⊥→prr)t¬p∨∧r)(∨)∧(qr()t⊥¬∧⊥(⊢⊢)rt¬→)rt→⊢→r¬
INPUT: (q(t→q))⊢(s∨q)
OUTPUT: q∧r∨¬¬t¬qt⊥sp)⊢∧pp∧∧rp⊢tst⊥→prr)t¬p∨∧r)(t)∧(qs⊢)t⊥¬∧⊥(⊢)rt¬→)rt→⊢→r¬
INPUT: (((q→t)∧t)((p∧p)∨p))⊢t
OUTPUT: )∧r∨⊥¬t⊥qt⊥s¬)⊢∧ppp∧rrstt⊥→prr)t¬p∨∧r)(∨))tqs()t⊥¬∧⊥(⊢⊢)rt¬p)rt→⊢→r¬
INPUT: ((¬r)t)⊢(t∧((¬r)→t))
OUTPUT: )∧r∨⊥¬tsqt⊥s¬)⊢∧ppt∧rr⊢t∨⊥prr)t¬p∨∧r)(∨)∧(qr(tt⊥¬∧⊥(⊢⊢)rt¬→)rt→⊢→r¬
INPUT: ((t→s)q)⊢(r∨q)
OUTPUT: q∧r∨⊥¬t¬qt⊥sp)⊢∧ppt∧rp⊢tst

In [18]:
# Delete model from GPU to make space for new models
del ffn_ed_model
torch.cuda.empty_cache()

### 4.2.2 RNN Encoder-Decoder

In [19]:
# Load model to GPU
rnn_ed_model.to(device)

Seq2Seq(
  (encoder): Encoder_RNN(
    (embedding): Embedding(55, 150)
    (rnn): RNN(150, 150, batch_first=True)
  )
  (decoder): Decoder_RNN(
    (rnn): RNN(150, 150, num_layers=6, batch_first=True)
    (fc_out): Linear(in_features=150, out_features=14, bias=True)
  )
)

In [20]:
# Training Loop
schedule.train_model(rnn_ed_model, train_dataloader_nu, test_dataloader_nu, rnn_ed_optimizer, criterion, 200, device, max_y_length, y_train_3d)
torch.save(rnn_ed_model.state_dict(), 'addition_model.pth')

torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([44, 72, 14])
Ep. 01, CRL-Train: 1.4301 | CRL-Test: 1.1783 | ACC-Train: 1.6755 | ACC-Tesh:  2.3734
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 

KeyboardInterrupt: 

In [51]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(rnn_ed_model, test_dataloader_nu, device, max_y_length)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

In [22]:
# Delete model from GPU to make space for new models
del rnn_ed_model
torch.cuda.empty_cache()

### 4.2.3 LSTM Encoder-Decoder

In [23]:
# Load model to GPU
lst_ed_model.to(device)

Seq2Seq(
  (encoder): Encoder_LSTM(
    (embedding): Embedding(55, 150)
    (rnn): LSTM(150, 150, batch_first=True)
  )
  (decoder): Decoder_LSTM(
    (rnn): LSTM(150, 150, num_layers=6, batch_first=True)
    (fc_out): Linear(in_features=150, out_features=14, bias=True)
  )
)

In [24]:
# Training Loop
schedule.train_model(lst_ed_model, train_dataloader_nu, test_dataloader_nu, lst_ed_optimizer, criterion, 200, device, max_y_length, y_train_3d)
torch.save(lst_ed_model.state_dict(), 'addition_model.pth')

torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([44, 72, 14])
Ep. 01, CRL-Train: 1.7287 | CRL-Test: 1.1111 | ACC-Train: 1.6008 | ACC-Tesh:  3.9120
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])


KeyboardInterrupt: 

In [26]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(lst_ed_model, test_dataloader_nu, device, max_y_length)

INPUT: (((r∨q)∨q)q)⊢(q∨q)
OUTPUT: 
INPUT: (s(q→(p∨s)))⊢((q∨s)∨p)
OUTPUT: 
INPUT: ((s→(s→p))(¬p))⊢((¬p)→((¬p)∨t))
OUTPUT: 
INPUT: (r(r→p))⊢((r→p)∨r)
OUTPUT: 
INPUT: ((s→q)((t∨t)∨s))⊢((s→q)∨t)
OUTPUT: 
INPUT: (p(s∧t))⊢(p∧t)
OUTPUT: 
INPUT: ((s∨(r∨s))p)⊢((p∨p)∨q)
OUTPUT: 
INPUT: ((r→q)q)⊢(q∨s)
OUTPUT: 
INPUT: (sq)⊢(q→(r∨q))
OUTPUT: 
INPUT: ((q∧t)p)⊢((t∨(q∧t))∨q)
OUTPUT: 
INPUT: (rq)⊢(t∨q)
OUTPUT: 
INPUT: (q(p→(¬s)))⊢(q∧(r∨q))
OUTPUT: 
INPUT: ((s∨p)t)⊢((s∨p)∨r)
OUTPUT: 
INPUT: (st)⊢(s∨(s∨r))
OUTPUT: 
INPUT: ((r∨(s∧q))p)⊢((r∨(s∧q))∧p)
OUTPUT: 
INPUT: (r(r→p))⊢(r∧(r→(r→p)))
OUTPUT: 
INPUT: (p(t∧s))⊢((t∧s)∨q)
OUTPUT: 
INPUT: (q((¬t)∨q))⊢(q∧((¬t)∨q))
OUTPUT: 
INPUT: (p(s∧t))⊢(p∨r)
OUTPUT: 
INPUT: (s(r→t))⊢((r→t)∨r)
OUTPUT: 
INPUT: (s(s∧(s→p)))⊢(s∨q)
OUTPUT: 
INPUT: (tr)⊢(t∨p)
OUTPUT: 
INPUT: ((q∨s)(p∧p))⊢(s∨(q∨s))
OUTPUT: 
INPUT: ((r∧s)(s∧q))⊢((s∧q)→s)
OUTPUT: 
INPUT: (q(¬q))⊢((¬q)∨r)
OUTPUT: 
INPUT: ((s→(s→p))(¬p))⊢(q∨(¬p))
OUTPUT: 
INPUT: (r((p→r)→p))⊢(r∨p)
OUTPUT: 
INPUT: (pq)⊢(q∨q)
OUTPUT:

In [27]:
# Delete model from GPU to make space for new models
del lst_ed_model
torch.cuda.empty_cache()

### 4.2.4 Transformer

In [44]:
# Load model to GPU
tra_ed_model.to(device)

NameError: name 'tra_ed_model' is not defined

In [29]:
schedule.train_model(tra_ed_model, train_dataloader_nu, test_dataloader_nu, tra_ed_optimizer, criterion, 200, device, max_y_length, y_train_3d)
torch.save(tra_ed_model.state_dict(), 'addition_model.pth')

torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([44, 72, 14])
Ep. 01, CRL-Train: 1.3075 | CRL-Test: 1.1475 | ACC-Train: 0.3569 | ACC-Tesh:  0.1034
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 14])
torch.Size([72, 14])
torch.Size([50, 72, 

KeyboardInterrupt: 

In [43]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(tra_ed_model, test_dataloader_nu, device, max_y_length)

NameError: name 'tra_ed_model' is not defined

In [31]:
# Delete model from GPU to make space for new models
del tra_ed_model
torch.cuda.empty_cache()

## 4.1 Onehot-to-Onehot
Each subsequent cell trains one of the four models and calculates their mean squared error loss for the nearest correct derivation from the dataset to the derivation provided by the model. The logic for this is impolemented in the custom loss function "mse_min_dist" in losses.py.

### 4.1.1 FFN

In [54]:
# Load the feedforward model to the gpu 
ffn_oh_model.to(device)

ffn(
  (l1): Linear(in_features=756, out_features=10, bias=True)
  (l2): Linear(in_features=756, out_features=10, bias=True)
  (l3): Linear(in_features=756, out_features=10, bias=True)
  (l4): Linear(in_features=756, out_features=10, bias=True)
  (l5): Linear(in_features=756, out_features=10, bias=True)
  (l6): Linear(in_features=756, out_features=10, bias=True)
  (l7): Linear(in_features=756, out_features=10, bias=True)
  (l8): Linear(in_features=756, out_features=10, bias=True)
  (l9): Linear(in_features=756, out_features=10, bias=True)
  (relu): ReLU()
  (l10): Linear(in_features=10, out_features=1008, bias=True)
  (dropout1): Dropout(p=0.1, inplace=False)
  (dropout2): Dropout(p=0.1, inplace=False)
)

In [55]:
schedule.train_model(ffn_oh_model, train_dataloader_2d, test_dataloader_2d, ffn_oh_optimizer, criterion, 200, device, max_y_length, y_train)
torch.save(ffn_oh_model.state_dict(), 'addition_model.pth')

torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50, 1008])
torch.Size([1008])
torch.Size([50,

KeyboardInterrupt: 

In [56]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(ffn_oh_model, test_dataloader_nu, device, max_y_length)

RuntimeError: mat1 and mat2 must have the same dtype, but got Long and Float

In [None]:
# Delete model from GPU to make space for new models
del ffh_oh_model
torch.cuda.empty_cache()

### 4.1.2 Recurrent Neural Network

In [162]:
# Load the feedforward model to the gpu 
rnn_oh_model.to(device)

SimpleRNN(
  (rnn): RNN(14, 150, batch_first=True)
  (fc): Linear(in_features=150, out_features=14, bias=True)
)

In [50]:
importlib.reload(schedule)

<module 'schedule' from '/home/str/master-s-thesis/schedule.py'>

In [181]:
schedule.train_model(rnn_oh_model, train_dataloader_3d, test_dataloader_3d, ffn_oh_optimizer, criterion, 200, device, max_y_length, y_train_3d)
torch.save(rnn_oh_model.state_dict(), 'addition_model.pth')

cuda:0
torch.Size([54, 14])
torch.Size([50, 72, 14])


RuntimeError: The size of tensor a (54) must match the size of tensor b (72) at non-singleton dimension 1

In [None]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(tra_ed_model, test_dataloader_nu, device, max_y_length)

## 5 Plot results
Here all results from above are plotted.

In [None]:
plt.figure(figsize=(8, 8))
x_data = list(range(100))
y_data_ffn = ffn_costval_train
y_data_rnn = rnn_costval_train
y_data_lst = lst_costval_train
y_data_tra = tra_costval_train
plt.plot(x_data, y_data_ffn, label='FFN')
plt.plot(x_data, y_data_rnn, label='RNN')
plt.plot(x_data, y_data_lst, label='LSTM')
plt.plot(x_data, y_data_tra, label='Transformers')
plt.xlabel('Epochs')
plt.ylabel('Training MSE')
plt.legend()
plt.show()

In [None]:
plt.figure(figsize=(8, 8))
x_data = list(range(100))
y_data_ffn = ffn_costval_test
y_data_rnn = rnn_costval_test
y_data_lst = lst_costval_test
y_data_tra = tra_costval_test
plt.plot(x_data, y_data_ffn, label='FFN')
plt.plot(x_data, y_data_rnn, label='RNN')
plt.plot(x_data, y_data_lst, label='LSTM')
plt.plot(x_data, y_data_tra, label='Transformers')
plt.xlabel('Epochs')
plt.ylabel('Test MSE')
plt.legend()
plt.show()