## 1 Overview
An enviroment to train and evaluate neural networks on learning logical consequence.

In [1]:
# For Google Collab: Get repository and go to it in collab.
!git clone -b new-cleaned-branch https://github.com/stereifberger/master-s-thesis
%cd master-s-thesis/

Cloning into 'master-s-thesis'...
remote: Enumerating objects: 402, done.[K
remote: Total 402 (delta 0), reused 0 (delta 0), pack-reused 402[K
Receiving objects: 100% (402/402), 17.10 MiB | 18.60 MiB/s, done.
Resolving deltas: 100% (246/246), done.
/content/master-s-thesis


In [46]:
# prompt: Push the content of master-s-thesis to its original github repository under the new-cleaned-branch

!git add -A
!git commit -m "Pushing changes to new-cleaned-branch"
!git push origin new-cleaned-branch


On branch new-cleaned-branch
Your branch is ahead of 'origin/new-cleaned-branch' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean
fatal: could not read Username for 'https://github.com': No such device or address


In [None]:
# Google Colab if above does not move to right directory
%cd /content/master-s-thesis/

/content/master-s-thesis


In [None]:
# For VsCode after starting Jupyter server: go to right directory.
%cd master-s-thesis/

/home/str/master-s-thesis


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [None]:
# Install required dependencies - not necessary on google colab
!pip install -r requirements.txt

In [2]:
# Import required libraries
from imports import *

In [14]:
# For reloading libraries.
importlib.reload(schedule)

<module 'schedule' from '/content/master-s-thesis/schedule.py'>

## 2 Create dataset
First the dataset for training is generated. For this the function "create_dataset" from "generation.py" utilizes the functions "gen_outp_PA" to generate a set of random starting formulas, for which iterativly the applicability of rules is checked. All applicable rules are then used to generate new derivations. In each iteration of gen_oupt_PA, set by the iterations variable, new, longer examples are generated.

**Rules.** The rules are defined in calculi.py. Two sets are avaiable: Intuitionistic propositional logic (set below via "calculus = ipl") and classical propositional logic (set below via "calculus = cpl").

**Dataset entries.**
- **x_train.** Training input: [INDEX, PREMISES, DERIVATION SYMBOL, CONCLUSION]
- **y_train_ordered.** Dataset of correct derivations where each sublist i correspnds to INDEX: [DERIVATIONS_0...DERIVATION_N]

**Encoding.** Propositional variables and logical constants are encoded as integers. The integers are then one-hot-encoded into unique sequences containing only 0s and ones with the length of the maximum integer value, the feature length. The shape of the individual entries is 2D: [SEQUENCE LENGTH, FEATURE LENGTH].

**Example entries withouth numerical representation and one-hot-encoding.**
- **x_train.** [2345, A, A THEN B, DERIVES, B OR C]
- **y_train_ordered.** Sublist 2345 is entry entry: [[A, A THEN B, B, B OR C], [A, A THEN B, B, A AND B, B OR C]]


In [3]:
import contextlib
import io

In [None]:
# Create dataset
x_train_2d, x_train_3d, y_train_ordered, max_y_train_len = generation.create_dataset(iterations = [1,2], calculus = calculi.cl)

Processed at iteration 1:   0%|          | 0/2000 [00:00<?, ?it/s]

Processed at iteration 2:   0%|          | 0/11500 [00:00<?, ?it/s]

Processed premises for sample conclusions at iteration 2:   0%|          | 0/13663 [00:00<?, ?it/s]

Checked derivations for sample conclusions:   0%|          | 0/165044 [00:00<?, ?it/s]

Padded x_train entries:   0%|          | 0/107204 [00:00<?, ?it/s]

  0%|          | 0/107204 [00:00<?, ?it/s]

Processed entries for x_train and y_tdict:   0%|          | 0/107204 [00:00<?, ?it/s]

Padded y_train_ordered:   0%|          | 0/52260 [00:00<?, ?it/s]

Padded x_train entries:   0%|          | 0/52260 [00:00<?, ?it/s]

LENINPT: 52260
LENy_t: 52260
Number x_train examples: 52260
Average number ground truth examples/x_train example: 2.0513585916570993


In [None]:
import json
torch.save(x_train_2d, 'x_train_2d.pt')
torch.save(x_train_3d, 'x_train_3d.pt')
torch.save(y_train_ordered, 'y_train_ordered.pt')
with open('Medium_max_y_train_len.json', 'w') as file:
    json.dump(max_y_train_len, file)

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
# prompt: Import files from drive folder
!cp /content/drive/MyDrive/university/masters_thesis/datasets/2000-1-2/x_train_2d.pt /content
!cp /content/drive/MyDrive/university/masters_thesis/datasets/2000-1-2/x_train_3d.pt /content
!cp /content/drive/MyDrive/university/masters_thesis/datasets/2000-1-2/y_train_ordered.pt /content
!cp /content/drive/MyDrive/university/masters_thesis/datasets/2000-1-2/Medium_max_y_train_len.json /content


In [5]:
# prompt: Define variables x_train_2d, x_train_3d and y_train_ordered on the respective files.

with open('/content/x_train_2d.pt', 'rb') as f:
    x_train_2d = torch.load(f).dataset

with open('/content/x_train_3d.pt', 'rb') as f:
    x_train_3d = torch.load(f).dataset


with open('/content/y_train_ordered.pt', 'rb') as f:
    y_train_ordered = torch.load(f)



In [6]:
# prompt: Define variable max_y_train_len on Medium_max_y_train_len.json

import json

with open('/content/Medium_max_y_train_len.json', 'r') as file:
    max_y_train_len = json.load(file)


In [12]:
max_y_train_len

1008

## 3 Prepare dataset and define models for training
Next with pytorch's dataloader the single training entries in x_train are assigned to batches of size "batch size" in mixed order. Then the different models are defined using definitions from "architectures.py". These models are:

- Feedforward network (net)
- Recurrent neural network (RNNNet)
- Long-short-term memory (LSTMNet)
- Transformers (TransformerModel)

In [7]:
# Use when gpu is present to empty its catch and define it as "device" for referencing it
torch.cuda.empty_cache()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [8]:
# Get the datasets' shapes for the model definitions later
two_d_shape = x_train_2d.shape
three_d_shape = x_train_3d.shape
max_y_length = int(max_y_train_len/14)

In [9]:
# Reverse one-hot encoding for encoder-decoder models
x = torch.argmax(x_train_3d, dim=2)
x[:, 0] = x_train_2d[:, 0]
x_train_nu = x

In [10]:
# Set train-test split to 80-20 [^1]
train_size = int(0.8 * len(x_train_2d))
test_size = len(x_train_2d) - train_size
x_train_2d, x_test_2d = random_split(x_train_2d, [train_size, test_size])
x_train_3d, x_test_3d = random_split(x_train_3d, [train_size, test_size])
x_train_nu, x_test_nu = random_split(x_train_nu, [train_size, test_size])

In [11]:
# Collect and mix the data in [^2]
train_dataloader_2d = DataLoader(dataset = x_train_2d, shuffle = True, batch_size = 16)
test_dataloader_2d = DataLoader(dataset = x_test_2d, shuffle = True, batch_size = 16)
train_dataloader_3d = DataLoader(dataset = x_train_3d, shuffle = True, batch_size = 16)
test_dataloader_3d = DataLoader(dataset = x_test_3d, shuffle = True, batch_size = 16)
train_dataloader_nu = DataLoader(dataset = x_train_nu, shuffle = True, batch_size = 64)
test_dataloader_nu = DataLoader(dataset = x_test_nu, shuffle = True, batch_size = 64)

In [12]:
# Load ground truth data to GPU
y_train = y_train_ordered.to(device)
y_train_3d = y_train.view(int(len(y_train)), int(len(y_train[0])), int(len(y_train[0][0])/14), 14)

In [15]:
# Define the Encoder-Decoder networks
## FFN | Inputs: input_dim, hidden dim
encoder_ffn = architectures.Encoder_FFN(three_d_shape[1], 20)
decoder_ffn = architectures.Decoder_FFN((max_y_length*14), 20)
ffn_ed_model = architectures.Seq2Seq(encoder_ffn, decoder_ffn, device)
## RNN | Inputs: input_dim, embedding dim, hidden dim, nr layers
#encoder_rnn = architectures.Encoder_RNN(three_d_shape[1], 128, 150, 1)
#decoder_rnn = architectures.Decoder_RNN(14, 128, 150, 3)
#rnn_ed_model = architectures.Seq2Seq(encoder_rnn, decoder_rnn, device)
## LSTM | Inputs: input_dim, embedding dim, hidden dim, nr layers, droput
encoder_lstm = architectures.Encoder_LSTM(three_d_shape[1], 128, 128, 1, 0)
decoder_lstm = architectures.Decoder_LSTM(14, 128, 128, 3, 0)
lst_ed_model = architectures.Seq2Seq(encoder_lstm, decoder_lstm, device)
## Transformer-Encoder | Inputs:  input_dim, emb_dim, num_heads, hidden_dim, num_layers, dropout
encoder_tra = architectures.TransformerEncoder(three_d_shape[1], 128, 4, 128, 1, dropout=0)
# Transformer-Decoder | Inputs: output_dim, emb_dim, num_heads, hidden_dim, num_layers
decoder_tra = architectures.TransformerDecoder(14, 128, 4, 128, 3)
tra_ed_model = architectures.Seq2SeqTransformer(encoder_tra, decoder_tra, device)

# Define optimizers for models
lr = 0.001
ffn_ed_optimizer = torch.optim.AdamW(ffn_ed_model.parameters(),lr=lr)
#rnn_ed_optimizer = torch.optim.AdamW(rnn_ed_model.parameters(),lr=lr)
lst_ed_optimizer = torch.optim.AdamW(lst_ed_model.parameters(),lr=lr)
tra_ed_optimizer = torch.optim.AdamW(tra_ed_model.parameters(),lr=lr)



# 4 Training

In [18]:
import csv

In [19]:
import contextlib
import io

In [20]:
criterion = nn.CrossEntropyLoss()

## 4.1 FFN Encoder-Decoder

In [None]:
# Load model to GPU
ffn_ed_model.to(device)

Seq2Seq(
  (encoder): Encoder_FFN(
    (embedding): Embedding(55, 20)
    (fc_hidden): Linear(in_features=20, out_features=20, bias=True)
  )
  (decoder): Decoder_FFN(
    (fc_hidden): Linear(in_features=20, out_features=20, bias=True)
    (fc_out): Linear(in_features=20, out_features=1008, bias=True)
  )
)

In [None]:
x_train_nu_2 = [x_train_nu[i][1:].tolist() for i in range(len(x_train_nu))]

In [None]:
# Train model and save results
FFN_CELtrain, FFN_CELtest, FFN_ACCtrain, FFN_ACCtest = schedule.train_model(ffn_ed_model, train_dataloader_nu, test_dataloader_nu, ffn_ed_optimizer, criterion, 20, device, max_y_length, y_train)
torch.save(ffn_ed_model.state_dict(), 'ffn_2l.pth')

Ep. 01, CEL-Train: 377.1100| CEL-Test: 367.4432 | ACC-Train: 0.0000 | ACC-Test:  0.0000
Ep. 02, CEL-Train: 364.9277| CEL-Test: 364.2587 | ACC-Train: 0.0000 | ACC-Test:  0.0500
Ep. 03, CEL-Train: 363.9621| CEL-Test: 364.0389 | ACC-Train: 0.0000 | ACC-Test:  0.0000
Ep. 04, CEL-Train: 363.8176| CEL-Test: 363.8662 | ACC-Train: 0.0000 | ACC-Test:  0.0000
Ep. 05, CEL-Train: 363.5521| CEL-Test: 363.4199 | ACC-Train: 0.0625 | ACC-Test:  0.0000
Ep. 06, CEL-Train: 362.7582| CEL-Test: 362.4800 | ACC-Train: 0.0000 | ACC-Test:  0.0000
Ep. 07, CEL-Train: 361.8980| CEL-Test: 361.7068 | ACC-Train: 0.0625 | ACC-Test:  0.0000
Ep. 08, CEL-Train: 361.3448| CEL-Test: 361.3458 | ACC-Train: 0.0000 | ACC-Test:  0.0000
Ep. 09, CEL-Train: 360.8822| CEL-Test: 360.8793 | ACC-Train: 0.0625 | ACC-Test:  0.1500
Ep. 10, CEL-Train: 360.5527| CEL-Test: 360.5707 | ACC-Train: 0.0000 | ACC-Test:  0.1000
Ep. 11, CEL-Train: 360.2291| CEL-Test: 360.1036 | ACC-Train: 0.0000 | ACC-Test:  0.0500
Ep. 12, CEL-Train: 359.7205| CEL

In [None]:
# prompt: convert FFN_CELtrain to a list

FFN_CELtrain_list = list(FFN_CELtrain)


[377.1099521275325,
 364.9277030641515,
 363.9621202661357,
 363.81762331341383,
 363.552097845515,
 362.75820409558963,
 361.8980101605803,
 361.3447757452635,
 360.8821592185111,
 360.5526848655957,
 360.22905574880247,
 359.72046948354176,
 359.3023719904255,
 358.83163209501026,
 358.4711538425644,
 358.2835426447224,
 358.2102163238992,
 358.1684285201793,
 358.10023162561816,
 357.98913961521345]

In [25]:
def csv_saver(list_a, list_b, list_a_name, list_b_name, name):
  # Generate the "Iterations" list
  Iterations = list(range(1, len(list_a) + 1))

  # Writing to csv file
  with open(name, mode='w', newline='') as file:
      writer = csv.writer(file)

      # Writing the header
      writer.writerow(["Iterations", list_a_name, list_b_name])

      # Writing the data rows
      for i in range(len(list_a)):
          writer.writerow([Iterations[i], list_a[i], list_b[i]])

  print(f'Data successfully written to {name}')

In [None]:
input, output = schedule.sanity(ffn_ed_model, test_dataloader_nu, device, max_y_length)

In [None]:
csv_saver(FFN_CELtrain, FFN_CELtest, "CELtrain", "CELtest", "FFN_small_CEL.csv")

Data successfully written to FFN_small_CEL.csv


In [None]:
csv_saver(FFN_ACCtrain, FFN_ACCtest, "ACCtrain", "ACCtest", "FFN_small_ACC.csv")

Data successfully written to FFN_small_ACC.csv


In [None]:
csv_saver(input, output, "input", "output", "FFN_small_sanity.csv")

Data successfully written to FFN_small_sanity.csv


In [None]:
# Delete model from GPU to make space for new models
del ffn_ed_model
torch.cuda.empty_cache()

## 4.2 RNN Encoder-Decoder

In [None]:
# Load model to GPU
rnn_ed_model.to(device)

Seq2Seq(
  (encoder): Encoder_RNN(
    (embedding): Embedding(55, 150)
    (rnn): RNN(150, 150, batch_first=True)
  )
  (decoder): Decoder_RNN(
    (rnn): RNN(150, 150, num_layers=3, batch_first=True)
    (fc_out): Linear(in_features=150, out_features=14, bias=True)
  )
)

In [None]:
# Training Loop
RNN_CELtrain, RNN_CELtest, RNN_ACCtrain, RNN_ACCtest = schedule.train_model(rnn_ed_model, train_dataloader_nu, test_dataloader_nu, rnn_ed_optimizer, criterion, 50, device, max_y_length, y_train_3d)
torch.save(rnn_ed_model.state_dict(), 'addition_model.pth')

Ep. 01, CEL-Train: 1.2988| CEL-Test: 1.2971 | ACC-Train: 0.6876 | ACC-Test:  0.7120
Ep. 02, CEL-Train: 1.2661| CEL-Test: 1.2422 | ACC-Train: 0.7020 | ACC-Test:  0.7069
Ep. 03, CEL-Train: 1.2508| CEL-Test: 1.2511 | ACC-Train: 0.6745 | ACC-Test:  0.7014
Ep. 04, CEL-Train: 1.2465| CEL-Test: 1.2575 | ACC-Train: 0.7232 | ACC-Test:  0.6824
Ep. 05, CEL-Train: 1.2570| CEL-Test: 1.2459 | ACC-Train: 0.7048 | ACC-Test:  0.6991
Ep. 06, CEL-Train: 1.2499| CEL-Test: 1.2334 | ACC-Train: 0.6972 | ACC-Test:  0.7000
Ep. 07, CEL-Train: 1.2199| CEL-Test: 1.2507 | ACC-Train: 0.7215 | ACC-Test:  0.6824
Ep. 08, CEL-Train: 1.2461| CEL-Test: 1.2509 | ACC-Train: 0.6793 | ACC-Test:  0.6843
Ep. 09, CEL-Train: 1.2308| CEL-Test: 1.2429 | ACC-Train: 0.6828 | ACC-Test:  0.6866
Ep. 10, CEL-Train: 1.2283| CEL-Test: 1.2242 | ACC-Train: 0.6848 | ACC-Test:  0.7046
Ep. 11, CEL-Train: 1.0848| CEL-Test: 0.8920 | ACC-Train: 0.7114 | ACC-Test:  0.7014
Ep. 12, CEL-Train: 0.8823| CEL-Test: 0.8821 | ACC-Train: 0.7159 | ACC-Test: 

In [None]:
# A sanity test for wheter the outputs look appropriate
schedule.sanity(rnn_ed_model, test_dataloader_nu, device, max_y_length)

INPUT: ((q∧s)r)⊢(r∨q)
OUTPUT: (
INPUT: (p(p∨r))⊢(q∨p)
OUTPUT: (
INPUT: ((q∧q)(¬s))⊢(q∨(q∧q))
OUTPUT: (
INPUT: ((s→(r∧q))(r∧q))⊢(r∨q)
OUTPUT: (
INPUT: (t((r∨s)∨r))⊢t
OUTPUT: (
INPUT: (r(s→(q∧p)))⊢(r→(r∨r))
OUTPUT: (
INPUT: (((q→t)∨q)q)⊢(((q→t)∨q)∧q)
OUTPUT: (
INPUT: ((r∨p)((p∨q)→t))⊢((r∨p)∨r)
OUTPUT: (
INPUT: (((t∧t)→q)p)⊢(p∨p)
OUTPUT: (
INPUT: (r(¬s))⊢(t∨((¬s)∨q))
OUTPUT: (
INPUT: (r(t→s))⊢(r∨p)
OUTPUT: (
INPUT: (r((q→q)∧r))⊢(r→(r∨r))
OUTPUT: (
INPUT: ((p∧q)(t∧p))⊢((p∧q)∨q)
OUTPUT: (
INPUT: ((q∧(¬r))(¬r))⊢((q∧(¬r))∨q)
OUTPUT: (
INPUT: ((t∧r)(¬s))⊢((¬s)∨r)
OUTPUT: (
INPUT: (p(p∨p))⊢(p∧(p∨s))
OUTPUT: (
INPUT: (s(t∧(¬p)))⊢(s∧(s∨q))
OUTPUT: (
INPUT: (((q∧q)→r)t)⊢(p∨t)
OUTPUT: (
INPUT: ((q→r)(¬s))⊢((q∨(¬s))∨t)
OUTPUT: (
INPUT: ((s∧p)(s∧(r→t)))⊢((s∧p)∧((s∧p)∨r))
OUTPUT: (
INPUT: (((t→s)∨q)(r∧p))⊢(((t→s)∨q)∨q)
OUTPUT: (
INPUT: ((t∧s)s)⊢(s→(s∨s))
OUTPUT: (
INPUT: (r(t→s))⊢((t→s)∨p)
OUTPUT: (
INPUT: (s(r∧t))⊢t
OUTPUT: (
INPUT: (t(p∧r))⊢(r∨r)
OUTPUT: (
INPUT: (r((q∨t)→p))⊢(t∨(t∨r))
OUTPUT: (
INP

In [None]:
# Delete model from GPU to make space for new models
del rnn_ed_model
torch.cuda.empty_cache()

## 4.3 LSTM Encoder-Decoder

In [21]:
# Load model to GPU
lst_ed_model.to(device)

Seq2Seq(
  (encoder): Encoder_LSTM(
    (embedding): Embedding(55, 128)
    (rnn): LSTM(128, 128, batch_first=True)
  )
  (decoder): Decoder_LSTM(
    (rnn): LSTM(128, 128, num_layers=3, batch_first=True)
    (fc_out): Linear(in_features=128, out_features=14, bias=True)
  )
)

In [22]:
# Training Loop
LSTM_CELtrain, LSTM_CELtest, LSTM_ACCtrain, LSTM_ACCtest = schedule.train_model(lst_ed_model, train_dataloader_nu, test_dataloader_nu, lst_ed_optimizer, criterion, 50, device, max_y_length, y_train_3d)
torch.save(lst_ed_model.state_dict(), 'lstm_3.pth')

Ep. 01, CEL-Train: 0.9392| CEL-Test: 0.8120 | ACC-Train: 0.7583 | ACC-Test:  0.7294
Ep. 02, CEL-Train: 0.7728| CEL-Test: 0.7404 | ACC-Train: 0.7669 | ACC-Test:  0.7783
Ep. 03, CEL-Train: 0.6885| CEL-Test: 0.6365 | ACC-Train: 0.7860 | ACC-Test:  0.8210
Ep. 04, CEL-Train: 0.5952| CEL-Test: 0.5551 | ACC-Train: 0.8284 | ACC-Test:  0.8148
Ep. 05, CEL-Train: 0.5363| CEL-Test: 0.5187 | ACC-Train: 0.7887 | ACC-Test:  0.8261
Ep. 06, CEL-Train: 0.5080| CEL-Test: 0.5023 | ACC-Train: 0.8264 | ACC-Test:  0.8112
Ep. 07, CEL-Train: 0.4835| CEL-Test: 0.5001 | ACC-Train: 0.8082 | ACC-Test:  0.8122
Ep. 08, CEL-Train: 0.4669| CEL-Test: 0.4960 | ACC-Train: 0.8409 | ACC-Test:  0.8230
Ep. 09, CEL-Train: 0.4612| CEL-Test: 0.4582 | ACC-Train: 0.8247 | ACC-Test:  0.8302
Ep. 10, CEL-Train: 0.4355| CEL-Test: 0.4357 | ACC-Train: 0.8185 | ACC-Test:  0.8369
Ep. 11, CEL-Train: 0.4425| CEL-Test: 0.4221 | ACC-Train: 0.8390 | ACC-Test:  0.8277
Ep. 12, CEL-Train: 0.4240| CEL-Test: 0.4157 | ACC-Train: 0.8307 | ACC-Test: 

In [26]:
input, output = schedule.sanity(lst_ed_model, test_dataloader_nu, device, max_y_length)

In [27]:
csv_saver(LSTM_CELtrain, LSTM_CELtest, "CELtrain", "CELtest", "LSTM_3_CEL.csv")

Data successfully written to LSTM_3_CEL.csv


In [28]:
csv_saver(LSTM_ACCtrain, LSTM_ACCtest, "ACCtrain", "ACCtest", "LSTM_3_ACC.csv")

Data successfully written to LSTM_3_ACC.csv


In [29]:
csv_saver(input, output, "input", "output", "LSTM_3_sanity.csv")

Data successfully written to LSTM_3_sanity.csv


In [30]:
# Delete model from GPU to make space for new models
del lst_ed_model
torch.cuda.empty_cache()

## 4.4 Transformer

In [31]:
importlib.reload(schedule)

<module 'schedule' from '/content/master-s-thesis/schedule.py'>

In [32]:
# Load model to GPU
tra_ed_model.to(device)

Seq2SeqTransformer(
  (encoder): TransformerEncoder(
    (embedding): Embedding(55, 128)
    (transformer_encoder): TransformerEncoder(
      (layers): ModuleList(
        (0): TransformerEncoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=128, out_features=128, bias=True)
          )
          (linear1): Linear(in_features=128, out_features=128, bias=True)
          (dropout): Dropout(p=0, inplace=False)
          (linear2): Linear(in_features=128, out_features=128, bias=True)
          (norm1): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0, inplace=False)
          (dropout2): Dropout(p=0, inplace=False)
        )
      )
    )
  )
  (decoder): TransformerDecoder(
    (embedding): Embedding(14, 128)
    (transformer_decoder): TransformerDecoder(
      (layers): ModuleList(
        (0-2): 3 x Transfo

In [33]:
TRA_CELtrain, TRA_CELtest, TRA_ACCtrain, TRA_ACCtest = schedule.train_model(tra_ed_model, train_dataloader_nu, test_dataloader_nu, tra_ed_optimizer, criterion, 50, device, max_y_length, y_train_3d)
torch.save(tra_ed_model.state_dict(), 'tra_3.pth')

Ep. 01, CEL-Train: 0.7354| CEL-Test: 0.4413 | ACC-Train: 0.8317 | ACC-Test:  0.8390
Ep. 02, CEL-Train: 0.3675| CEL-Test: 0.2956 | ACC-Train: 0.8991 | ACC-Test:  0.8951
Ep. 03, CEL-Train: 0.2753| CEL-Test: 0.2308 | ACC-Train: 0.9253 | ACC-Test:  0.9172
Ep. 04, CEL-Train: 0.2375| CEL-Test: 0.2089 | ACC-Train: 0.9028 | ACC-Test:  0.9496
Ep. 05, CEL-Train: 0.2149| CEL-Test: 0.1873 | ACC-Train: 0.9315 | ACC-Test:  0.9408
Ep. 06, CEL-Train: 0.1987| CEL-Test: 0.1779 | ACC-Train: 0.9206 | ACC-Test:  0.9295
Ep. 07, CEL-Train: 0.1884| CEL-Test: 0.1682 | ACC-Train: 0.9190 | ACC-Test:  0.9511
Ep. 08, CEL-Train: 0.1809| CEL-Test: 0.1792 | ACC-Train: 0.9276 | ACC-Test:  0.9465
Ep. 09, CEL-Train: 0.1759| CEL-Test: 0.1604 | ACC-Train: 0.9249 | ACC-Test:  0.9465
Ep. 10, CEL-Train: 0.1707| CEL-Test: 0.1536 | ACC-Train: 0.9292 | ACC-Test:  0.9480
Ep. 11, CEL-Train: 0.1670| CEL-Test: 0.1518 | ACC-Train: 0.9534 | ACC-Test:  0.8791
Ep. 12, CEL-Train: 0.1633| CEL-Test: 0.1501 | ACC-Train: 0.9616 | ACC-Test: 

In [34]:
input, output = schedule.sanity(tra_ed_model, test_dataloader_nu, device, max_y_length)

In [35]:
csv_saver(TRA_CELtrain, TRA_CELtest, "CELtrain", "CELtest", "TRA_3_CEL.csv")

Data successfully written to TRA_3_CEL.csv


In [36]:
csv_saver(TRA_ACCtrain, TRA_ACCtest, "ACCtrain", "ACCtest", "TRA_3_ACC.csv")

Data successfully written to TRA_3_ACC.csv


In [37]:
csv_saver(input, output, "input", "output", "TRA_3_sanity.csv")

Data successfully written to TRA_3_sanity.csv


In [38]:
# Delete model from GPU to make space for new models
del tra_ed_model
torch.cuda.empty_cache()

# LSTM Large

In [39]:
# Define the Encoder-Decoder networks
## FFN | Inputs: input_dim, hidden dim
encoder_ffn = architectures.Encoder_FFN(three_d_shape[1], 20)
decoder_ffn = architectures.Decoder_FFN((max_y_length*14), 20)
ffn_ed_model = architectures.Seq2Seq(encoder_ffn, decoder_ffn, device)
## RNN | Inputs: input_dim, embedding dim, hidden dim, nr layers
#encoder_rnn = architectures.Encoder_RNN(three_d_shape[1], 128, 150, 1)
#decoder_rnn = architectures.Decoder_RNN(14, 128, 150, 3)
#rnn_ed_model = architectures.Seq2Seq(encoder_rnn, decoder_rnn, device)
## LSTM | Inputs: input_dim, embedding dim, hidden dim, nr layers, droput
encoder_lstm = architectures.Encoder_LSTM(three_d_shape[1], 128, 128, 1, 0)
decoder_lstm = architectures.Decoder_LSTM(14, 128, 128, 4, 0)
lst_ed_model = architectures.Seq2Seq(encoder_lstm, decoder_lstm, device)
## Transformer-Encoder | Inputs:  input_dim, emb_dim, num_heads, hidden_dim, num_layers, dropout
encoder_tra = architectures.TransformerEncoder(three_d_shape[1], 128, 4, 128, 1, dropout=0)
# Transformer-Decoder | Inputs: output_dim, emb_dim, num_heads, hidden_dim, num_layers
decoder_tra = architectures.TransformerDecoder(14, 128, 4, 128, 4)
tra_ed_model = architectures.Seq2SeqTransformer(encoder_tra, decoder_tra, device)

# Define optimizers for models
lr = 0.001
ffn_ed_optimizer = torch.optim.AdamW(ffn_ed_model.parameters(),lr=lr)
#rnn_ed_optimizer = torch.optim.AdamW(rnn_ed_model.parameters(),lr=lr)
lst_ed_optimizer = torch.optim.AdamW(lst_ed_model.parameters(),lr=lr)
tra_ed_optimizer = torch.optim.AdamW(tra_ed_model.parameters(),lr=lr)

In [40]:
# Load model to GPU
lst_ed_model.to(device)

Seq2Seq(
  (encoder): Encoder_LSTM(
    (embedding): Embedding(55, 128)
    (rnn): LSTM(128, 128, batch_first=True)
  )
  (decoder): Decoder_LSTM(
    (rnn): LSTM(128, 128, num_layers=4, batch_first=True)
    (fc_out): Linear(in_features=128, out_features=14, bias=True)
  )
)

In [41]:
# Training Loop
LSTM_CELtrain, LSTM_CELtest, LSTM_ACCtrain, LSTM_ACCtest = schedule.train_model(lst_ed_model, train_dataloader_nu, test_dataloader_nu, lst_ed_optimizer, criterion, 50, device, max_y_length, y_train_3d)
torch.save(lst_ed_model.state_dict(), 'lstm_4.pth')

Ep. 01, CEL-Train: 0.9632| CEL-Test: 0.8994 | ACC-Train: 0.7083 | ACC-Test:  0.7248
Ep. 02, CEL-Train: 0.8452| CEL-Test: 0.7815 | ACC-Train: 0.7649 | ACC-Test:  0.7623
Ep. 03, CEL-Train: 0.7578| CEL-Test: 0.7387 | ACC-Train: 0.7520 | ACC-Test:  0.7701
Ep. 04, CEL-Train: 0.7075| CEL-Test: 0.6975 | ACC-Train: 0.7738 | ACC-Test:  0.7438
Ep. 05, CEL-Train: 0.6639| CEL-Test: 0.6821 | ACC-Train: 0.7771 | ACC-Test:  0.7623
Ep. 06, CEL-Train: 0.6274| CEL-Test: 0.6085 | ACC-Train: 0.7986 | ACC-Test:  0.7845
Ep. 07, CEL-Train: 0.6141| CEL-Test: 0.6201 | ACC-Train: 0.7784 | ACC-Test:  0.8102
Ep. 08, CEL-Train: 0.5788| CEL-Test: 0.5522 | ACC-Train: 0.8075 | ACC-Test:  0.8272
Ep. 09, CEL-Train: 0.5660| CEL-Test: 0.5762 | ACC-Train: 0.8102 | ACC-Test:  0.7855
Ep. 10, CEL-Train: 0.5374| CEL-Test: 0.5182 | ACC-Train: 0.8108 | ACC-Test:  0.7978
Ep. 11, CEL-Train: 0.5290| CEL-Test: 0.5043 | ACC-Train: 0.8198 | ACC-Test:  0.8436
Ep. 12, CEL-Train: 0.5006| CEL-Test: 0.4924 | ACC-Train: 0.8208 | ACC-Test: 

In [42]:
input, output = schedule.sanity(lst_ed_model, test_dataloader_nu, device, max_y_length)

In [43]:
csv_saver(LSTM_CELtrain, LSTM_CELtest, "CELtrain", "CELtest", "LSTM_4_CEL.csv")

Data successfully written to LSTM_4_CEL.csv


In [44]:
csv_saver(LSTM_ACCtrain, LSTM_ACCtest, "ACCtrain", "ACCtest", "LSTM_4_ACC.csv")

Data successfully written to LSTM_4_ACC.csv


In [45]:
csv_saver(input, output, "input", "output", "LSTM_4_sanity.csv")

Data successfully written to LSTM_4_sanity.csv


In [46]:
# Delete model from GPU to make space for new models
del lst_ed_model
torch.cuda.empty_cache()

# Transformer Large

In [47]:
# Load model to GPU
tra_ed_model.to(device)

Seq2SeqTransformer(
  (encoder): TransformerEncoder(
    (embedding): Embedding(55, 128)
    (transformer_encoder): TransformerEncoder(
      (layers): ModuleList(
        (0): TransformerEncoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=128, out_features=128, bias=True)
          )
          (linear1): Linear(in_features=128, out_features=128, bias=True)
          (dropout): Dropout(p=0, inplace=False)
          (linear2): Linear(in_features=128, out_features=128, bias=True)
          (norm1): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0, inplace=False)
          (dropout2): Dropout(p=0, inplace=False)
        )
      )
    )
  )
  (decoder): TransformerDecoder(
    (embedding): Embedding(14, 128)
    (transformer_decoder): TransformerDecoder(
      (layers): ModuleList(
        (0-3): 4 x Transfo

In [48]:
TRA_CELtrain, TRA_CELtest, TRA_ACCtrain, TRA_ACCtest = schedule.train_model(tra_ed_model, train_dataloader_nu, test_dataloader_nu, tra_ed_optimizer, criterion, 50, device, max_y_length, y_train_3d)
torch.save(tra_ed_model.state_dict(), 'tra_4.pth')

Ep. 01, CEL-Train: 0.7407| CEL-Test: 0.4290 | ACC-Train: 0.8380 | ACC-Test:  0.8447
Ep. 02, CEL-Train: 0.3494| CEL-Test: 0.2654 | ACC-Train: 0.8981 | ACC-Test:  0.9105
Ep. 03, CEL-Train: 0.2586| CEL-Test: 0.2140 | ACC-Train: 0.9196 | ACC-Test:  0.9352
Ep. 04, CEL-Train: 0.2182| CEL-Test: 0.1870 | ACC-Train: 0.9289 | ACC-Test:  0.9414
Ep. 05, CEL-Train: 0.1984| CEL-Test: 0.1751 | ACC-Train: 0.9263 | ACC-Test:  0.9249
Ep. 06, CEL-Train: 0.1856| CEL-Test: 0.1690 | ACC-Train: 0.9091 | ACC-Test:  0.9285
Ep. 07, CEL-Train: 0.1791| CEL-Test: 0.1616 | ACC-Train: 0.9435 | ACC-Test:  0.9702
Ep. 08, CEL-Train: 0.1730| CEL-Test: 0.1604 | ACC-Train: 0.9127 | ACC-Test:  0.9475
Ep. 09, CEL-Train: 0.1684| CEL-Test: 0.1550 | ACC-Train: 0.9279 | ACC-Test:  0.9285
Ep. 10, CEL-Train: 0.1650| CEL-Test: 0.1511 | ACC-Train: 0.9401 | ACC-Test:  0.9686
Ep. 11, CEL-Train: 0.1610| CEL-Test: 0.1503 | ACC-Train: 0.9325 | ACC-Test:  0.9475
Ep. 12, CEL-Train: 0.1582| CEL-Test: 0.1457 | ACC-Train: 0.9491 | ACC-Test: 

In [49]:
input, output = schedule.sanity(tra_ed_model, test_dataloader_nu, device, max_y_length)

In [50]:
csv_saver(TRA_CELtrain, TRA_CELtest, "CELtrain", "CELtest", "TRA_4_CEL.csv")

Data successfully written to TRA_4_CEL.csv


In [51]:
csv_saver(TRA_ACCtrain, TRA_ACCtest, "ACCtrain", "ACCtest", "TRA_4_ACC.csv")

Data successfully written to TRA_4_ACC.csv


In [52]:
csv_saver(input, output, "input", "output", "TRA_4_sanity.csv")

Data successfully written to TRA_4_sanity.csv


In [53]:
# Delete model from GPU to make space for new models
del tra_ed_model
torch.cuda.empty_cache()

# LSTM XS

In [54]:
# Define the Encoder-Decoder networks
## FFN | Inputs: input_dim, hidden dim
encoder_ffn = architectures.Encoder_FFN(three_d_shape[1], 20)
decoder_ffn = architectures.Decoder_FFN((max_y_length*14), 20)
ffn_ed_model = architectures.Seq2Seq(encoder_ffn, decoder_ffn, device)
## RNN | Inputs: input_dim, embedding dim, hidden dim, nr layers
#encoder_rnn = architectures.Encoder_RNN(three_d_shape[1], 128, 150, 1)
#decoder_rnn = architectures.Decoder_RNN(14, 128, 150, 3)
#rnn_ed_model = architectures.Seq2Seq(encoder_rnn, decoder_rnn, device)
## LSTM | Inputs: input_dim, embedding dim, hidden dim, nr layers, droput
encoder_lstm = architectures.Encoder_LSTM(three_d_shape[1], 128, 128, 1, 0)
decoder_lstm = architectures.Decoder_LSTM(14, 128, 128, 1, 0)
lst_ed_model = architectures.Seq2Seq(encoder_lstm, decoder_lstm, device)
## Transformer-Encoder | Inputs:  input_dim, emb_dim, num_heads, hidden_dim, num_layers, dropout
encoder_tra = architectures.TransformerEncoder(three_d_shape[1], 128, 4, 128, 1, dropout=0)
# Transformer-Decoder | Inputs: output_dim, emb_dim, num_heads, hidden_dim, num_layers
decoder_tra = architectures.TransformerDecoder(14, 128, 4, 128, 1)
tra_ed_model = architectures.Seq2SeqTransformer(encoder_tra, decoder_tra, device)

# Define optimizers for models
lr = 0.001
ffn_ed_optimizer = torch.optim.AdamW(ffn_ed_model.parameters(),lr=lr)
#rnn_ed_optimizer = torch.optim.AdamW(rnn_ed_model.parameters(),lr=lr)
lst_ed_optimizer = torch.optim.AdamW(lst_ed_model.parameters(),lr=lr)
tra_ed_optimizer = torch.optim.AdamW(tra_ed_model.parameters(),lr=lr)

In [55]:
# Load model to GPU
lst_ed_model.to(device)

Seq2Seq(
  (encoder): Encoder_LSTM(
    (embedding): Embedding(55, 128)
    (rnn): LSTM(128, 128, batch_first=True)
  )
  (decoder): Decoder_LSTM(
    (rnn): LSTM(128, 128, batch_first=True)
    (fc_out): Linear(in_features=128, out_features=14, bias=True)
  )
)

In [56]:
# Training Loop
LSTM_CELtrain, LSTM_CELtest, LSTM_ACCtrain, LSTM_ACCtest = schedule.train_model(lst_ed_model, train_dataloader_nu, test_dataloader_nu, lst_ed_optimizer, criterion, 50, device, max_y_length, y_train_3d)
torch.save(lst_ed_model.state_dict(), 'lstm_1.pth')

Ep. 01, CEL-Train: 0.9430| CEL-Test: 0.8288 | ACC-Train: 0.7384 | ACC-Test:  0.7310
Ep. 02, CEL-Train: 0.7667| CEL-Test: 0.7335 | ACC-Train: 0.7672 | ACC-Test:  0.7654
Ep. 03, CEL-Train: 0.6897| CEL-Test: 0.6599 | ACC-Train: 0.7963 | ACC-Test:  0.8014
Ep. 04, CEL-Train: 0.6100| CEL-Test: 0.5685 | ACC-Train: 0.7857 | ACC-Test:  0.8251
Ep. 05, CEL-Train: 0.5341| CEL-Test: 0.5129 | ACC-Train: 0.8228 | ACC-Test:  0.7927
Ep. 06, CEL-Train: 0.4871| CEL-Test: 0.5060 | ACC-Train: 0.7867 | ACC-Test:  0.8081
Ep. 07, CEL-Train: 0.4656| CEL-Test: 0.4663 | ACC-Train: 0.8075 | ACC-Test:  0.8020
Ep. 08, CEL-Train: 0.4456| CEL-Test: 0.4360 | ACC-Train: 0.8462 | ACC-Test:  0.8374
Ep. 09, CEL-Train: 0.4363| CEL-Test: 0.4690 | ACC-Train: 0.8433 | ACC-Test:  0.8369
Ep. 10, CEL-Train: 0.4218| CEL-Test: 0.4127 | ACC-Train: 0.8393 | ACC-Test:  0.8374
Ep. 11, CEL-Train: 0.4132| CEL-Test: 0.4196 | ACC-Train: 0.8476 | ACC-Test:  0.8477
Ep. 12, CEL-Train: 0.4030| CEL-Test: 0.4015 | ACC-Train: 0.8419 | ACC-Test: 

In [57]:
input, output = schedule.sanity(lst_ed_model, test_dataloader_nu, device, max_y_length)

In [58]:
csv_saver(LSTM_CELtrain, LSTM_CELtest, "CELtrain", "CELtest", "LSTM_1_CEL.csv")

Data successfully written to LSTM_1_CEL.csv


In [59]:
csv_saver(LSTM_ACCtrain, LSTM_ACCtest, "ACCtrain", "ACCtest", "LSTM_1_ACC.csv")

Data successfully written to LSTM_1_ACC.csv


In [60]:
csv_saver(input, output, "input", "output", "LSTM_1_sanity.csv")

Data successfully written to LSTM_1_sanity.csv


In [61]:
# Delete model from GPU to make space for new models
del lst_ed_model
torch.cuda.empty_cache()

# Transformer XS

In [62]:
# Load model to GPU
tra_ed_model.to(device)

Seq2SeqTransformer(
  (encoder): TransformerEncoder(
    (embedding): Embedding(55, 128)
    (transformer_encoder): TransformerEncoder(
      (layers): ModuleList(
        (0): TransformerEncoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=128, out_features=128, bias=True)
          )
          (linear1): Linear(in_features=128, out_features=128, bias=True)
          (dropout): Dropout(p=0, inplace=False)
          (linear2): Linear(in_features=128, out_features=128, bias=True)
          (norm1): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0, inplace=False)
          (dropout2): Dropout(p=0, inplace=False)
        )
      )
    )
  )
  (decoder): TransformerDecoder(
    (embedding): Embedding(14, 128)
    (transformer_decoder): TransformerDecoder(
      (layers): ModuleList(
        (0): TransformerDe

In [63]:
TRA_CELtrain, TRA_CELtest, TRA_ACCtrain, TRA_ACCtest = schedule.train_model(tra_ed_model, train_dataloader_nu, test_dataloader_nu, tra_ed_optimizer, criterion, 50, device, max_y_length, y_train_3d)
torch.save(tra_ed_model.state_dict(), 'tra_1.pth')

Ep. 01, CEL-Train: 0.7917| CEL-Test: 0.6345 | ACC-Train: 0.7847 | ACC-Test:  0.7773
Ep. 02, CEL-Train: 0.4699| CEL-Test: 0.3669 | ACC-Train: 0.8614 | ACC-Test:  0.8817
Ep. 03, CEL-Train: 0.3571| CEL-Test: 0.3039 | ACC-Train: 0.8962 | ACC-Test:  0.9053
Ep. 04, CEL-Train: 0.3091| CEL-Test: 0.2698 | ACC-Train: 0.8965 | ACC-Test:  0.9249
Ep. 05, CEL-Train: 0.2765| CEL-Test: 0.2455 | ACC-Train: 0.9024 | ACC-Test:  0.9208
Ep. 06, CEL-Train: 0.2527| CEL-Test: 0.2302 | ACC-Train: 0.9256 | ACC-Test:  0.9244
Ep. 07, CEL-Train: 0.2344| CEL-Test: 0.2106 | ACC-Train: 0.9163 | ACC-Test:  0.9131
Ep. 08, CEL-Train: 0.2216| CEL-Test: 0.2029 | ACC-Train: 0.9177 | ACC-Test:  0.9336
Ep. 09, CEL-Train: 0.2119| CEL-Test: 0.1922 | ACC-Train: 0.9203 | ACC-Test:  0.9306
Ep. 10, CEL-Train: 0.2050| CEL-Test: 0.1887 | ACC-Train: 0.9464 | ACC-Test:  0.8904
Ep. 11, CEL-Train: 0.1996| CEL-Test: 0.1890 | ACC-Train: 0.9249 | ACC-Test:  0.9563
Ep. 12, CEL-Train: 0.1947| CEL-Test: 0.1788 | ACC-Train: 0.9345 | ACC-Test: 

KeyboardInterrupt: 

In [None]:
input, output = schedule.sanity(tra_ed_model, test_dataloader_nu, device, max_y_length)

In [None]:
csv_saver(TRA_CELtrain, TRA_CELtest, "CELtrain", "CELtest", "TRA_4_CEL.csv")

In [None]:
csv_saver(TRA_ACCtrain, TRA_ACCtest, "ACCtrain", "ACCtest", "TRA_4_ACC.csv")

In [None]:
csv_saver(input, output, "input", "output", "TRA_4_sanity.csv")

In [None]:
# Delete model from GPU to make space for new models
del tra_ed_model
torch.cuda.empty_cache()

# LSTM 2E

In [None]:
# Define the Encoder-Decoder networks
## FFN | Inputs: input_dim, hidden dim
encoder_ffn = architectures.Encoder_FFN(three_d_shape[1], 20)
decoder_ffn = architectures.Decoder_FFN((max_y_length*14), 20)
ffn_ed_model = architectures.Seq2Seq(encoder_ffn, decoder_ffn, device)
## RNN | Inputs: input_dim, embedding dim, hidden dim, nr layers
#encoder_rnn = architectures.Encoder_RNN(three_d_shape[1], 128, 150, 1)
#decoder_rnn = architectures.Decoder_RNN(14, 128, 150, 3)
#rnn_ed_model = architectures.Seq2Seq(encoder_rnn, decoder_rnn, device)
## LSTM | Inputs: input_dim, embedding dim, hidden dim, nr layers, droput
encoder_lstm = architectures.Encoder_LSTM(three_d_shape[1], 128, 128, 2, 0)
decoder_lstm = architectures.Decoder_LSTM(14, 128, 128, 3, 0)
lst_ed_model = architectures.Seq2Seq(encoder_lstm, decoder_lstm, device)
## Transformer-Encoder | Inputs:  input_dim, emb_dim, num_heads, hidden_dim, num_layers, dropout
encoder_tra = architectures.TransformerEncoder(three_d_shape[1], 128, 4, 128, 2, dropout=0)
# Transformer-Decoder | Inputs: output_dim, emb_dim, num_heads, hidden_dim, num_layers
decoder_tra = architectures.TransformerDecoder(14, 128, 4, 128, 3)
tra_ed_model = architectures.Seq2SeqTransformer(encoder_tra, decoder_tra, device)

# Define optimizers for models
lr = 0.001
ffn_ed_optimizer = torch.optim.AdamW(ffn_ed_model.parameters(),lr=lr)
#rnn_ed_optimizer = torch.optim.AdamW(rnn_ed_model.parameters(),lr=lr)
lst_ed_optimizer = torch.optim.AdamW(lst_ed_model.parameters(),lr=lr)
tra_ed_optimizer = torch.optim.AdamW(tra_ed_model.parameters(),lr=lr)

In [None]:
# Load model to GPU
lst_ed_model.to(device)

In [None]:
# Training Loop
LSTM_CELtrain, LSTM_CELtest, LSTM_ACCtrain, LSTM_ACCtest = schedule.train_model(lst_ed_model, train_dataloader_nu, test_dataloader_nu, lst_ed_optimizer, criterion, 50, device, max_y_length, y_train_3d)
torch.save(lst_ed_model.state_dict(), 'lstm_2e.pth')

In [None]:
input, output = schedule.sanity(lst_ed_model, test_dataloader_nu, device, max_y_length)

In [None]:
csv_saver(LSTM_CELtrain, LSTM_CELtest, "CELtrain", "CELtest", "LSTM_2e_CEL.csv")

In [None]:
csv_saver(LSTM_ACCtrain, LSTM_ACCtest, "ACCtrain", "ACCtest", "LSTM_2e_ACC.csv")

In [None]:
csv_saver(input, output, "input", "output", "LSTM_2e_sanity.csv")

In [None]:
# Delete model from GPU to make space for new models
del lst_ed_model
torch.cuda.empty_cache()

# Transformer 2E

In [None]:
# Load model to GPU
tra_ed_model.to(device)

In [None]:
TRA_CELtrain, TRA_CELtest, TRA_ACCtrain, TRA_ACCtest = schedule.train_model(tra_ed_model, train_dataloader_nu, test_dataloader_nu, tra_ed_optimizer, criterion, 50, device, max_y_length, y_train_3d)
torch.save(tra_ed_model.state_dict(), 'tra_2e.pth')

In [None]:
input, output = schedule.sanity(tra_ed_model, test_dataloader_nu, device, max_y_length)

In [None]:
csv_saver(TRA_CELtrain, TRA_CELtest, "CELtrain", "CELtest", "TRA_2e_CEL.csv")

In [None]:
csv_saver(TRA_ACCtrain, TRA_ACCtest, "ACCtrain", "ACCtest", "TRA_2e_ACC.csv")

In [None]:
csv_saver(input, output, "input", "output", "TRA_2e_sanity.csv")

In [None]:
# Delete model from GPU to make space for new models
del tra_ed_model
torch.cuda.empty_cache()

# LSTM 3E

In [None]:
# Define the Encoder-Decoder networks
## FFN | Inputs: input_dim, hidden dim
encoder_ffn = architectures.Encoder_FFN(three_d_shape[1], 20)
decoder_ffn = architectures.Decoder_FFN((max_y_length*14), 20)
ffn_ed_model = architectures.Seq2Seq(encoder_ffn, decoder_ffn, device)
## RNN | Inputs: input_dim, embedding dim, hidden dim, nr layers
#encoder_rnn = architectures.Encoder_RNN(three_d_shape[1], 128, 150, 1)
#decoder_rnn = architectures.Decoder_RNN(14, 128, 150, 3)
#rnn_ed_model = architectures.Seq2Seq(encoder_rnn, decoder_rnn, device)
## LSTM | Inputs: input_dim, embedding dim, hidden dim, nr layers, droput
encoder_lstm = architectures.Encoder_LSTM(three_d_shape[1], 128, 128, 3, 0)
decoder_lstm = architectures.Decoder_LSTM(14, 128, 128, 3, 0)
lst_ed_model = architectures.Seq2Seq(encoder_lstm, decoder_lstm, device)
## Transformer-Encoder | Inputs:  input_dim, emb_dim, num_heads, hidden_dim, num_layers, dropout
encoder_tra = architectures.TransformerEncoder(three_d_shape[1], 128, 4, 128, 3, dropout=0)
# Transformer-Decoder | Inputs: output_dim, emb_dim, num_heads, hidden_dim, num_layers
decoder_tra = architectures.TransformerDecoder(14, 128, 4, 128, 3)
tra_ed_model = architectures.Seq2SeqTransformer(encoder_tra, decoder_tra, device)

# Define optimizers for models
lr = 0.001
ffn_ed_optimizer = torch.optim.AdamW(ffn_ed_model.parameters(),lr=lr)
#rnn_ed_optimizer = torch.optim.AdamW(rnn_ed_model.parameters(),lr=lr)
lst_ed_optimizer = torch.optim.AdamW(lst_ed_model.parameters(),lr=lr)
tra_ed_optimizer = torch.optim.AdamW(tra_ed_model.parameters(),lr=lr)

In [None]:
# Load model to GPU
lst_ed_model.to(device)

In [None]:
# Training Loop
LSTM_CELtrain, LSTM_CELtest, LSTM_ACCtrain, LSTM_ACCtest = schedule.train_model(lst_ed_model, train_dataloader_nu, test_dataloader_nu, lst_ed_optimizer, criterion, 50, device, max_y_length, y_train_3d)
torch.save(lst_ed_model.state_dict(), 'lstm_3e.pth')

In [None]:
input, output = schedule.sanity(lst_ed_model, test_dataloader_nu, device, max_y_length)

In [None]:
csv_saver(LSTM_CELtrain, LSTM_CELtest, "CELtrain", "CELtest", "LSTM_3e_CEL.csv")

In [None]:
csv_saver(LSTM_ACCtrain, LSTM_ACCtest, "ACCtrain", "ACCtest", "LSTM_e3_ACC.csv")

In [None]:
csv_saver(input, output, "input", "output", "LSTM_e3_sanity.csv")

In [None]:
# Delete model from GPU to make space for new models
del lst_ed_model
torch.cuda.empty_cache()

# Transformer 3E

In [None]:
# Load model to GPU
tra_ed_model.to(device)

In [None]:
TRA_CELtrain, TRA_CELtest, TRA_ACCtrain, TRA_ACCtest = schedule.train_model(tra_ed_model, train_dataloader_nu, test_dataloader_nu, tra_ed_optimizer, criterion, 50, device, max_y_length, y_train_3d)
torch.save(tra_ed_model.state_dict(), 'tra_3e.pth')

In [None]:
input, output = schedule.sanity(tra_ed_model, test_dataloader_nu, device, max_y_length)

In [None]:
csv_saver(TRA_CELtrain, TRA_CELtest, "CELtrain", "CELtest", "TRA_3e_CEL.csv")

In [None]:
csv_saver(TRA_ACCtrain, TRA_ACCtest, "ACCtrain", "ACCtest", "TRA_3e_ACC.csv")

In [None]:
csv_saver(input, output, "input", "output", "TRA_3e_sanity.csv")

In [None]:
# Delete model from GPU to make space for new models
del tra_ed_model
torch.cuda.empty_cache()