## 1 Overview
An enviroment to train and evaluate neural networks on learning logical consequence. 

In [1]:
from imports import *

In [137]:
importlib.reload(generation)

<module 'generation' from '/home/str/university/masters thesis/master-s-thesis/generation.py'>

## 2 Create dataset
First the dataset for training is generated. For this the function "create_dataset" from "generation.py" utilizes the functions "gen_outp_PA" to generate a set of random starting formulas, for which iterativly the applicability of rules is checked. All applicable rules are then used to generate new derivations. In each iteration of gen_oupt_PA, set by the iterations variable, new, longer examples are generated.

**Rules.** The rules are defined in calculi.py. Two sets are avaiable: Intuitionistic propositional logic (set below via "calculus = ipl") and classical propositional logic (set below via "calculus = cpl").

**Dataset entries.**
- **x_train.** Training input: [INDEX, PREMISES, DERIVATION SYMBOL, CONCLUSION]
- **y_tdict.** Dictionary of correct derivations for input INDEX: {INDEX: [DERIVATIONS_0...DERIVATION_N]}

**Encoding.** Propositional variables and logical constants are encoded as integers. The integers are then one-hot-encoded into unique sequences containing only 0s and ones with the length of the maximum integer value, the feature length. The shape of the individual entries is 2D: [SEQUENCE LENGTH, FEATURE LENGTH].

**Example entries withouth numerical representation and one-hot-encoding.**
- **x_train.** [2345, A, A THEN B, DERIVES, B OR C]
- **y_tdict.** {2345: [[A, A THEN B, B, B OR C], [A, A THEN B, B, A AND B, B OR C]]}


In [63]:
# Create dataset
x_train_2d, x_train_3d, y_tdict, max_y_train_len = generation.create_dataset(iterations = [1,2], calculus = calculi.ipl)

Processed at iteration 1:   0%|          | 0/100 [00:00<?, ?it/s]

Processed at iteration 2:   0%|          | 0/548 [00:00<?, ?it/s]

Number ground truth examples in y_tdict: 7697
DRVAS:[[5, 1, [1, 11, 4]], [5, 1, [5, 11, 1]], [5, 1, [5, 11, 2]], [5, 1, [3, 11, 5]], [5, 1, [5, 12, 1]], [5, 1, [5, 10, 1]], [3, [3, 11, 2], [[3, 11, 2], 11, 1]], [3, [3, 11, 2], [1, 11, [3, 11, 2]]], [3, [3, 11, 2], [3, 11, 1]], [3, [3, 11, 2], [3, 11, 3]], [3, [3, 11, 2], [3, 12, [3, 11, 2]]], [3, [3, 11, 2], [3, 10, [3, 11, 2]]], [[9, 2], [5, 12, 2], 5], [[9, 2], [5, 12, 2], 2], [[9, 2], [5, 12, 2], [[5, 12, 2], 11, 4]], [[9, 2], [5, 12, 2], [3, 11, [5, 12, 2]]], [[9, 2], [5, 12, 2], [[9, 2], 11, 4]], [[9, 2], [5, 12, 2], [3, 11, [9, 2]]], [[9, 2], [5, 12, 2], [[9, 2], 12, [5, 12, 2]]], [[9, 2], [5, 12, 2], [[9, 2], 10, [5, 12, 2]]], [1, [9, 1], [[9, 1], 11, 2]], [1, [9, 1], [1, 11, [9, 1]]], [1, [9, 1], [1, 11, 5]], [1, [9, 1], [3, 11, 1]], [1, [9, 1], 13], [1, [9, 1], [1, 12, [9, 1]]], [1, [9, 1], [1, 10, [9, 1]]], [5, 2, [2, 11, 3]], [5, 2, [4, 11, 2]], [5, 2, [5, 11, 3]], [5, 2, [4, 11, 5]], [5, 2, [5, 12, 2]], [5, 2, [5, 10, 2]], 

Processed premises for sample conclusions at iteration 2:   0%|          | 0/661 [00:00<?, ?it/s]

Checked derivations for sample conclusions:   0%|          | 0/7697 [00:00<?, ?it/s]

Processed entries for x_train and y_tdict:   0%|          | 0/4029 [00:00<?, ?it/s]

Padded x_train entries:   0%|          | 0/1743 [00:00<?, ?it/s]

Number x_train examples: 1743
Average number ground truth examples/x_train example: 2.3115318416523234


## 3 Prepare dataset and define model for training
Next with pytorch's dataloader the single training entries in x_train are assigned to batches of size "batch size" in mixed order. Then the different models are defined using definitions from "architectures.py". These models are:

- Feedforward network (net)
- Recurrent neural network (RNNNet)
- Long-short-term memory (LSTMNet)
- Transformers (TransformerModel)

In [50]:
from torch.utils.data import DataLoader, random_split

In [64]:
train_size = int(0.8 * len(x_train_2d))  # for 80-20 train-test split
test_size = len(x_train_2d) - train_size
two_d_shape = x_train_2d.shape
three_d_shape = x_train_3d.shape
x_train_2d, x_test_2d = random_split(x_train_2d, [train_size, test_size])

In [68]:
two_d_shape[0]

1743

In [65]:
train_size = int(0.8 * len(x_train_3d))  # for 80-20 train-test split
test_size = len(x_train_3d) - train_size
x_train_3d, x_test_3d = random_split(x_train_3d, [train_size, test_size])

In [66]:
# Dataloader
train_dataloader_2d = DataLoader(dataset = x_train_2d, shuffle = True, batch_size = 50)
test_dataloader_2d = DataLoader(dataset = x_test_2d, shuffle = True, batch_size = 50)
train_dataloader_3d = DataLoader(dataset = x_train_3d, shuffle = True, batch_size = 50)
test_dataloader_3d = DataLoader(dataset = x_test_3d, shuffle = True, batch_size = 50)

In [69]:
ffn_model = architectures.ffn(two_d_shape[1]-1, max_y_train_len)
rnn_model = architectures.rnn(three_d_shape[2], 20, max_y_train_len)
lst_model = architectures.lst(three_d_shape[2], 20, max_y_train_len)
tra_model = architectures.tra(three_d_shape[2], 20, max_y_train_len, 2, 4)



In [70]:
ffn_optimizer = torch.optim.SGD(ffn_model.parameters(),lr=0.001)
rnn_optimizer = torch.optim.SGD(rnn_model.parameters(),lr=0.001)
lst_optimizer = torch.optim.SGD(lst_model.parameters(),lr=0.001)
tra_optimizer = torch.optim.SGD(tra_model.parameters(),lr=0.001)

epochs = 1000

## 4 Training

In [None]:
import torch
from torch.utils.data import DataLoader, random_split

# Assuming 'dataset_2d' is your full dataset
train_size = int(0.8 * len(dataset_2d))  # for 80-20 train-test split
test_size = len(dataset_2d) - train_size
train_dataset, test_dataset = random_split(dataset_2d, [train_size, test_size])

train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=64, shuffle=False)

ffn_costval_train = []
ffn_costval_test = []

epochs = 100
for j in range(epochs):
    ffn_model.train()  # Set model to training mode
    total_train_loss = 0
    for x_train in train_dataloader:
        y_pred = ffn_model(x_train[:, 1:])
        cost = losses.mse_min_dist(y_pred, x_train, y_tdict, (max_y_train_len/14), "ffn")
        ffn_optimizer.zero_grad()
        cost.backward()
        ffn_optimizer.step()
        total_train_loss += cost.item()
    avg_train_loss = total_train_loss / len(train_dataloader)
    ffn_costval_train.append(avg_train_loss)

    ffn_model.eval()  # Set model to evaluation mode
    total_test_loss = 0
    with torch.no_grad():  # Disable gradient calculation
        for x_test in test_dataloader:
            y_pred = ffn_model(x_test[:, 1:])
            cost = losses.mse_min_dist(y_pred, x_test, y_tdict, (max_y_train_len/14), "ffn")
            total_test_loss += cost.item()
    avg_test_loss = total_test_loss / len(test_dataloader)
    ffn_costval_test.append(avg_test_loss)

    if j % 50 == 0:
        print(f"Epoch {j}: Train Loss - {avg_train_loss}, Test Loss - {avg_test_loss}")


In [None]:
# FFN
ffn_costval_train = []
ffn_costval_test = []
for j in range(epochs):
    ffn_model.train()
    train_loss = 0
    for i, x_train in enumerate(train_dataloader_2d):
        #prediction
        y_pred = ffn_model(x_train[:,1:])
        cost = losses.mse_min_dist(y_pred, x_train, y_tdict, (max_y_train_len/14), "ffn")
        # Backpropagation
        ffn_optimizer.zero_grad()
        cost.backward()
        ffn_optimizer.step()
        train_loss += cost.item()
    avg_train_loss = train_loss / len(train_dataloader_2d)
    ffn_costval_train.append(avg_train_loss)

    ffn_model.eval()
    test_loss = 0
    with torch.no_grad():  
        for i, x_test in enumerate(test_dataloader_2d):
            y_pred = ffn_model(x_test[:, 1:])
            cost = losses.mse_min_dist(y_pred, x_test, y_tdict, (max_y_train_len/14), "ffn")
            test_loss += cost.item()
    avg_test_loss = test_loss / len(test_dataloader_2d)

    if j % 10 == 0:
        print(f"Epoch {j}: Train Loss - {avg_train_loss}, Test Loss - {avg_test_loss}")

In [71]:
# RNN
rnn_costval = []
for j in range(epochs):
    for i, x_train in enumerate(dataloader_3d):
        #prediction
        y_pred = rnn_model(x_train[:,1:])
        cost = losses.mse_min_dist(y_pred, x_train, y_tdict, (max_y_train_len/14), "rnn")
        # Backpropagation
        rnn_optimizer.zero_grad()
        cost.backward()
        rnn_optimizer.step()

    if j % 50 == 0:
        print(cost)
        rnn_costval.append(cost.item()) 

tensor(0.1074, grad_fn=<DivBackward0>)


KeyboardInterrupt: 

In [None]:
# LSTM
lst_costval = []
for j in range(epochs):
    for i, x_train in enumerate(dataloader_3d):
        #prediction
        y_pred = lst_model(x_train[:,1:])
        cost = losses.mse_min_dist(y_pred, x_train, y_tdict, (max_y_train_len/14), "lstm")
        # Backpropagation
        lst_optimizer.zero_grad()
        cost.backward()
        lst_optimizer.step()

    if j % 50 == 0:
        print(cost)
        lst_costval.append(cost.item())  # Use .item() to store the loss value as a number

In [None]:
# Transformers
tra_costval = []
for j in range(epochs):
    for i, x_train in enumerate(dataloader_3d):
        #prediction
        y_pred = tra_model(x_train[:,1:])
        cost = losses.mse_min_dist(y_pred, x_train, y_tdict, (max_y_train_len/14), "transformers")
        # Backpropagation
        tra_optimizer.zero_grad()
        cost.backward()
        tra_optimizer.step()

    if j % 50 == 0:
        print(cost)
        tra_costval.append(cost.item())  # Use .item() to store the loss value as a number

## 5 Plot results

In [None]:
# Plotting
plt.figure(figsize=(8, 4))
#for i, history in enumerate(all_histories):
#    plt.plot(history['mae'], label=f'MAE {architectures[i]}')
plt.plot(costval, label='MSE for nearest correct derivation')
#plt.title('Task 2: MAE training history')
plt.xlabel('Epochs')
plt.ylabel('MSE')
plt.legend()
plt.show() # (End GPT)

In [None]:
src_vocab_size = 5000
tgt_vocab_size = 5000
d_model = 512
num_heads = 8
num_layers = 6
d_ff = 2048
max_seq_length = 100
dropout = 0.1

transformer = arch.Transformer(src_vocab_size, tgt_vocab_size, d_model, num_heads, num_layers, d_ff, max_seq_length, dropout)

# Generate random sample data
src_data = torch.randint(1, src_vocab_size, (64, max_seq_length))  # (batch_size, seq_length)
tgt_data = torch.randint(1, tgt_vocab_size, (64, max_seq_length))  # (batch_size, seq_length)

In [None]:
src_data

In [None]:
tgt_data.shape

In [None]:
criterion = nn.CrossEntropyLoss(ignore_index=0)
optimizer = optim.Adam(transformer.parameters(), lr=0.0001, betas=(0.9, 0.98), eps=1e-9)

transformer.train()

for epoch in range(100):
    optimizer.zero_grad()
    output = transformer(src_data, tgt_data[:, :-1])
    #loss = criterion(output.contiguous().view(-1, tgt_vocab_size), tgt_data[:, 1:].contiguous().view(-1))
    loss.backward()
    optimizer.step()
    print(f"Epoch: {epoch+1}, Loss: {loss.item()}")