This is a short notebook to walk through some of the applications of Transformer neural nets for tokenized analytic data, and demonstrate the functionality of the repo through examples. First let's load in the necessary libraries and modules.

In [None]:
import encoder_decoder, encoder_only, decoder_only

Each architecture does something a little different, roughly, the inputs and outputs look like the following:

1. Encoder-Only: [3,4,2,5,...,5,6,3,4,3,3] --> 8
2. Decoder-Only: [5,6,3,4,3] --> [5,6,3,4,3,3]
3. Encoder-Decoder: [5,6,3,4,3,3] --> [3,4,2,5,1,3,2]

In words, this looks like:

1. Encoders take a sequence and maps it to a new vector in the embedding space that gets mapped to a single category
2. Decoders take a sequence and predict the next token, either unconditionally, or...
3. Encoder-Decoder conditions the next token predictino of the decoder layers with an encoder output vector. 

Obviously, these are all overlappping, and in many ways you can create the same behavior for encoders with decoders and vice-versa (just have the output of encoder map to the next token in the sequence, as opposed to some completely different semantic category). 

But for historical reasons, we'll keep all three of these architectures distinct as they have been used for different types of token prediction tasks.

Let's start with encoder only and "train" a neural network to identify the largest token in a sequence -- i.e. effectively implement a MAX function acting on list using a neural network.

In [29]:
encoder_decoder.importLibs()
import torch
from torch.utils.data import DataLoader

Finished import


In [13]:
# Define Dataset Class
class SequenceDataset(torch.utils.data.Dataset):
    def __init__(self, sequences, targets):
        self.sequences = sequences
        self.targets = targets
    
    def __len__(self):
        return len(self.sequences)
    
    def __getitem__(self, idx):
        return torch.tensor(self.sequences[idx], dtype=torch.long), torch.tensor(self.targets[idx], dtype=torch.long)

In [None]:
seq = []
tgt = []
import random

for i in range(20):
    start = []
    for j in range(7):
        start.append(random.randint(1,10))
    seq.append(start)
    tgt.append(random.randint(1,20))

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
data = SequenceDataset(seq,tgt)



In [95]:
from torch import nn
encoder_layer = nn.TransformerEncoderLayer(d_model=96,nhead=8)
encoder_only = nn.TransformerEncoder(encoder_layer,num_layers=4)
src = torch.rand(4,4,96)
out = encoder_only(src)
out[0,1]

tensor([-9.7131e-01,  6.4653e-01,  1.1058e+00, -1.5218e+00, -2.9165e-01,
         1.7518e-01, -6.2922e-02, -1.3742e+00,  5.7713e-01,  9.7474e-02,
        -5.5063e-01, -4.2551e-01, -2.4405e+00,  3.3425e-01, -9.2217e-01,
         2.9956e-02, -1.7427e+00, -2.9713e-04,  2.0554e+00, -3.3534e-01,
         4.4978e-01, -1.4612e+00, -9.6993e-01, -1.7550e+00, -4.9043e-01,
         2.1186e-01, -2.5202e+00, -3.3610e-01, -6.8310e-01, -3.2452e-02,
         2.7511e+00, -9.1493e-01,  1.3785e-01,  1.0419e+00, -1.4911e+00,
         3.2904e-01,  4.1584e-02,  1.4884e-01,  2.4676e-01,  1.1247e+00,
         1.3813e+00, -1.2214e-01,  1.9641e-01,  4.9167e-01,  2.5802e-01,
         3.2950e-01,  1.3350e+00,  1.0029e+00,  9.8470e-01, -8.4471e-01,
         1.3989e+00,  7.9151e-01,  2.5862e-01, -1.2320e+00,  8.8090e-02,
        -4.4815e-01,  7.1815e-02,  5.7713e-01,  1.3305e+00,  1.1457e+00,
        -2.8976e-01, -1.2213e-01, -1.1351e+00,  9.9267e-01, -1.4445e+00,
         1.2487e+00, -1.0512e+00,  9.6878e-01, -1.2

In [74]:
loader=DataLoader(data,batch_size=5, shuffle=True)

for epoch in range(5):
    print(epoch)
    for seq,tgt in loader:
        seq,tgt = seq.to(device), tgt.to(device)
        print(seq,seq.transpose(1,0),seq.transpose(1,0))

0
tensor([[ 1, 10,  6,  7,  3,  6,  7],
        [ 8,  6,  4,  2, 10,  9,  6],
        [ 1,  8,  4,  7,  8,  8,  7],
        [ 8,  6,  1,  9,  3,  6,  9],
        [ 5,  8,  1,  3,  7,  1,  8]]) tensor([[ 1,  8,  1,  8,  5],
        [10,  6,  8,  6,  8],
        [ 6,  4,  4,  1,  1],
        [ 7,  2,  7,  9,  3],
        [ 3, 10,  8,  3,  7],
        [ 6,  9,  8,  6,  1],
        [ 7,  6,  7,  9,  8]]) tensor([[ 1,  8,  1,  8,  5],
        [10,  6,  8,  6,  8],
        [ 6,  4,  4,  1,  1],
        [ 7,  2,  7,  9,  3],
        [ 3, 10,  8,  3,  7],
        [ 6,  9,  8,  6,  1],
        [ 7,  6,  7,  9,  8]])
tensor([[ 2,  3,  3, 10,  7,  9,  3],
        [ 7,  3,  6,  5,  1,  3,  2],
        [ 9,  2,  9,  6, 10,  5,  8],
        [ 7, 10, 10,  5,  3,  4,  1],
        [ 3,  2,  8, 10,  3, 10,  9]]) tensor([[ 2,  7,  9,  7,  3],
        [ 3,  3,  2, 10,  2],
        [ 3,  6,  9, 10,  8],
        [10,  5,  6,  5, 10],
        [ 7,  1, 10,  3,  3],
        [ 9,  3,  5,  4, 10],
        [ 3,  