In this notebook we are going to train a custom Transformer model with using pytorch.  

## Imports

We will install and import all the required libraries

In [None]:
!pip install transformers datasets tokenizers --q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m57.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m486.2/486.2 kB[0m [31m38.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m119.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.5/268.5 kB[0m [31m33.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m83.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.5/212.5 kB[0m [31m24.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.3/134.3 kB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import os
import re
import json
import torch
import transformers
import pandas as pd
import numpy as np
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
from datasets import load_dataset
from tokenizers import Tokenizer
from tokenizers.trainers import BpeTrainer
from tokenizers.models import BPE
from tokenizers.pre_tokenizers import Whitespace

In [None]:
# Check the availability of the cuda device
if torch.cuda.is_available():
    device = torch.device("cuda")  # Use CUDA device
else:
    device = torch.device("cpu")  # Use CPU device

In [None]:
print(device)

cuda


In [None]:
""+"Sjar"+" "

'Sjar '

## Data Processing
In this section firsly, we will read the text files which later, we will divide songs into train, test and validation set. Please note that we are making testing conditions hard by explicitingly spliting based on songs than joining all the text together and spliting the text.

We will perform heuristic analysis on the training text in terms of:
1. Average number of verses in a song
2. Average number of STOPWORDs in a song



In [None]:
"""
We are using special tokens [EOS] and [SEP] to denote the end of the song
and end of the verse respectively
"""

album_path = "/content/drive/MyDrive/Colab_Notebooks/Outsystems/data/Albums"
text_data = []
for root, dirs, files in os.walk(album_path):
    for name in files:
        with open(os.path.join(root, name), mode="r", encoding="utf-8") as f:
            lines = f.readlines()[1:]
            # adding [EOS] at the end of each song
            lines = "".join(lines)
            ## adding [SEP] in between verses
            # lines = re.sub(r"\n \n", " [SEP]", lines)
            # lines = re.sub(r"\n", "", lines)
            # text_data += lines + " "
            text_data.append(lines)
            f.close()

In [None]:
!pip install revtok

Collecting revtok
  Downloading revtok-0.0.3-py3-none-any.whl (4.3 kB)
Installing collected packages: revtok
Successfully installed revtok-0.0.3


In [None]:
print(text_data[:10][0])

It was so nice throwing big parties
Jump into the pool from the balcony
Everyone swimming in a champagne sea
And there are no rules when you show up here
Bass beat rattling the chandelier
Feeling so Gatsby for that whole year

[Pre-Chorus]
So, why'd you have to rain on my parade?
I'm shaking my head and locking the gates
[Chorus]
This is why we can't have nice things, darling
Because you break them, I had to take them away
This is why we can't have nice things, honey (Oh)
Did you think I wouldn't hear all the things you said about me?
This is why we can't have nice things

[Verse 2]
It was so nice being friends again
There I was, giving you a second chance
But you stabbed me in the back while shaking my hand
And therein lies the issue, friends don't try to trick you
Get you on the phone and mind-twist you
And so I took an axe to a mended fence

[Pre-Chorus]
But I'm not the only friend you've lost lately (Mm-mm)
If only you weren't so shady
[Chorus]
This is why we can't have nice things

In [None]:
# Splitting data into 80-10-10 data split
train, val = train_test_split(text_data, test_size=0.1, random_state=99)
train, test = train_test_split(train, test_size=0.1, random_state=99)

In [None]:
train_text = "".join(train)
val_text = "".join(val)
test_text = "".join(test)

In [None]:
import io
from torchtext.vocab import build_vocab_from_iterator
from torchtext.data.utils import get_tokenizer

tokenizer = get_tokenizer('subword')

def yield_tokens(file_path):
  for root, dirs, files in os.walk(file_path):
    for name in files:
      with io.open(os.path.join(root, name), encoding = 'utf-8') as f:
        for line in f:
          yield tokenizer(line.strip())

vocab = build_vocab_from_iterator(yield_tokens(album_path), specials=["<unk>"])
vocab.set_default_index(vocab["<unk>"])

In [None]:
def data_process(raw_text_iter):
    """Converts raw text into a flat Tensor."""
    data = [torch.tensor(vocab(tokenizer(item.strip())), dtype=torch.long) for item in raw_text_iter]
    return torch.cat(tuple(filter(lambda t: t.numel() > 0, data)))

In [None]:
from torchtext.vocab import build_vocab_from_iterator
from torchtext.data.utils import get_tokenizer

tokenizer = get_tokenizer('subword')
vocab = build_vocab_from_iterator(map(tokenizer, map(lambda x:x.strip(),train_iter)), max_tokens=15346)
vocab.set_default_index(vocab["[UNK]"])

def data_process(raw_text_iter):
    """Converts raw text into a flat Tensor."""
    data = [torch.tensor(vocab(tokenizer(item.strip())), dtype=torch.long) for item in raw_text_iter]
    return torch.cat(tuple(filter(lambda t: t.numel() > 0, data)))

### Text analysis

In [None]:
count_verses = lambda song: len(song.split("[SEP]"))
stopwords = set([
        "a", "an", "and", "are", "as", "at", "be", "by", "for",
        "from", "has", "he", "in", "is", "it", "its", "of", "on",
        "that", "the", "to", "was", "were", "will", "with"
    ])

def count_stopwords(text, stopwords=stopwords):
    word_list = text.lower().split()
    stopwords_count = 0

    for word in word_list:
        if word in stopwords:
            stopwords_count += 1

    return stopwords_count

avg_n_verses = sum([count_verses(t) for t in train]) / len(train)
avg_n_words = sum([len(t.split()) for t in train]) / len(train)
avg_stop_verses = sum([count_stopwords(t) for t in train]) / len(train)
len_vocab = len(set([w for t in train for w in t.split()]))

print(f"Average number of verses in a song {avg_n_verses:.4f} \n")
print(f"Average number of words in a song {avg_n_words:.4f} \n")
print(f"Average number of stopwords in a song {avg_stop_verses:.4f} \n")
print(f"Average number of words in a verse {avg_n_words/avg_n_verses:.4f} \n")
print(f"Vocab size {len_vocab}")

Average number of verses in a song 1.0000 

Average number of words in a song 474.4198 

Average number of stopwords in a song 97.3309 

Average number of words in a verse 474.4198 

Vocab size 15344


The requirement of the task is to generate song of based on initial one or two verses. Due to memory constraints, we will consider only one **verse of length 64** (closest int to the power of 2 for 70.37) to generate maximum sequnece **length of song 512** (closest int to the power of 2 for 481.25). We will not remove the stopwords to achieve grammatically correct sentence, which will eventually reduce the perplexity of the generated sequence which would be the evaluation metric for the task.

In [None]:
import torch
from torch.utils.data import IterableDataset

class TaylorLyricsDataset(IterableDataset):
    """
    A custom IterableDataset implementation.

    This class allows iterating over the provided data by implementing the __iter__ method.
    It inherits from the IterableDataset class.

    Args:
        data (Iterable): The data to be used for iteration.

    Yields:
        Any: Each item from the provided data.
    """
    def __init__(self, data):
        self.data = data

    def __iter__(self):
        for item in self.data:
            # Yield or return each item from the data
            yield item

train_iter = TaylorLyricsDataset(train)
test_iter = TaylorLyricsDataset(test)
val_iter = TaylorLyricsDataset(val)

### Tokenization

For this task, we are using basic English tokenizer from torchtext by taking small vocabulary size (15346) into account. As other tokenizers based on subword needs large vocabulary size to learn the frequency patterns of the characters appearing together.

In [None]:
!pip install revtok -q

In [None]:
from torchtext.vocab import build_vocab_from_iterator
from torchtext.data.utils import get_tokenizer

In [None]:
tokenizer = get_tokenizer('subword')
# vocab = build_vocab_from_iterator([tokenizer(item) for item in train_iter])

for idx, item in enumerate(list(train_iter)):
  try:
    tokenizer(item.strip())
  except IndexError:
    print(f"Error at {idx}")

In [None]:
list(train_iter)[64].strip()

"[Verse 1] Once upon a time, a few mistakes ago I was in your sights, you got me alone You found me, you found me You found me-e-e-e-e I guess you didn't care, and I guess I liked that And when I fell hard, you took a step back Without me, without me Without me-e-e-e-e [Pre-Chorus] And he's long gone when he's next to me And I realize the blame is on me [SEP] [Chorus] 'Cause I knew you were trouble when you walked in So shame on me now Flew me to places I'd never been 'Til you put me down, oh I knew you were trouble when you walked in So shame on me now Flew me to places I'd never been Now, I'm lying on the cold, hard ground [SEP] [Post-Chorus] Oh, oh-oh Trouble, trouble, trouble Oh, oh-oh Trouble, trouble, trouble [SEP] [Verse 2] No apologies, he'll never see you cry Pretends he doesn't know that he's the reason why You're drowning, you're drowning You're drowning-ing-ing-ing-ing And I heard you moved on from whispers on the street A new notch in your belt is all I'll ever be And now,

In [None]:
tokenizer(list(train_iter)[64].strip())

[' [',
 '\ue302 verse ',
 ' 1 ',
 '] ',
 '\ue302 once ',
 ' upon ',
 ' a ',
 ' time ',
 ', ',
 ' a ',
 ' few ',
 ' mistakes ',
 ' ago ',
 ' I ',
 ' was ',
 ' in ',
 ' your ',
 ' sights ',
 ', ',
 ' you ',
 ' got ',
 ' me ',
 ' alone ',
 '\ue302 you ',
 ' found ',
 ' me ',
 ', ',
 ' you ',
 ' found ',
 ' me ',
 '\ue302 you ',
 ' found ',
 ' me ',
 '-',
 ' e ',
 '-',
 ' e ',
 '-',
 ' e ',
 '-',
 ' e ',
 ' I ',
 ' guess ',
 ' you ',
 ' didn ',
 "'",
 ' t ',
 ' care ',
 ', ',
 ' and ',
 ' I ',
 ' guess ',
 ' I ',
 ' liked ',
 ' that ',
 '\ue302 and ',
 ' when ',
 ' I ',
 ' fell ',
 ' hard ',
 ', ',
 ' you ',
 ' took ',
 ' a ',
 ' step ',
 ' back ',
 '\ue302 without ',
 ' me ',
 ', ',
 ' without ',
 ' me ',
 '\ue302 without ',
 ' me ',
 '-',
 ' e ',
 '-',
 ' e ',
 '-',
 ' e ',
 '-',
 ' e ',
 ' [',
 '\ue302 pre ',
 '-',
 '\ue302 chorus ',
 '] ',
 '\ue302 and ',
 ' he ',
 "'",
 ' s ',
 ' long ',
 ' gone ',
 ' when ',
 ' he ',
 "'",
 ' s ',
 ' next ',
 ' to ',
 ' me ',
 '\ue302 and ',
 ' I ',


In [None]:
from torchtext.vocab import build_vocab_from_iterator
from torchtext.data.utils import get_tokenizer


tokenizer = get_tokenizer('subword')
vocab = build_vocab_from_iterator(map(tokenizer, map(lambda x:x.strip(),train_iter)), specials=["[UNK]", "[SEP]", "[EOS]"], max_tokens=15346)
vocab.set_default_index(vocab["[UNK]"])

def data_process(raw_text_iter):
    """Converts raw text into a flat Tensor."""
    data = [torch.tensor(vocab(tokenizer(item.strip())), dtype=torch.long) for item in raw_text_iter]
    return torch.cat(tuple(filter(lambda t: t.numel() > 0, data)))

train_data = data_process(train_iter)
test_data = data_process(test_iter)
val_data = data_process(val_iter)

In [None]:
def batchify(data, seq_len):
    """Divides the data into ``bsz`` separate sequences, removing extra elements
    that wouldn't cleanly fit.

    Arguments:
        data: Tensor, shape ``[N]``
        seq_len: int, sequence length

    Returns:
        Tensor of shape ``[seq_len, seq_len // bsz]``
    """
    bsz = data.size(0) // seq_len
    data = data[:seq_len * bsz]
    data = data.view(bsz, seq_len).t().contiguous()
    return data.to(device)

# Set to 5 for keeping
seq_len = 256

train_data = batchify(train_data, seq_len)
val_data = batchify(val_data, seq_len)
test_data = batchify(test_data, seq_len)

In [None]:
train_data.shape

torch.Size([256, 996])

Here we are generating source and target vector
Target is the right shifted version of source text
We are going to generate the text in Seq2Seq manner
For this we will predict the next word based on the left context
which is also known as Causal Language modelling


In [None]:
bptt = 35
def get_batch(source, i):
    """
    Args:
        source: Tensor, shape ``[full_seq_len, batch_size]``
        i: int

    Returns:
        tuple (data, target), where data has shape ``[seq_len, batch_size]`` and
        target has shape ``[seq_len * batch_size]``
    """
    seq_len = min(bptt, len(source) - 1 - i)
    data = source[i:i+seq_len]
    target = source[i+1:i+1+seq_len].reshape(-1)
    return data, target

## Modelling

We are goint to use a custom transformer language model to generate lyrics

In [None]:
import math
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import TransformerEncoder, TransformerEncoderLayer

class PositionalEncoding(nn.Module):

    def __init__(self, d_model: int, dropout: float = 0.1, max_len: int = 5000):
        super().__init__()
        self.dropout = nn.Dropout(p=dropout)
        # Positional encoding calculation
        position = torch.arange(max_len).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))
        pe = torch.zeros(max_len, 1, d_model)
        pe[:, 0, 0::2] = torch.sin(position * div_term)
        pe[:, 0, 1::2] = torch.cos(position * div_term)
        self.register_buffer('pe', pe)

    def forward(self, x):
        """
        Arguments:
            x: Tensor, shape ``[seq_len, batch_size, embedding_dim]``
        """
        x = x + self.pe[:x.size(0)]
        return self.dropout(x)

class TransformerModel(nn.Module):

    def __init__(self, ntoken: int, d_model: int, nhead: int, d_hid: int,
                 nlayers: int, dropout: float = 0.5):
        super().__init__()
        self.model_type = 'Transformer'

        # Positional encoding layer
        self.pos_encoder = PositionalEncoding(d_model, dropout)

        # Transformer encoder layers
        encoder_layers = TransformerEncoderLayer(d_model, nhead, d_hid, dropout)
        self.transformer_encoder = TransformerEncoder(encoder_layers, nlayers)

        # Word embedding layer
        self.encoder = nn.Embedding(ntoken, d_model)
        self.d_model = d_model

        # Linear decoder layer
        self.decoder = nn.Linear(d_model, ntoken)

        self.init_weights()

    def init_weights(self):
        initrange = 0.1
        self.encoder.weight.data.uniform_(-initrange, initrange)
        self.decoder.bias.data.zero_()
        self.decoder.weight.data.uniform_(-initrange, initrange)

    def forward(self, src, src_mask=None):
        """
        Arguments:
            src: Tensor, shape ``[seq_len, batch_size]``
            src_mask: Tensor, shape ``[seq_len, seq_len]``

        Returns:
            output Tensor of shape ``[seq_len, batch_size, ntoken]``
        """
        # Generating embeddings from encoder and normalizing it with model dimensions
        src = self.encoder(src) * math.sqrt(self.d_model)
        # Encoding position in the text embeddings
        src = self.pos_encoder(src)
        # Transformer encoder
        output = self.transformer_encoder(src, src_mask)
        # Linear decoder
        output = self.decoder(output)
        return output

    def generate(self, idx, max_new_tokens):
      for _ in range(max_new_tokens):
        logits = self(idx)
        logits = logits[:, -1, :]
        probs = torch.softmax(logits, dim=-1)
        idx_next = torch.multinomial(probs, num_samples=1)
        idx = torch.cat((idx, idx_next), dim=1)
      return idx

def generate_square_subsequent_mask(sz: int):
    """Generates an upper-triangular matrix of ``-inf``, with zeros on ``diag``."""
    return torch.triu(torch.ones(sz, sz) * float('-inf'), diagonal=1)

Config is a dictionary that contains hyperparameters needed for training

In [None]:
config = {
    "ntoken": len_vocab,
    "d_model": 500,
    "d_hid": 500,
    "nlayers": 2,
    "nhead": 2,
    "dropout": 0.2,
    "lr": 5.0
}


model = TransformerModel(
    ntoken = config["ntoken"],
    d_model = config["d_model"],
    nhead = config["nhead"],
    d_hid = config["d_hid"],
    nlayers = config["nlayers"],
    dropout = config["dropout"]
    ).to(device)

## Training

In [None]:
import time

criterion = nn.CrossEntropyLoss()
lr = config["lr"]  # learning rate
# Define the optimizer using stochastic gradient descent (SGD)
optimizer = torch.optim.SGD(model.parameters(), lr=lr)
# Define a learning rate scheduler that decreases the learning rate over time
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1.0, gamma=0.95)

def train(model):
    model.train()  # turn on train mode
    total_loss = 0.
    log_interval = 200
    start_time = time.time()

    num_batches = len(train_data) // bptt
    for batch, i in enumerate(range(0, train_data.size(0) - 1, bptt)):
        data, targets = get_batch(train_data, i)
        output = model(data)
        output_flat = output.view(-1, config["ntoken"])
        loss = criterion(output_flat, targets)

        optimizer.zero_grad()
        loss.backward()

        # Clip the gradients to prevent exploding gradients problem
        torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5)
        optimizer.step()

        total_loss += loss.item()
        if batch % log_interval == 0 and batch > 0:
            lr = scheduler.get_last_lr()[0]
            ms_per_batch = (time.time() - start_time) * 1000 / log_interval
            cur_loss = total_loss / log_interval
            # Calculate perplexity
            ppl = math.exp(cur_loss)

            # Print the training progress and metrics
            print(f'| epoch {epoch:3d} | {batch:5d}/{num_batches:5d} batches | '
                  f'lr {lr:02.2f} | ms/batch {ms_per_batch:5.2f} | '
                  f'loss {cur_loss:5.2f} | ppl {ppl:8.2f}')
            total_loss = 0
            start_time = time.time()

def evaluate(model, eval_data):
    model.eval()  # Turn on evaluation mode
    total_loss = 0.
    with torch.no_grad():
        for i in range(0, eval_data.size(0) - 1, bptt):
            data, targets = get_batch(eval_data, i)
            seq_len = data.size(0)

            output = model(data)
            output_flat = output.view(-1, config["ntoken"])

            # Compute the loss and accumulate it
            total_loss += seq_len * criterion(output_flat, targets).item()
    # Return the average loss over the evaluation data
    return total_loss / (len(eval_data) - 1)

In [None]:
best_val_loss = float('inf')
epochs = 20

best_model_params_path = os.path.join("/content/drive/MyDrive/Colab_Notebooks/Outsystems/RNN_for_lang_modelling/saved_models/model.pth")

for epoch in range(1, epochs + 1):
  epoch_start_time = time.time()
  train(model)
  val_loss = evaluate(model, val_data)
  val_ppl = math.exp(val_loss)
  elapsed = time.time() - epoch_start_time
  print('-' * 89)
  print(f'| end of epoch {epoch:3d} | time: {elapsed:5.2f}s | '
        f'valid loss {val_loss:5.2f} | valid ppl {val_ppl:8.2f}')
  print('-' * 89)

  if val_loss < best_val_loss:
    best_val_loss = val_loss
    torch.save(model.state_dict(), best_model_params_path)

  scheduler.step()
  model.load_state_dict(torch.load(best_model_params_path))

-----------------------------------------------------------------------------------------
| end of epoch   1 | time:  6.74s | valid loss 22.67 | valid ppl 6993987391.52
-----------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
| end of epoch   2 | time:  6.12s | valid loss  7.54 | valid ppl  1882.11
-----------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
| end of epoch   3 | time:  6.14s | valid loss  7.00 | valid ppl  1098.20
-----------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
| end of epoch   4 | time:  6.24s | valid loss  6.73 | valid ppl   840.09
---------------------------------------------------------------------

## Inference

In [None]:
save_path = "/content/drive/MyDrive/Colab_Notebooks/Outsystems/RNN_for_lang_modelling/saved_models/model.pth"
model = TransformerModel(
    ntoken = config["ntoken"],
    d_model = config["d_model"],
    nhead = config["nhead"],
    d_hid = config["d_hid"],
    nlayers = config["nlayers"],
    dropout = config["dropout"]
    ).to(device)

with torch.no_grad():
  model.load_state_dict(torch.load(save_path))

In [None]:
# prompt = tokenizer(test[1][:70].strip())
# context = torch.tensor([vocab[item] for item in prompt], dtype=torch.long).unsqueeze(0).cuda()
# itos = vocab.get_itos()
# decode = lambda l: ''.join([itos[i] for i in l])
# print(decode(model.generate(context, max_new_tokens=100)[0].tolist()))

In [None]:
def generate_text(prompt, max_seq_len, temperature, model, tokenizer, vocab, device=device, beam_width=4, seed=0):
    # Set the model to evaluation mode
    model.eval()
    # Set the random seed if provided
    if seed is not None:
        torch.manual_seed(seed)

    # Tokenize the prompt and convert to indices using the vocabulary
    tokens = tokenizer(prompt.strip())
    prompt_indices = [vocab[t] for t in tokens]

    with torch.no_grad():
        beam = [(prompt_indices, 0.0)] # Initialize the beam with the prompt
        completed_sequences = [] # Store completed sequences

        for _ in range(max_seq_len):
            candidates = [] # Store candidate sequences for the next step

            # Expand the beam by generating new candidates
            for seq_indices, seq_score in beam:
                input_tensor = torch.LongTensor(seq_indices).unsqueeze(1).to(device)
                output = model(input_tensor)
                logits = output[-1, -1, :]

                probs = torch.softmax(logits / temperature, dim=-1)
                topk_probs, topk_indices = torch.topk(probs, beam_width)

                # Generate new candidates based on top-k probabilities
                for prob, index in zip(topk_probs.squeeze(), topk_indices.squeeze()):
                    new_seq_indices = seq_indices + [index.item()]
                    new_seq_score = seq_score - torch.log(prob).item()
                    candidates.append((new_seq_indices, new_seq_score))

            # Select top-k candidates for the next iteration
            candidates = sorted(candidates, key=lambda x: x[1])[:beam_width]
            beam = []

            # Check if any candidates have completed sequences
            for candidate_indices, candidate_score in candidates:
                if candidate_indices[-1] == vocab["[EOS]"]:
                    completed_sequences.append((candidate_indices, candidate_score))
                else:
                    beam.append((candidate_indices, candidate_score))

            # Break the loop if enough completed sequences have been found
            if len(completed_sequences) >= beam_width:
                break
    try:
      # Select the best completed sequence with the lowest score
      best_sequence_indices, _ = min(completed_sequences, key=lambda x: x[1])
    except ValueError:
      # If no completed sequences are found, select the best candidate from the beam
      best_sequence_indices, _ = min(beam, key=lambda x: x[1])

    # Convert the sequence indices back to tokens
    itos = vocab.get_itos()
    generated_text = [itos[i] for i in best_sequence_indices]
    return generated_text

In [None]:
prompt = test[1][:70]
max_seq_len = 100

temperatures = [0.5, 0.7, 0.75, 0.8, 1.0]
for temperature in temperatures:
  generation = generate_text(prompt, max_seq_len, temperature, model, tokenizer, vocab)
  print(str(temperature)+'\n'+' '.join(generation)+'\n')

0.5
 how  '  s   one   to   know  ?   I  '  d   meet   you   where   the   spirit   meets   the   bones   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a 

0.7
 how  '  s   one   to   know  ?   I  '  d   meet   you   where   the   spirit   meets   the   bones   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   a   in   

In [None]:
prompt

"How's one to know? I'd meet you where the spirit meets the bones In a "

Due to limited vocabulary in training and less data, the generated text is not cohesive. This is also corroborated by high perplexity (57.50). However, we can tackle this problem by data augmentation or by using a task agnostic pre-trained language model. In the second notebook "GPT2_for_LM" we use finetune a GPT2 model on the text.txt file to generate song lyrics