# LSTM Language Models

Experimentation of LSTM Language models with a sample text corpus, and try to generate the prompt.

Assignment A2 : st125214

# torch and torchtext compatiblilty

Torch has compatibility requirements with torchtext.

I prepare my python environment first for the compatibality.

Torch version       = 2.2.0+cu118

Torchtext version   = 0.16.2+cpu

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
print("Torch version:", torch.__version__)
torch.backends.cudnn.deterministic = False # Choose this because speed is a priority and exact reproducibility isn't critical.

Torch version: 2.2.0+cu118


In [2]:
import torchtext
print("TorchText version:", torchtext.__version__)

TorchText version: 0.16.2+cpu


In [3]:
import datasets, math
from tqdm import tqdm

In [4]:
# assign the device as cuda if available, else cpu
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

cuda


In [5]:
SEED = 69
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

## 1. Load data - James Bond Books

We will be using 13 James Books books written by ian fleming, which contains a large corpus of text, perfect for language modeling task.  This time, we will use the `datasets` library from HuggingFace to load.

Total books - 13 novels

Size of finished text file ~ 5 Mb

Number of rows (vectors) - 36,213

In [6]:
import os

# Marked out the file combination code segment here
'''
# Combine all text files in the 'james bond' folder into a single text file
folder_path = 'james bond'
combined_text = ''

for filename in os.listdir(folder_path):
    if filename.endswith('.txt'):
        with open(os.path.join(folder_path, filename), 'r', encoding='utf-8') as file:
            combined_text += file.read() + '\n'

# Save the combined text to a single file
with open('combined_james_bond.txt', 'w', encoding='utf-8') as file:
    file.write(combined_text)
'''

# load the saved combined text file
dataset = datasets.load_dataset('text', data_files={'train': 'combined_james_bond.txt'})
print(dataset)

Generating train split: 0 examples [00:00, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 35890
    })
})


In [7]:
# determine how many text vectors (rows) are in the dataset
print(dataset['train'].shape)

(35890, 1)


## 2. Preprocessing

### Split the data set into Train, Validate and Test

Since the original text file is imported only as 'train', we will split it into train, validate and test.

We will first split main set into train and test, test set occupying 15% of main set.

Later, train set is split again into train and validate set, validate set occupying 15% of train set.

### Tokenizing

Simply tokenize the given text to tokens.

Uses torchtext to get a simple English tokenizer, which tokenizes text based on basic rules (e.g., splitting by spaces and handling punctuation).

It returns a dictionary containing the tokenized output.

In [8]:

# Step 1: First split (85% train, 15% test)
split1 = dataset['train'].train_test_split(test_size=0.15, seed=SEED)
dataset = datasets.DatasetDict({
    'train': split1['train'],
    'test': split1['test']  # 15% test set stored
})

# Step 2: Further split train into train-validation (85% train, 15% validation)
split2 = dataset['train'].train_test_split(test_size=0.15, seed=SEED)
dataset = datasets.DatasetDict({
    'train': split2['train'],  
    'validation': split2['test'],
    'test': dataset['test']  
})

print(dataset)

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 25930
    })
    validation: Dataset({
        features: ['text'],
        num_rows: 4576
    })
    test: Dataset({
        features: ['text'],
        num_rows: 5384
    })
})


In [9]:
# Tokenize the text data using the basic_english tokenizer

tokenizer = torchtext.data.utils.get_tokenizer('basic_english')

tokenize_data = lambda example, tokenizer: {'tokens': tokenizer(example['text'])}

tokenized_dataset = dataset.map(tokenize_data, remove_columns=['text'], fn_kwargs={'tokenizer': tokenizer})

Map:   0%|          | 0/25930 [00:00<?, ? examples/s]

Map:   0%|          | 0/4576 [00:00<?, ? examples/s]

Map:   0%|          | 0/5384 [00:00<?, ? examples/s]

In [10]:
print(tokenized_dataset['train'][22:25]['tokens'])

[[], [], ['bond', 'took', 'the', 'sharp', 'corner', 'and', 'accelerated', 'up', 'to', 'fifty', '.', 'the', 'viaduct', 'carrying', 'the', 'paris', 'autoroute', 'loomed', 'up', 'ahead', '.', 'the', 'dark', 'mouth', 'of', 'the', 'tunnel', 'beneath', 'it', 'opened', 'and', 'swallowed', 'him', '.', 'the', 'noise', 'of', 'his', 'exhaust', 'was', 'gigantic', ',', 'and', 'for', 'an', 'instant', 'there', 'was', 'a', 'tunnel', 'smell', 'of', 'cold', 'and', 'damp', '.', 'then', 'he', 'was', 'out', 'in', 'the', 'sunshine', 'again', 'and', 'immediately', 'across', 'the', 'carrefour', 'royal', '.', 'ahead', 'the', 'oily', 'tarmac', 'glittered', 'dead', 'straight', 'for', 'two', 'miles', 'through', 'the', 'enchanted', 'forest', 'and', 'there', 'was', 'a', 'sweet', 'smell', 'of', 'leaves', 'and', 'dew', '.', 'bond', 'cut', 'his', 'speed', 'to', 'forty', '.', 'the', 'driving-mirror', 'by', 'his', 'left', 'hand', 'shivered', 'slightly', 'with', 'his', 'speed', '.', 'it', 'showed', 'nothing', 'but', 'an'

### Numericalizing

We will tell torchtext to add any word that has occurred at least three times in the dataset to the vocabulary because otherwise it would be too big.  Also we shall make sure to add `unk` and `eos`.

In [11]:
vocab = torchtext.vocab.build_vocab_from_iterator(tokenized_dataset['train']['tokens'], min_freq=3)
vocab.insert_token('<unk>', 0)
vocab.insert_token('<eos>', 1)
vocab.set_default_index(vocab['<unk>'])

In [12]:
print(len(vocab))

11932


In [13]:
print(vocab.get_itos()[:10])

['<unk>', '<eos>', '.', 'the', ',', 'and', 'of', "'", 'a', 'to']


## 3. Prepare the batch loader

### Prepare data

Data was prepared using batch size of 128, and tokenized sets for train, validate and test sets were created.  

In [14]:
# Function to convert text data to numerical data from the vocabulary

def get_data(dataset, vocab, batch_size):
    data = []
    for example in dataset:
        if example['tokens']:
            tokens = example['tokens'].append('<eos>')
            tokens = [vocab[token] for token in example['tokens']]
            data.extend(tokens)
    data = torch.LongTensor(data)
    num_batches = data.shape[0] // batch_size
    data = data[:num_batches * batch_size]
    data = data.view(batch_size, num_batches) #view vs. reshape (whether data is contiguous)
    return data #[batch size, seq len]

In [15]:
print(tokenized_dataset.keys())

dict_keys(['train', 'validation', 'test'])


In [16]:
batch_size = 128
train_data = get_data(tokenized_dataset['train'], vocab, batch_size)
valid_data = get_data(tokenized_dataset['validation'], vocab, batch_size)
test_data  = get_data(tokenized_dataset['test'],  vocab, batch_size)

In [17]:
train_data.shape

torch.Size([128, 6117])

## 4. Modeling 

**Model Architecture**

The model is an **LSTM-based Language Model** implemented in PyTorch. The model has following key components:

1. **Embedding Layer**:
   - Converts token indices into dense vectors of size `emb_dim`.
   - Input: `[batch_size, seq_len]`
   - Output: `[batch_size, seq_len, emb_dim]`

2. **LSTM Layer**:
   - Processes the sequence of embeddings to capture contextual information.
   - Input: `[batch_size, seq_len, emb_dim]`
   - Output:
     - `output`: `[batch_size, seq_len, hid_dim]` (hidden states for each time step)
     - `hidden`: `[num_layers, batch_size, hid_dim]` (final hidden state and cell state)

3. **Dropout Layer**:
   - Applied after the embedding and LSTM layers to prevent overfitting.

4. **Fully Connected Layer**:
   - Maps the LSTM output to the vocabulary size (`vocab_size`).
   - Input: `[batch_size, seq_len, hid_dim]`
   - Output: `[batch_size, seq_len, vocab_size]`

5. **Initialization**:
   - Weights are initialized uniformly to small values for stable training.


**Training Process**

### Data Preparation
1. **Tokenization and Numericalization**:
   - Text data is tokenized and converted into integer indices using a vocabulary.
   - Special tokens like `<eos>` (end of sequence) are added.

2. **Batching**:
   - Numericalized data is split into batches of size `batch_size`.
   - Each batch is divided into sequences of length `seq_len`.

### Training Loop
1. **Initialization**:
   - The LSTM hidden state is initialized to zeros at the start of each epoch.

2. **Forward Pass**:
   - The model predicts the next token for each position in the sequence.
   - The output is reshaped for the loss function.

3. **Loss Calculation**:
   - **CrossEntropyLoss** computes the difference between predicted and actual tokens.
   - Loss is averaged over the sequence length.

4. **Backpropagation**:
   - Gradients are computed and clipped to prevent exploding gradients.
   - The optimizer updates the model parameters.

5. **Evaluation**:
   - The model is evaluated on the validation set after each epoch.
   - The learning rate is adjusted using a scheduler if the validation loss plateaus.

6. **Checkpointing**:
   - The model with the best validation loss is saved.


## Hyperparameters
- **`vocab_size`**: Size of the vocabulary.(12260)
- **`emb_dim`**: Dimension of the embedding layer (1024).
- **`hid_dim`**: Dimension of the LSTM hidden state (1024).
- **`num_layers`**: Number of LSTM layers (2).
- **`dropout_rate`**: Dropout probability (0.65).
- **`batch_size`**: Number of sequences per batch (128).
- **`seq_len`**: Length of each sequence (50).
- **`clip`**: Gradient clipping threshold (0.25).
- **`lr`**: Learning rate (1e-3).
- **`n_epochs`**: Number of training epochs (25).

---

## Metrics
- **Perplexity**:
  - Used to evaluate the model's performance.
  - Defined as `exp(loss)`.
  - Lower perplexity indicates better performance.

---

## Training Output
During training, the following metrics are printed for each epoch:
- **Train Perplexity**: Perplexity on the training set.
- **Valid Perplexity**: Perplexity on the validation set.

Total training time is calculated and later displayed (for the sake of experimentation).


In [18]:
class LSTMLanguageModel(nn.Module):
    def __init__(self, vocab_size, emb_dim, hid_dim, num_layers, dropout_rate):
        super().__init__()
        self.num_layers = num_layers
        self.hid_dim    = hid_dim
        self.emb_dim    = emb_dim
        
        self.embedding  = nn.Embedding(vocab_size, emb_dim)
        self.lstm       = nn.LSTM(emb_dim, hid_dim, num_layers=num_layers, dropout=dropout_rate, batch_first=True)
        self.dropout    = nn.Dropout(dropout_rate)
        self.fc         = nn.Linear(hid_dim, vocab_size)
        
        self.init_weights()
    
    def init_weights(self):
        init_range_emb = 0.1
        init_range_other = 1/math.sqrt(self.hid_dim)
        self.embedding.weight.data.uniform_(-init_range_emb, init_range_other)
        self.fc.weight.data.uniform_(-init_range_other, init_range_other)
        self.fc.bias.data.zero_()
        for i in range(self.num_layers):
            self.lstm.all_weights[i][0] = torch.FloatTensor(self.emb_dim,
                self.hid_dim).uniform_(-init_range_other, init_range_other) #We
            self.lstm.all_weights[i][1] = torch.FloatTensor(self.hid_dim,   
                self.hid_dim).uniform_(-init_range_other, init_range_other) #Wh
    
    def init_hidden(self, batch_size, device):
        hidden = torch.zeros(self.num_layers, batch_size, self.hid_dim).to(device)
        cell   = torch.zeros(self.num_layers, batch_size, self.hid_dim).to(device)
        return hidden, cell
        
    def detach_hidden(self, hidden):
        hidden, cell = hidden
        hidden = hidden.detach() #not to be used for gradient computation
        cell   = cell.detach()
        return hidden, cell
        
    def forward(self, src, hidden):
        #src: [batch_size, seq len]
        embedding = self.dropout(self.embedding(src)) #harry potter is
        #embedding: [batch-size, seq len, emb dim]
        output, hidden = self.lstm(embedding, hidden)
        #ouput: [batch size, seq len, hid dim]
        #hidden: [num_layers * direction, seq len, hid_dim]
        output = self.dropout(output)
        prediction =self.fc(output)
        #prediction: [batch_size, seq_len, vocab_size]
        return prediction, hidden

## 5. Training 

Follows very basic procedure.  One note is that some of the sequences that will be fed to the model may involve parts from different sequences in the original dataset or be a subset of one (depending on the decoding length). For this reason we will reset the hidden state every epoch, this is like assuming that the next batch of sequences is probably always a follow up on the previous in the original dataset.

In [19]:
vocab_size = len(vocab)
emb_dim = 1024                # 400 in the paper
hid_dim = 1024                # 1150 in the paper
num_layers = 2                # 3 in the paper
dropout_rate = 0.65              
lr = 1e-3                     

In [20]:
# Save the model hyperparameters for later use
import pickle

Data = {
    'vocab_size': vocab_size,
    'emb_dim': emb_dim,
    'hid_dim': hid_dim,
    'num_layers': num_layers,
    'dropout_rate': dropout_rate,
    'tokenizer': tokenizer,
    'vocab': vocab
}

pickle.dump(Data,open('Data.pkl', 'wb'))

In [21]:
# Load the model hyperparameters and initialize the model

model      = LSTMLanguageModel(vocab_size, emb_dim, hid_dim, num_layers, dropout_rate).to(device)
optimizer  = optim.Adam(model.parameters(), lr=lr)
criterion  = nn.CrossEntropyLoss()
num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f'The model has {num_params:,} trainable parameters')

The model has 41,242,268 trainable parameters


In [22]:
def get_batch(data, seq_len, idx):
    #data #[batch size, bunch of tokens]
    src    = data[:, idx:idx+seq_len]                   
    target = data[:, idx+1:idx+seq_len+1]  #target simply is ahead of src by 1            
    return src, target

In [23]:
def train(model, data, optimizer, criterion, batch_size, seq_len, clip, device):
    
    epoch_loss = 0
    model.train()
    # drop all batches that are not a multiple of seq_len
    # data #[batch size, seq len]
    num_batches = data.shape[-1]
    data = data[:, :num_batches - (num_batches -1) % seq_len]  #we need to -1 because we start at 0
    num_batches = data.shape[-1]
    
    #reset the hidden every epoch
    hidden = model.init_hidden(batch_size, device)
    
    for idx in tqdm(range(0, num_batches - 1, seq_len), desc='Training: ',leave=False):
        optimizer.zero_grad()
        
        #hidden does not need to be in the computational graph for efficiency
        hidden = model.detach_hidden(hidden)

        src, target = get_batch(data, seq_len, idx) #src, target: [batch size, seq len]
        src, target = src.to(device), target.to(device)
        batch_size = src.shape[0]
        prediction, hidden = model(src, hidden)               

        #need to reshape because criterion expects pred to be 2d and target to be 1d
        prediction = prediction.reshape(batch_size * seq_len, -1)  #prediction: [batch size * seq len, vocab size]  
        target = target.reshape(-1)
        loss = criterion(prediction, target)
        
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
        optimizer.step()
        epoch_loss += loss.item() * seq_len
    return epoch_loss / num_batches

In [24]:
def evaluate(model, data, criterion, batch_size, seq_len, device):

    epoch_loss = 0
    model.eval()
    num_batches = data.shape[-1]
    data = data[:, :num_batches - (num_batches -1) % seq_len]
    num_batches = data.shape[-1]

    hidden = model.init_hidden(batch_size, device)

    with torch.no_grad():
        for idx in range(0, num_batches - 1, seq_len):
            hidden = model.detach_hidden(hidden)
            src, target = get_batch(data, seq_len, idx)
            src, target = src.to(device), target.to(device)
            batch_size= src.shape[0]

            prediction, hidden = model(src, hidden)
            prediction = prediction.reshape(batch_size * seq_len, -1)
            target = target.reshape(-1)

            loss = criterion(prediction, target)
            epoch_loss += loss.item() * seq_len
    return epoch_loss / num_batches

In [25]:
def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs


Here we will be using a `ReduceLROnPlateau` learning scheduler which decreases the learning rate by a factor, if the loss don't improve by a certain epoch.

In [27]:
import time

n_epochs = 25
seq_len  = 50 #<----decoding length
clip    = 0.25

lr_scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.5, patience=0)

best_valid_loss = float('inf')

start = time.time()

for epoch in range(n_epochs):
    train_loss = train(model, train_data, optimizer, criterion, 
                batch_size, seq_len, clip, device)
    valid_loss = evaluate(model, valid_data, criterion, batch_size, 
                seq_len, device)

    lr_scheduler.step(valid_loss)

    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'best-val-lstm_lm_ian_fleming.pt')

    print(f'Epoch: {epoch+1:02}')
    print(f'\tTrain Perplexity: {math.exp(train_loss):.3f}')
    print(f'\tValid Perplexity: {math.exp(valid_loss):.3f}')
    print(f'\tTrain loss : {train_loss:.3f}\tValid loss: {valid_loss:.3f}')
    
# Calculate the elapsed time
end = time.time()
epoch_mins, epoch_secs = epoch_time(start, end)
print(f"Training of {n_epochs} epochs were completed in {epoch_mins}m {epoch_secs}s.")

                                                           

Epoch: 01
	Train Perplexity: 296.300
	Valid Perplexity: 199.111
	Train loss : 5.691	Valid loss: 5.294


                                                           

Epoch: 02
	Train Perplexity: 197.982
	Valid Perplexity: 146.724
	Train loss : 5.288	Valid loss: 4.989


                                                           

Epoch: 03
	Train Perplexity: 157.263
	Valid Perplexity: 125.858
	Train loss : 5.058	Valid loss: 4.835


                                                           

Epoch: 04
	Train Perplexity: 135.572
	Valid Perplexity: 113.747
	Train loss : 4.910	Valid loss: 4.734


                                                           

Epoch: 05
	Train Perplexity: 121.389
	Valid Perplexity: 106.186
	Train loss : 4.799	Valid loss: 4.665


                                                           

Epoch: 06
	Train Perplexity: 111.017
	Valid Perplexity: 101.458
	Train loss : 4.710	Valid loss: 4.620


                                                           

Epoch: 07
	Train Perplexity: 103.012
	Valid Perplexity: 97.178
	Train loss : 4.635	Valid loss: 4.577


                                                           

Epoch: 08
	Train Perplexity: 96.304
	Valid Perplexity: 93.496
	Train loss : 4.568	Valid loss: 4.538


                                                           

Epoch: 09
	Train Perplexity: 90.529
	Valid Perplexity: 91.320
	Train loss : 4.506	Valid loss: 4.514


                                                           

Epoch: 10
	Train Perplexity: 85.590
	Valid Perplexity: 89.120
	Train loss : 4.450	Valid loss: 4.490


                                                           

Epoch: 11
	Train Perplexity: 81.092
	Valid Perplexity: 87.706
	Train loss : 4.396	Valid loss: 4.474


                                                           

Epoch: 12
	Train Perplexity: 77.302
	Valid Perplexity: 86.565
	Train loss : 4.348	Valid loss: 4.461


                                                           

Epoch: 13
	Train Perplexity: 73.619
	Valid Perplexity: 85.491
	Train loss : 4.299	Valid loss: 4.448


                                                           

Epoch: 14
	Train Perplexity: 70.490
	Valid Perplexity: 85.034
	Train loss : 4.255	Valid loss: 4.443


                                                           

Epoch: 15
	Train Perplexity: 67.627
	Valid Perplexity: 84.737
	Train loss : 4.214	Valid loss: 4.440


                                                           

Epoch: 16
	Train Perplexity: 64.959
	Valid Perplexity: 84.536
	Train loss : 4.174	Valid loss: 4.437


                                                           

Epoch: 17
	Train Perplexity: 62.442
	Valid Perplexity: 84.308
	Train loss : 4.134	Valid loss: 4.434


                                                           

Epoch: 18
	Train Perplexity: 60.160
	Valid Perplexity: 83.792
	Train loss : 4.097	Valid loss: 4.428


                                                           

Epoch: 19
	Train Perplexity: 58.042
	Valid Perplexity: 83.443
	Train loss : 4.061	Valid loss: 4.424


                                                           

Epoch: 20
	Train Perplexity: 56.063
	Valid Perplexity: 83.105
	Train loss : 4.026	Valid loss: 4.420


                                                           

Epoch: 21
	Train Perplexity: 54.306
	Valid Perplexity: 83.693
	Train loss : 3.995	Valid loss: 4.427


                                                           

Epoch: 22
	Train Perplexity: 51.466
	Valid Perplexity: 82.603
	Train loss : 3.941	Valid loss: 4.414


                                                           

Epoch: 23
	Train Perplexity: 50.138
	Valid Perplexity: 82.755
	Train loss : 3.915	Valid loss: 4.416


                                                           

Epoch: 24
	Train Perplexity: 48.621
	Valid Perplexity: 82.723
	Train loss : 3.884	Valid loss: 4.416


                                                           

Epoch: 25
	Train Perplexity: 47.915
	Valid Perplexity: 82.437
	Train loss : 3.869	Valid loss: 4.412
Training of 25 epochs were completed in 8m 35s.


## 6. Testing

In [28]:
model.load_state_dict(torch.load('best-val-lstm_lm_ian_fleming.pt',  map_location=device))
test_loss = evaluate(model, test_data, criterion, batch_size, seq_len, device)
print(f'Test Perplexity: {math.exp(test_loss):.3f}')

Test Perplexity: 82.327


## 7. Real-world inference

Here we take the prompt, tokenize, encode and feed it into the model to get the predictions.  We then apply softmax while specifying that we want the output due to the last word in the sequence which represents the prediction for the next word.  We divide the logits by a temperature value to alter the model’s confidence by adjusting the softmax probability distribution.

Once we have the Softmax distribution, we randomly sample it to make our prediction on the next word. If we get <unk> then we give that another try.  Once we get <eos> we stop predicting.
    
We decode the prediction back to strings last lines.

In [29]:
def generate(prompt, max_seq_len, temperature, model, tokenizer, vocab, device, seed=None):
    if seed is not None:
        torch.manual_seed(seed)
    model.eval()
    tokens = tokenizer(prompt)
    indices = [vocab[t] for t in tokens]
    batch_size = 1
    hidden = model.init_hidden(batch_size, device)
    with torch.no_grad():
        for i in range(max_seq_len):
            src = torch.LongTensor([indices]).to(device)
            prediction, hidden = model(src, hidden)
            
            #prediction: [batch size, seq len, vocab size]
            #prediction[:, -1]: [batch size, vocab size] #probability of last vocab
            
            probs = torch.softmax(prediction[:, -1] / temperature, dim=-1)  
            prediction = torch.multinomial(probs, num_samples=1).item()    
            
            while prediction == vocab['<unk>']: #if it is unk, we sample again
                prediction = torch.multinomial(probs, num_samples=1).item()

            if prediction == vocab['<eos>']:    #if it is eos, we stop
                break

            indices.append(prediction) #autoregressive, thus output becomes input

    itos = vocab.get_itos()
    tokens = [itos[i] for i in indices]
    return tokens

### Experimentation

Since james bond books contain mainly of action and fightings, our model will prompt best if we use words related to action scenes...


In [30]:
# experimentation with prompting
prompt = 'Their hands touched lightly and then'
max_seq_len = 30
seed = 69

#smaller the temperature, more diverse tokens but comes 
#with a tradeoff of less-make-sense sentence
temperatures = [1.0, 0.85, 0.7, 0.6, 0.5 ]
for temperature in temperatures:
    generation = generate(prompt, max_seq_len, temperature, model, tokenizer, 
                          vocab, device, seed)
    print(str(temperature)+'\n'+' '.join(generation)+'\n')

1.0
their hands touched lightly and then he brushed it back to hers . he got out , and slit it into the sunshine and began filling her minutely in the air .

0.85
their hands touched lightly and then he brushed it back to hers .

0.7
their hands touched lightly and then he brushed the glass lever out of his pocket and watched it and sat down .

0.6
their hands touched lightly and then he brushed the glass lever out of his pocket and watched the bright powder of the engine . he threw up the safety-catch and cleaned the other back to his

0.5
their hands touched lightly and then he was wearing his hand , but he got to his feet and sat down .



## Experimentation

It was found out that, after experimentation with different prompts, temperature 1.0 returns overfitted text.

Temperature returns between .5 and .7 seems to return more natural responses.