# Pytorch Text - Better Language modeling
Notebook for following along with Pytorch Text interpretation tutorial, looking at better transformer (BT) fastpath, using the [Pytorch](https://pytorch.org/tutorials/beginner/bettertransformer_tutorial.html)  website tutorial.

### Choices for data

<br>

### Libaries and Modules
Importing the necessary libaries and modules for the notebook.

In [1]:
#Import cell
import captum
import copy
import json
import matplotlib as mpl
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import math
import numpy as np
import os, sys
import pandas as pd
import pickle as pk
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchtext
import torchvision
import torchvision.models as models
import torchvision.transforms as transforms

from typing import Tuple
from torch import nn, Tensor
from torch.nn import TransformerEncoder, TransformerEncoderLayer
from torch.utils.data import dataset
from torchtext.models import RobertaClassificationHead
from torchtext.functional import to_tensor

#device = torch.device('cuda' if torch.cuda.is_available else 'cpu')
device = 'cpu' #Cuda having issues on PC, so manual setting to cpu
print(f"Device: {device}")


print("Imports complete")

Device: cpu
Imports complete


<br>

### Importing and preparing data sets
Importing and preparing the data for the models.

In [2]:
#Gather datasets and prepare them for consumption


In [3]:
#Importing data sets
small_input_batch = ["Hello world", "How are you!"]
big_input_batch = ["Hello world", "How are you!",
                   """`Well, Prince, so Genoa and Lucca are now just family estates of the
Buonapartes. But I warn you, if you don't tell me that this means war,
if you still try to defend the infamies and horrors perpetrated by
that Antichrist- I really believe he is Antichrist- I will have
nothing more to do with you and you are no longer my friend, no longer
my 'faithful slave,' as you call yourself! But how do you do? I see
I have frightened you- sit down and tell me all the news.`

It was in July, 1805, and the speaker was the well-known Anna
Pavlovna Scherer, maid of honor and favorite of the Empress Marya
Fedorovna. With these words she greeted Prince Vasili Kuragin, a man
of high rank and importance, who was the first to arrive at her
reception. Anna Pavlovna had had a cough for some days. She was, as
she said, suffering from la grippe; grippe being then a new word in
St. Petersburg, used only by the elite."""]

print("Data sets successfully imported.")

Data sets successfully imported.


In [4]:
#Loader definitions

print(f"Loaders defined, running on device: {device}")

Loaders defined, running on device: cpu


In [5]:
#Setting seed value
torch.manual_seed(1247)

<torch._C.Generator at 0x1bf93fc0030>

<br>

### Class Definitions
<b>Classes:</b><br>
<ul>
    <li>TransformerModel - Language interpretting model.</li>
    <li>PositionalEncoding - Injects information about the relative or absolute position of tokens in the sequence.</li>
</ul>

In [6]:
#Class definition cell

print("Classes defined.")

Classes defined.


<br>

### Calculation functions
<b>Functions:</b><br>
<ul>
    <li></li>
</ul>

In [7]:
#Calculation functions cell

print("Calculation functions defined.")

Calculation functions defined.


<br>

### Plotting functions
<b>Functions:</b>
<ul>
    <li></li>
</ul>

In [8]:
#Plotting functions Cell

print("Plotting functions defined.")

Plotting functions defined.


<br>

### Main code
#### Instantiating the model

In [9]:
xlmr_large = torchtext.models.XLMR_LARGE_ENCODER
classifier_head = torchtext.models.RobertaClassificationHead(
                    num_classes=2, input_dim = 1024)
model = xlmr_large.get_model(head=classifier_head)
transform = xlmr_large.transform()

#### Dataset Setup

In [10]:
input_batch = big_input_batch #Change between being and small
ITERATIONS = 10

model_input = to_tensor(transform(input_batch), padding_value=1)
output = model(model_input)
output.shape

torch.Size([3, 2])

#### Running Iterations
Here the model BT fast path is taken by calling `model.eval()` and disabling gradient collection with `torch.no_grad()`.

In [11]:
print("slow path:")
print("==========")
with torch.autograd.profiler.profile(use_cuda=False) as prof:
    for i in range(ITERATIONS):
        output = model(model_input)
print(prof)

slow path:
---------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                       Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  
---------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                   aten::eq         0.00%      28.000us         0.00%      28.000us      28.000us             1  
            aten::embedding         0.00%      25.000us         0.00%     604.000us     604.000us             1  
              aten::reshape         0.00%       3.000us         0.00%       7.000us       7.000us             1  
       aten::_reshape_alias         0.00%       4.000us         0.00%       4.000us       4.000us             1  
         aten::index_select         0.00%     559.000us         0.00%     569.000us     569.000us             1  
                aten::empty         0.00%       2.000us         0.00%       2

In [12]:
model.eval()

RobertaModel(
  (encoder): RobertaEncoder(
    (transformer): TransformerEncoder(
      (token_embedding): Embedding(250002, 1024, padding_idx=1)
      (layers): TransformerEncoder(
        (layers): ModuleList(
          (0): TransformerEncoderLayer(
            (self_attn): MultiheadAttention(
              (out_proj): NonDynamicallyQuantizableLinear(in_features=1024, out_features=1024, bias=True)
            )
            (linear1): Linear(in_features=1024, out_features=4096, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
            (linear2): Linear(in_features=4096, out_features=1024, bias=True)
            (norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            (norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            (dropout1): Dropout(p=0.1, inplace=False)
            (dropout2): Dropout(p=0.1, inplace=False)
          )
          (1): TransformerEncoderLayer(
            (self_attn): MultiheadAttention(
              (ou

In [13]:
print("fast path:")
print("==========")
with torch.autograd.profiler.profile(use_cuda=False) as prof:
    with torch.no_grad():
        for i in range(ITERATIONS):
            output = model(model_input)
print(prof)

fast path:
----------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                    Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  
----------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                aten::eq         0.00%      21.000us         0.00%      21.000us      21.000us             1  
                         aten::embedding         0.00%       6.000us         0.00%     480.000us     480.000us             1  
                           aten::reshape         0.00%       3.000us         0.00%       5.000us       5.000us             1  
                    aten::_reshape_alias         0.00%       2.000us         0.00%       2.000us       2.000us             1  
                      aten::index_select         0.00%     460.000us         0.00%     467.000us    

#### Run and benchmark
Run and benchmark inference on DEVICE with and without BT fastpath.

In [14]:
print(f"BT sparsity setting: "
      f"{model.encoder.transformer.layers.enable_nested_tensor}")

model.encoder.transformer.layers.enable_nested_tensor=False

print(f"BT sparsity setting: "
      f"{model.encoder.transformer.layers.enable_nested_tensor}")

BT sparsity setting: False
BT sparsity setting: False


The model can also be sped up by running it on the GPU, using if `model.to(DEVICE)` and `model_input = model_input.to(DEVICE)` if `torch.cuda.is_available == True`.

<br>