# How to trace/convert Transformer model into Triton acceptable models?

Load necessary libraries

In [None]:
import torch
from transformers import AutoTokenizer, AutoModel
from torch.nn import functional as F

# Load and Convert Hugging Face Model
tokenizer = AutoTokenizer.from_pretrained('deepset/sentence_bert')
model = AutoModel.from_pretrained('deepset/sentence_bert')

In [None]:
# dummy inputs for tracing
sentence = 'Who are you voting for in 2020?'
labels = ['business', 'art & culture', 'politics']

# run inputs through model and mean-pool over the sequence
# dimension to get sequence-level representations
inputs = tokenizer.batch_encode_plus([sentence] + labels,
                                     return_tensors='pt', max_length=256,
                                     truncation=True, padding='max_length')
input_ids = inputs['input_ids']
attention_mask = inputs['attention_mask']

In [None]:
# input shapes
input_ids.shape, attention_mask.shape

# Tracing PyTorch Model

Conversion of the model is done using its JIT traced version. According to PyTorch’s documentation: ‘Torchscript’ is a way to create serializable and optimizable models from PyTorch code”. It allows the developer to export their model to be re-used in other programs, such as efficiency-oriented C++ programs. Exporting a model requires: Dummy inputs and Standard length to execute the model’s forward pass. During the model’s forward pass with dummy inputs, PyTorch keeps the track of different operations on each tensor and records these operations to create the “trace” of the model. Since the created trace is relative to the dummy input dimensions, therefore the model inputs in the future will be constrained by the dimension of the dummy input, and will not work for other sequences length or batch size. It is therefore recommended to trace the model with the largest dummy input dimension that you can think can be fed to the model in the future. Apart from this, we can always use padding or truncation on input sequences.

In [None]:
class PyTorch_to_TorchScript(torch.nn.Module):
    def __init__(self):
        super(PyTorch_to_TorchScript, self).__init__()
        self.model = AutoModel.from_pretrained('deepset/sentence_bert')
    def forward(self, data, attention_mask=None):
        return self.model(data, attention_mask)[0]

In [None]:
# after trace it will save the model in cwd
pt_model = PyTorch_to_TorchScript().eval()

remove_attributes = []
for key, value in vars(pt_model).items():
    if value is None:
        remove_attributes.append(key)

for key in remove_attributes:
    delattr(pt_model, key)

traced_script_module = torch.jit.trace(pt_model, (input_ids, attention_mask), strict=False)
traced_script_module.save("./model.pt")


# Next, save the model in the model repository folder with the following directory structure:

model_repository_path/
|- <pytorch_model_name>/
|  |- config.pbtxt
|  |- 1/
|     |- model.pt
|

In [None]:
import shutil
import os
os.mkdir('../../model_repository/deepset')
os.mkdir('../../model_repository/deepset/1')
shutil.copy('model.pt', '../../model_repository/deepset/1')

# Writing the Model Configuration File

This configuration file, config.pbtxt contains the detail of permissible input/outputs types and shapes, favorable batch sizes, versioning, platform since the server doesn't know details about these configurations, therefore, we write them into a separate configuration file. </br>

Configuration file for Hugging Face DeepSentence Model

```
name: "deepset"
platform: "pytorch_libtorch"
input [
 {
    name: "input__0"
    data_type: TYPE_INT32
    dims: [4, 256]
  } ,
{
    name: "input__1"
    data_type: TYPE_INT32
    dims: [4, 256]
  }
]
output {
    name: "output__0"
    data_type: TYPE_FP32
    dims: [4, 256, 768]
  }
```