# [Exporting 🤗 Transformers Models](https://huggingface.co/docs/transformers/serialization)

## Using TorchScript in Python

Below is an example, showing how to save, load models as well as how to use the trace for inference. 

### Saving a model
This snippet shows how to use TorchScript to export a BertModel. Here the BertModel is instantiated according to a BertConfig class and then saved to disk under the filename traced_bert.pt

In [1]:
from transformers import BertModel, BertTokenizer, BertConfig
import torch

In [2]:
enc = BertTokenizer.from_pretrained("bert-base-uncased")

Downloading: 100%|██████████| 28.0/28.0 [00:00<00:00, 9.34kB/s]
Downloading: 100%|██████████| 226k/226k [00:01<00:00, 210kB/s]  
Downloading: 100%|██████████| 455k/455k [00:01<00:00, 317kB/s]  
Downloading: 100%|██████████| 570/570 [00:00<00:00, 380kB/s]


In [3]:
# Tokenizing input text
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = enc.tokenize(text)

In [4]:
# Masking one of the input tokens
masked_index = 8
tokenized_text[masked_index] = "[MASK]"
indexed_tokens = enc.convert_tokens_to_ids(tokenized_text)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]

In [5]:
# Creating a dummy input
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
dummy_input = [tokens_tensor, segments_tensors]

In [6]:
# Initializing the model with the torchscript flag
# Flag set to True even though it is not necessary as this model does not have an LM Head.
config = BertConfig(
    vocab_size_or_config_json_file=32000,
    hidden_size=768,
    num_hidden_layers=12,
    num_attention_heads=12,
    intermediate_size=3072,
    torchscript=True,
)

In [7]:
# Instantiating the model
model = BertModel(config)

In [8]:
# The model needs to be in evaluation mode
model.eval()

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          

In [9]:
# If you are instantiating the model with *from_pretrained* you can also easily set the TorchScript flag
model = BertModel.from_pretrained("bert-base-uncased", torchscript=True)

Downloading: 100%|██████████| 420M/420M [00:07<00:00, 58.9MB/s] 
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [10]:
# Creating the trace
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
torch.jit.save(traced_model, "traced_bert.pt")

### Loading a model
This snippet shows how to load the BertModel that was previously saved to disk under the name traced_bert.pt. We are re-using the previously initialised dummy_input.

In [11]:
loaded_model = torch.jit.load("traced_bert.pt")
loaded_model.eval()

RecursiveScriptModule(
  original_name=BertModel
  (embeddings): RecursiveScriptModule(
    original_name=BertEmbeddings
    (word_embeddings): RecursiveScriptModule(original_name=Embedding)
    (position_embeddings): RecursiveScriptModule(original_name=Embedding)
    (token_type_embeddings): RecursiveScriptModule(original_name=Embedding)
    (LayerNorm): RecursiveScriptModule(original_name=LayerNorm)
    (dropout): RecursiveScriptModule(original_name=Dropout)
  )
  (encoder): RecursiveScriptModule(
    original_name=BertEncoder
    (layer): RecursiveScriptModule(
      original_name=ModuleList
      (0): RecursiveScriptModule(
        original_name=BertLayer
        (attention): RecursiveScriptModule(
          original_name=BertAttention
          (self): RecursiveScriptModule(
            original_name=BertSelfAttention
            (query): RecursiveScriptModule(original_name=Linear)
            (key): RecursiveScriptModule(original_name=Linear)
            (value): RecursiveScriptM

In [12]:
all_encoder_layers, pooled_output = loaded_model(*dummy_input)

### Using a traced model for inference
Using the traced model for inference is as simple as using its __call__ dunder method:

In [13]:
traced_model(tokens_tensor, segments_tensors),

((tensor([[[-2.5689e-01, -7.3598e-03, -8.9146e-02,  ..., -1.3546e-01,
             2.3597e-01,  2.4208e-01],
           [-5.8262e-01,  3.1923e-01, -2.8020e-01,  ...,  1.0413e-01,
             1.7953e-01, -4.7086e-01],
           [-3.0671e-01, -2.3213e-01, -1.5938e-01,  ...,  7.0993e-02,
             1.4761e-01,  2.7529e-01],
           ...,
           [ 2.0549e-01, -1.6316e-02, -7.0978e-05,  ..., -1.3032e-01,
             6.1008e-01,  4.2999e-01],
           [-4.9530e-01, -4.6195e-01, -2.9027e-01,  ...,  6.3559e-01,
             6.2100e-01,  1.0318e-01],
           [ 8.2051e-01,  1.8250e-01, -1.1302e-01,  ...,  1.5103e-01,
            -7.6513e-01, -1.9481e-02]]], grad_fn=<NativeLayerNormBackward0>),
  tensor([[-4.9859e-01, -1.6913e-01,  8.3044e-01,  7.2490e-02, -4.8807e-01,
           -9.1258e-02,  5.1964e-01,  1.2615e-01,  7.3988e-01, -9.9609e-01,
            3.7945e-01, -5.8106e-01,  9.5275e-01, -6.8154e-01,  7.0220e-01,
           -2.4374e-01,  9.2702e-02, -3.1204e-01,  2.3801e-01, 