## Converting fine-tuned model to onnx format with fastt5 library 

*   the library can be downloaded from [pypi.org](https://pypi.org/project/fastt5/) and [Github code](https://github.com/Ki6an/fastT5)
*   It convert the model and quantized it to decrease the model size and speed-up the inference time. However, this quantization will slightly reduce the accuracy.
*   works on torch ` 1.13.1 `


Note : it cannot be used if there is `nan` in the saved model weight 

In [1]:
import torch

print(torch.__version__)

1.13.1+cu116


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
 !pip install fastt5

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting fastt5
  Downloading fastt5-0.1.4.tar.gz (18 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting onnx
  Downloading onnx-1.13.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.5/13.5 MB[0m [31m67.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting onnxruntime==1.10.0
  Downloading onnxruntime-1.10.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.9/4.9 MB[0m [31m87.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers>4.6.1
  Downloading transformers-4.26.1-py3-none-any.whl (6.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m93.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting progress>=1.5
  Downloading progress-1.6.tar.gz (7.8 kB)
  Preparing me

Exporting to ONNX

In [4]:
from fastT5 import export_and_get_onnx_model

# model_name = 't5-small'
model_path = 'drive/MyDrive/t5-model/t5'
model = export_and_get_onnx_model(model_path)


Exporting to onnx... |################################| 3/3
Quantizing... |################################| 3/3
[?25h

Setting up onnx model...
Done!


In [5]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_path)
t_input = "generate question: The universe is a dark forest."
token = tokenizer(t_input, return_tensors='pt')

tokens = model.generate(input_ids=token['input_ids'],
               attention_mask=token['attention_mask'],
               num_beams=2)

output = tokenizer.decode(tokens.squeeze(), skip_special_tokens=True)
print(output)

Don't you know that?


In [6]:
from transformers import T5Config

config = T5Config.from_pretrained(model_path)

In [7]:
tokenizer.save_pretrained('models/')
config.save_pretrained('models/')

In [8]:
!du -sh models

2.0G	models


In [12]:
!cp -r models '/content/drive/My Drive/t5-model/onnx-model'

ONNX Inference

In [13]:
from fastT5 import get_onnx_model, get_onnx_runtime_sessions, OnnxT5
from transformers import AutoTokenizer
from pathlib import Path
import os

# onnx_model_path = 'models'
onnx_model_path = 'drive/MyDrive/t5-model/onnx-model/models'
onnx_model_name = Path('t5').stem

encoder_path = os.path.join(onnx_model_path, f"{onnx_model_name}-encoder-quantized.onnx")
decoder_path = os.path.join(onnx_model_path, f"{onnx_model_name}-decoder-quantized.onnx")
init_decoder_path = os.path.join(onnx_model_path, f"{onnx_model_name}-init-decoder-quantized.onnx")

model_paths = encoder_path, decoder_path, init_decoder_path
model_sessions = get_onnx_runtime_sessions(model_paths)
model = OnnxT5(onnx_model_path, model_sessions)

tokenizer = AutoTokenizer.from_pretrained(onnx_model_path)

In [14]:
%%time
# text = "I need to leave now. What time do you need to leave? at 2 o'clock"
text = "It's a lovely day"
inputs = tokenizer("generate question: "+text, return_tensors="pt").input_ids
outputs = model.generate(
    inputs, 
    num_beams=3, 
    max_length=100, 
    early_stopping=True, 
    num_return_sequences=1)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

I hope it stays nice like this.
CPU times: user 279 ms, sys: 0 ns, total: 279 ms
Wall time: 286 ms


In [None]:
# ls drive/MyDrive/t5-finetuned/onnx-model/models

In [9]:
rm -f -r models/*decoder.onnx

In [10]:
rm -f -r models/*encoder.onnx