## Benchmark optimized models
* Get runtime of sentence transformers, ST with onnx, and ST with onnx graph optimizations 

In [1]:
import torch
from sentence_transformers import SentenceTransformer, util
import os
import onnx 
from transformers import AutoTokenizer
from pathlib import Path

os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

print(torch.__version__)

%load_ext autoreload
%autoreload 2

1.12.1+cpu


## Save ST Model

In [2]:
model = SentenceTransformer('sentence-transformers/all-distilroberta-v1')
# train if required

# save
model.save('trained_model')

# Latency

In [3]:
document = "The Beatles were a legendary British rock band that rose to international fame during the 1960s and became one of the most influential and successful musical acts in history. Their story is a tale of talent, innovation, cultural revolution, and enduring impact. Formation and Early Years (1957-1962): The Beatles were formed in Liverpool, England, in 1957. The original members included John Lennon, Paul McCartney, George Harrison, and drummer Pete Best (later replaced by Ringo Starr). The band started as a skiffle group, playing a mix of folk, blues, and rock 'n' roll covers. They honed their skills playing in local clubs and gradually gained a following."

# warmup 
output = model.encode([document] * 5, show_progress_bar=False)

def benchmark(model, document):
    output = model.encode([document] * 200, batch_size=8, show_progress_bar=True)


### Latency using original pytorch model

In [4]:
%%time
benchmark(model, document)

Batches:   0%|          | 0/25 [00:00<?, ?it/s]

CPU times: total: 2min 44s
Wall time: 27.6 s


## Convert to ONNX 

In [5]:
from optim_sentence_transformers import SentenceTransformerOptim, optimize_model

optimize_model(model_name_or_path = 'trained_model',
             pooling_model=None,
             save_dir='onnx',
             optimize_mode='onnx'                                 
             )

The argument `from_transformers` is deprecated, and will be removed in optimum 2.0.  Use `export` instead
Framework not specified. Using pt to export to ONNX.
Using framework PyTorch: 1.12.1+cpu
Overriding 1 configuration item(s)
	- use_cache -> False


Found Pooling config. If normalized embeddings required, use normalize_embeddings in model.encode
Converting model to onnx..
Optimized model using onnx saved at onnx


In [6]:
optim_model = SentenceTransformerOptim('onnx')

# warmup 
output = model.encode([document] * 5, show_progress_bar=False)

No pooling model found. Creating a new one with MEAN pooling.
If normalized embeddings are required, set normalize_embeddings=True in model.encode


### Latency of Onnx

In [7]:
%%time
document = "The Beatles were a legendary British rock band that rose to international fame during the 1960s and became one of the most influential and successful musical acts in history. Their story is a tale of talent, innovation, cultural revolution, and enduring impact. Formation and Early Years (1957-1962): The Beatles were formed in Liverpool, England, in 1957. The original members included John Lennon, Paul McCartney, George Harrison, and drummer Pete Best (later replaced by Ringo Starr). The band started as a skiffle group, playing a mix of folk, blues, and rock 'n' roll covers. They honed their skills playing in local clubs and gradually gained a following."
benchmark(optim_model, document)


Batches:   0%|          | 0/25 [00:00<?, ?it/s]

CPU times: total: 1min 48s
Wall time: 15.2 s


## Graph Optimization

In [8]:
optimize_model(model_name_or_path = 'trained_model',
             pooling_model=None,
             save_dir='onnx_graph',
             optimize_mode='graph_optim'                                 
             )


The argument `from_transformers` is deprecated, and will be removed in optimum 2.0.  Use `export` instead
Framework not specified. Using pt to export to ONNX.
Using framework PyTorch: 1.12.1+cpu
Overriding 1 configuration item(s)
	- use_cache -> False


Found Pooling config. If normalized embeddings required, use normalize_embeddings in model.encode
Optimizing onnx model using graph_optim..


Optimizing model...
Configuration saved in onnx_graph\ort_config.json
Optimized model saved at: onnx_graph (external data format: False; saved all tensor to one file: True)


Optimized model using graph_optim saved at onnx_graph


No pooling model found. Creating a new one with MEAN pooling.
If normalized embeddings are required, set normalize_embeddings=True in model.encode


In [9]:
optim_model2 = SentenceTransformerOptim('onnx_graph')

# warmup 
output = model.encode([document] * 5, show_progress_bar=False)

No pooling model found. Creating a new one with MEAN pooling.
If normalized embeddings are required, set normalize_embeddings=True in model.encode


### Latency of onnx with graph optimization

In [10]:
%%time
document = "The Beatles were a legendary British rock band that rose to international fame during the 1960s and became one of the most influential and successful musical acts in history. Their story is a tale of talent, innovation, cultural revolution, and enduring impact. Formation and Early Years (1957-1962): The Beatles were formed in Liverpool, England, in 1957. The original members included John Lennon, Paul McCartney, George Harrison, and drummer Pete Best (later replaced by Ringo Starr). The band started as a skiffle group, playing a mix of folk, blues, and rock 'n' roll covers. They honed their skills playing in local clubs and gradually gained a following."
benchmark(optim_model2, document)

Batches:   0%|          | 0/25 [00:00<?, ?it/s]

CPU times: total: 1min 38s
Wall time: 13 s
