New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
onnx speed is even slower #414
Comments
Hi, I am going to forward this issue to Also you can use tripe quotes ``` to better display code. Cheers ! |
Tagging @mfuntowicz @michaelbenayoun for visibility. Sorry if the transfer wasn't correct. |
from transformers import AutoTokenizer, pipeline from transformers import ( model = MarianMTModel.from_pretrained(modchoice) t1 = time.time() tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated] |
I removed the comments and can run it directly to reproduce |
@CatchDr this code is not useable, you NEED to use ``` to make it block code and show the proper indents. |
''' from transformers import ( model = MarianMTModel.from_pretrained(modchoice) t1 = time.time() tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated] |
|
|
There are no functions, no indentation |
onnxruntime-gpu also can not speed up, pipline is required to set something to use gpu? |
|
It seems that the onnx pipline approach to increase batch speed does not speed up, marian speed will improve a lot |
Hi @CatchDr, for speeding up the inference on CPU, you can optimize the vanilla exported ONNX model with |
System Info
win10
python 3.8.4
pytorch 12.1 cpu
transformers4.22.2
optimum 1.4.0
onnxruntime 1.12.1
Who can help?
@Narsil
@patil-suraj
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForSeq2SeqLM
import warnings
text="Vehicle detection technology is of great significance for realizing automatic monitoring and AI-assisted driving systems. The state-of-the-art object detection method, namely, a class of YOLOv5, has often been used to detect vehicles."
warnings.filterwarnings("ignore")
import time
textlists=[text,text,text,text,text]
model_checkpoint = "Helsinki-NLP/opus-mt-en-zh"
model = ORTModelForSeq2SeqLM.from_pretrained(model_checkpoint, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model.save_pretrained("onnx")
tokenizer.save_pretrained("onnx")
onnx_translation = pipeline("translation_en_to_zh", model=model, tokenizer=tokenizer)
t1=time.time()
result = onnx_translation(textlists)
print(result ,time.time()-t1)
from transformers import (
MarianTokenizer,
MarianMTModel,
)
modchoice = "Helsinki-NLP/opus-mt-en-zh"
tokenizer = MarianTokenizer.from_pretrained(modchoice)
model = MarianMTModel.from_pretrained(modchoice)
t1 = time.time()
encoded=tokenizer.prepare_seq2seq_batch(
textlists,
truncation=True,
padding="longest",
return_tensors="pt"
)
encoded.to(device)
translated = model.generate(
**encoded
)
tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
print(tgt_text,time.time() - t1)
Batch processing is much slower, and single processing is only a little faster
Expected behavior
Faster batch processing
The text was updated successfully, but these errors were encountered: