Skip to content

v1.2.0: pipeline and AutoModelForXxx classes to run ONNX Runtime inference

Compare
Choose a tag to compare
@echarlaix echarlaix released this 10 May 15:04
· 926 commits to main since this release

ORTModel

ORTModelForXXX classes such as ORTModelForSequenceClassification were integrated with the Hugging Face Hub in order to easily export models through the ONNX format, load ONNX models, as well as easily save the resulting model and push it to the 🤗 Hub by using respectively the save_pretrained and push_to_hub methods. An already optimized and / or quantized ONNX model can also be loaded using the ORTModelForXXX classes using the from_pretrained method.

Below is an example that downloads a DistilBERT model from the Hub, exports it through the ONNX format and saves it :

from optimum.onnxruntime import ORTModelForSequenceClassification

# Load model from hub and export it through the ONNX format 
model = ORTModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased-finetuned-sst-2-english", 
    from_transformers=True
)

# Save the exported model
model.save_pretrained("a_local_path_for_convert_onnx_model")

Pipelines

Built-in support for transformers pipelines was added. This allows us to leverage the same API used from Transformers, with the power of accelerated runtimes such as ONNX Runtime.

The currently supported tasks with the default model for each are the following :

  • Text Classification (DistilBERT model fine-tuned on SST-2)
  • Question Answering (DistilBERT model fine-tuned on SQuAD v1.1)
  • Token Classification(BERT large fine-tuned on CoNLL2003)
  • Feature Extraction (DistilBERT)
  • Zero Shot Classification (BART model fine-tuned on MNLI)
  • Text Generation (DistilGPT2)

Below is an example that downloads a RoBERTa model from the Hub, exports it through the ONNX format and loads it with transformers pipeline for question-answering.

from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForQuestionAnswering

# load vanilla transformers and convert to onnx
model = ORTModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2",from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")

# test the model with using transformers pipeline, with handle_impossible_answer for squad_v2 
optimum_qa = pipeline(task, model=model, tokenizer=tokenizer, handle_impossible_answer=True)
prediction = optimum_qa(
  question="What's my name?", context="My name is Philipp and I live in Nuremberg."
)

print(prediction)
# {'score': 0.9041663408279419, 'start': 11, 'end': 18, 'answer': 'Philipp'}

Improvements

  • Add loss when performing the evalutation step using an instance of ORTTrainer, previously not enabled when inference was performed with ONNX Runtime in #152