v1.2.0: pipeline and AutoModelForXxx classes to run ONNX Runtime inference
ORTModel
ORTModelForXXX
classes such as ORTModelForSequenceClassification
were integrated with the Hugging Face Hub in order to easily export models through the ONNX format, load ONNX models, as well as easily save the resulting model and push it to the 🤗 Hub by using respectively the save_pretrained
and push_to_hub
methods. An already optimized and / or quantized ONNX model can also be loaded using the ORTModelForXXX classes using the from_pretrained
method.
Below is an example that downloads a DistilBERT model from the Hub, exports it through the ONNX format and saves it :
from optimum.onnxruntime import ORTModelForSequenceClassification
# Load model from hub and export it through the ONNX format
model = ORTModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased-finetuned-sst-2-english",
from_transformers=True
)
# Save the exported model
model.save_pretrained("a_local_path_for_convert_onnx_model")
Pipelines
Built-in support for transformers pipelines was added. This allows us to leverage the same API used from Transformers, with the power of accelerated runtimes such as ONNX Runtime.
The currently supported tasks with the default model for each are the following :
- Text Classification (DistilBERT model fine-tuned on SST-2)
- Question Answering (DistilBERT model fine-tuned on SQuAD v1.1)
- Token Classification(BERT large fine-tuned on CoNLL2003)
- Feature Extraction (DistilBERT)
- Zero Shot Classification (BART model fine-tuned on MNLI)
- Text Generation (DistilGPT2)
Below is an example that downloads a RoBERTa model from the Hub, exports it through the ONNX format and loads it with transformers
pipeline for question-answering
.
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForQuestionAnswering
# load vanilla transformers and convert to onnx
model = ORTModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2",from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
# test the model with using transformers pipeline, with handle_impossible_answer for squad_v2
optimum_qa = pipeline(task, model=model, tokenizer=tokenizer, handle_impossible_answer=True)
prediction = optimum_qa(
question="What's my name?", context="My name is Philipp and I live in Nuremberg."
)
print(prediction)
# {'score': 0.9041663408279419, 'start': 11, 'end': 18, 'answer': 'Philipp'}
Improvements
- Add loss when performing the evalutation step using an instance of
ORTTrainer
, previously not enabled when inference was performed with ONNX Runtime in #152