# Exporting Hugging Face Models Using Optimum and Running Them in DeepSparse

This guide harnesses the power of Neural Magic's DeepSparse Inference Runtime library in combination with Hugging Face's ONNX models. DeepSparse offers a cutting-edge solution for efficient and accelerated inference on deep learning models, optimizing performance and resource utilization. By seamlessly integrating DeepSparse with Hugging Face's ONNX models, users can experience lightning-fast inference times while maintaining the flexibility and versatility of the widely adopted ONNX format alongside the  `Optimum` library for PyTorch model ONNX exporting.

This notebook will use several popular models found on the Hugging Face Hub for text classification, zero-shot classification, question answering, and NER.

The flow for this guide includes:

1. Exporting models to ONNX using `optimum-cli`.
2. Running inference with ONNX models with DeepSparse.

## Install DeepSparse and Optimum

In [23]:
!pip install deepsparse-nightly[transformers] optimum[exporters]



## Text Classification | Sentiment Analysis

Let's export the DistilBERT SST-2 model for sentiment analysis to an output folder called `tc_model`:

In [22]:
!optimum-cli export onnx --model distilbert-base-uncased-finetuned-sst-2-english tc_model --sequence_length 128

Framework not specified. Using pt to export to ONNX.
Automatic task detection to text-classification (possible synonyms are: sequence-classification, zero-shot-classification).
Traceback (most recent call last):
  File "/home/zeroshot/nm/examples/env/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
  File "/home/zeroshot/nm/examples/env/lib/python3.10/site-packages/optimum/commands/optimum_cli.py", line 163, in main
    service.run()
  File "/home/zeroshot/nm/examples/env/lib/python3.10/site-packages/optimum/commands/export/onnx.py", line 219, in run
    main_export(
  File "/home/zeroshot/nm/examples/env/lib/python3.10/site-packages/optimum/exporters/onnx/__main__.py", line 446, in main_export
    _, onnx_outputs = export_models(
  File "/home/zeroshot/nm/examples/env/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 760, in export_models
    export(
  File "/home/zeroshot/nm/examples/env/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", li

Load model and run inference with DeepSparse:

In [None]:
from deepsparse import Pipeline

text_input = "Snorlax loves my Tesla!"

pipe = Pipeline.create(task="sentiment-analysis", model_path="./tc_model/")
inference = pipe(text_input)
print(inference)

In [None]:
!pip install deepsparse -U

## NER

Let's export the BERT Base NER model to an output folder called `ner_model`:

In [None]:
!optimum-cli export onnx --model dslim/bert-base-NER ner_model --sequence_length 128

Load model and run inference with DeepSparse:

In [None]:
from time import perf_counter
from deepsparse import Pipeline

def run_deepsparse(model_path, text_input, task):
    pipeline = Pipeline.create(task=task, model_path=model_path)
    start_time = perf_counter()
    inference = pipeline(text_input)
    end_time = perf_counter()
    execution_time_deepsparse = end_time - start_time
    return inference, execution_time_deepsparse

model_path = "./ner_model/model.onnx"
task = "token-classification"
text_input = "Snorlax loves my Tesla!"

inference_deepsparse, execution_time_deepsparse = run_deepsparse(model_path, text_input, task)
print(f"Deepsparse code snippet execution time: {execution_time_deepsparse:.4f} seconds")
print(inference_deepsparse)

## Question Answering

Let's export the RoBERTa Base model for Question Answering to an output folder called `qa_model`:

In [None]:
!optimum-cli export onnx --model deepset/roberta-base-squad2 qa_model --sequence_length 128

Load model and run inference with DeepSparse:

In [None]:
from time import perf_counter
from deepsparse import Pipeline

def run_deepsparse(model_path, question, context, task):
    pipeline = Pipeline.create(task=task, model_path=model_path)
    start_time = perf_counter()
    inference = pipeline(question="What's my name?", context="My name is Snorlax")
    end_time = perf_counter()
    execution_time_deepsparse = end_time - start_time
    return inference, execution_time_deepsparse


model_path = "./qa_model/model.onnx"
task = "question-answering"
question = "who loves Tesla?"
context = "Snorlax loves my Tesla?"

inference_deepsparse, execution_time_deepsparse = run_deepsparse(model_path, question, context, task)
print(f"Deepsparse code snippet execution time: {execution_time_deepsparse:.4f} seconds")
print(inference_deepsparse)

## Zero-Shot Text Classification

Let's export the DistilBERT MNLI Base model to an output folder called `zs_model`:

In [None]:
!optimum-cli export onnx --model typeform/distilbert-base-uncased-mnli zs_model --sequence_length 128

Load model and run inference with DeepSparse:

In [None]:
from time import perf_counter
from deepsparse import Pipeline

def run_deepsparse(model_path, text_input, task, labels):
    pipeline = Pipeline.create(
        task=task, model_scheme="mnli", 
        model_config={"hypothesis_template": "This text is related to {}"}, 
        model_path=model_path
    )
    start_time = perf_counter()
    inference = pipeline(text_input, labels)
    end_time = perf_counter()
    execution_time_deepsparse = end_time - start_time
    return inference, execution_time_deepsparse

model_path = "./zs_model/model.onnx"
task = "zero_shot_text_classification"
text_input = "I like pepperoni pizza."
labels = ["food", "movies", "sports"]

inference_deepsparse, execution_time_deepsparse = run_deepsparse(model_path, text_input, task, labels)
print(f"Deepsparse code snippet execution time: {execution_time_deepsparse:.4f} seconds")
print(inference_deepsparse)

## Image Classification

Let's export the Resnet-50 model to an output folder called `ic_model`:

In [None]:
!optimum-cli export onnx --model microsoft/resnet-50 ic_model

Load model and run inference with DeepSparse:

In [None]:
from time import perf_counter
from deepsparse import Pipeline

def run_deepsparse(model_path, image, task):
    pipeline = Pipeline.create(task=task, model_path=model_path, input_shapes=[1,3,224,224])
    start_time = perf_counter()
    inference = pipeline(images=image)
    end_time = perf_counter()
    execution_time_deepsparse = end_time - start_time
    return inference, execution_time_deepsparse

image = "./notebooks/optimum-export/cat.jpg"
model_path ="ic_model/model.onnx"
task = "image_classification"

inference_deepsparse, execution_time_deepsparse = run_deepsparse(model_path, image, task)
print(f"Deepsparse code snippet execution time: {execution_time_deepsparse:.4f} seconds")
print(inference_deepsparse)

## Image Segmentation

Let's export the DEtection TRansformer(DETR) model to an output folder called `is_model`:

In [None]:
!optimum-cli export onnx --model facebook/detr-resnet-50-panoptic is_model

Load model and run inference with DeepSparse:

In [None]:
from time import perf_counter
from deepsparse import Pipeline

def run_deepsparse(model_path, image, task):
    pipeline = Pipeline.create(task=task, model_path=model_path, input_shapes=[1,3,224,224], image_size=(224,224))
    start_time = perf_counter()
    inference = pipeline(images=image)
    end_time = perf_counter()
    execution_time_deepsparse = end_time - start_time
    return inference, execution_time_deepsparse

image = "./notebooks/optimum-export/thailand.jpeg"
model_path ="is_model/model.onnx"
task = "yolov8"

inference_deepsparse, execution_time_deepsparse = run_deepsparse(model_path, image, task)
print(f"Deepsparse code snippet execution time: {execution_time_deepsparse:.4f} seconds")
print(inference_deepsparse)