# 🤗 Hugging Face Model Hub with OpenVINO™

The Hugging Face (HF) [Model Hub](https://huggingface.co/models) is a central repository for pre-trained deep learning models. It allows exploration and provides access to thousands of models for a wide range of tasks, including text classification, question answering, and image classification.
Hugging Face provides Python packages that serve as APIs and tools to easily download and fine tune state-of-the-art pretrained models, namely [transformers](https://github.com/huggingface/transformers) and [diffusers](https://github.com/huggingface/diffusers) packages.

![](https://huggingface.co/datasets/optimum/documentation-images/resolve/main/intel/logo/hf_intel_logo.png)

Throughout this notebook we will learn:
1. How to load a HF pipeline using the `transformers` package and then convert it to OpenVINO.
2. How to load the same pipeline using Optimum Intel package.


#### Table of contents:

- [Converting a Model from the HF Transformers Package](#Converting-a-Model-from-the-HF-Transformers-Package)
    - [Installing Requirements](#Installing-Requirements)
    - [Imports](#Imports)
    - [Initializing a Model Using the HF Transformers Package](#Initializing-a-Model-Using-the-HF-Transformers-Package)
    - [Original Model inference](#Original-Model-inference)
    - [Converting the Model to OpenVINO IR format](#Converting-the-Model-to-OpenVINO-IR-format)
    - [Converted Model Inference](#Converted-Model-Inference)
- [Converting a Model Using the Optimum Intel Package](#Converting-a-Model-Using-the-Optimum-Intel-Package)
    - [Install Requirements for Optimum](#Install-Requirements-for-Optimum)
    - [Import Optimum](#Import-Optimum)
    - [Initialize and Convert the Model Automatically using OVModel class](#Initialize-and-Convert-the-Model-Automatically-using-OVModel-class)
    - [Convert model using Optimum CLI interface](#Convert-model-using-Optimum-CLI-interface)
    - [The Optimum Model Inference](#The-Optimum-Model-Inference)


### Installation Instructions

This is a self-contained example that relies solely on its own code.

We recommend  running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to [Installation Guide](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/README.md#-installation-guide).

<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/hugging-face-hub/hugging-face-hub.ipynb" />


## Converting a Model from the HF Transformers Package
[back to top ⬆️](#Table-of-contents:)

Hugging Face transformers package provides API for initializing a model and loading a set of pre-trained weights using the model text handle.
Discovering a desired model name is straightforward with [HF website's Models page](https://huggingface.co/models), one can choose a model solving a particular machine learning problem and even sort the models by popularity and novelty.

### Installing Requirements
[back to top ⬆️](#Table-of-contents:)


In [None]:
import platform

%pip install -q --extra-index-url https://download.pytorch.org/whl/cpu "transformers>=4.33.0" "accelerate" "torch>=2.1.0"
%pip install -q ipywidgets
%pip install -q "openvino>=2023.1.0" "Pillow"

if platform.system() == "Darwin":
    %pip install "numpy<2.0.0"

### Imports
[back to top ⬆️](#Table-of-contents:)


In [2]:
from pathlib import Path

import numpy as np
import torch

from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer

import requests

if not Path("notebook_utils.py").exists():
    r = requests.get(
        url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py",
    )
    open("notebook_utils.py", "w").write(r.text)


# Read more about telemetry collection at https://github.com/openvinotoolkit/openvino_notebooks?tab=readme-ov-file#-telemetry
from notebook_utils import collect_telemetry

collect_telemetry("hugging-face-hub.ipynb")

### Initializing a Model Using the HF Transformers Package
[back to top ⬆️](#Table-of-contents:)

We will use [roberta text sentiment classification](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) model in our example, it is a transformer-based encoder model pretrained in a special way, please refer to the model card to learn more.

Following the instructions on the model page, we use `AutoModelForSequenceClassification` to initialize the model and perform inference with it.
To find more information on HF pipelines and model initialization please refer to [HF tutorials](https://huggingface.co/learn/nlp-course/chapter2/2?fw=pt#behind-the-pipeline).

In [3]:
MODEL = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"

tokenizer = AutoTokenizer.from_pretrained(MODEL)

# The torchscript=True flag is used to ensure the model outputs are tuples
# instead of ModelOutput (which causes JIT errors).
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

### Original Model inference
[back to top ⬆️](#Table-of-contents:)

Let's do a classification of a simple prompt below.

In [4]:
text = "HF models run perfectly with OpenVINO!"

encoded_input = tokenizer(text, return_tensors="pt")
output = model(**encoded_input)
scores = output[0][0]
scores = torch.softmax(scores, dim=0).numpy(force=True)


def print_prediction(scores):
    for i, descending_index in enumerate(scores.argsort()[::-1]):
        label = model.config.id2label[descending_index]
        score = np.round(float(scores[descending_index]), 4)
        print(f"{i+1}) {label} {score}")


print_prediction(scores)

1) POSITIVE 0.9993
2) NEGATIVE 0.0007


### Converting the Model to OpenVINO IR format
[back to top ⬆️](#Table-of-contents:)
We use the OpenVINO [Model conversion API](https://docs.openvino.ai/2024/openvino-workflow/model-preparation.html#convert-a-model-with-python-convert-model) to convert the model (this one is implemented in PyTorch) to OpenVINO Intermediate Representation (IR).

Note how we reuse our real `encoded_input`, passing it to the `ov.convert_model` function. It will be used for model tracing.

In [5]:
import openvino as ov

save_model_path = Path("./models/model.xml")

if not save_model_path.exists():
    model.config.torchscript = True
    ov_model = ov.convert_model(model, example_input=dict(encoded_input))
    ov.save_model(ov_model, save_model_path)

  mask, torch.tensor(torch.finfo(scores.dtype).min)


### Converted Model Inference
[back to top ⬆️](#Table-of-contents:)


First, we pick a device to do the model inference

In [6]:
from notebook_utils import device_widget

device = device_widget()

device

Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')

OpenVINO model IR must be compiled for a specific device prior to the model inference.

In [7]:
import openvino as ov

core = ov.Core()

compiled_model = core.compile_model(save_model_path, device.value)

# Compiled model call is performed using the same parameters as for the original model
scores_ov = compiled_model(encoded_input.data)[0]

scores_ov = torch.softmax(torch.tensor(scores_ov[0]), dim=0).detach().numpy()

print_prediction(scores_ov)

1) POSITIVE 0.9993
2) NEGATIVE 0.0007


Note the prediction of the converted model match exactly the one of the original model.

This is a rather simple example as the pipeline includes just one encoder model. Contemporary state of the art pipelines often consist of several model, feel free to explore other OpenVINO tutorials:
1. [Stable Diffusion v2](../stable-diffusion-v2)
2. [Zero-shot Image Classification with OpenAI CLIP](../clip-zero-shot-image-classification)
3. [Controllable Music Generation with MusicGen](../music-generation)

The workflow for the `diffusers` package is exactly the same. The first example in the list above relies on the `diffusers`.

## Converting a Model Using the Optimum Intel Package
[back to top ⬆️](#Table-of-contents:)

🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.

Among other use cases, Optimum Intel provides a simple interface to optimize your Transformers and Diffusers models, convert them to the OpenVINO Intermediate Representation (IR) format and run inference using OpenVINO Runtime.

### Install Requirements for Optimum
[back to top ⬆️](#Table-of-contents:)


In [None]:
%pip install -q "git+https://github.com/huggingface/optimum-intel.git"

if not Path("cmd_helper.py").exists():
    r = requests.get(
        url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/cmd_helper.py",
    )
    open("cmd_helper.py", "w").write(r.text)

### Import Optimum
[back to top ⬆️](#Table-of-contents:)

Documentation for Optimum Intel states:
>You can now easily perform inference with OpenVINO Runtime on a variety of Intel processors (see the full list of supported devices). For that, just replace the `AutoModelForXxx` class with the corresponding `OVModelForXxx` class.

You can find more information in [Optimum Intel documentation](https://huggingface.co/docs/optimum/intel/inference).

In [9]:
from optimum.intel.openvino import OVModelForSequenceClassification

2025-04-08 11:33:32.618793: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-08 11:33:32.631573: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744097612.646069 3995333 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744097612.650312 3995333 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-08 11:33:32.665341: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instr

### Initialize and Convert the Model Automatically using OVModel class
[back to top ⬆️](#Table-of-contents:)

To load a Transformers model and convert it to the OpenVINO format on the fly, you can set `export=True` when loading your model. The model can be saved in OpenVINO format using `save_pretrained` method and specifying a directory for storing the model as an argument. For the next usage, you can avoid the conversion step and load the saved early model from disk using `from_pretrained` method without export specification. We also specified `device` parameter for compiling the model on the specific device, if not provided, the default device will be used. The device can be changed later in runtime using `model.to(device)`, please note that it may require some time for model compilation on a newly selected device. In some cases, it can be useful to separate model initialization and compilation, for example, if you want to reshape the model using `reshape` method, you can postpone compilation, providing the parameter `compile=False` into `from_pretrained` method, compilation can be performed manually using `compile` method or will be performed automatically during first inference run.

In [10]:
model = OVModelForSequenceClassification.from_pretrained(MODEL, export=True, device=device.value)

# The save_pretrained() method saves the model weights to avoid conversion on the next load.
model.save_pretrained("./models/optimum_model")

  op1 = operator(*args, **kwargs)


### Convert model using Optimum CLI interface
[back to top ⬆️](#Table-of-contents:)

Alternatively, you can use the Optimum CLI interface for converting models (supported starting optimum-intel 1.12 version).
General command format:

```bash
optimum-cli export openvino --model <model_id_or_path> --task <task> <output_dir>
```

where task is task to export the model for, if not specified, the task will be auto-inferred based on the model. Available tasks depend on the model, but are among: ['default', 'fill-mask', 'text-generation', 'text2text-generation', 'text-classification', 'token-classification', 'multiple-choice', 'object-detection', 'question-answering', 'image-classification', 'image-segmentation', 'masked-im', 'semantic-segmentation', 'automatic-speech-recognition', 'audio-classification', 'audio-frame-classification', 'automatic-speech-recognition', 'audio-xvector', 'image-to-text', 'stable-diffusion', 'zero-shot-object-detection']. For decoder models, use `xxx-with-past` to export the model using past key values in the decoder. 

You can find a mapping between tasks and model classes in Optimum TaskManager [documentation](https://huggingface.co/docs/optimum/exporters/task_manager).

Additionally, you can specify weights compression using `--weight-format` argument with one of following options: `fp32`, `fp16`, `int8` and `int4`. Fro int8 and int4 nncf will be used for  weight compression.

Full list of supported arguments available via `--help`

In [11]:
!optimum-cli export openvino --help

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


2025-04-08 11:33:48.305275: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-08 11:33:48.317187: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744097628.330829 3995764 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744097628.335042 3995764 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-08 11:33:48.348942: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instr

The command line export for model from example above with FP16 weights compression:

In [12]:
# !optimum-cli export openvino --model $MODEL --task text-classification --weight-format fp16 models/optimum_model/fp16

from cmd_helper import optimum_cli

optimum_cli(MODEL, "models/optimum_model/fp16", additional_args={"task": "text-classification", "weight-format": "fp16"})

**Export command:**

`optimum-cli export openvino --model distilbert/distilbert-base-uncased-finetuned-sst-2-english models/optimum_model/fp16 --task text-classification --weight-format fp16`

After export, model will be available in the specified directory and can be loaded using the same OVModelForXXX class.

In [13]:
model = OVModelForSequenceClassification.from_pretrained("models/optimum_model/fp16", device=device.value)

There are some models in the Hugging Face Models Hub, that are already converted and ready to run! You can filter those models out by library name, just type OpenVINO, or follow [this link](https://huggingface.co/models?library=openvino&sort=trending).

### The Optimum Model Inference
[back to top ⬆️](#Table-of-contents:)

Model inference is exactly the same as for the original model!

In [14]:
output = model(**encoded_input)
scores = output[0][0]
scores = torch.softmax(scores, dim=0).numpy(force=True)

print_prediction(scores)

1) POSITIVE 0.9993
2) NEGATIVE 0.0007


You can find more examples of using Optimum Intel here:
1. [Accelerate Inference of Sparse Transformer Models](../sparsity-optimization/sparsity-optimization.ipynb)
2. [Grammatical Error Correction with OpenVINO](../grammar-correction/grammar-correction.ipynb)
3. [Stable Diffusion v2.1 using Optimum-Intel OpenVINO](../stable-diffusion-v2/stable-diffusion-v2-optimum-demo.ipynb)
4. [Image generation with Stable Diffusion XL](../stable-diffusion-xl)
5. [Create LLM-powered Chatbot using OpenVINO](../llm-chatbot)
6. [Document Visual Question Answering Using Pix2Struct and OpenVINO](../pix2struct-docvqa)
7. [Automatic speech recognition using Distil-Whisper and OpenVINO](../distil-whisper-asr)