# 🤗 Hugging Face Model Hub with OpenVINO™

The Hugging Face (HF) [Model Hub](https://huggingface.co/models) is a central repository for pre-trained deep learning models. It allows exploration and provides access to thousands of models for a wide range of tasks, including text classification, question answering, and image classification.
Hugging Face provides Python packages that serve as APIs and tools to easily download and fine tune state-of-the-art pretrained models, namely [transformers](https://github.com/huggingface/transformers) and [diffusers](https://github.com/huggingface/diffusers) packages.

![](https://github.com/huggingface/optimum-intel/raw/main/readme_logo.png)

Throughout this notebook we will learn:
1. How to load a HF pipeline using the `transformers` package and then convert it to OpenVINO.
2. How to load the same pipeline using Optimum Intel package.

Contents:
- [Converting a Model from the HF Transformers Package](#Converting-a-Model-from-the-HF-Transformers-Package)
    - [Installing Requirements](#Installing-Requirements)
    - [Imports](#Imports)
    - [Initializing a Model Using the HF Transformers Package](#Initializing-a-Model-Using-the-HF-Transformers-Package)
    - [Original Model inference](#Original-Model-inference)
    - [Converting the Model to OpenVINO IR format](#Converting-the-Model-to-OpenVINO-IR-format)
    - [Converted Model Inference](#Converted-Model-Inference)
- [Converting a Model Using the Optimum Intel Package](#Converting-a-Model-Using-the-Optimum-Intel-Package)
    - [Installing Requirements](#Install-Requirements-for-Optimum)
    - [Import Optimum](#Import-Optimum)
    - [Initialize and Convert the Model Automatically](#Initialize-and-Convert-the-Model-Automatically)

## Converting a Model from the HF Transformers Package

Hugging Face transformers package provides API for initializing a model and loading a set of pre-trained weights using the model text handle.
Discovering a desired model name is straightforward with [HF website's Models page](https://huggingface.co/models), one can choose a model solving a particular machine learning problem and even sort the models by popularity and novelty.

### Installing Requirements

In [1]:
%pip install -q --extra-index-url https://download.pytorch.org/whl/cpu transformers[torch] 
%pip install -q ipywidgets
%pip install -q "openvino>=2023.1.0"

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Imports

In [2]:
from pathlib import Path

import numpy as np
import torch

from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer

### Initializing a Model Using the HF Transformers Package

We will use [roberta text sentiment classification](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) model in our example, it is a transformer-based encoder model pretrained in a special way, please refer to the model card to learn more.

Following the instructions on the model page, we use `AutoModelForSequenceClassification` to initialize the model and perform inference with it.
To find more information on HF pipelines and model initialization please refer to [HF tutorials](https://huggingface.co/learn/nlp-course/chapter2/2?fw=pt#behind-the-pipeline).

In [3]:
MODEL = "cardiffnlp/twitter-roberta-base-sentiment-latest"

tokenizer = AutoTokenizer.from_pretrained(MODEL, return_dict=True)

# The torchscript=True flag is used to ensure the model outputs are tuples
# instead of ModelOutput (which causes JIT errors).
model = AutoModelForSequenceClassification.from_pretrained(MODEL, torchscript=True)

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


### Original Model inference

Let's do a classification of a simple prompt below.

In [4]:
text = "HF models run perfectly with OpenVINO!"

encoded_input = tokenizer(text, return_tensors='pt')
output = model.forward(**encoded_input)
scores = output[0][0]
scores = torch.softmax(scores, dim=0).detach().numpy()

def print_prediction(scores):
    for i, descending_index in enumerate(scores.argsort()[::-1]):
        label = model.config.id2label[descending_index]
        score = np.round(float(scores[descending_index]), 4)
        print(f"{i+1}) {label} {score}")

print_prediction(scores)

1) positive 0.9485
2) neutral 0.0484
3) negative 0.0031


### Converting the Model to OpenVINO IR format
We use the OpenVINO [Model conversion API](https://docs.openvino.ai/2023.1/openvino_docs_model_processing_introduction.html#convert-a-model-in-python-convert-model) to convert the model (this one is implemented in PyTorch) to OpenVINO Intermediate Representation (IR).

Note how we reuse our real `encoded_input`, passing it to the `ov.convert_model` function. It will be used for model tracing.

In [5]:
import openvino as ov

save_model_path = Path('./models/model.xml')

if not save_model_path.exists():
    ov_model = ov.convert_model(model, example_input=dict(encoded_input))
    ov.save_model(ov_model, save_model_path)

### Converted Model Inference

First, we pick a device to do the model inference

In [6]:
import ipywidgets as widgets

core = ov.Core()

device = widgets.Dropdown(
    options=core.available_devices + ["AUTO"],
    value='AUTO',
    description='Device:',
    disabled=False,
)

device

Dropdown(description='Device:', index=2, options=('CPU', 'GPU', 'AUTO'), value='AUTO')

OpenVINO model IR must be compiled for a specific device prior to the model inference.

In [7]:
compiled_model = core.compile_model(save_model_path, device.value)

# Compiled model call is performed using the same parameters as for the original model
scores_ov = compiled_model(encoded_input.data)[0]

scores_ov = torch.softmax(torch.tensor(scores_ov[0]), dim=0).detach().numpy()

print_prediction(scores_ov)

1) positive 0.9483
2) neutral 0.0485
3) negative 0.0031


Note the prediction of the converted model match exactly the one of the original model.

This is a rather simple example as the pipeline includes just one encoder model. Contemporary state of the art pipelines often consist of several model, feel free to explore other OpenVINO tutorials:
1. [Stable Diffusion v2](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/236-stable-diffusion-v2)
2. [Zero-shot Image Classification with OpenAI CLIP](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/228-clip-zero-shot-image-classification)
3. [Controllable Music Generation with MusicGen](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/250-music-generation)

The workflow for the `diffusers` package is exactly the same. The first example in the list above relies on the `diffusers`.

## Converting a Model Using the Optimum Intel Package

🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.

Among other use cases, Optimum Intel provides a simple interface to optimize your Transformers and Diffusers models, convert them to the OpenVINO Intermediate Representation (IR) format and run inference using OpenVINO Runtime.

### Install Requirements for Optimum

In [8]:
%pip install -q "optimum==1.13.0"
%pip install -q "optimum-intel"@git+https://github.com/huggingface/optimum-intel.git
%pip install -q onnx

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Note: you may need to restart the kernel to use updated packages.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Note: you may need to restart the kernel to use updated packages.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Note: you may need 

### Import Optimum

Documentation for Optimum Intel states:
>You can now easily perform inference with OpenVINO Runtime on a variety of Intel processors (see the full list of supported devices). For that, just replace the `AutoModelForXxx` class with the corresponding `OVModelForXxx` class.

You can find [Optimum Intel documentation](https://huggingface.co/docs/optimum/intel/inference) on the Hugging Face website.

In [9]:
from optimum.intel.openvino import OVModelForSequenceClassification

### Initialize and Convert the Model Automatically

To load a Transformers model and convert it to the OpenVINO format on-the-fly, you can set `export=True` when loading your model.

In [10]:
model = OVModelForSequenceClassification.from_pretrained(MODEL, export=True, device=device.value)

# The save_pretrained() method saves the model weights to avoid conversion on the next load.
model.save_pretrained('./models')

Framework not specified. Using pt to export to ONNX.
Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 2.0.1+cu117
Overriding 1 configuration item(s)
	- use_cache -> False
Compiling the model...
Set CACHE_DIR to /tmp/tmp5a6y8rn2/model_cache


Moreover, some models in the Hugging Face Models Hub are already converted and ready to run! You can filter those models out by library name, just type OpenVINO, or follow [this link](https://huggingface.co/models?library=openvino&sort=trending).

### The Optimum Model Inference

Model inference is exactly the same as for the original model!

In [11]:
output = model.forward(**encoded_input)
scores = output[0][0]
scores = torch.softmax(scores, dim=0).detach().numpy()

print_prediction(scores)

1) positive 0.9485
2) neutral 0.0484
3) negative 0.0031


You can find more examples of using Optimum Intel here:
1. [Accelerate Inference of Sparse Transformer Models](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/116-sparsity-optimization)
2. [Grammatical Error Correction with OpenVINO](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/214-grammar-correction)
3. [Stable Diffusion v2.1 using Optimum-Intel OpenVINO](https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/236-stable-diffusion-v2/236-stable-diffusion-v2-optimum-demo.ipynb)
4. [Image generation with Stable Diffusion XL](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/248-stable-diffusion-xl)
5. [Instruction following using Databricks Dolly 2.0](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/240-dolly-2-instruction-following)
6. [Create LLM-powered Chatbot using OpenVINO](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/254-llm-chatbot)