example of using hugging face model hub with OpenVINO

https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/hugging-face-hub/hugging-face-hub.ipynb




In [None]:
%pip install -q --extra-index-url https://download.pytorch.org/whl/cpu "transformers>=4.33.0" "torch>=2.1.0"
%pip install -q ipywidgets
%pip install -q "openvino>=2023.1.0"

Initializing a Model Using the HF Transformers Package

Twitter-roBERTa-base for Sentiment Analysis   is used.  This is a RoBERTa-base model trained on ~124M tweets from January 2018 to December 2021, and finetuned for sentiment analysis with the TweetEval benchmark.  
Use AutoModelForSequenceClassification to initialize the model and perform inference with it

https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest


In [2]:
from pathlib import Path

import numpy as np
import torch

from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer

MODEL = "cardiffnlp/twitter-roberta-base-sentiment-latest"

tokenizer = AutoTokenizer.from_pretrained(MODEL, return_dict=True)

# The torchscript=True flag is used to ensure the model outputs are tuples
# instead of ModelOutput (which causes JIT errors).
model = AutoModelForSequenceClassification.from_pretrained(MODEL, torchscript=True)

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


do the classifcation of a simple prompt

In [13]:
text = """
    Only the brave men and women can bring peace to the world, not by practicing war but by practicing nonviolence. 
    Our heart is wide enough to embrace the world and hands are long enough to encompass the world.
    """

encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

scores = output[0][0]
scores = torch.softmax(scores, dim=0).numpy(force=True)

def print_prediction(scores):
    for i, descending_index in enumerate(scores.argsort()[::-1]):
        label = model.config.id2label[descending_index]
        score = np.round(float(scores[descending_index]), 4)
        print(f"{i+1}) {label} {score}")

print_prediction(scores)

1) positive 0.8178
2) neutral 0.1685
3) negative 0.0137


Converting the Model to OpenVINO IR format

use the OpenVINO Model conversion API to convert the model (this one is implemented in PyTorch) to OpenVINO Intermediate Representation (IR).


In [10]:
import openvino as ov

save_model_path = Path('./models/model.xml')

if not save_model_path.exists():
    ov_model = ov.convert_model(model, example_input=dict(encoded_input))
    ov.save_model(ov_model, save_model_path)

run model inference on GPU

In [18]:
core = ov.Core()
devices = core.available_devices
print(devices)

device = 'GPU'
compiled_model = ov.Core().compile_model(save_model_path, device)

# Compiled model call is performed using the same parameters as for the original model
scores_ov = compiled_model(encoded_input.data)[0]

scores_ov = torch.softmax(torch.tensor(scores_ov[0]), dim=0).detach().numpy()

print_prediction(scores_ov)

['CPU', 'GPU', 'NPU']
1) positive 0.8165
2) neutral 0.1698
3) negative 0.0136


In [20]:
%pip install -q "git+https://github.com/huggingface/optimum-intel.git" onnx


Note: you may need to restart the kernel to use updated packages.


In [None]:
from optimum.intel.openvino import OVModelForSequenceClassification

model = OVModelForSequenceClassification.from_pretrained(MODEL, export=True, device='GPU')

# The save_pretrained() method saves the model weights to avoid conversion on the next load.
model.save_pretrained('./models/optimum_model')

In [None]:
!optimum-cli export openvino --help


In [None]:
!optimum-cli export openvino --model $MODEL --task text-classification --fp16 models/optimum_model/fp16


In [24]:
model = OVModelForSequenceClassification.from_pretrained("models/optimum_model/fp16", device='GPU')


Compiling the model to GPU ...
Setting OpenVINO CACHE_DIR to models\optimum_model\fp16\model_cache


In [25]:
output = model(**encoded_input)
scores = output[0][0]
scores = torch.softmax(scores, dim=0).numpy(force=True)

print_prediction(scores)

1) positive 0.8165
2) neutral 0.1698
3) negative 0.0136
