# Enhancing Inference Performance with Optimum Intel and OpenVINO
<img src="https://www.intel.com/content/dam/www/central-libraries/us/en/images/2022-07/openvino-logo.png" alt="Alt Text" style="width: 400px;"/>



Welcome to this developer-centric workshop, where we explore the integration of Optimum Intel with OpenVINO in the context of Hugging Face's powerful Transformers. This notebook is tailored to demonstrate the effective use of Intel's OpenVINO toolkit, a crucial component for edge computing and optimizing model inference.

## Why Optimum Intel and OpenVINO?

In the current landscape of AI and Machine Learning, deploying models efficiently on edge devices is a critical challenge. Optimum Intel for OpenVINO addresses this by enabling Hugging Face models to leverage the OpenVINO toolkit, which is specifically designed for high-performance, efficient inference on Intel hardware.

### Learning Objectives

- **Understanding OpenVINO**: We will delve into how OpenVINO optimizes model performance, particularly for edge workloads.
- **Model Compression and Deployment**: Explore OpenVINO's capabilities in model compression techniques and seamless deployment on Intel hardware.
- **Practical Application**: Learn how to integrate these optimizations into Hugging Face models, enhancing inference performance with minimal code changes.

By the end of this notebook, you will have a practical understanding of applying Intel's OpenVINO optimizations to Hugging Face models, preparing you to deploy efficient AI solutions in edge computing environments.

Let's embark on this journey of optimized model performance!


In [1]:
!source /opt/intel/oneapi/setvars.sh #comment out if not running on Intel Developer Cloud Jupyter
!pip install transformers==4.37.2
!pip install optimum==1.16.2
!pip install --upgrade-strategy eager optimum[openvino,nncf]==1.16.2

 
   To force a re-execution of setvars.sh, use the '--force' option.
   Using '--force' can result in excessive use of your environment variables.
  
usage: source setvars.sh [--force] [--config=file] [--help] [...]
  --force        Force setvars.sh to re-run, doing so may overload environment.
  --config=file  Customize env vars using a setvars.sh configuration file.
  --help         Display this help message and exit.
  ...            Additional args are passed to individual env/vars.sh scripts
                 and should follow this script's arguments.
  
  Some POSIX shells do not accept command-line options. In that case, you can pass
  command-line options via the SETVARS_ARGS environment variable. For example:
  
  $ SETVARS_ARGS="ia32 --config=config.txt" ; export SETVARS_ARGS
  $ . path/to/setvars.sh
  
  The SETVARS_ARGS environment variable is cleared on exiting setvars.sh.
  
Defaulting to user installation because normal site-packages is not writeable
[0mDefaulting to us

# Let's test Optimum Intel For OpenVINO

#### Importing Libraries for Model Optimization

This cell is the foundation of our journey into model optimization. Here, we import:
- `OVModelForCausalLM` from `optimum.intel`, which is a specialized model class that integrates OpenVINO optimizations with Hugging Face models.
- `AutoTokenizer` and `pipeline` from Hugging Face's `transformers` library, essential for tokenizing input data and creating a pipeline for text generation.

In [2]:
from optimum.intel import OVModelForCausalLM
from transformers import AutoTokenizer, pipeline

INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino


Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


#### Model and Tokenizer Loading

In this cell, we load the model and tokenizer:
- `model_id` is set to "helenai/gpt2-ov", a GPT-2 model optimized using OpenVINO.
- `OVModelForCausalLM.from_pretrained` is used to load the optimized model, ensuring that it's ready for efficient inference.
- `AutoTokenizer.from_pretrained` prepares the tokenizer corresponding to our model, crucial for processing input text.m

In [3]:
model_id = "helenai/gpt2-ov"
model = OVModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

Provided model does not contain state. It may lead to sub-optimal performance.Please reexport model with updated OpenVINO version >= 2023.3.0 calling the `from_pretrained` method with original model and `export=True` parameter
Compiling the model to CPU ...


#### Setting Up the Inference Pipeline

We set up a pipeline for text generation:
- `pipeline("text-generation")` is initialized with our OpenVINO-optimized model and tokenizer, creating an efficient text generation pipeline.


In [4]:
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

device must be of type <class 'str'> but got <class 'torch.device'> instead


#### Generating Text with the Optimized Model

In this final cell, we use our pipeline to generate text:
- The pipeline is fed with a prompt ("In the sprint, beautiful flowers bloom..."), and we observe the model's output, showcasing the efficiency and performance of our OpenVINO-optimized model in generating text.

In [5]:
pipe("In the spring, beautiful flowers bloom...")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In the spring, beautiful flowers bloom...\n\nDingdong, Zhejiang, China, USA\n\nMay 7th - 18th (in a little over a week) - A mysterious bird or two roosted over a field'}]

# Conclusion and Discussion

### Conclusion

Throughout this workshop, we've explored the potent combination of Hugging Face's Transformers with Intel's OpenVINO toolkit through Optimum Intel. This integration not only enhances the efficiency of model inference on edge devices but also simplifies the deployment process on Intel hardware.

### Discussion

The skills acquired in this session are invaluable for developers looking to deploy AI models in edge computing environments. Understanding how to apply OpenVINO optimizations to Hugging Face models empowers developers to create more efficient, faster-performing applications, crucial in the fast-evolving landscape of AI and ML.

As we continue to innovate in AI deployment, the knowledge of optimizing models for edge devices will remain a key asset in the toolkit of AI practitioners and developers.