# OpenVino
OpenVINO is an open-source toolkit for optimizing and deploying deep learning models from cloud to edge. It accelerates deep learning inference across various use cases, such as generative AI, video, audio, and language with models from popular frameworks like PyTorch, TensorFlow, ONNX, and more. Convert and optimize models, and deploy across a mix of Intel® hardware and environments, on-premises and on-device, in the browser or in the cloud

https://github.com/openvinotoolkit/openvino_notebooks/wiki?source=post_page-----45f066b3059a--------------------------------

# Neural Network Compression Framework (NNCF)
Neural Network Compression Framework (NNCF) provides a suite of post-training and training-time algorithms for optimizing inference of neural networks in OpenVINO™ with a minimal accuracy drop.
NNCF is designed to work with models from PyTorch, TensorFlow, ONNX and OpenVINO™.

https://github.com/openvinotoolkit/nncf

# LLama 3.1-8B Instruct

https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

# Optimum

Optimum is an extension of Transformers that provides a set of performance optimization tools to train and run models on targeted hardware with maximum efficiency.

https://huggingface.co/docs/optimum/index

![title](Llama3.png)

In [None]:
# 
# python -m pip install --upgrade pip
# pip install -U --pre openvino-genai openvino openvino-tokenizers[transformers] --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly 
# pip install --extra-index-url https://download.pytorch.org/whl/cpu "git+https://github.com/huggingface/optimum-intel.git" "git+https://github.com/openvinotoolkit/nncf.git" "onnx<=1.16.1" 


 # accelerate-0.34.2 safetensors-0.4.5  transformers-4.43.1

# Tranform the model to Format IR
```
optimum-cli export openvino --model meta-llama/Meta-Llama-3.1-8B-Instruct --task text-generation-with-past --weight-format int4 --group-size 128 --ratio 1.0 --sym llama-3.1-8b-instruct/INT4_compressed_weights
```

# Replace the variable chat_template with tokenizer_config.json

"chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}",

In [1]:
import openvino_genai 
def streamer(subword): 
    print(subword, end='', flush=True) 
    # Return flag corresponds to whether generation should be stopped. 
    # False means continue generation. 
    return False 

In [2]:
model_dir = "./llama-3.1-8b-instruct/INT4_compressed_weights/"

In [3]:
device = 'CPU'  # GPU can be used as well 
pipe = openvino_genai.LLMPipeline(model_dir, device) 

In [None]:
config = openvino_genai.GenerationConfig() 
config.max_new_tokens = 100 
pipe.start_chat() 
while True: 
    prompt = input('question:\n') 
    if 'Stop!' == prompt: 
        break 
    pipe.generate(prompt, config, streamer) 
    print('\n----------') 
pipe.finish_chat() 

question:
 what it is a llm?


LLM stands for Large Language Model. It's a type of artificial intelligence (AI) model that is trained on a massive corpus of text data to generate human-like language responses. LLMs are designed to understand and generate human language, including nuances, idioms, and context-dependent expressions.

LLMs are typically trained on a large dataset of text, which allows them to learn patterns, relationships, and structures of language. This training enables them to generate text that is coherent, contextually relevant,
----------
