# Lab 4. Retrieval-augmented generation (RAG)

## 1. Basic Completion and Chat
### Download Qwen2

In [1]:
from pathlib import Path
from modelscope import snapshot_download
llm_model_id = "snake7gun/Qwen2-7B-Instruct-int4-ov"
llm_local_path  = "./model/snake7gun/Qwen2-7B-Instruct-int4-ov"

if not Path(llm_local_path).exists():
    model_dir = snapshot_download(llm_model_id, cache_dir="./model/")

### Initialize LLM

In [2]:
from llama_index.llms.openvino import OpenVINOLLM

ov_config = {
    "PERFORMANCE_HINT": "LATENCY",
    "NUM_STREAMS": "1",
    "CACHE_DIR": "",
}

def completion_to_prompt(completion):
   return f"<|im_start|>system\n<|im_end|>\n<|im_start|>user\n{completion}<|im_end|>\n<|im_start|>assistant\n"

def messages_to_prompt(messages):
    prompt = ""
    for message in messages:
        if message.role == "system":
            prompt += f"<|im_start|>system\n{message.content}<|im_end|>\n"
        elif message.role == "user":
            prompt += f"<|im_start|>user\n{message.content}<|im_end|>\n"
        elif message.role == "assistant":
            prompt += f"<|im_start|>assistant\n{message.content}<|im_end|>\n"

    if not prompt.startswith("<|im_start|>system"):
        prompt = "<|im_start|>system\n" + prompt

    prompt = prompt + "<|im_start|>assistant\n"

    return prompt

ov_llm = OpenVINOLLM(
    model_id_or_path=llm_local_path,
    context_window=3900,
    max_new_tokens=1024,
    model_kwargs={"ov_config": ov_config},
    generate_kwargs={"pad_token_id": 32000, "do_sample": False, "temperature": None, "top_p": None},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    device_map="cpu",
)









Compiling the model to CPU ...


### Call complete with a prompt

In [3]:
response = ov_llm.stream_complete("What is OpenVINO ?")

for r in response:
    print(r.delta, end="")



 OpenVINO is an open-source toolkit developed by Intel that provides a set of tools and libraries for building high-performance computer vision applications. It includes pre-trained models, inference engines, and development tools to help developers quickly deploy computer vision models on a variety of platforms, including CPUs, GPUs, and FPGAs.
OpenVINO supports a wide range of computer vision tasks, including object detection, object recognition, image classification, and more. It also includes support for popular deep learning frameworks such as TensorFlow, Caffe, and ONNX, making it easy to integrate with existing machine learning workflows.
One of the key features of OpenVINO is its ability to optimize models for specific hardware platforms, allowing developers to achieve high performance and low latency on a variety of devices. This makes it well-suited for use in a range of applications, from edge devices like smartphones and IoT sensors to data centers and cloud environments.
O

## 2. Basic RAG (Vector Search, Summarization)
### Export Embedding model

In [11]:
embedding_model_id = "BAAI/bge-small-zh-v1.5"
embedding_model_path = "./model/bge-small-zh-v1.5-ov"

if not Path(embedding_model_path).exists():
    !optimum-cli export openvino --model {embedding_model_id} --task feature-extraction {embedding_model_path}

### Initialize Embedding model

In [12]:
from llama_index.embeddings.huggingface_openvino import OpenVINOEmbedding

ov_embedding = OpenVINOEmbedding(model_id_or_path=embedding_model_path, device="CPU")

Compiling the model to CPU ...


### Basic RAG (Vector Search)

In [13]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex, Settings

Settings.embed_model = ov_embedding
Settings.llm = ov_llm

reader = SimpleDirectoryReader(
    input_files=["./examples/text_example_cn.pdf"]
)
documents = reader.load_data()
index = VectorStoreIndex.from_documents(
    documents,
)
query_engine = index.as_query_engine(streaming=True, similarity_top_k=2)

In [9]:
streaming_response = query_engine.query("英特尔博锐® Enterprise系统提供哪些功能?")
streaming_response.print_response_stream()

英特尔博锐 Enterprise 系统提供了以下一系列强大且多功能的安全和技术组合：

1. **动态信任根**：确保系统的初始启动过程可信，防止恶意软件在系统启动阶段进行攻击。

2. **系统管理模式 (SMM)**：提供对系统管理级操作的保护，防止非授权访问和修改。

3. **多密钥支持的加密**：通过使用多个密钥，增强了数据加密的安全性，确保敏感信息的安全传输和存储。

4. **操作系统内核保护**：保护操作系统的核心部分免受攻击，确保基础系统的稳定性和安全性。

5. **KVM 控制的带外管理**：允许远程访问和控制计算机，即使在没有直接物理访问的情况下也能进行管理操作。

6. **唯一标识符**：为每个设备提供唯一的识别码，便于管理和追踪。

7. **设备历史记录**：记录设备的使用历史和状态变化，有助于故障诊断和维护。

8. **可管理性插件**：提供额外的功能和工具，增强设备管理软件的能力，并支持 AIOps（All-In-One Operations）。

这些功能结合了强大的安全性和可管理性技术，构成了英特尔博锐平台的基础，为各种规模的组织提供差异化功能。

In [10]:
streaming_response = query_engine.query("相比英特尔之前的移动处理器产品，英特尔®酷睿™ Ultra处理器的AI推理性能提升了多少？")
streaming_response.print_response_stream()

相较于英特尔移动移动处理器产品，英特尔酷睿 Ultra Ultra处理器的AI推理性能最高提升了2.5倍。