<a href="https://colab.research.google.com/github/nicklamb97/instructlab-poc/blob/main/openroad_rag_with_milvus.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Build RAG with Hugging Face and Milvus

_Authored by: [Chen Zhang](https://github.com/zc277584121)_


[Milvus](https://milvus.io/) is a popular open-source vector database that powers AI applications with highly performant and scalable vector similarity search. In this tutorial, we will show you how to build a RAG (Retrieval-Augmented Generation) pipeline with Hugging Face and Milvus.

The RAG system combines a retrieval system with LLM. The system first retrieves relevant documents from a corpus using Milvus vector database, then uses an LLM hosted Hugging Face to generate answer based on the retrieved documents.

## Preparation
### Dependencies and Environment

In [None]:
! pip install --upgrade pymilvus grpcio==1.64.1 protobuf==3.20.3 sentence-transformers huggingface-hub langchain_community langchain-text-splitters pypdf tqdm

Collecting grpcio==1.64.1
  Downloading grpcio-1.64.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.3 kB)
Collecting huggingface-hub
  Downloading huggingface_hub-0.24.5-py3-none-any.whl.metadata (13 kB)
INFO: pip is looking at multiple versions of pymilvus to determine which version is compatible with other requirements. This could take a while.
Collecting pymilvus
  Using cached pymilvus-2.4.4-py3-none-any.whl.metadata (5.4 kB)
  Using cached pymilvus-2.4.3-py3-none-any.whl.metadata (5.3 kB)
  Using cached pymilvus-2.4.2-py3-none-any.whl.metadata (5.3 kB)
  Using cached pymilvus-2.4.1-py3-none-any.whl.metadata (5.1 kB)
INFO: pip is still looking at multiple versions of pymilvus to determine which version is compatible with other requirements. This could take a while.
  Using cached pymilvus-2.4.0-1-py3-none-any.whl.metadata (4.5 kB)
  Using cached pymilvus-2.3.7-py3-none-any.whl.metadata (4.4 kB)
  Using cached pymilvus-2.3.6-py3-none-any.whl.metadata (4.4 kB

In addition, we recommend that you configure your [Hugging Face User Access Token](https://huggingface.co/docs/hub/security-tokens), and set it in your environment variables because we will use a LLM from the Hugging Face Hub. You may get a low limit of requests if you don't set the token environment variable.

In [None]:
import os

os.environ["HF_TOKEN"] = "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"

### Prepare the data

We use the [AI Act PDF](https://artificialintelligenceact.eu/wp-content/uploads/2021/08/The-AI-Act.pdf), a regulatory framework for AI with different risk levels corresponding to more or less regulation, as the private knowledge in our RAG.

In [None]:
%%bash

if [ ! -f "The-AI-Act.pdf" ]; then
    wget -q https://artificialintelligenceact.eu/wp-content/uploads/2021/08/The-AI-Act.pdf
fi

We use the [`PyPDFLoader`](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/pdf/) from LangChain to extract the text from the PDF, and then split the text into smaller chunks. By default, we set the chunk size as 1000 and the overlap as 200, which means each chunk will nearly have 1000 characters and the overlap between two chunks will be 200 characters.

In [None]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("./The-AI-Act.pdf")
docs = loader.load()
print(len(docs))

108


In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(docs)

In [None]:
text_lines = [chunk.page_content for chunk in chunks]

### Prepare the Embedding Model
Define a function to generate text embeddings. We use [BGE embedding model](https://huggingface.co/BAAI/bge-small-en-v1.5) as an example, but you can use any embedding models, such as those found on the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard).

In [None]:
from sentence_transformers import SentenceTransformer

embedding_model = SentenceTransformer("BAAI/bge-small-en-v1.5")

def emb_text(text):
    return embedding_model.encode([text], normalize_embeddings=True).tolist()[0]

  from tqdm.autonotebook import tqdm, trange


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Generate a test embedding and print its dimension and first few elements.

In [None]:
test_embedding = emb_text("What is a procedure?")
embedding_dim = len(test_embedding)
print(embedding_dim)
print(test_embedding[:10])

384
[0.08076542615890503, -0.0036146771162748337, 0.04314689338207245, -0.058242496103048325, -0.0476657971739769, -0.010763049125671387, 0.08234911412000656, 0.028488924726843834, 0.05461260676383972, 0.019323859363794327]


## Load data into Milvus

### Create the Collection

In [None]:
!pip install --upgrade pymilvus

Collecting pymilvus
  Using cached pymilvus-2.4.4-py3-none-any.whl.metadata (5.4 kB)
Collecting grpcio<=1.63.0,>=1.49.1 (from pymilvus)
  Using cached grpcio-1.63.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.2 kB)
Using cached pymilvus-2.4.4-py3-none-any.whl (196 kB)
Using cached grpcio-1.63.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.6 MB)
Installing collected packages: grpcio, pymilvus
  Attempting uninstall: grpcio
    Found existing installation: grpcio 1.64.1
    Uninstalling grpcio-1.64.1:
      Successfully uninstalled grpcio-1.64.1
  Attempting uninstall: pymilvus
    Found existing installation: pymilvus 2.1.1
    Uninstalling pymilvus-2.1.1:
      Successfully uninstalled pymilvus-2.1.1
Successfully installed grpcio-1.63.0 pymilvus-2.4.4


In [None]:
from pymilvus import MilvusClient

milvus_client = MilvusClient(uri="./hf_milvus_demo.db")

collection_name = "rag_collection"

DEBUG:pymilvus.milvus_client.milvus_client:Created new connection using: 985587b1b8b94560a2079475084449ba


> As for the argument of `MilvusClient`:
> - Setting the `uri` as a local file, e.g.`./hf_milvus_demo.db`, is the most convenient method, as it automatically utilizes [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to store all data in this file.
> - If you have a large amount of data, say more than a million vectors, you can set up a more performant Milvus server on [Docker or Kubernetes](https://milvus.io/docs/quickstart.md). In this setup, please use the server uri, e.g.`http://localhost:19530`, as your `uri`.
> - If you want to use [Zilliz Cloud](https://zilliz.com/cloud), the fully managed cloud service for Milvus, adjust the `uri` and `token`, which correspond to the [Public Endpoint and Api key](https://docs.zilliz.com/docs/on-zilliz-cloud-console#cluster-details) in Zilliz Cloud.


Check if the collection already exists and drop it if it does.

In [None]:
if milvus_client.has_collection(collection_name):
    milvus_client.drop_collection(collection_name)

Create a new collection with specified parameters.

If we don't specify any field information, Milvus will automatically create a default `id` field for primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values.

In [None]:
milvus_client.create_collection(
    collection_name=collection_name,
    dimension=embedding_dim,
    metric_type="IP",  # Inner product distance
    consistency_level="Strong",  # Strong consistency level
)

DEBUG:pymilvus.milvus_client.milvus_client:Successfully created collection: rag_collection
DEBUG:pymilvus.milvus_client.milvus_client:Successfully created an index on collection: rag_collection


### Insert data
Iterate through the text lines, create embeddings, and then insert the data into Milvus.

Here is a new field `text`, which is a non-defined field in the collection schema. It will be automatically added to the reserved JSON dynamic field, which can be treated as a normal field at a high level.

In [None]:
from tqdm import tqdm

data = []

for i, line in enumerate(tqdm(text_lines, desc="Creating embeddings")):
    data.append({"id": i, "vector": emb_text(line), "text": line})

insert_res = milvus_client.insert(collection_name=collection_name, data=data)
insert_res["insert_count"]

Creating embeddings: 100%|██████████| 429/429 [01:37<00:00,  4.40it/s]


429

## Build RAG

### Retrieve data for a query

Let's specify a question to ask about the corpus.

In [None]:
question = "How would you write a SELECT statement to retrieve all columns from a table called employees where the salary is greater than 50,000, order the results by department in ascending order, and eliminate duplicate rows from the result set?"

Search for the question in the collection and retrieve the top 3 semantic matches.

In [None]:
search_res = milvus_client.search(
    collection_name=collection_name,
    data=[
        emb_text(question)
    ],  # Use the `emb_text` function to convert the question to an embedding vector
    limit=3,  # Return top 3 results
    search_params={"metric_type": "IP", "params": {}},  # Inner product distance
    output_fields=["text"],  # Return the text field
)
print(emb_text(question))

[-0.05265034735202789, -0.03726352006196976, -0.01743076741695404, -0.019107995554804802, -0.01624126173555851, -0.013605906628072262, 0.029931487515568733, -0.021267421543598175, 0.056143682450056076, 0.057182107120752335, 0.049973562359809875, 0.04000922292470932, 0.06512730568647385, -0.030681049451231956, -0.054181504994630814, -0.007790513336658478, -0.002776092616841197, -0.0034351476933807135, -0.011288275942206383, -0.003327255370095372, -0.011132842861115932, -0.032972127199172974, -0.09136775881052017, -0.040515828877687454, 0.03133782371878624, 0.015110987238585949, -0.05216806009411812, -0.028298689052462578, -0.08330659568309784, -0.12575149536132812, -0.005023175850510597, -0.02972816489636898, 0.03524608537554741, -0.03206444904208183, 0.0773937776684761, -0.051493480801582336, 0.0007041657227091491, 0.04024319350719452, -0.0022025073412805796, -0.019878830760717392, 0.027362918481230736, -0.04105452075600624, -0.012474166229367256, -0.015220294706523418, -0.009550605900

Let's take a look at the search results of the query


In [None]:
import json

retrieved_lines_with_distances = [
    (res["entity"]["text"], res["distance"]) for res in search_res[0]
]
print(json.dumps(retrieved_lines_with_distances, indent=4))

[
    [
        "EN 101  EN under HEADINGS 1 to 7  \nof the multiannual financial framework   Payments   2.370  1.870  1.870  1.870  1.870  9.850",
        0.5671846270561218
    ],
    [
        "operational headings)  (6)         \nTOTAL appropriations  \nunder HEADINGS 1 to 6  \nof the multiannual financial framework  \n(Reference amount)  Commitments  =4+ 6          \nPayments  =5+ 6",
        0.5626274347305298
    ],
    [
        "- Output                    \nSubtotal for specific objective No 2                  \nTOTALS    13 0.240  13 0.240  13 0.240  13 0.240  13 0.240  13 0.100  65 2.200  \n                                                 \n73 All figures in this column are indicative and subject to the continuation of the programmes and availability of appropriation s \n74 As described in point 1.4.2. \u2018Specific objective(s)\u2026\u2019",
        0.5533467531204224
    ]
]


### Use LLM to get an RAG response

Before composing the prompt for LLM, let's first flatten the retrieved document list into a plain string.

In [None]:
context = "\n".join(
    [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]
)
print(context)

EN 101  EN under HEADINGS 1 to 7  
of the multiannual financial framework   Payments   2.370  1.870  1.870  1.870  1.870  9.850
operational headings)  (6)         
TOTAL appropriations  
under HEADINGS 1 to 6  
of the multiannual financial framework  
(Reference amount)  Commitments  =4+ 6          
Payments  =5+ 6
- Output                    
Subtotal for specific objective No 2                  
TOTALS    13 0.240  13 0.240  13 0.240  13 0.240  13 0.240  13 0.100  65 2.200  
                                                 
73 All figures in this column are indicative and subject to the continuation of the programmes and availability of appropriation s 
74 As described in point 1.4.2. ‘Specific objective(s)…’


Define prompts for the Language Model. This prompt is assembled with the retrieved documents from Milvus.

In [None]:
PROMPT = """
Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
{context}
</context>
<question>
{question}
</question>
"""
print(PROMPT)


Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
{context}
</context>
<question>
{question}
</question>



We use the [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) hosted on Hugging Face inference server to generate a response based on the prompt.

In [None]:
from huggingface_hub import InferenceClient

repo_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"

llm_client = InferenceClient(model=repo_id, timeout=120)

Finally, we can format the prompt and generate the answer.

In [None]:
prompt = PROMPT.format(context=context, question=question)
print(prompt)


Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
EN 101  EN under HEADINGS 1 to 7  
of the multiannual financial framework   Payments   2.370  1.870  1.870  1.870  1.870  9.850
operational headings)  (6)         
TOTAL appropriations  
under HEADINGS 1 to 6  
of the multiannual financial framework  
(Reference amount)  Commitments  =4+ 6          
Payments  =5+ 6
- Output                    
Subtotal for specific objective No 2                  
TOTALS    13 0.240  13 0.240  13 0.240  13 0.240  13 0.240  13 0.100  65 2.200  
                                                 
73 All figures in this column are indicative and subject to the continuation of the programmes and availability of appropriation s 
74 As described in point 1.4.2. ‘Specific objective(s)…’
</context>
<question>
How would you write a SELECT statement to retrieve all columns from a table called employees where the salary is 

In [None]:
answer = llm_client.text_generation(
    prompt,
    max_new_tokens=1000,
).strip()
print(answer)

To retrieve all columns from a table called employees where the salary is greater than 50,000, order the results by department in ascending order, and eliminate duplicate rows from the result set, you can use the following SELECT statement:
```
SELECT DISTINCT *
FROM employees
WHERE salary > 50000
ORDER BY department ASC;
```
The `DISTINCT` keyword is used to eliminate duplicate rows from the result set. The `WHERE` clause is used to filter the rows where the salary is greater than 50,000. The `ORDER BY` clause is used to sort the results by department in ascending order.


Congratulations! You have built an RAG pipeline with Hugging Face and Milvus.