# Build RAG with HuggingFace and Milvus

[**Milvus**](https://milvus.io/) is a popular open-source vector database that powers AI applications with highly performant and scalable vector similarity search. In this example, we will build a RAG pipeline with HuggingFace and Milvus.

The RAG system combines a retrieval system with an LLM. The system first retrieves relevant documents from a corpus using Milvus vector database, then uses an LLM hosted in HugingFace to generate answers based on the retrieved documents.

## Setups

In [None]:
!pip install -qU pymilvus sentence-transformers huggingface-hub langchain_community langchain-text-splitters pypdf tqdm

In [None]:
from huggingface_hub import notebook_login
notebook_login()

## Prepare data

We will use the [AI Act PDF](https://artificialintelligenceact.eu/wp-content/uploads/2021/08/The-AI-Act.pdf), a regulatory framework for AI with different risk levels corresponding to more or less regulation, as the private knowledge in our RAG.

In [3]:
!wget -q https://artificialintelligenceact.eu/wp-content/uploads/2021/08/The-AI-Act.pdf

We use the `PyPDFReader` from LangChain to extract the text from the PDF, and then split the text into smaller chunks. By default, we set the chunk size as 1000 and the overlap as 200.

In [5]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = PyPDFLoader("./The-AI-Act.pdf")
docs = loader.load()
print(f"{len(docs)} documents loaded")

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
chunks = text_splitter.split_documents(docs)
text_lines = [chunk.page_content for chunk in chunks]
print(f"{len(chunks)} chunks created")

108 documents loaded
424 chunks created


## Prepare model

We define a function to generate text embeddings. We use [`BAAI/bge-small-en-v1.5`](https://huggingface.co/BAAI/bge-small-en-v1.5) embedding model as an example.

In [6]:
from sentence_transformers import SentenceTransformer

embedding_model = SentenceTransformer("BAAI/bge-small-en-v1.5")

def emb_text(text):
    return embedding_model.encode([text], normalize_embeddings=True).tolist()[0]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Then we can generate a test embedding and print its dimension and first few elements.

In [7]:
test_embedding = emb_text("This is a test text")
embedding_dim = len(test_embedding)
print(f"Embedding dimension: {embedding_dim}")
print(f"Embedding example: {test_embedding[:10]}")

Embedding dimension: 384
Embedding example: [-0.0744439885020256, 0.042849473655223846, 0.011254151351749897, 0.0016748392954468727, 0.017243895679712296, 0.011638454161584377, 0.029011037200689316, 0.05034235492348671, -0.001127515803091228, -0.031293533742427826]


## Load data into Milvus

### Create the collection

In [8]:
from pymilvus import MilvusClient

milvus_client = MilvusClient(uri='./hf_milvus_demo.db')

collection_name = 'rag_collection'

Note:
* Setting the `uri` as a local file, e.g., `./hf_milvus_demo.db`, is the most convenient method, as it automatically utilizes MilvusLite to store all data in this file.
* If we have a large amount of data, we can set up a more performant Milvus server on DOcker or Kubernetes, and then user the server uri.
* We can also use Zilliz Cloud, the fully managed cloud services for Milvus. We can adjust the `uri` and `token` which correspond to the Public Endpoint and API key in Zilliz cloud.

We need to check if the collection already exists and drop it if it does:

In [9]:
if milvus_client.has_collection(collection_name):
    milvus_client.drop_collection(collection_name)

Now we can create a new collection with specified parameters. If we do not specify any field information, Milvus will automatically create a default `id` field for primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values.

In [10]:
milvus_client.create_collection(
    collection_name=collection_name,
    dimension=embedding_dim,
    metric_type='IP', # Inner Product distance
    consistency_level='Strong', # Strong consistency level
)

### Insert data

After creating a collection, we will iterate through the text lines, create embeddings, and then insert the data into Milvus.

We will create a new field `text`, which is a non-defined field in the collection schema. It will be automatically added to the reserved JSON dynamic field, which can be treated as a normal field at a high level.

In [11]:
from tqdm import tqdm

data = []

for i, line in enumerate(tqdm(text_lines, desc='Creating embeddings')):
    data.append({
        'id': i,
        'vector': emb_text(line),
        'text': line
    })

insert_res = milvus_client.insert(
    collection_name=collection_name,
    data=data
)

Creating embeddings: 100%|██████████| 424/424 [00:59<00:00,  7.08it/s]


In [12]:
insert_res['insert_count']

424

## Build RAG

### Retrieve data for a query

We can specify a question to ask about the corpus.

In [13]:
question = "What is the legal basis for the proposal?"

We will search for the question in the collection and retrieve the top 3 semantic matches:

In [14]:
search_res = milvus_client.search(
    collection_name=collection_name,
    data=[emb_text(question)], # use `emb_text` to convert the question to an embedding vector
    limit=3, # return top-3 results
    search_params={
        'metric_type': 'IP',
        'params': {}
    }, # use inner product distance
    output_fields=['text'] # return the text field only
)

Check the search results of the query:

In [15]:
import json

retrieved_lines_with_distances = [
    (res['entity']['text'], res['distance'])
    for res in search_res[0]
]
print(json.dumps(retrieved_lines_with_distances, indent=4))

[
    [
        "EN 6  EN \n2. LEGAL BASIS, SUBSIDIARITY AND PROPORTIONALITY \n2.1. Legal basis \nThe legal basis for the proposal is in the first place Article 114 of the Treaty on the \nFunctioning of the European Union (TFEU), which provides for the adoption of measures to \nensure the establishment and functioning of the internal market.  \nThis proposal constitutes a core part of the EU digital single market strategy. The primary \nobjective of this proposal is to ensure the proper functioning of the internal market by setting \nharmonised rules in particular on the development, placing on the Union market and the use \nof products and services making use of AI technologies or provided as stand -alone AI \nsystems. Some Member States are already considering national rules to ensure that AI is safe \nand is developed and used in compliance with fundamental rights obligations. This will likely \nlead to two main problems: i) a fragmentation of the internal market on essential elemen

### Use LLM to get an RAG response

Before composing the prompt for LLM, we need to first flatten the retrieved document list into a plain string:

In [17]:
context = "\n".join([
    line[0] for line in retrieved_lines_with_distances
])

Define the prompts for the language model. This prompt is assembled with the retrieved documents from Milvus.

In [18]:
PROMPT = """
Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
{context}
</context>
<question>
{question}
</question>
"""

We will use the [`Mixtral-8x7B-Instruct-v0.1`](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) hosted on HuggingFace inference server to generate a response based on the prompt:

In [19]:
from huggingface_hub import InferenceClient

repo_id = 'mistralai/Mixtral-8x7B-Instruct-v0.1'
llm_client = InferenceClient(model=repo_id, timeout=120)

Finally, we can format the prompt and generate the answer:

In [20]:
prompt = PROMPT.format(context=context, question=question)

answer = llm_client.text_generation(
    prompt,
    max_new_tokens=1000
).strip()
answer

'The legal basis for the proposal is Article 114 of the Treaty on the Functioning of the European Union (TFEU), which provides for the adoption of measures to ensure the establishment and functioning of the internal market.'