# Build RAG with Milvus

## Preparation

---

### Models and data

```bash
$ just prepare-for-build-rag
```

### Import required packages

In [1]:
from glob import glob
import json

from pymilvus import connections, db, MilvusClient
import ollama
from tqdm import tqdm
from ollama import chat, ChatResponse

In [2]:
text_lines = []

for file_path in glob("../../milvus_docs/en/faq/*.md", recursive=True):
    with open(file_path, "r") as file:
        file_text = file.read()

    text_lines += file_text.split("# ")

print(text_lines)

['---\nid: operational_faq.md\nsummary: Find answers to commonly asked questions about operations in Milvus.\ntitle: Operational FAQ\n---\n\n', 'Operational FAQ\n\n<!-- TOC -->\n\n\n<!-- /TOC -->\n\n###', 'What if I failed to pull the Milvus Docker image from Docker Hub?\n\nIf you failed to pull the Milvus Docker image from Docker Hub, try adding other registry mirrors. \n\nUsers from Mainland China can add the URL "https://registry.docker-cn.com" to the registry-mirrors array in **/etc.docker/daemon.json**.\n\n```\n{\n  "registry-mirrors": ["https://registry.docker-cn.com"]\n}\n```\n\n###', 'Is Docker the only way to install and run Milvus?\n\nDocker is an efficient way to deploy Milvus, but not the only way. You can also deploy Milvus from source code. This requires Ubuntu (18.04 or higher) or CentOS (7 or higher). See [Building Milvus from Source Code](https://github.com/milvus-io/milvus#build-milvus-from-source-code) for more information.\n\n###', 'What are the main factors affecti

## Prepare the Embedding Model

In [3]:
def emb_text(text):
    response = ollama.embed(model="mxbai-embed-large", input=text)
    embeddings = response["embeddings"]
    return embeddings

In [4]:
test_embedding = emb_text("This is a test")
embedding_dim = len(test_embedding[0])
print(embedding_dim)
print(test_embedding[:10])

1024
[[0.014159441, 0.025895605, 0.011998307, 0.028055206, -0.02800642, -0.0085993465, -0.011109261, -0.0046244604, 0.02432652, 0.05071807, 0.016894067, 0.017668504, 0.021578614, -0.02930977, -0.020609075, -0.013211932, -0.035605323, 0.030856617, -0.04253828, -0.03248645, -0.030339457, 0.02472977, -0.01953667, -0.01729041, -0.024510749, 0.017018672, -0.011180677, 0.011992801, 0.031210108, 0.045644164, -0.011119481, -0.0034905442, -0.013345559, -0.05504599, 0.012998305, -0.035329062, 0.07305908, -0.026254898, -0.010768775, -0.051466722, -0.01442618, 0.024714027, 0.017681977, -0.00506723, -0.053373333, -0.040964432, -0.015954886, -0.021948904, -0.009993038, 0.007816579, 0.009196891, 0.011241326, 0.010452717, -0.003016334, 0.012232448, -0.024015814, -0.039357126, -0.018959405, -0.052015603, 0.04875448, 0.04730713, 0.008712056, 0.018479757, -0.06160156, 0.008042228, -0.020069273, -0.0042375885, -0.008524245, 0.012329032, -0.032960184, -0.01296901, 0.034926552, 0.0041084113, -0.030665128, -

## Load data into Milvus

---

### Create the Collection

In [5]:
uri = "http://localhost:19530"
db_name = "milvus_demo"
collection_name = "my_rag_collection"

# DB 생성
connections.connect(uri=uri)
if db_name not in db.list_database():
    db.create_database(db_name)

In [6]:
milvus_client = MilvusClient(uri=uri)

# Check if the collection already exists and drop it if it does.
if milvus_client.has_collection(collection_name):
    milvus_client.drop_collection(collection_name)

In [7]:
milvus_client.create_collection(
    collection_name=collection_name,
    dimension=embedding_dim,
    metric_type="IP",  # Inner product distance
    consistency_level="Strong",
    # Supported values are (`"Strong"`, `"Session"`, `"Bounded"`, `"Eventually"`). See https://milvus.io/docs/consistency.md#Consistency-Level for more details.
)

### Insert data

In [8]:
data = []

for i, line in enumerate(tqdm(text_lines, desc="Creating embeddings")):
    data.append({"id": i, "vector": emb_text(line)[0], "text": line})

milvus_client.insert(collection_name=collection_name, data=data)

Creating embeddings: 100%|██████████| 72/72 [00:03<00:00, 23.73it/s]


{'insert_count': 72, 'ids': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71], 'cost': 0}

## Build RAG

---

### Retrieve data for a query

In [9]:
question = "How is data stored in milvus?"

Search for the question in the collection and retrieve the semantic top-3 matches.

In [10]:
search_res = milvus_client.search(
    collection_name=collection_name,
    data=[
        emb_text(question)[0]
    ],  # Use the `emb_text` function to convert the question to an embedding vector
    limit=3,  # Return top 3 results
    search_params={"metric_type": "IP", "params": {}},  # Inner product distance
    output_fields=["text"],  # Return the text field
)

Let’s take a look at the search results of the query

In [11]:
retrieved_lines_with_distances = [
    (res["entity"]["text"], res["distance"]) for res in search_res[0]
]
print(json.dumps(retrieved_lines_with_distances, indent=4))

[
    [
        " Where does Milvus store data?\n\nMilvus deals with two types of data, inserted data and metadata. \n\nInserted data, including vector data, scalar data, and collection-specific schema, are stored in persistent storage as incremental log. Milvus supports multiple object storage backends, including [MinIO](https://min.io/), [AWS S3](https://aws.amazon.com/s3/?nc1=h_ls), [Google Cloud Storage](https://cloud.google.com/storage?hl=en#object-storage-for-companies-of-all-sizes) (GCS), [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs), [Alibaba Cloud OSS](https://www.alibabacloud.com/product/object-storage-service), and [Tencent Cloud Object Storage](https://www.tencentcloud.com/products/cos) (COS).\n\nMetadata are generated within Milvus. Each Milvus module has its own metadata that are stored in etcd.\n\n###",
        0.8473492860794067
    ],
    [
        "How does Milvus flush data?\n\nMilvus returns success when inserted data are loaded to t

### Use LLM to get a RAG response

Convert the retrieved documents into a string format.

In [12]:
context = "\n".join(
    [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]
)
print(context)

 Where does Milvus store data?

Milvus deals with two types of data, inserted data and metadata. 

Inserted data, including vector data, scalar data, and collection-specific schema, are stored in persistent storage as incremental log. Milvus supports multiple object storage backends, including [MinIO](https://min.io/), [AWS S3](https://aws.amazon.com/s3/?nc1=h_ls), [Google Cloud Storage](https://cloud.google.com/storage?hl=en#object-storage-for-companies-of-all-sizes) (GCS), [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs), [Alibaba Cloud OSS](https://www.alibabacloud.com/product/object-storage-service), and [Tencent Cloud Object Storage](https://www.tencentcloud.com/products/cos) (COS).

Metadata are generated within Milvus. Each Milvus module has its own metadata that are stored in etcd.

###
How does Milvus flush data?

Milvus returns success when inserted data are loaded to the message queue. However, the data are not yet flushed to the disk. Then Milv

Define system and user prompts for the Language Model. This prompt is assembled with the retrieved documents from Milvus.

In [13]:
SYSTEM_PROMPT = """
Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.
"""
USER_PROMPT = f"""
Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
{context}
</context>
<question>
{question}
</question>
"""

print("SYSTEM PROMPT")
print("======================")
print(SYSTEM_PROMPT)
print("\nUSER PROMPT")
print("======================")
print(USER_PROMPT)

SYSTEM PROMPT

Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.


USER PROMPT

Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
 Where does Milvus store data?

Milvus deals with two types of data, inserted data and metadata. 

Inserted data, including vector data, scalar data, and collection-specific schema, are stored in persistent storage as incremental log. Milvus supports multiple object storage backends, including [MinIO](https://min.io/), [AWS S3](https://aws.amazon.com/s3/?nc1=h_ls), [Google Cloud Storage](https://cloud.google.com/storage?hl=en#object-storage-for-companies-of-all-sizes) (GCS), [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs), [Alibaba Cloud OSS](https://www.alibabacloud.com/product/object-storage-service), and [Tencent Cloud Object Storage](https://www.tencentcloud.

In [14]:
response: ChatResponse = chat(
    model='gemma3',
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": USER_PROMPT},
    ],
)
print(f"Question: {question}")
print(response['message']['content'])
# or access fields directly from the response object
print(response.message.content)

Question: How is data stored in milvus?
Milvus stores data in two ways:

*   **Inserted data** (including vector data, scalar data, and collection-specific schema) are stored in persistent storage as incremental logs, supporting backends like MinIO, AWS S3, Google Cloud Storage (GCS), Azure Blob Storage, Alibaba Cloud OSS, and Tencent Cloud Object Storage (COS).
*   **Metadata** is generated within Milvus and stored in etcd.

Additionally, Milvus searches both incremental data and historical data by loading them into memory when a query request comes. Incremental data are in growing segments buffered in memory, while historical data are from sealed segments stored in object storage.
Milvus stores data in two ways:

*   **Inserted data** (including vector data, scalar data, and collection-specific schema) are stored in persistent storage as incremental logs, supporting backends like MinIO, AWS S3, Google Cloud Storage (GCS), Azure Blob Storage, Alibaba Cloud OSS, and Tencent Cloud Objec

In [15]:
response: ChatResponse = chat(
    model='qwen2.5vl:3b',
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": USER_PROMPT},
    ],
)
print(f"Question: {question}")
print(response['message']['content'])
# or access fields directly from the response object
print(response.message.content)

Question: How is data stored in milvus?
Data in Milvus is stored in two types of ways: inserted data and metadata. Inserted data, which includes vector data, scalar data, and collection-specific schema, are stored in persistent storage as incremental log. Milvus supports multiple object storage backends, such as MinIO, AWS S3, Google Cloud Storage, Azure Blob Storage, Alibaba Cloud OSS, and Tencent Cloud Object Storage. Metadata are generated within Milvus and are stored in etcd.
Data in Milvus is stored in two types of ways: inserted data and metadata. Inserted data, which includes vector data, scalar data, and collection-specific schema, are stored in persistent storage as incremental log. Milvus supports multiple object storage backends, such as MinIO, AWS S3, Google Cloud Storage, Azure Blob Storage, Alibaba Cloud OSS, and Tencent Cloud Object Storage. Metadata are generated within Milvus and are stored in etcd.
