# Advanced RAG with LlamaIndex and Milvus

This notebook walks your through an advanced Retrieval-Augmented Generation (RAG) pipeline using LlamaIndex and Milvus.


## Prerequisites

### Installation
First, you need to install the required packages for LlamaIndex and Milvus. The python version 3.11 is used in this sample

In [None]:
# !pip3 install 'llama_index'
# !pip3 install 'milvus>=2.4.0' 'pymilvus>=2.4.0' openai
# !pip3 install 'llama-index-vector-stores-milvus'
# !pip3 install 'llama-index-embeddings-openai'
# !pip3 install 'llama-index-llms-openai'
# !pip3 install python-dotenv torch sentence-transformers

import llama_index
from importlib.metadata import version

print(f"LlamaIndex version: {version('llama_index')}")

LlamaIndex version: 0.10.36


### Start Milvus Service

There are 2 options to start a Milvus service:,
- [Zilliz Cloud](https://zilliz.com/cloud): Zilliz provides cloud-native service for Milvus. It simplifies the process of deploying and scaling vector search applications by eliminating the need to create and maintain complex data infrastructure. [Get Started Free!](https://cloud.zilliz.com/signup)
- [Open Source Milvus](https://milvus.io): You can install the open source Milvus using either Docker Compose or on Kubernetes.

Here, we use [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to start with a lightweight version of Milvus, which works seamlessly with Google Colab and Jupyter Notebook


In [None]:
#from milvus import default_server

# default_server.cleanup()  # Optional, run this line if you want to cleanup previous data
#default_server.start()

### Set your OpenAI API key

This tutorial uses an embedding model and LLM from OpenAI, for which you will need an API key set as an evironment variable.

In [None]:
# import os
# os.environ["OPENAI_API_KEY"] = "sk-..."
# print(os.environ["OPENAI_API_KEY"])

## Step 1: Define Embedding Model and LLM

First, you can define an embedding model and LLM in a global settings object.


In [None]:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.settings import Settings

Settings.llm = OpenAI(model="gpt-4-turbo", temperature=0.1)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

  from pandas.core import (


## Step 2: Load data


In [None]:
!mkdir -p 'data'
!wget 'https://publicdataset.zillizcloud.com/milvus_doc.md' -O 'data/milvus_doc.md'

--2024-05-14 20:20:45--  https://publicdataset.zillizcloud.com/milvus_doc.md
Connecting to 127.0.0.1:1087... connected.
Proxy request sent, awaiting response... 200 OK
Length: 5153 (5.0K) [binary/octet-stream]
Saving to: ‘data/milvus_doc.md’


2024-05-14 20:20:46 (328 MB/s) - ‘data/milvus_doc.md’ saved [5153/5153]



In [None]:
from llama_index.core import SimpleDirectoryReader

# Load data
documents = SimpleDirectoryReader(
        input_files=["./data/milvus_doc.md"]
).load_data()

print("Document ID:", documents[0].doc_id)

Document ID: ebe47875-ace0-4a96-b853-0b74e4c6d163


## Step 3: Chunk documents into Nodes

As the whole document is too large to fit into the context window of the LLM, you will need to partition it into smaller text chunks, which are called `Nodes` in LlamaIndex.

With the `SentenceWindowNodeParser` each sentence is stored as a chunk together with a larger window of text surrounding the original sentence as metadata.

In [None]:
from llama_index.core.node_parser import SentenceWindowNodeParser

# Create the sentence window node parser
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

# Extract nodes from documents
nodes = node_parser.get_nodes_from_documents(documents)

print(len(nodes))

53


## Step 4: Build the index

You will build the index that stores all the external knowledge in a Milvus vector database.


In [None]:
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.milvus import  MilvusVectorStore
from llama_index.core import StorageContext

vector_store = MilvusVectorStore(dim=1536,
                                 uri="http://localhost:19530",
                                 collection_name='advance_rag',
                                 overwrite=True,
                                 enable_sparse=True,
                                 hybrid_ranker="RRFRanker",
                                 hybrid_ranker_params={"k": 60})

storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex(
    nodes,
    storage_context=storage_context
)

2024-05-14 20:21:18.260686: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
  warn("The installed version of bitsandbytes was compiled without GPU support. "


'NoneType' object has no attribute 'cadam32bit_grad_fp32'


Sparse embedding function is not provided, using default.


Fetching 30 files:   0%|          | 0/30 [00:00<?, ?it/s]

  return self.fget.__get__(instance, owner)()


## Step 5: Setup the Query Engine

### Build the Metadata Replacement Post Processor
In advanced RAG, you can use the `MetadataReplacementPostProcessor` to replace the sentence in each node with it’s surrounding context as part of the sentence-window-retrieval method.

In [None]:
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

# The target key defaults to `window` to match the node_parser's default
postproc = MetadataReplacementPostProcessor(
    target_metadata_key="window"
)

### Add a re-ranker
For advanced RAG, you can also add a re-ranker, which re-ranks the retrieved context for its relevance to the query. Note, that you should retrieve a larger number of `similarity_top_k`, which will be reduced to `top_n`.

In [None]:
from llama_index.core.postprocessor import SentenceTransformerRerank

# BAAI/bge-reranker-base is a cross-encoder model
# link: https://huggingface.co/BAAI/bge-reranker-base
rerank = SentenceTransformerRerank(
    top_n = 3,
    model = "BAAI/bge-reranker-base"
)

Finally, you can put all components together in the query engine!

In [None]:
# The QueryEngine class is equipped with the generator and facilitates the retrieval and generation steps
query_engine = index.as_query_engine(
    similarity_top_k = 3,
    vector_store_query_mode="hybrid",  # Milvus starts supporting from version 2.4, use 'Default' for versions before 2.4
    node_postprocessors = [postproc, rerank],
)

## Step 6: Run an Advanced RAG Query on Your Data
Now, you can run advanced RAG queries on your data.

In [None]:
response = query_engine.query(
    "Can user delete milvus entities through non-primary key filtering?"
)
print(str(response))

Yes, users can delete Milvus entities through non-primary key filtering by using complex boolean expressions.


In [None]:
window = response.source_nodes[0].node.metadata["window"]
sentence = response.source_nodes[0].node.metadata["original_text"]

print(f"Window: {window}")
print("------------------")
print(f"Original Sentence: {sentence}")

Window: 

Delete Entities
This topic describes how to delete entities in Milvus.

 Milvus supports deleting entities by primary key or complex boolean expressions.  Deleting entities by primary key is much faster and lighter than deleting them by complex boolean expressions.  This is because Milvus executes queries first when deleting data by complex boolean expressions.

 Deleted entities can still be retrieved immediately after the deletion if the consistency level is set lower than Strong.

------------------
Original Sentence: Milvus supports deleting entities by primary key or complex boolean expressions. 


# References
* [Llamaindex docs: Milvus Vector Store](https://docs.llamaindex.ai/en/stable/examples/vector_stores/MilvusIndexDemo.html)
* [LlamaIndex docs: Metadata Replacement + Node Sentence Window](https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/MetadataReplacementDemo.html)
* [Advanced Retrieval-Augmented Generation: From Theory to LlamaIndex Implementation](https://towardsdatascience.com/advanced-retrieval-augmented-generation-from-theory-to-llamaindex-implementation-4de1464a9930)
* [Milvus Vector Store With Hybrid Retrieval](https://docs.llamaindex.ai/en/stable/examples/vector_stores/MilvusHybridIndexDemo/)
