# RAG With llama-index  + Milvus + LLama

References
- https://docs.llamaindex.ai/en/stable/examples/vector_stores/MilvusIndexDemo/
- https://docs.llamaindex.ai/en/stable/api_reference/storage/vector_store/milvus/?h=milvusvectorstore#llama_index.vector_stores.milvus.MilvusVectorStore

## Step-1: Configuration

In [1]:
from my_config import MY_CONFIG

MY_CONFIG.DB_URI = './rag_2_llamaindex.db'
MY_CONFIG.COLLECTION_NAME = 'llamaindex_papers'

## Step-2: Setup Embeddings

In [2]:
# If connection to https://huggingface.co/ failed, uncomment the following path
import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

In [3]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

Settings.embed_model = HuggingFaceEmbedding(
    model_name = MY_CONFIG.EMBEDDING_MODEL
)

  from .autonotebook import tqdm as notebook_tqdm


## Step-3: Connect to Milvus

In [4]:
# connect to vector db
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.milvus import MilvusVectorStore

vector_store = MilvusVectorStore(
    uri = MY_CONFIG.DB_URI ,
    dim = MY_CONFIG.EMBEDDING_LENGTH , 
    collection_name = MY_CONFIG.COLLECTION_NAME,
    overwrite=False  # so we load the index from db
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

print ("✅ Connected Llama-index to Milvus instance: ", MY_CONFIG.DB_URI )

  from pkg_resources import DistributionNotFound, get_distribution


✅ Connected Llama-index to Milvus instance:  ./rag_2_llamaindex.db


## Step-4: Load Document Index from DB

In [5]:
%%time

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, storage_context=storage_context)

print ("✅ Loaded index from vector db:", MY_CONFIG.DB_URI )

✅ Loaded index from vector db: ./rag_2_llamaindex.db
CPU times: user 108 ms, sys: 23.2 ms, total: 131 ms
Wall time: 128 ms


## Step-5: Setup LLM

In [6]:
from llama_index.llms.litellm import LiteLLM

# Setup LLM
print (f"✅ Using LLM model : {MY_CONFIG.LLM_MODEL}")
Settings.llm = LiteLLM (
        model=MY_CONFIG.LLM_MODEL,
    )

✅ Using LLM model : nebius/Qwen/Qwen3-30B-A3B-Instruct-2507


## Step-6: Query

In [7]:
query_engine = index.as_query_engine()
res = query_engine.query("What was the training data used to train Granite models?")
print(res)

The training data used to train the Granite models consists of 3.5T to 4.5T tokens of code data and natural language datasets related to code. This includes high-quality data from various domains such as technical, mathematical, and web documents. The data is tokenized using byte pair encoding (BPE) with the same tokenizer as StarCoder. The training process involves two phases: Phase 1 focuses on code-only training using 116 languages for the 3B and 8B models (4 trillion tokens), 3 trillion tokens for the 20B model, and 1.4 trillion tokens after depth upscaling for the 34B model. Phase 2 incorporates additional high-quality publicly available data, with 80% code and 20% language data, trained for 500 billion tokens to enhance reasoning and problem-solving skills.


In [8]:
query_engine = index.as_query_engine()
res = query_engine.query("What is attention mechanism?")
print(res)

The attention mechanism is a process that allows a model to focus on specific parts of input data when making predictions, particularly useful for capturing long-distance dependencies in sequences. In the context of the provided material, it is illustrated through visualizations in the encoder self-attention layer, where different attention heads attend to relevant words across the input sequence—such as linking the verb "making" to its distant object—to complete phrases or understand sentence structure. This enables the model to effectively process and relate words that are far apart in the text.


In [9]:
query_engine = index.as_query_engine()
res = query_engine.query("When was the moon landing?")
print(res)

The provided context does not contain information about the moon landing.
