# RAG With llama-index  + Milvus + LLama

References
- https://docs.llamaindex.ai/en/stable/examples/vector_stores/MilvusIndexDemo/
- https://docs.llamaindex.ai/en/stable/api_reference/storage/vector_store/milvus/?h=milvusvectorstore#llama_index.vector_stores.milvus.MilvusVectorStore

## Step-1: Configuration

In [1]:
from my_config import MY_CONFIG

In [2]:
import os,sys
## Load Settings from .env file
from dotenv import find_dotenv, load_dotenv

_ = load_dotenv(find_dotenv()) # read local .env file


REPLICATE_API_TOKEN = os.environ.get("REPLICATE_API_TOKEN")

if  REPLICATE_API_TOKEN:
    print ("✅ config REPLICATE_API_TOKEN found")
else:
    raise Exception ("'❌ REPLICATE_API_TOKEN' is not set.  Please set it above to continue...")


✅ config REPLICATE_API_TOKEN found


## Step-2: Setup Embeddings

In [3]:
# If connection to https://huggingface.co/ failed, uncomment the following path
import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

In [4]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

Settings.embed_model = HuggingFaceEmbedding(
    model_name = MY_CONFIG.EMBEDDING_MODEL
)

  from .autonotebook import tqdm as notebook_tqdm


## Step-3: Connect to Milvus

In [5]:
# connect to vector db
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.milvus import MilvusVectorStore

vector_store = MilvusVectorStore(
    uri = MY_CONFIG.DB_URI ,
    dim = MY_CONFIG.EMBEDDING_LENGTH , 
    collection_name = MY_CONFIG.COLLECTION_NAME,
    overwrite=False
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

print ("✅ Connected Llama-index to Milvus instance: ", MY_CONFIG.DB_URI )

✅ Connected Llama-index to Milvus instance:  ./rag_1.db


## Step-4: Load Document Index from DB

In [6]:
%%time

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, storage_context=storage_context)

print ("✅ Loaded index from vector db:", MY_CONFIG.DB_URI , ",  collection: ", MY_CONFIG.COLLECTION_NAME)

✅ Loaded index from vector db: ./rag_1.db ,  collection:  walmart
CPU times: user 102 ms, sys: 16.8 ms, total: 119 ms
Wall time: 117 ms


## Step-5: Setup LLM

In [7]:
from llama_index.llms.replicate import Replicate
from llama_index.core import Settings

llm = Replicate(
    model= MY_CONFIG.LLM_MODEL,
    temperature=0.1
)

Settings.llm = llm

## Step-6: Query

Here are some sample queries based on what dataset you are using.  (You can set the datasets in [my_config.py](my_config.py))

**LLM Papers**

- What training data was used to train Granite models?
- What is attention mechanism

**Walmart**

- What was Walmart's revenue for 2023?
- How many distribution centers does Walmart have?

**FOMC**

- What is the target inflation rate?
- Which members voted?
- 


**And trick question**

- When was the moon landing?

In [8]:
query_engine = index.as_query_engine()
res = query_engine.query("What training data was used to train Granite models?")
# res = query_engine.query("What was Walmart's revenue for 2023?")
print(res)



According to the provided context information, Walmart's revenue for 2023 was $605,881 million.


In [9]:
query_engine = index.as_query_engine()
res = query_engine.query("What is attention mechanism")
# res = query_engine.query("How many distribution centers does Walmart have?")
print(res)



Based on the provided context information, Walmart has a total of 163 distribution facilities.


In [10]:
query_engine = index.as_query_engine()
res = query_engine.query("When was the moon landing?")
print(res)



I'm happy to help! However, I don't see any information about the moon landing in the provided context. The context appears to be a 10-K report filed by Walmart Inc. with the Securities and Exchange Commission. There is no mention of the moon landing in this report. If you're looking for information about the moon landing, I'd be happy to help you find it elsewhere!
