# RAG with IlamaIndex + Milvus + Llama @ Replicate

References
- https://docs.llamaindex.ai/en/stable/examples/vector_stores/MilvusIndexDemo/
- https://docs.llamaindex.ai/en/stable/api_reference/storage/vector_store/milvus/?h=milvusvectorstore#llama_index.vector_stores.milvus.MilvusVectorStore

## Configuration

In [1]:
class MyConfig:
    pass

MY_CONFIG = MyConfig()

MY_CONFIG.EMBEDDING_MODEL = "BAAI/bge-small-en-v1.5"
MY_CONFIG.EMBEDDING_LENGTH = 384

MY_CONFIG.INPUT_DATA_DIR = 'data/granite-docs/input'

MY_CONFIG.DB_URI = './rag_demo.db'
MY_CONFIG.COLLECTION_NAME = 'llamaindex_granite_docs'

MY_CONFIG.LLM_MODEL = "meta/meta-llama-3-8b-instruct"

In [2]:
from llama_index.core import SimpleDirectoryReader
import pprint

# load documents
documents = SimpleDirectoryReader(
    # input_dir = './data/10k/input/'
    input_dir = MY_CONFIG.INPUT_DATA_DIR
).load_data()

print (f"Loaded {len(documents)} chunks")

print("Document [0].doc_id:", documents[0].doc_id)
# pprint.pprint (documents[0], indent=4)

Loaded 20 chunks
Document [0].doc_id: 5fa23387-db9e-48f1-87b7-128299170f14


In [3]:
# If connection to https://huggingface.co/ failed, uncomment the following path
import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

In [4]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

Settings.embed_model = HuggingFaceEmbedding(
    model_name = MY_CONFIG.EMBEDDING_MODEL
)

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
from pymilvus import MilvusClient

client = MilvusClient(MY_CONFIG.DB_URI)

# if we already have a collection, clear it first
if client.has_collection(collection_name = MY_CONFIG.COLLECTION_NAME):
    client.drop_collection(collection_name = MY_CONFIG.COLLECTION_NAME)

In [6]:
# connect to vector db
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.milvus import MilvusVectorStore


vector_store = MilvusVectorStore(
    uri = MY_CONFIG.DB_URI ,
    dim = MY_CONFIG.EMBEDDING_LENGTH , 
    collection_name = MY_CONFIG.COLLECTION_NAME,
    overwrite=True
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

print ("✅ Connected to Milvus instance: ", MY_CONFIG.DB_URI )

✅ Connected to Milvus instance:  ./rag_demo.db


In [7]:
%%time

# create an index

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

CPU times: user 40.9 s, sys: 2.16 s, total: 43.1 s
Wall time: 4.61 s


In [8]:
%%time

# create an index

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

CPU times: user 40.8 s, sys: 2.17 s, total: 42.9 s
Wall time: 4.23 s


In [9]:
# See data in vector db

from pymilvus import MilvusClient
import pprint 

client = MilvusClient(MY_CONFIG.DB_URI)
res = client.list_collections()

print(res)
print ('---------')

res = client.describe_collection(
    collection_name = MY_CONFIG.COLLECTION_NAME
)

pprint.pprint(res)


['docs', 'llamaindex_granite_docs']
---------
{'aliases': [],
 'auto_id': False,
 'collection_id': 0,
 'collection_name': 'llamaindex_granite_docs',
 'consistency_level': 0,
 'description': '',
 'enable_dynamic_field': True,
 'fields': [{'description': '',
             'field_id': 100,
             'is_primary': True,
             'name': 'id',
             'params': {'max_length': 65535},
             'type': <DataType.VARCHAR: 21>},
            {'description': '',
             'field_id': 101,
             'name': 'embedding',
             'params': {'dim': 384},
             'type': <DataType.FLOAT_VECTOR: 101>}],
 'num_partitions': 0,
 'num_shards': 0,
 'properties': {}}


In [10]:
import os
## Load Settings from .env file
from dotenv import find_dotenv, dotenv_values

# _ = load_dotenv(find_dotenv()) # read local .env file
config = dotenv_values(find_dotenv())

os.environ["REPLICATE_API_TOKEN"] = config.get('REPLICATE_API_TOKEN')

In [11]:
from llama_index.llms.replicate import Replicate
from llama_index.core import Settings

llm = Replicate(
    model= MY_CONFIG.LLM_MODEL,
    temperature=0.1
)

Settings.llm = llm

In [12]:
query_engine = index.as_query_engine()
res = query_engine.query("Summarize this document for me in one paragraph")
print(res)



Based on the provided context information, I will summarize the document for you in one paragraph. The document appears to be a report or paper on the Granite Foundation Models, which are a set of pre-trained language models. The report provides updates on the models, including changes to the evaluation results, new documentation, and updates to the language used. The report also includes information on the models' performance on various tasks, such as summarization, question answering, and language translation. Additionally, the report mentions the inclusion of new datasets and the use of red-teaming benchmarks to evaluate the models' safety. Overall, the report provides an update on the Granite Foundation Models and their capabilities.


In [13]:
query_engine = index.as_query_engine()
res = query_engine.query("What was the training dataset?")
print(res)



Based on the provided context, the training dataset for the Granite models is a combination of 14 datasets, which are:

1. arXiv
2. Common Crawl
3. DeepMind Mathematics
4. Free Law
5. GitHub Clean
6. Hacker News
7. OpenWeb Text
8. Project Gutenberg (PG-19)
9. Pubmed Central
10. SEC Filings
11. Stack Exchange
12. USPTO
13. Webhose
14. Wikimedia

Additionally, the second version of the base model, granite.13b.v2, continued pre-training on an additional 1.5T newly-curated tokens, which included a mixture of the same 14 datasets from granite.13b.v1, along with 6 new datasets.


In [14]:
query_engine = index.as_query_engine()
res = query_engine.query("When was the moon landing?")
print(res)



I'm happy to help! However, I don't see any information about the moon landing in the provided context. The text appears to be about AI models, training, and infrastructure, but it doesn't mention the moon landing. If you could provide more context or clarify the question, I'd be happy to try and assist you further!
