# Build a RAG application with Milvus Lite, Mistral and Llama-index

In this notebook, we are showing how you can build a Retrieval Augmented Generation (RAG) application to interact with data from the French Parliament. It uses Ollama with Mistral for LLM operations, Llama-index for orchestration, and Milvus for vector storage.


### Install the different dependencies 

In [None]:
!pip install pymilvus ollama llama-index-llms-ollama llama-index-vector-stores-milvus

In [None]:
!pip install llama-index-embeddings-huggingface sentencepiece

### Download the Embedding model

Given that the text is written in French, I am using an embedding model that is specially trained on French Language, feel free to use a different one you can find on HuggingFace. 

In [None]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embeddings = HuggingFaceEmbedding(model_name="dangvantuan/sentence-camembert-large")

### Prepare out data to be stored in Milvus

This code makes it possible to process text embeddings using Sentence Camembert Large & Mistral-7B and store those in Milvus.

**!!Make sure to have Ollama running on your laptop!!**

* Initialises Mistral-7B model using Ollama
* Service Context: Configures a service context with Mistral and the embedding model defined above
* Vector Store: Sets up a collection in Milvus to store text embeddings, specifying the database file, collection name, vector dimensions
* Storage Context: Configures a storage context with the Milvus vector store

This makes it possible to have efficient storage and retrieval of vector embeddings for text data.

In [None]:
from llama_index.llms.ollama import Ollama
from llama_index.vector_stores.milvus import MilvusVectorStore

from llama_index.core import StorageContext, ServiceContext

llm = Ollama(model="mistral", request_timeout=120.0)

service_context = ServiceContext.from_defaults(llm=llm, embed_model=embeddings, chunk_size=350)

vector_store = MilvusVectorStore(
    uri="milvus_mistral_rag.db",
    collection_name="mistral_french_parliament",
    dim=1024,  # the value changes with embedding model
    overwrite=True  # drop table if exist and then create
    
    )
storage_context = StorageContext.from_defaults(vector_store=vector_store)

### Process and load the Data 

In [5]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

docs = SimpleDirectoryReader(input_files=['data/french_parliament_discussion.xml']).load_data()
vector_index = VectorStoreIndex.from_documents(docs, storage_context=storage_context, service_context=service_context)

In [6]:
from llama_index.core.tools import RetrieverTool, ToolMetadata

milvus_tool_openai = RetrieverTool(
    retriever=vector_index.as_retriever(similarity_top_k=3),  # retrieve top_k results
    metadata=ToolMetadata(
        name="CustomRetriever",
        description='Retrieve relevant information from provided documents.'
    ),
)

### Finally, ask questions to our RAG system

In [7]:
query_engine = vector_index.as_query_engine()
response = query_engine.query("What did the French parliament talk about the last time?")
print(response)

 The French parliament discussed a proposed law and the ongoing health crisis management. They expressed their past efforts in debating and suggesting amendments, but faced obstacles due to partisan politics. They also welcomed the involvement of commission members Philippe Gosselin and Sacha Houlié in the discussions. Additionally, they noted the progress made in dealing with the health crisis issues more dispassionately, and looked forward to finding compromises through legislative work. Amendments related to this topic were identified for further examination.


In [8]:
query_engine = vector_index.as_query_engine()
response = query_engine.query("Peux tu me dire de quoi à parler le parliement dernièrement? Je veux la réponse en Français")
print(response.response)

 In recent parliamentary discussions, there have been debates surrounding protective measures for the population regarding travel to and from French territory. The necessity of widespread testing has been emphasized due to the increase in contaminations. To ensure accessibility to these tests regardless of income, it is proposed that they be made free and available as soon as possible.
