Assumptions :
1. Local db object from milus exists; collection name - rag_collection
2. the rag_collection has the embeddgings stored
3. sentence transfoemer model is present in the local model path (for creating the query embeddigns)
4. Gemini-1.0-pro model is used as a LLM with API call

In [None]:
import os
from pymilvus import MilvusClient 
from sentence_transformers import SentenceTransformer
import google.generativeai as genai
from IPython.display import Markdown

In [None]:
import os
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.getenv("GEMINI_API_KEY") # Make sure the API key is there in the environment variables
print(API_KEY)

Load the transformer models from local path to generate the embeddings and google gemini models LLM


In [None]:
# Defining the local model path and name 

model_name = "all-mpnet-base-v2"
modelPath = f"../model/{model_name}"  
# Load the SentenceTransformer embedding model fromthe local path
embedding_model = SentenceTransformer(modelPath)

# Loading the gemini model

genai.configure(api_key=API_KEY)  #Configuring the genai object with the API key
chat_model = genai.GenerativeModel('gemini-1.0-pro')

Define the milvus client object and list the collection already present

In [None]:
RAGchatclient = MilvusClient("../DB/RAG.db") # ensure the local milvus db object is present in the path

# list the collection
res = RAGchatclient.list_collections() 
print(res)

Defining a function that will take the user query as input and return the top 2 citations and content for futher processing with LLM

In [None]:
def query_index(question):
    """
    The uesr question will be passed in tthis function and it will return the top 2 citations and
    the 1st chunk retrieved from the index.
    """
    q = [question]
    query_vector = embedding_model.encode(q).tolist()

    search_result = RAGchatclient.search(
    collection_name="rag_collection",  # target collection
    data=query_vector,  # query vector from the user asked question
    limit=3,  # returning the top 3 entities
    output_fields=["Title", "Content"],  # specifies fields to be returned
    )

    cit1 = search_result[0][0]["entity"]["Title"]
    cit2 = search_result[0][1]["entity"]["Title"]
    chunk1 = search_result[0][0]["entity"]["Content"]

    return chunk1, cit1, cit2


Defining a function which will take the returned chunk from Vector DB Index and call the LLM to tailor the answer to the question

In [None]:
def chat_llm(user_question):
    returned_chunk, citation1, citation2 = query_index(user_question)

    passage = returned_chunk
    query = user_question
    
    prompt = (f"""Behave like a teacher who answers questions using text from the passage included below. \
          Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. \
          Try to give short answers for direct questions.\
          If the passage is irrelevant to the answer, you may ignore it.
          QUESTION: '{query}'
          PASSAGE: '{passage}'
          ANSWER:
        """)
    
    response = chat_model.generate_content(prompt)
    
    return response , citation1, citation2

Interact

In [None]:
print("How can I assist : ")
user_question = input()
print(user_question)
answer, citation1 , citation2 = chat_llm(user_question)

print(f"Ref1: {citation1} \nRef2: {citation2}")
Markdown(answer.text) # answer is displayed as a markdown object