This RAG implementation is based on a simple observation: users often ask vague or underspecified questions. Instead of blindly trying to answer such queries, this system first checks whether the input is clear enough to proceed. If the query is ambiguous, it passes it through an LLM which generates a clarifying follow-up prompt - something that gets sent back to the user asking exactly what they meant. Only after the system has enough context does it move forward.

If the query is already clear, it is passed into a function called reformulate_query, which makes the question more specific and direct - improving its quality for downstream retrieval. This refined version is then used to fetch relevant context from the vector store and generate a final answer using the LLM.

In this implementation, I have used a government leave rules document that I found online.

This kind of setup helps prevent common problems like hallucination or irrelevant answers, which usually happen when the model has to “guess” what the user meant. By introducing this clarification and reformulation loop, the system ensures that the prompt reaching the answering model is as precise and context-aware as possible. It leads to more structured, relevant, and accurate answers.

This is also how most advanced LLM-based systems like ChatGPT and Perplexity handle vague queries - not just to improve answer quality, but to reduce unnecessary computation spent trying to figure out what the user was actually asking.

In [19]:
!pip install -q langchain chromadb cohere pypdf groq langchain-community

In [35]:
from langchain.embeddings import CohereEmbeddings
from langchain.vectorstores import Chroma
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.chains import RetrievalQA

In [21]:
from google.colab import userdata
cohere_api_key = userdata.get('COHERE_API_KEY')
groq_api_key = userdata.get('GROQ_API_KEY')

In [22]:
loader = PyPDFLoader("/content/LeaveRulesRevised.pdf")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
chunks = splitter.split_documents(docs)

In [33]:
embeddings = CohereEmbeddings(model="embed-english-v3.0",cohere_api_key=cohere_api_key,user_agent='xyz')
texts = [doc.page_content for doc in chunks]
metadatas = [doc.metadata for doc in chunks]
db = Chroma.from_texts(texts=texts, embedding=embeddings, metadatas=metadatas, persist_directory="./leaverules_db")

retriever = db.as_retriever()

In [24]:
!pip install -q groq langchain_groq

In [25]:
from langchain_groq import ChatGroq

groq_api_key = userdata.get('GROQ_API_KEY')
llm = ChatGroq(groq_api_key=groq_api_key,
    model_name="llama3-8b-8192")

In [26]:
def is_query_ambiguous(query: str) -> bool:
    prompt = f"""You are a helpful assistant. A user asked: "{query}".
Is this query vague or ambiguous in a way that would make it hard to answer without asking them a follow-up question? Reply with only 'Yes' or 'No'."""
    res = llm.invoke(prompt)
    return "yes" in res.content.lower()

In [27]:
def ask_for_clarification(query: str) -> str:
    prompt = f"""The user query is unclear or ambiguous: '{query}'. Ask a clarifying question to get missing information."""
    return llm.invoke(prompt)

In [28]:
def get_clarification_prompt(query: str) -> str:
    prompt = f"""The user said: "{query}".
Ask a polite follow-up question to clarify their intent or provide missing details. Keep it short and direct."""
    return llm.invoke(prompt)

In [29]:
def reformulate_query(query: str) -> str:
    prompt = f"""Take the user's query: "{query}" and rewrite it to make it clear, specific, and helpful for retrieving information from a document about government leave rules."""
    return llm.invoke(prompt)

In [30]:
def active_rag_loop(user_query: str):

    if is_query_ambiguous(user_query):                                       #asking for clarification in case the prompt is unclear
        clarification = get_clarification_prompt(user_query)
        return {
            "action": "clarify_needed",
            "message": f"Clarification needed.\n{clarification}"
        }

    refined_query = reformulate_query(user_query)

    docs = retriever.invoke(refined_query)

    if docs == []:
        return {
            "action": "no_results",
            "message": " Couldn't find anything relevant. Try rephrasing or giving more context."
        }

    qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
    answer = qa_chain.run(refined_query)

    return {
        "action": "answer",
        "message": answer
    }

In [34]:
query = "tell me about leave"

response = active_rag_loop(query)

print("System:", response["message"])

System: Clarification needed.
content='"Leave" can refer to many things, such as vacation time, sick leave, or quitting a job. Can you please specify what type of "leave" you are referring to?' additional_kwargs={} response_metadata={'token_usage': {'completion_tokens': 38, 'prompt_tokens': 42, 'total_tokens': 80, 'completion_time': 0.153826892, 'prompt_time': 0.058908177, 'queue_time': 0.849473121, 'total_time': 0.212735069}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_24ec19897b', 'finish_reason': 'stop', 'logprobs': None} id='run--16de0a0c-403a-42a8-a0ab-608c8efa6a4c-0' usage_metadata={'input_tokens': 42, 'output_tokens': 38, 'total_tokens': 80}
