# Notebook 4 · Self-Reflective RAG

In this notebook the agent critiques its own answer. After generating an initial response, we use a critique prompt to decide whether to re-query the knowledge base for better evidence. This mirrors evaluation loops from frameworks like RAGAS.

In [None]:
from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI as LangChainChatOpenAI

from pprint import pprint

from shared import (
    DEFAULT_MODEL,
    RetrievalContext,
    build_baseline_chain,
    build_retrieval_context,
    pretty_print_json,
    time_execution,
)


In [None]:
context = build_retrieval_context(top_k=4)
qa_chain = build_baseline_chain(context.retriever)
critic = LangChainChatOpenAI(model=DEFAULT_MODEL, temperature=0.0)


In [None]:
critique_prompt = ChatPromptTemplate.from_template(
    'Answer:\n{answer}\n\nEvidence:\n{evidence}\n\nDoes the answer need another retrieval pass? Reply with YES or NO and explain.'
)


In [None]:
def ask_with_reflection(question: str, max_retries: int = 2) -> str:
    evidence_docs = context.retriever.get_relevant_documents(question)
    evidence_text = '\n\n'.join(doc.page_content for doc in evidence_docs)
    answer = qa_chain.run(question)

    for _ in range(max_retries):
        critique = critic(critique_prompt.format_messages(answer=answer, evidence=evidence_text))
        if 'YES' not in critique.content.upper():
            return answer + '\n\nCritique: ' + critique.content

        evidence_docs = context.retriever.get_relevant_documents(question)
        evidence_text = '\n\n'.join(doc.page_content for doc in evidence_docs)
        answer = qa_chain.run(question)

    return answer + '\n\nCritique after retries: ' + critique.content


In [None]:
question = 'What happens if my workspace exceeds the seat limit mid-cycle?'
print(ask_with_reflection(question))


## Extension ideas

* Swap the critic for `gpt-4.1` (non-mini) to see if judgments improve.
* Compute a RAGAS faithfulness score after each iteration to guide retries.
* Cache retrieval results to avoid redundant FAISS lookups.