### Introduction

This is a self-correcting RAG pattern that checks the retrieved contexts for relevancy and the generated answers for hallucinations.\
It is loosely based on this Self-RAG [paper](https://arxiv.org/abs/2310.11511)

![flow](resource/flow.png)

The LLM used in this is llama3. The embedding model used is mxbai-embed-large (dim is 1024).\
Both are ran locally using ollama:\
a) Install ollama\
b) Pull llama3 and mxbai-embed-large (ollama pull...)

Run the agentic_rag_index notebook before this to index and persist the context docs

### Build the Execution Graph

In [5]:
from langgraph.graph import END, StateGraph
from typing_extensions import TypedDict
from typing import List
from agentic_rag_helper import Helper

class GraphState(TypedDict):
    question: str
    answer: str
    context: List[str]
    quality: str


#retriever = retrieved_index.as_retriever()
helper = Helper()
helper.load_index("index")

workflow = StateGraph(GraphState)

# Define the nodes
workflow.add_node("check_guardrails", helper.guardtail_check) 
workflow.add_node("retrieve_context", helper.retrieve_context) 
workflow.add_node("grade_documents", helper.grade_chunks) 
workflow.add_node("generate", helper.generate) 
workflow.add_node("grade_hallucination", helper.grade_hallucination) 

workflow.set_entry_point("check_guardrails")
#workflow.add_edge("check_guardrails", "retrieve_context")
workflow.add_edge("retrieve_context", "grade_documents")
workflow.add_conditional_edges(
    "check_guardrails",
    helper.guardrail_decision,
    {
        "stop": END,
        "retrieve_context": "retrieve_context",
    }
)
workflow.add_conditional_edges(
    "grade_documents",
    helper.generation_decision,
    {
        "stop": END,
        "generate": "generate",
    }
)
workflow.add_edge("generate", "grade_hallucination")
workflow.add_edge("grade_hallucination", END)

---LOADING INDEX FROM PERSISTENNT STORE---


In [6]:
app = workflow.compile()

from pprint import pprint

inputs = {"question": "Has the author been wrong before? Explain the incidence where the author was wrong"}
#inputs = {"question": "adjusting the heat using thermostats?"}
for output in app.stream(inputs):
    for key, value in output.items():
        pprint(f"Finished running: {key}")
if(len(value['context']) == 0):
    pprint("No Relevant Chunks available in the Knowledgebase")
else:
    pprint(value["answer"])

---CHECK FOR TOXICITY---
---CLASSIFICASTION is NON_TOXIC--
'Finished running: check_guardrails'
---RETRIEVE---
'Finished running: retrieve_context'
---CHECK DOCUMENT RELEVANCE TO QUESTION---
And the 3rd and the 4th are fundamentally similar in the approach. Therefore we can now reduce the categories into two to move forward — 1/symbol manipulation and; 2/sequence transduction using pattern matching. Now, it is important to understand the limitations of both these approaches to avoid rude shocks. In the first symbol manipulation approach, the trouble is the knowledge engineering bottleneck. Until the system assembles a critical mass of axioms, it would not be useful. Humanly seeding such an axiom base has folded in the past (refer to Tim Berners Lee’s earlier attempt at creating the Semantic Web). As a side story, I attempted to solve this problem using NLP to create the axioms as mentioned in the beginning of this article and could only achieve a limited success in a couple of narrow d