### Introduction

This is a self-correcting RAG pattern that checks the retrieved contexts for relevancy and the generated answers for hallucinations.\
It is loosely based on this Self-RAG [paper](https://arxiv.org/abs/2310.11511)
<img title="flow"  src="resource/flow.png">

The LLM used in this is llama3:8b. The embedding model used is mxbai-embed-large (dim is 1024).\
Both are ran locally using ollama:\
a) Install ollama\
b) Pull llama3 and mxbai-embed-large (ollama pull...)

Run the agentic_rag_index notebook before this to index and persist the context docs

### Build the Execution Graph

In [1]:
from langgraph.graph import END, StateGraph
from typing_extensions import TypedDict
from typing import List
from agentic_rag_helper import Helper

class GraphState(TypedDict):
    question: str
    answer: str
    context: List[str]
    quality: str


#retriever = retrieved_index.as_retriever()
helper = Helper()
helper.load_index("index")

workflow = StateGraph(GraphState)

# Define the nodes
workflow.add_node("check_guardrails", helper.guardtail_check) 
workflow.add_node("retrieve_context", helper.retrieve_context) 
workflow.add_node("grade_documents", helper.grade_chunks) 
workflow.add_node("generate", helper.generate) 
workflow.add_node("grade_hallucination", helper.grade_hallucination) 

workflow.set_entry_point("check_guardrails")
#workflow.add_edge("check_guardrails", "retrieve_context")
workflow.add_edge("retrieve_context", "grade_documents")
workflow.add_conditional_edges(
    "check_guardrails",
    helper.guardrail_decision,
    {
        "stop": END,
        "retrieve_context": "retrieve_context",
    }
)
workflow.add_conditional_edges(
    "grade_documents",
    helper.generation_decision,
    {
        "stop": END,
        "generate": "generate",
    }
)
workflow.add_edge("generate", "grade_hallucination")
workflow.add_edge("grade_hallucination", END)



---LOADING INDEX FROM PERSISTENNT STORE---
LLM is explicitly disabled. Using MockLLM.


In [2]:
app = workflow.compile()

from pprint import pprint

inputs = {"question": "What is the author's current job scope?"}
#inputs = {"question": "adjusting the heat using thermostats?"}
for output in app.stream(inputs):
    for key, value in output.items():
        pprint(f"Finished running: {key}")
if(len(value['context']) == 0):
    pprint("No Relevant Chunks available in the Knowledgebase")
else:
    pprint(value["answer"])

---CHECK FOR TOXICITY---
---CLASSIFICASTION is NON_TOXIC--
'Finished running: check_guardrails'
---RETRIEVE---
'Finished running: retrieve_context'
---CHECK DOCUMENT RELEVANCE TO QUESTION---
Before that, I want to dive into making this article a little more contextual for developers. My current job scope is to augment developer productivity and in that scope, code generation using GenAI is an important weapon in any developers’ armoury. And we developers need to know when and where to apply this technology safely!

Back to the 2 challenges of hallucination and lacking causality, the first problem is easy in the development domain. The generated code from the models can be easily fact checked — one just have to execute them. That is not hard. Most boiler plate codes have been working well in my tests. The hallucination flaws start to appear when you prompt for the not-so-common patterns (ex. generate a neural net algorithm for x inputs, y hidden nodes, z outputs using a certain activati