# 📄 Self-RAG Notebook Documentation
This notebook sets up a **Self-RAG (Self-Reflective Retrieval-Augmented Generation)** pipeline using `llama-index`.

## Structure
1. **Language Model & Embedding Setup** - Initializes the LLM and embedding model.
2. **Data Loading** - Loads documents for indexing.
3. *(Later Sections Expected)* - Retrieval decisions, query engine setup, agent workflows.

---

## Setup Language models

In [2]:
# Load environment variables from a .env file
from dotenv import load_dotenv

# Call to load environment variables into the environment
load_dotenv()

True

In [3]:
from llama_index.llms.groq import Groq
# Initialize a Groq LLM (LLaMA3 70B) for natural language processing
llm = Groq(model="llama3-70b-8192")
# Load a local embedding model (nomic-embed-text) using Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
# Load a local embedding model (nomic-embed-text) using Ollama
embed_model = OllamaEmbedding(
    model_name="nomic-embed-text:latest",
)

  from .autonotebook import tqdm as notebook_tqdm


## Load, Chunk and Embed Data

In [4]:
# Read and load documents from the local './data' directory
from llama_index.core import SimpleDirectoryReader

# Read and load documents from the local './data' directory
documents = SimpleDirectoryReader("./data").load_data()

In [5]:
from llama_index.core.node_parser import SentenceSplitter
parser = SentenceSplitter(chunk_size=256, chunk_overlap=16)
nodes = parser.get_nodes_from_documents(documents)
len(nodes)

11

In [46]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)
# index.storage_context.persist(persist_dir="./storage")

## Create Workflow

In [14]:
from llama_index.core.workflow import (
    StartEvent,
    StopEvent,
    Workflow,
    step,
    Event,
)
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.core.retrievers import VectorIndexRetriever
from typing import List
from llama_index.core.schema import TextNode

In [24]:
class NoRetrieval(Event):
    query: str

class RetrieveEvent(Event):
    query: str

class RelevanceEval(Event):
    query: str
    retrieved_nodes: List[TextNode]

class WithRetrieval(Event):
    query: str
    relevant_context: List[str]

In [44]:
class sRAG(Workflow):
    @step
    async def decide_retrieval(self, ev: StartEvent) -> RetrieveEvent | NoRetrieval:
        query = ev.query

        prompt = f"Given the query: '{query}', determine if retrieval is necessary. Output only 'Yes' or 'No'."
        response = await llm.acomplete(prompt)
        if str(response)=="Yes":
            return RetrieveEvent(query=query)
        else:
            return NoRetrieval(query=query)
    @step
    async def retrieve(self, ev: RetrieveEvent) -> RelevanceEval:
        query = ev.query
        # storage_context = StorageContext.from_defaults(persist_dir="./storage", embed_model=embed_model)
        # index = load_index_from_storage(storage_context)
        retriever = VectorIndexRetriever(index=index, similarity_top_k=3)
        retrieved_nodes = retriever.retrieve(query)
        text_nodes = [n.node for n in retrieved_nodes] #try sending with scores as well and make llm leverage them evaluate relevange
        return RelevanceEval(query=query, retrieved_nodes=text_nodes)
    @step
    async def eval_relevance(self, ev: RelevanceEval) -> WithRetrieval | NoRetrieval:
        retrieved_nodes = ev.retrieved_nodes
        query = ev.query
        relevant_context = []
        for node in retrieved_nodes:
            context = node.get_content()
            prompt = f"Given the query: '{query}' and the context: '{context}', determine if the context is relevant. Output only 'Relevant' or 'Irrelevant'."
            response = await llm.acomplete(prompt)
            if str(response)=="Relevant":
                relevant_context.append(context)
        if not relevant_context:
            return NoRetrieval(query=query)
        else:
            return WithRetrieval(query=query, relevant_context=relevant_context)
    @step
    async def generate_with_context(self, ev: WithRetrieval) -> StopEvent:
        query= ev.query
        relevant_context = ev.relevant_context
        context = "\n".join(f"- {doc}" for doc in relevant_context)
        # this has a flaw, joining contexts may lead to redundant data and possible chunk overlaps
        # instead of joining contexts, we can try generating response over individual contexts 
        prompt = f"Given the query '{query}' and the context '{context}', generate a response."
        response = await llm.acomplete(prompt)
        # print(str(response))
        return StopEvent(result = str(response))
    @step
    async def generate_without_context(self, ev: NoRetrieval) -> StopEvent:
        query= ev.query
        prompt = f"Given the query '{query}', generate a response."
        response = await llm.acomplete(prompt)
        # print(str(response))
        return StopEvent(result = str(response))

In [42]:
from llama_index.utils.workflow import (
    draw_all_possible_flows,
    draw_most_recent_execution,
)
from IPython.display import IFrame


# Draw all
draw_all_possible_flows(sRAG, filename="srag.html")
IFrame("srag.html", width=800, height=400)

srag.html


## Running the Workflow

Example where **No Retrieval** needed

In [30]:
w = sRAG(timeout=120, verbose = True)
result = await w.run(query="Write a joke")
print(str(result))

Running step decide_retrieval
Step decide_retrieval produced event NoRetrieval
Running step generate_without_context
Step generate_without_context produced event StopEvent
Here's a joke for you:

Why couldn't the bicycle stand up by itself?

Because it was two-tired!

Hope that made you laugh!


Example where **Retrieval** needed

In [45]:
w = sRAG(timeout=120, verbose = True)
result = await w.run(query="Review for EE200A course")
print(str(result))

Running step decide_retrieval
Step decide_retrieval produced event RetrieveEvent
Running step retrieve
Step retrieve produced event RelevanceEval
Running step eval_relevance
Step eval_relevance produced event WithRetrieval
Running step generate_with_context
Step generate_with_context produced event StopEvent
Here is a response to the query "Review for EE200A course":

The EE200A course, also known as Signals, Systems & Networks, is a fundamental course in the Electrical Engineering (EE) curriculum. According to the review, this course is the first true EE course that students will encounter on campus, and it deals with signals and their representations, systems, Fourier representations, and networks.

The instructor for this course is reportedly one of the best on campus, which is a positive aspect. However, the course has some drawbacks. Notes are not provided, and students have to write notes in class, which can be challenging. Additionally, the assignments are lengthy, and the quest

## ✅ Summary
The notebook sets up the foundation for a Self-RAG pipeline. Additional cells would likely include:
- Setting up decision logic for whether to retrieve or not
- Query execution with and without retrieval
- Multi-agent workflows

Make sure your `.env` file and model dependencies (Groq API, Ollama) are properly configured before execution.