### Visual Representation

<img src="https://github.com/NirDiamant/RAG_Techniques/blob/main/images/reliable_rag.svg?raw=1" alt="Reliable-RAG" width="300">

# Package Installation and Imports

The cell below installs all necessary packages required to run this notebook.


In [None]:
# Install required packages
!pip install langchain langchain-community python-dotenv

In [None]:
### LLMs
import os
from dotenv import load_dotenv

# Load environment variables from '.env' file
load_dotenv()

os.environ['GROQ_API_KEY'] = os.getenv('GROQ_API_KEY') # For LLM -- llama-3.1-8b (small) & mixtral-8x7b-32768 (large)
os.environ['COHERE_API_KEY'] = os.getenv('COHERE_API_KEY') # For embedding

### Create Vectorstore

In [None]:
### Build Index
from langchain.text_splitter import RecursiveCharacterTextSplitter # Import a tool to split text into smaller, manageable chunks
from langchain_community.document_loaders import WebBaseLoader # Import a tool to load documents from web pages
from langchain_community.vectorstores import Chroma # Import Chroma, a type of database optimized for storing numerical representations (embeddings) of text
from langchain_cohere import CohereEmbeddings # Import CohereEmbeddings to convert text into numerical representations (embeddings)

# Set embeddings
embedding_model = CohereEmbeddings(model="embed-english-v3.0") # Choose Cohere's embedding model to create numerical representations of text

# Docs to index
urls = [
    "https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/?ref=dl-staging-website.ghost.io", # List of web page addresses (URLs) to get content from
    "https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-2-reflection/?ref=dl-staging-website.ghost.io",
    "https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-3-tool-use/?ref=dl-staging-website.ghost.io",
    "https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-4-planning/?ref=dl-staging-website.ghost.io",
    "https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-5-multi-agent-collaboration/?ref=dl-staging-website.ghost.io"
]

# Load
docs = [WebBaseLoader(url).load() for url in urls] # Go to each URL and load its content as documents
docs_list = [item for sublist in docs for item in sublist] # Flatten the list of documents into a single list

# Split
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder( # Set up a text splitter to break down large documents
    chunk_size=500, chunk_overlap=0 # Each chunk will be up to 500 characters, with no overlap between them
)
doc_splits = text_splitter.split_documents(docs_list) # Split the loaded documents into smaller chunks

# Add to vectorstore
vectorstore = Chroma.from_documents( # Create a Chroma vector store (a database for embeddings)
    documents=doc_splits, # Add the document chunks to the vector store
    collection_name="rag", # Give a name to this collection of documents
    embedding=embedding_model, # Use the Cohere embedding model to convert chunks into embeddings
)

retriever = vectorstore.as_retriever( # Create a retriever that can fetch relevant documents from the vector store
                search_type="similarity", # Search for documents that are most similar to a given query
                search_kwargs={'k': 4}, # Retrieve the top 4 most similar documents
            )

### Question

In [None]:
question = "what are the differnt kind of agentic design patterns?"

### Retrieve docs

In [None]:
docs = retriever.invoke(question)

### Check what our doc looklike

In [None]:
print(f"Title: {docs[0].metadata['title']}\n\nSource: {docs[0].metadata['source']}\n\nContent: {docs[0].page_content}\n")

Title: Agentic Design Patterns Part 5, Multi-Agent Collaboration

Source: https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-5-multi-agent-collaboration/?ref=dl-staging-website.ghost.io

Content: mature patterns of Reflection and Tool Use are more reliable. I hope you enjoy playing with these agentic design patterns and that they produce amazing results for you! If you're interested in learning more, I recommend: ‚ÄúCommunicative Agents for Software Development,‚Äù Qian et al. (2023) (the ChatDev paper)‚ÄúAutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation,‚Äù Wu et al. (2023) ‚ÄúMetaGPT: Meta Programming for a Multi-Agent Collaborative Framework,‚Äù Hong et al. (2023)Keep learning!AndrewRead "Agentic Design Patterns Part 1: Four AI agent strategies that improve GPT-4 and GPT-3.5 performance"Read "Agentic Design Patterns Part 2: Reflection" Read "Agentic Design Patterns Part 3: Tool Use"Read "Agentic Design Patterns Part 4: Planning" ShareSubscribe 

### Check document relevancy

In [None]:
from langchain_core.prompts import ChatPromptTemplate # Import a tool to create structured chat prompts
from pydantic import BaseModel, Field # Import tools to define data structures with validation
from langchain_groq import ChatGroq # Import the Groq chat model for generating responses

# Data model
class GradeDocuments(BaseModel): # Define a structure for how we expect the grading result to look
    """Binary score for relevance check on retrieved documents."""

    binary_score: str = Field(
        description="Documents are relevant to the question, 'yes' or 'no'" # This field will store 'yes' if a document is relevant, or 'no' if it's not
    )


# LLM with function call
llm = ChatGroq(model="llama-3.1-8b-instant", temperature=0) # Initialize a large language model (LLM) from Groq, setting its creativity to a minimum (temperature=0)
structured_llm_grader = llm.with_structured_output(GradeDocuments) # Configure the LLM to output its answers in the 'GradeDocuments' structure we defined

# Prompt
system = """You are a grader assessing relevance of a retrieved document to a user question. \n
    If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n
    It does not need to be a stringent test. The goal is to filter out erroneous retrievals. \n
    Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question.""" # Define the instructions for the LLM on how to grade document relevance
grade_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system), # Add the system instructions to the prompt
        ("human", "Retrieved document: \n\n {document} \n\n User question: {question}"), # Define the user's part of the prompt, including placeholders for the document and question
    ]
)

retrieval_grader = grade_prompt | structured_llm_grader # Combine the prompt and the structured LLM to create our document grading chain. | is langchain's pipe operator.
#Here, the output of grade_prompt is fed as input into structured_llm_grader, creating a sequential chain of operations.

### Filter out the non-relevant docs

In [None]:
docs_to_use = []
for doc in docs:
    print(doc.page_content, '\n', '-'*50)
    res = retrieval_grader.invoke({"question": question, "document": doc.page_content})
    print(res,'\n')
    if res.binary_score == 'yes':
        docs_to_use.append(doc)

mature patterns of Reflection and Tool Use are more reliable. I hope you enjoy playing with these agentic design patterns and that they produce amazing results for you! If you're interested in learning more, I recommend: ‚ÄúCommunicative Agents for Software Development,‚Äù Qian et al. (2023) (the ChatDev paper)‚ÄúAutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation,‚Äù Wu et al. (2023) ‚ÄúMetaGPT: Meta Programming for a Multi-Agent Collaborative Framework,‚Äù Hong et al. (2023)Keep learning!AndrewRead "Agentic Design Patterns Part 1: Four AI agent strategies that improve GPT-4 and GPT-3.5 performance"Read "Agentic Design Patterns Part 2: Reflection" Read "Agentic Design Patterns Part 3: Tool Use"Read "Agentic Design Patterns Part 4: Planning" ShareSubscribe to The BatchStay updated with weekly AI News and Insights delivered to your inboxCoursesThe BatchCommunityCareersAbout 
 --------------------------------------------------
binary_score='yes' 

I recommend: ‚ÄúGor

### Generate Result

In [None]:
from langchain_core.output_parsers import StrOutputParser # Import a tool to convert the LLM's output into a simple string

# Prompt
system = """You are an assistant for question-answering tasks. Answer the question based upon your knowledge.
Use three-to-five sentences maximum and keep the answer concise.""" # Define the instructions for the AI assistant
prompt = ChatPromptTemplate.from_messages( # Create a chat prompt using the system instructions and a human query
    [
        ("system", system), # Add the system instructions to the prompt
        ("human", "Retrieved documents: \n\n <docs>{documents}</docs> \n\n User question: <question>{question}</question>"), # Define the user's part of the prompt, including placeholders for documents and question
    ]
)

# LLM
llm = ChatGroq(model="llama-3.1-8b-instant", temperature=0) # Initialize a large language model (LLM) from Groq, setting its creativity to a minimum

# Post-processing
def format_docs(docs): # Define a function to format the retrieved documents nicely
    return "\n".join(f"<doc{i+1}>:\nTitle:{doc.metadata['title']}\nSource:{doc.metadata['source']}\nContent:{doc.page_content}\n</doc{i+1}>\n" for i, doc in enumerate(docs)) # Loop through documents and format each with its title, source, and content

# Chain
rag_chain = prompt | llm | StrOutputParser() # Create a sequence: prompt -> LLM -> string parser to generate the answer

# Run
generation = rag_chain.invoke({"documents":format_docs(docs_to_use), "question": question}) # Run the chain with formatted documents and the user's question to get the answer
print(generation) # Display the generated answer

According to the retrieved documents, there are four main agentic design patterns:

1. **Reflection**: The LLM examines its own work to come up with ways to improve it.
2. **Tool Use**: The LLM is given tools such as web search, code execution, or any other function to help it gather information, take action, or process data.
3. **Planning**: The LLM comes up with, and executes, a multistep plan to achieve a goal.
4. **Multi-agent collaboration**: More than one AI agent work together, splitting up tasks and discussing and debating ideas, to come up with better solutions than a single agent would.


### Check for Hallucinations

In [None]:
# Data model
class GradeHallucinations(BaseModel):
    """Binary score for hallucination present in 'generation' answer."""

    binary_score: str = Field(
        ...,
        description="Answer is grounded in the facts, 'yes' or 'no'"
    )

# LLM with function call
llm = ChatGroq(model="llama-3.1-8b-instant", temperature=0)
structured_llm_grader = llm.with_structured_output(GradeHallucinations)

# Prompt
system = """You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. \n
    Give a binary score 'yes' or 'no'. 'Yes' means that the answer is grounded in / supported by the set of facts."""
hallucination_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "Set of facts: \n\n <facts>{documents}</facts> \n\n LLM generation: <generation>{generation}</generation>"),
    ]
)

hallucination_grader = hallucination_prompt | structured_llm_grader

response = hallucination_grader.invoke({"documents": format_docs(docs_to_use), "generation": generation})
print(response)

binary_score='yes'


### Highlight used docs

In [None]:
from typing import List
from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate

# Data model
class HighlightDocuments(BaseModel):
    """Return the specific part of a document used for answering the question."""

    id: List[str] = Field(
        ...,
        description="List of id of docs used to answers the question"
    )

    title: List[str] = Field(
        ...,
        description="List of titles used to answers the question"
    )

    source: List[str] = Field(
        ...,
        description="List of sources used to answers the question"
    )

    segment: List[str] = Field(
        ...,
        description="List of direct segements from used documents that answers the question"
    )

# LLM
llm = ChatGroq(model="mixtral-8x7b-32768", temperature=0)

# parser
parser = PydanticOutputParser(pydantic_object=HighlightDocuments)

# Prompt
system = """You are an advanced assistant for document search and retrieval. You are provided with the following:
1. A question.
2. A generated answer based on the question.
3. A set of documents that were referenced in generating the answer.

Your task is to identify and extract the exact inline segments from the provided documents that directly correspond to the content used to
generate the given answer. The extracted segments must be verbatim snippets from the documents, ensuring a word-for-word match with the text
in the provided documents.

Ensure that:
- (Important) Each segment is an exact match to a part of the document and is fully contained within the document text.
- The relevance of each segment to the generated answer is clear and directly supports the answer provided.
- (Important) If you didn't used the specific document don't mention it.

Used documents: <docs>{documents}</docs> \n\n User question: <question>{question}</question> \n\n Generated answer: <answer>{generation}</answer>

<format_instruction>
{format_instructions}
</format_instruction>
"""


prompt = PromptTemplate(
    template= system,
    input_variables=["documents", "question", "generation"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# Chain
doc_lookup = prompt | llm | parser

# Run
lookup_response = doc_lookup.invoke({"documents":format_docs(docs_to_use), "question": question, "generation": generation})

In [None]:
for id, title, source, segment in zip(lookup_response.id, lookup_response.title, lookup_response.source, lookup_response.segment):
    print(f"ID: {id}\nTitle: {title}\nSource: {source}\nText Segment: {segment}\n")

ID: doc3
Title: Four AI Agent Strategies That Improve GPT-4 and GPT-3.5 Performance
Source: https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/?ref=dl-staging-website.ghost.io
Text Segment: Reflection: The LLM examines its own work to come up with ways to improve it.\nTool Use: The LLM is given tools such as web search, code execution, or any other function to help it gather information, take action, or process data.\nPlanning: The LLM comes up with, and executes, a multistep plan to achieve a goal (for example, writing an outline for an essay, then doing online research, then writing a draft, and so on).\nMulti-agent collaboration: More than one AI agent work together, splitting up tasks and discussing and debating ideas, to come up with better solutions than a single agent would.



### Text segment in the source

![image.png](attachment:image.png)

![](https://europe-west1-rag-techniques-views-tracker.cloudfunctions.net/rag-techniques-tracker?notebook=all-rag-techniques--reliable-rag)