<a href="https://colab.research.google.com/github/ramahasiba/NLP/blob/langGraph/Build_a_Retrieval_Augmented_Generation_App.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [Build a Retrieval Augmented Generation (RAG) App: Part 1](https://python.langchain.com/docs/tutorials/rag/)
In this tutorial, we design an aplication that can answer questions about speific source information. These application use a technique known as Retrieval Augmented Generation(RAG).

Below we build a simple Q&A application over a text data source.

## Setup

In [None]:
%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph -q

In [None]:
!pip install dotenv -q

In [None]:
!pip install dotenv -q
from dotenv import load_dotenv
try:
  load_dotenv('.env')
except ImportError:
  print('No .env file found')

### LangSmith

In [None]:
import getpass
import os

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = os.environ.get("LANGSMITH_API_KEY")

In [None]:
!pip install -qU "langchain[groq]"

### LLM - Groq

In [None]:
os.environ["GROQ_API_KEY"]=os.environ.get("GROQ_API_KEY")

model_name = "llama3-70b-8192"

from langchain.chat_models import init_chat_model
llm=init_chat_model(model_name, model_provider="groq")

### Embedding from HuggungFace

In [None]:
!pip install -qU langchain-huggingface

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

### Vector Database - Chroma DB

In [None]:
!pip install -qU langchain-chroma

In [None]:
from langchain_chroma import Chroma

vector_store = Chroma(
    collection_name="RAG",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",
    # client_settings=settings
)

This app will answer questions about a website's content. The specific website we used is the [LLM Powered Autonomous Agetns](https://lilianweng.github.io/posts/2023-06-23-agent/) blog post. which allow us to ask questions about the contents of the post.

In [None]:
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Index chunks
_ =  vector_store.add_documents(documents=all_splits)

# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")

# Define state for application
class State(TypedDict):
  question: str
  context: List[Document]
  answer: str

# Define application steps
def retrieve(state: State):
  retrieved_docs = vector_store.similarity_search(state["question"])
  return {"context": retrieved_docs}

def generate(state: State):
  docs_content = "\n\n".join(doc.page_content for doc in state["context"])
  messages = prompt.invoke({"question": state["question"], "context": docs_content})
  response = llm.invoke(messages)
  return {"answer": response.content}

# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

In [None]:
response = graph.invoke({"question": "What is Task Decomposition?"})
print(response["answer"])

## Indexing

### Loading Documents

In [None]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load() # this returns a list of Document objects

assert len(docs) == 1
print(f"Total characters: {len(docs[0].page_content)}")

In [None]:
print(docs[0].page_content[:500])

### Splitting Documents
The loaded document is over 42K, which is too long to fit into the context window of may models. Even if model that could fit the full post in thier context window, models caan struggle to find information in very long inputs.

In [None]:
# RecursiveCharacterTextSplitter is recommended for generic text use cases.
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    add_start_index=True,
)

all_splits = text_splitter.split_documents(docs)

print(f"Split blog post into {len(all_splits)} sub-documents.")

### Storing Documents

Here we index the text chunks so that we can search over them at runtime. we embed the splits and then insert hose embeddings into a vector store.

In [None]:
document_ids = vector_store.add_documents(documents=all_splits)

print(document_ids[:4])

## Retrieval and Generation
Here we used LangGraph to tie together the retrieval and generation steps into a single application.

In [None]:
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")

example_messages = prompt.invoke({
    "context": "(context goes here)",
    "question": "(question goes here)"
}).to_messages()

assert len(example_messages) == 1
print(example_messages[0].content)

## Define the State
The state of our application controls what data is input to the application, transferred between steps, and output by the application. For this simple RAG application we keep track of the following:
* input question
* retrieved context
* generated answer

In [None]:
from langchain_core.documents import Document
from typing_extensions import List, TypedDict

class State(TypedDict):
  question: str
  context: List[Document]
  answer: str

## Nodes (application steps)
We start with a simple sequence of two steps:
* Retrieval
* Generation

In [None]:
def retrieve( state: State):
  # run a similarity search using the input question
  retrieved_docs = vector_store.similarity_search(state["question"])
  return {"context": retrieved_docs}

def generate(state: State):
  docs_content = "\n\n".join(doc.page_content for doc in state["context"])
  # format the retrieved context and original question into a prompt for the chat model
  messages = prompt.invoke({"question": state["question"], "context": docs_content})
  response = llm.invoke(messages)
  return {"answer": response.content}

## Control Flow
compile the application into a single graph object. Here we just connect the retrieval and generation steps into a single sequence.

In [None]:
from langgraph.graph import START, StateGraph

graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

In [None]:
graph

## Graph Usage - Testing

### Invoke:

In [None]:
# Invoke
result = graph.invoke({"question": "What is task decomposition?"})

print(f"Context: {result['context']}")
print(f"Answer: {result['answer']}")

In [None]:
# Invoke
result = graph.invoke({"question": "tell me about task decomposition?"})

print(f"Context: {result['context']}")
print(f"Answer: {result['answer']}")

### Streaming Tokens:

In [None]:
for step in graph.stream(
    {"question": "What is Task Decomposition?"}, stream_mode="updates"
):
    print(f"{step}\n\n----------------\n")

In [None]:
for message, metadata in graph.stream(
    {"question": "What is Task Decomposition?"}, stream_mode="messages"
):
    print(message.content, end="|")

### Prompt Customization
Customizing the prompt instead of loading it from the prompt hub

In [None]:
from langchain_core.prompts import PromptTemplate

template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.
If there is no relevant context, just say that you don't know.

{context}

Question: {question}

Helpful Answer:"""
custom_rag_prompt = PromptTemplate.from_template(template)

## Query Analysis
Here the model rewrite user queries, which may be multifaceted or include irrelevant language, into more effective search queries.

In [None]:
total_documents = len(all_splits)
third = total_documents // 3

for i, document in enumerate(all_splits):
    if i < third:
        document.metadata["section"] = "beginning"
    elif i < 2 * third:
        document.metadata["section"] = "middle"
    else:
        document.metadata["section"] = "end"


all_splits[0].metadata

In [None]:
from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)
_ = vector_store.add_documents(all_splits)

### Schema Definition
Here we define a schema for the search query. we will use structured putput for this purpose.

In [None]:
from typing import Literal

from typing_extensions import Annotated


class Search(TypedDict):
    """Search query."""

    query: Annotated[str, ..., "Search query to run."]
    section: Annotated[
        Literal["beginning", "middle", "end"],
        ...,
        "Section to query.",
    ]

Here the LangGraph application generate a query from user's raw input.

In [None]:
class State(TypedDict):
    question: str
    query: Search
    context: List[Document]
    answer: str


def analyze_query(state: State):
    structured_llm = llm.with_structured_output(Search)
    query = structured_llm.invoke(state["question"])
    return {"query": query}


def retrieve(state: State):
    query = state["query"]
    retrieved_docs = vector_store.similarity_search(
        query["query"],
        filter=lambda doc: doc.metadata.get("section") == query["section"],
    )
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


graph_builder = StateGraph(State).add_sequence([analyze_query, retrieve, generate])
graph_builder.add_edge(START, "analyze_query")
graph = graph_builder.compile()

In [None]:
for step in graph.stream(
    {"question": "What does the post say about Task Decomposition?"},
    stream_mode="updates",
):
    print(f"{step}\n\n----------------\n")

**Note that changing user query may generates error**

In [None]:
for step in graph.stream(
    {"question": "What does the end of the post say about Task Decomposition?"},
    stream_mode="updates",
):
    print(f"{step}\n\n----------------\n")

In [None]:
graph