<a href="https://colab.research.google.com/github/ramahasiba/NLP/blob/langGraph/Build_a_Retrieval_Augmented_Generation_App_Part_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [Build a Retrieval Augmented Generation App Part 2](https://python.langchain.com/docs/tutorials/qa_chat_history/)

In this part of building a RAG application, we build an app that allows user to have a back-and-forth conversation, meaning this application has a memory of conversation history.

We focus on adding logic for incorporating historical messages and this involves the management of a chat history.

## Setup and Installation

In [None]:
!pip install -qU "langchain[groq]"

In [None]:
!pip install dotenv -q
from dotenv import load_dotenv
try:
  load_dotenv('.env')
except ImportError:
  print('No .env file found')

### LangSmith

In [None]:
import getpass
import os

os.environ[
    "LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = os.environ.get("LANGSMITH_API_KEY")

### Groq

In [None]:
os.environ["GROQ_API_KEY"]=os.environ.get("GROQ_API_KEY")

model_name = "llama3-70b-8192"

from langchain.chat_models import init_chat_model
llm=init_chat_model(model_name, model_provider="groq")

### Hugging Face

In [None]:
!pip install -qU langchain-huggingface

from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

### Chroma DB

In [None]:
!pip install -qU langchain-chroma

from langchain_chroma import Chroma

vector_store = Chroma(
    collection_name="example_collection",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",  # Where to save data locally, remove if not necessary
)

In [None]:
%%capture --no-stderr
%pip install --upgrade --quiet langgraph langchain-community beautifulsoup4 -q

There are two ways we can use to implement ourapplication:
* Chains
* Agents

## Chains
Here we execcute at most one retrieval step

In [None]:
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from typing_extensions import List, TypedDict

# Load chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(class_=("post-content", "post-title", "post-header"))
    )
)

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)
all_splits= text_splitter.split_documents(docs)

In [None]:
# Index chunks
_ = vector_store.add_documents(all_splits)

In [None]:
from langgraph.graph import MessagesState, StateGraph
graph_builder = StateGraph(MessagesState)

In [None]:
from langchain_core.tools import tool

@tool(response_format="content_and_artifact")
def retrieve(query: str):
  """Retrieve information related to a query"""
  retrieved_docs = vector_store.similarity_search(query, k=2)
  serialized = "\n\n".join(
      (f"Source: {doc.metadata}\n" f"Content: {doc.page_content}") for doc in retrieved_docs
  )
  return serialized, retrieved_docs

The graph will consist of three noes:
* A node that fields the user input, either generating a query for the retriever or responding directly.
* A node for retriever tol that executes the retrieval step
* A node that generates the final response using the retrieved context

In [None]:
from langchain_core.messages import SystemMessage
from langgraph.prebuilt import ToolNode

# Step 1: generate AI Message that may include a tool-call to be sent.
def query_or_respond(state: MessagesState):
  """Generate tool call for retrieval or respond."""
  llm_with_tools = llm.bind_tools([retrieve])
  response = llm_with_tools.invoke(state["messages"])
  # MessagesState appends messages to state instead of overwriting
  return {"messages": [response]}

# Step 2: Execute the retrieval
tools = ToolNode([retrieve])

# Step 3: Generate a response using the retrieved content.
def generate(state: MessagesState):
  """Generate answer."""
  # Get generated toolMessagaes
  recent_tool_messages = []
  for message in reversed(state["messages"]):
    if message.type == "tool":
      recent_tool_messages.append(message)
    else:
      break
  tool_messages = recent_tool_messages[::-1]

  # Format into prompt
  docs_content = "\n\n".join(doc.content for doc in tool_messages)
  system_message_content = (
      "You are an assistant for question-answering tasks."
      "Use the following pieces of retrieved context to answer"
      "the question. If you don't know the answer, say that you"
      "don't know. Use three senences maximum and keep the answer concise.\n\n"
      f"{docs_content}"
  )

  conversation_messages =[
      message
      for message in state["messages"]
      if message.type in ("human", "system")
      or (message.type == "ai" and not message.tool_calls)
  ]
  prompt = [SystemMessage(system_message_content)] + conversation_messages

  #Run
  response = llm.invoke(prompt)
  return {"messages": [response]}

Here we connect nodes together into a single grap, we allow te first query_or_respond step to "short-circut" and respond directly to the user if it does not generate a tool call.


In [None]:
from langgraph.graph import END
from langgraph.prebuilt import ToolNode, tools_condition

graph_builder.add_node(query_or_respond)
graph_builder.add_node(tools)
graph_builder.add_node(generate)

graph_builder.set_entry_point("query_or_respond")
graph_builder.add_conditional_edges(
    "query_or_respond",
    tools_condition,
    {END: END, "tools": "tools"}
)
graph_builder.add_edge("tools", "generate")
graph_builder.add_edge("generate", END)

graph = graph_builder.compile()

In [None]:
graph

In [None]:
system_message = SystemMessage(content="Only call tools if the user is asking a factual or knowledge-based question. Otherwise, respond directly.")

In [None]:
input_message = "Hello"

for step in graph.stream(
    {"messages": [system_message, {"role": "user", "content": input_message}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

In [None]:
input_message = "What is Task Decomposition?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

In [None]:
input_message = "What is the temprature today?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

In [None]:
from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()
graph = graph_builder.compile(checkpointer=memory)

# specify an ID for the thread
config = {"configurable": {"thread_id": "a"}}

In [None]:
input_message = "What is Task Decomposition?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config,
):
    step["messages"][-1].pretty_print()

In [None]:
input_message = "Can you look up some common ways of doing it?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config,
):
    step["messages"][-1].pretty_print()

## Agents

Agents leverages the reasoning capabilities of LLM to make deccision during execution.

Here the tool invokation loops back to the original LLM call. The model can either answerthe question using the retrieved context, or generate another tool call to obtain more information.


Here we givw an LLM discretion to execute multiple retrieval steps

In [None]:
from langgraph.prebuilt import create_react_agent

agent_executor = create_react_agent(llm, [retrieve], checkpointer=memory)

In [None]:
agent_executor

In [None]:
config = {"configurable": {"thread_id": "b"}}

input_message = (
    "what is the standard method for task decomposition?\n\n"
    "Once you get the answer, look up common extensions of that method"
)

for event in agent_executor.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config
):
  event["messages"][-1].pretty_print()

The key difference in those two implementationa is that instead of a final generation step that ends the run that is in the first implementatin, there is a tool invocation in the second implementation that loops back to the original LLM call.