## RAG2

This version is similar to RAG1

+ In RAG1, the first step is LLM with a structured output( llm.with_structued_output)
  + In RGA2, the first step is LLM with tools bound (llm.bind_tools). By this way, the execution can be routed to tool node
+ in RAG1, tool call (searching vector store) is a explicit step, just like other step.
  +  In RAG2, the tool is wrapped through tool node. The excution is routed to it through a tools_conditions. The tools args are
     structured output.
+ in RAG1, we combine all the output and generate a response in natural language
  + in RAG2, we again combine all the outpus and generate a respose in natual language.
 
+ RAG1 has a well-defined State while RAG2 uses MessagesState. In real application, we may want to use well-defined state, so we can compose a unique prompt at each step based on the data from state. 

In [1]:
## Basic set up : create chat mode, vector store

from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore

llm = ChatVertexAI(model="gemini-2.0-flash-001")
embeddings = VertexAIEmbeddings(model="text-embedding-004")
vector_store = InMemoryVectorStore(embeddings)





In [2]:
## Index documents. This is the basis of all our queries. 
## This is like database setup. 
import bs4
from langchain_core.documents import Document
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader(web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
                      bs_kwargs=dict(parse_only=bs4.SoupStrainer(
                          class_=("post-content", "post-header", "post-title")
                      )))

docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200, add_start_index=True)
splitts = splitter.split_documents(documents=docs)
_=vector_store.add_documents(splitts)


USER_AGENT environment variable not set, consider setting it to identify your requests.


In [None]:
# We are going to build a graph here, 

# Graph always needs State, MessagesState is so pupular, so it is provided
from langgraph.graph import START, END, MessagesState, StateGraph
# Tool call is part 
from langchain_core.tools import tool
from langgraph.prebuilt import ToolNode, tools_condition
from typing import Annotated
from langchain_core.messages import SystemMessage, HumanMessage

#The tool is just a regular python function with input spec.
@tool(response_format="content_and_artifact")
def retrieve( query: Annotated[str, ..., "The query to search external source"]):
    """Retrieve documents based on the query string."""
    
    docs = vector_store.similarity_search(query, k=2)

    # summary text
    text = "\n\n".join(f"source: {doc.metadata['source']} \n\n content: {doc.page_content} " for doc in docs)

    return text, docs


#step 1 Query -> [Tool call | END]
def check_query(msg_state:MessagesState):
    # only after bind tools. The llm can output a ToolCall 
    # The toolCall will be invoked by graph
    llm_with_tools = llm.bind_tools([retrieve])
    # give all the message to llm
    ai_msg = llm_with_tools.invoke(msg_state['messages'])
    return {'messages': [ai_msg]}

# step 2. tool call
tools = ToolNode([retrieve])

#step 3. Generate text based on tool call results

def gen(msg_state:MessagesState):
    #retreive the messages from tool call.

    last_tool_msg = None
    for msg in reversed(msg_state['messages']):
        if msg.type == 'tool':
            last_tool_msg = msg
            break

    # 
    docs_content = last_tool_msg.content
    system_message_content = (
        "You are an assistant for question-answering tasks. "
        "Use the following pieces of retrieved context to answer "
        "the question. If you don't know the answer, say that you "
        "don't know. Use three sentences maximum and keep the "
        "answer concise."
        "\n\n"
        f"{docs_content}"
    )

    msgs = [msg 
            for msg in msg_state['messages']
            if msg.type in ('human', 'system') # keep human and system
            or (msg.type =='ai' and not msg.tool_calls) # keep ai, but not those for tool calls. tool calls already ececuted
           ]
    ans=llm.invoke([SystemMessage(system_message_content)]+ msgs)
    return {'messages': [ans]}

graph_builder = StateGraph(MessagesState)
graph_builder.add_node(check_query)
graph_builder.add_node(tools)
graph_builder.add_node(gen)
graph_builder.set_entry_point("check_query")
graph_builder.add_conditional_edges("check_query", tools_condition, {END:END, "tools": "tools"})
graph_builder.add_edge("tools", "gen")
graph_builder.add_edge("gen", END)
graph=graph_builder.compile()

from IPython.display import display, Image
display(Image(graph.get_graph().draw_mermaid_png()))


In [None]:
import os
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_TRACING_V2"] = "true"

for step in graph.stream( {"messages": [{"role": "user", "content": "What is Task Decomposition?"}]}, stream_mode="values"):
    print(step['messages'][-1])


## How do memory

To remember what graph already did, 

1.  We give a checkpointer when building graph through compiling. This is like destionation (storage) of the project to save the state.
2.  We give a configured thread_id when invoke graph. This is like a key in the storage (such as user id)


## What does memory do.
Once there is a memory to save the state, the state is loaded from the storage. Then we could build a unique context using the stored state for the current graph execution. 

In this example,  the new human question is appended to all existing messages. We then select all human and AI messages, and use this messages as prefix-prompt as context for current question. 

Please note here, we doesn't use multi-turn chat session from the LLM directly. 



In [None]:
from langgraph.checkpoint.memory import MemorySaver

memory=MemorySaver()
graph1 = graph_builder.compile(checkpointer=memory)
config= {"configurable": {"thread_id": "jjrag2_1"}}
graph1.invoke({'messages': [{'role': 'user', 'content': 'What is Task Decomposition?'}]}, config=config)


In [9]:
msg_state = graph1.invoke({'messages': [{'role': 'user', 'content': 'Can you lookup some common ways of doing it?'}]}, config=config)

In [10]:
msg_state['messages'][-1].content

'Task decomposition can be done (1) by LLM with simple prompting, (2) by using task-specific instructions, or (3) with human inputs.\n'

## Agent has the same logic has manual StateGraph

+ The abstract agent decides which tool to call, and how process its result. 
+ The create_react_agent creates a compiled graph.  The graph topology is decided by agent itself. We assume the llm is smart
   enough to split the questions into small tasks.

In our example below, the agent apparenly can not

In [None]:
from langgraph.prebuilt import create_react_agent

memory1 = MemorySaver()
config1= {"configurable": {"thread_id": "jjrag2_2"}}
agent_executor = create_react_agent(llm, [retrieve], checkpointer=memory1)
question="What is Task Decomposition?\n\n Once you get the answer, look up some common ways of doing it."
for step_state in agent_executor.stream({'messages': [{'role': 'user', 'content': question}]}, stream_mode="values", config=config1):
    step_state['messages'][-1].pretty_print()
