# LangGraph自适应RAG

## 1. 介绍

自适应 RAG 是一种 RAG 策略，它将 (1) [查询分析](https://blog.langchain.dev/query-construction/) 与 (2) [主动/自我纠正 RAG](https://blog.langchain.dev/agentic-rag-with-langgraph/) 结合起来。

这里实现的一个自适应RAG，它能够通过三种途径回答问题：

* 无检索，直接回答问题
* 网络搜索，回答问题
* 查询向量数据库，回答问题

它的执行过程如下：

* 问题路由：
    * web_search - 回答具有时效性问题
    * vectorstore - 回答知识库所覆盖的问题
    * LLM - 其他问题
* 问题回答：其中vectorstore会根据相关性进行文档过滤
* 幻觉检测：如果有幻觉，即答案不符合查到的文档或常识(grand truth)，则重新生成答案
* 答案检测：如果生成内容不足以回答问提，fall back到web_search重新回答

## 2. 准备工作

### 2.1 安装依赖

In [1]:
with open('./requirements.txt', 'r') as file:
    for line in file:
        print(line.strip())

protobuf
langchain
langchain-openai
tiktoken
langchainhub
chromadb
langgraph
langchain-community
langchain-core
zhipuai
httpx_sse
bs4
lxml
faiss-cpu


In [2]:
# ! pip3 install -r ./requirements.txt --trusted-host pypi.org --trusted-host pypi.python.org --trusted-host files.pythonhosted.org

### 2.2 准备API Key

获取或购买API Key

* ChatGLM: [https://open.bigmodel.cn/](https://open.bigmodel.cn/)
* Open AI: [https://api.xty.app/](https://api.xty.app/)
* TAVILY: [https://tavily.com/](https://tavily.com/)
* KIMI: [https://platform.moonshot.cn/](https://platform.moonshot.cn/)
* Langchain (for Langsmith): 见下一小节

设置API Key

1. 替换`API KEY`的值、然后把下列命令添加到`~/.bash_profile`文件中

~~~bash
export OPENAI_API_KEY="sk-mt...vjl" 
export ZHIPU_API_KEY="210...w5y"
export TAVILY_API_KEY="tvly...E1R" # free API key with 1000 requests 
export LANGCHAIN_API_KEY="lsv2...599"
export KIMI_API_KEY="sk-...h59"
~~~

2. 在`shell`中按`Cmd + C`退出jupyter notebook
3. 载入`API KEY`然后重启`jupyter nootbook`

~~~bash
source ~/.bash_profile
jupyter notebook 
~~~

详细配置参考[setup.sh](setup.sh)
如果使用IDE，则将上述环境变量配置在IDE的运行设置中

In [1]:
# 测试API KEY是否已经在环境变量中
import os
print(f"ZHIPU_API_KEY\t: {os.environ['ZHIPU_API_KEY'][:5]}... ")
print(f"OPENAI_API_KEY\t: {os.environ['OPENAI_API_KEY'][:5]}...")
print(f"TAVILY_API_KEY\t: {os.environ['TAVILY_API_KEY'][:5]}...")
print(f"LANGCHAIN_API_KEY\t: {os.environ['LANGCHAIN_API_KEY'][:5]}...")
print(f"KIMI_API_KEY\t: {os.environ['KIMI_API_KEY'][:5]}...")

ZHIPU_API_KEY	: 210ba... 
OPENAI_API_KEY	: sk-mt...
TAVILY_API_KEY	: tvly-...
LANGCHAIN_API_KEY	: lsv2_...
KIMI_API_KEY	: sk-Jb...


### 2.3 准备langsmith

访问[https://smith.langchain.com/](https://smith.langchain.com/)

* 注册账号
* 点击`Setting`->`API Key`创建API Key，添加到环境变量中（参考上一小节）
* 点击`Projects`->`Create New Project`查看创建Project的代码，主要是设置下面的一组环境变量，包括LANGCHAIN_TRACING_V2、LANGCHAIN_ENDPOINT、LANGCHAIN_API_KEY、LANGCHAIN_PROJECT

上述方法，会将日志发送到langchain官网，在官网上进入相对应的project，就能查看tracing数据

如果不希望、可以使用官方提供的LangSmith Docker，将日志存储在本地，具体参考：[https://docs.smith.langchain.com/self_hosting/installation](https://docs.smith.langchain.com/self_hosting/installation)

In [2]:
import os

os.environ['LANGCHAIN_TRACING_V2'] = 'false' # set as true to enable tracing
os.environ['LANGCHAIN_PROJECT'] = "investigate_self_correct_rag"
os.environ['LANGCHAIN_ENDPOINT'] = "https://api.smith.langchain.com"

print(f"LANGCHAIN_TRACING_V2\t: {os.environ['LANGCHAIN_TRACING_V2']}")
print(f"LANGCHAIN_PROJECT\t: {os.environ['LANGCHAIN_PROJECT']}")
print(f"LANGCHAIN_API_KEY\t: {os.environ['LANGCHAIN_API_KEY']}")
print(f"LANGCHAIN_ENDPOINT\t: {os.environ['LANGCHAIN_ENDPOINT']}")

LANGCHAIN_TRACING_V2	: false
LANGCHAIN_PROJECT	: investigate_self_correct_rag
LANGCHAIN_API_KEY	: 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
LANGCHAIN_ENDPOINT	: https://api.smith.langchain.com


### 2.3 知识库索引

需要安装`zhipuai`和`langchain_community`，安装后重启kernel就可以加载了

代码参考文档：

* ZhipuAIEmbedding：[https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.zhipuai.ZhipuAIEmbeddings.html](https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.zhipuai.ZhipuAIEmbeddings.html#langchain_community.embeddings.zhipuai.ZhipuAIEmbeddings)
* Chroma: [https://python.langchain.com/v0.2/docs/integrations/vectorstores/chroma/](https://python.langchain.com/v0.2/docs/integrations/vectorstores/chroma/)
* Load Html: [https://python.langchain.com/v0.2/docs/how_to/document_loader_html/](https://python.langchain.com/v0.2/docs/how_to/document_loader_html/)

In [3]:
### 建立索引
from langchain.text_splitter import RecursiveCharacterTextSplitter
from lib.util.LLMUtils import EmbeddingUtil
from lib.util.LLMUtils import LLMVendor

# 初始化embedding
embd = EmbeddingUtil.getModel(LLMVendor.ZHIPU)

# 要加载文件的目录
file_dir = "../data/file"

from langchain_community.document_loaders import TextLoader
txt_files = [
    f"{file_dir}/2023-03-15-prompt-engineering.txt",
    f"{file_dir}/2023-06-23-agent.txt",
    f"{file_dir}/2023-10-25-adv-attack-llm.txt",
]
docs = [TextLoader(text_file).load() for text_file in txt_files]
docs_list = [item for sublist in docs for item in sublist]

# 切分
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=512, chunk_overlap=0)
doc_splits = text_splitter.split_documents(docs_list)
len(doc_splits)


# 加载文件
# from langchain_community.document_loaders import WebBaseLoader
# loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
# docs_list = loader.load()

# from langchain_community.document_loaders import BSHTMLLoader
# backlog_file_dir="../backlog/data/file"
# loader = BSHTMLLoader(file_path=f"{backlog_file_dir}/2023-03-15-prompt-engineering.html")
# docs_list = loader.load()

# from langchain_community.document_loaders import MHTMLLoader
# backlog_file_dir="../backlog/data/file"
# loader = MHTMLLoader(file_path=f"{backlog_file_dir}/2023-03-15-prompt-engineering.mhtml")
# docs_list = loader.load()

ZHIPU_API_KEY	: 210ba... 


62

In [4]:
from lib.util.VectorStoreUtil import VectorStoreUtil
from lib.util.VectorStoreUtil import VectorStoreVendor
from lib.util.LLMUtils import EmbeddingUtil,LLMVendor

# 保存向量数据库dump文件的base目录
vectorstore_dump_base_dir="../data/vector_store"

# 初始化
embd = EmbeddingUtil.getModel(LLMVendor.ZHIPU)
vectorstore_wrapper = VectorStoreUtil.create_wrapper(VectorStoreVendor.FAISS)

ZHIPU_API_KEY	: 210ba... 


In [5]:
# 对文档进行embedding，添加到向量数据库，并备份到本地dump文件
# vectorstore_wrapper.init_from_docs(docs=doc_splits, embedding=embd)
# vectorstore_wrapper.trigger_dump(base_dir=vectorstore_dump_base_dir)

load_chunk: 0
load_chunk: 1
load_chunk: 2
load_chunk: 3
load_chunk: 4
load_chunk: 5
load_chunk: 6
vector store load complete
vector store dump triggered, dir: ../data/vector_store/faiss_dump


In [5]:
# 从dump文件中加载向量数据库
vectorstore_wrapper.init_from_dump(embedding=embd, base_dir=vectorstore_dump_base_dir)

# 返回langchain object给其它模块使用
vectorstore = vectorstore_wrapper.get_vector_store()
retriever = vectorstore.as_retriever()
retriever

load from ../data/vector_store/faiss_dump
load compete


VectorStoreRetriever(tags=['FAISS', 'ZhipuAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x10a35ac30>)

## 3 RAG子模块
### 3.1 LLM Query意图分析及路由

使用路由器在工具之间进行选择，让大模型根据用户提问来判断使用哪条路由

参考文档

* ChatGLM用于langchain框架:[https://open.bigmodel.cn/dev/api#langchain_sdk](https://open.bigmodel.cn/dev/api#langchain_sdk)
* Langchain tool calling: [https://python.langchain.com/v0.2/docs/how_to/tool_calling/](https://python.langchain.com/v0.2/docs/how_to/tool_calling/)


In [7]:
from langchain_core.messages import SystemMessage
from langchain_core.tools import tool
from langchain.prompts import (ChatPromptTemplate, HumanMessagePromptTemplate)
from lib.util.LLMUtils import LLMVendor
from lib.util.LLMUtils import ChatModelUtil

# 数据模型
@tool
def web_search(query: str):
    """The internet. Use web_search for questions that are related to anything else than agents, prompt engineering, and adversarial attacks.

    Args: 
        query: The query to use when searching the internet.
    """
    return

@tool
def vectorstore(query: str):
    """A vectorstore containing documents related to agents, prompt engineering, and adversarial attacks. Use the vectorstore for questions on these topics.
    
    Args:
        query: The query to use when searching the vectorstore.
    """
    return

# 包含工具使用和序言的 LLM
llm = ChatModelUtil.getChatOpenAIModel(vendor=LLMVendor.ZHIPU, temperature=0)
structured_llm_router = llm.bind_tools(tools=[web_search, vectorstore])

# Prompt
route_prompt = ChatPromptTemplate(
    messages=[
        SystemMessage(content = """You are an expert at routing a user question to a vectorstore or web search.\n The vectorstore contains documents related to agents, prompt engineering, and adversarial attacks. \n Use the vectorstore for questions on these topics. Otherwise, use web-search."""), 
        HumanMessagePromptTemplate.from_template("{question}")
    ]
)

question_router = route_prompt | structured_llm_router


In [8]:
response = question_router.invoke(
    {"question": "Who will the Bears draft first in the NFL draft?"}
)
print(response.additional_kwargs["tool_calls"])

[{'id': 'call_202408101809530993f5092dcd47e3', 'function': {'arguments': '{"query": "Who will the Bears draft first in the NFL draft?"}', 'name': 'web_search'}, 'type': 'function', 'index': 0}]


In [9]:
response = question_router.invoke({"question": "What are the types of agent memory?"})
print(response.additional_kwargs["tool_calls"])

[{'id': 'call_2024081018095675c5f3db2d334e07', 'function': {'arguments': '{"query": "types of agent memory"}', 'name': 'vectorstore'}, 'type': 'function', 'index': 0}]


In [10]:
response = question_router.invoke({"question": "Hi how are you?"})
print("tool_calls" in response.additional_kwargs)

False


### 3.2 评价检索道的文档与提问的相关性

In [11]:
from lib.util.LLMUtils import ChatModelUtil
from lib.util.LLMUtils import LLMVendor
from langchain_core.messages import SystemMessage
from langchain.prompts import (ChatPromptTemplate, HumanMessagePromptTemplate)
from lib.util.StructureOutputUtil import YesOrNoUtil

# LMM
llm = ChatModelUtil.getChatOpenAIModel(LLMVendor.ZHIPU, temperature=0) # model='glm-4-0520', 'glm-4-air'

# 结构化输出
structured_llm_grader = llm.with_structured_output(
    schema=YesOrNoUtil.YesOrNo, method="json_mode", include_raw=False)

# 组装成一个Chain
grade_prompt = ChatPromptTemplate(
    messages=[
        SystemMessage(content=f"""You are a grader assessing relevance of a retrieved document to a user question. \n If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n {YesOrNoUtil.json_mode_prompt(yes_means="the document is relevant to the question")}."""), 
        HumanMessagePromptTemplate.from_template(
            "Retrieved document: \n\n {document} \n\n User question: {question}")
    ]
)
retrieval_grader = grade_prompt | structured_llm_grader

In [12]:
question = "introduce prompt engineering"
docs = retriever.invoke(question)
doc_txt = docs[0].page_content
f'{doc_txt[:500]}...'

'Prompt Engineering\n\nDate: March 15, 2023 | Estimated Reading Time: 21 min | Author: Lilian Weng\nTable of Contents\nPrompt Engineering, also known as In-Context Prompting, refers to methods for how to communicate with LLM to steer its behavior for desired outcomes without updating the model weights. It is an empirical science and the effect of prompt engineering methods can vary a lot among models, thus requiring heavy experimentation and heuristics.\n\nThis post only focuses on prompt engineering f...'

In [13]:
response = retrieval_grader.invoke({"question": "introduce prompt engineering", "document": doc_txt})
print(response)

binary_score='yes'


In [14]:
response = retrieval_grader.invoke({"question": "types of ice cream", "document": doc_txt})
print(response)

binary_score='no'


### 3.3 让LLM回答问题: (1) RAG Chain

当拿到向量数据库中的Document，以及用户提问后，就可以用它们来回答问题

In [15]:
from langchain_core.output_parsers import StrOutputParser
from lib.util.LLMUtils import ChatModelUtil
from langchain_core.messages import SystemMessage,HumanMessage

# LLM
llm = ChatModelUtil.getChatOpenAIModel(LLMVendor.ZHIPU, temperature=0)

# Prompt
prompt = lambda x: ChatPromptTemplate.from_messages(
    [
        SystemMessage(content="""You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. \n Answer 'I don't know' if you do not have information to answer this question. \n Use three sentences maximum and keep the answer concise if you have information to answer this question."""),
        HumanMessage(
            f"Question: {x['question']} \nAnswer: ",
            additional_kwargs={"documents": x["documents"]},
        )
    ]
)
# Chain
rag_chain = prompt | llm | StrOutputParser()

In [16]:
# test: an answer that covered by document
question = "introduce prompt engineering"
generation = rag_chain.invoke({"documents": docs, "question": question})
print(generation)

Prompt engineering is the process of crafting and fine-tuning natural language prompts to guide AI models like GPT-3 towards generating desired outputs. It involves understanding the model's capabilities and limitations to elicit more accurate, relevant, and contextually appropriate responses. This technique is crucial for improving the performance and utility of AI in various applications.


In [17]:
# test: an answer that model does not know
question = "What is the job of the person named RJXACAGEDSG LAMUX?"
generation = rag_chain.invoke({"documents": docs, "question": question})
print(generation)

I don't know.


### 3.4 让LLM回答问题: (2) Non RAG Chain

In [18]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.messages import HumanMessage, SystemMessage

# llm
llm = ChatModelUtil.getChatOpenAIModel(LLMVendor.ZHIPU, temperature=0)

# prompt
prompt = lambda x: ChatPromptTemplate.from_messages(
    [
        SystemMessage(content="""You are an assistant for question-answering tasks. Answer the question based upon your knowledge. Use three sentences maximum and keep the answer concise."""),
        HumanMessage(f"Question: {x['question']} \nAnswer: ")
    ]
)
# chain
llm_chain = prompt | llm | StrOutputParser()

In [19]:
question = "Who are you?"
generation = llm_chain.invoke({"question": question})
print(generation)

I am an AI assistant designed to provide information and answer questions based on the knowledge programmed into me. I don't have personal experiences or consciousness. How can I assist you today?


### 3.5 幻觉判定

In [20]:
from lib.util.StructureOutputUtil import YesOrNoUtil

# llm
llm = ChatModelUtil.getChatOpenAIModel(LLMVendor.ZHIPU, temperature=0)
structured_llm_grader = llm.with_structured_output(YesOrNoUtil.YesOrNo, method='json_mode', include_raw=False)

# prompt
hallucination_prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(content=f"""You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. \n Respond using a JSON that contains the key 'binary_score' with the value being 'yes' or 'no' \n {YesOrNoUtil.json_mode_prompt(yes_means="the answer is grounded in / supported by the set of facts.")}"""),
        HumanMessagePromptTemplate.from_template(
            "LLM generation: \n\n {generation} \n\n"
            "Set of facts: \n\n {documents}"),
    ]
)
reliability_grader = hallucination_prompt | structured_llm_grader

In [21]:
# test 1: an answer related with the documents
response = reliability_grader.invoke({"documents": docs, "generation": "Prompt engineering is the process of crafting and fine-tuning natural language prompts to guide AI models like GPT-3 towards generating desired outputs. It involves understanding the model's capabilities and limitations to elicit more accurate, relevant, and contextually appropriate responses. This technique is crucial for improving the performance and utility of AI in various applications."})

print(f'Reliability: {response}\n')
print(f'Reference documents: \n {docs[0].to_json().get('kwargs')}\n'[:300])

Reliability: binary_score='yes'

Reference documents: 
 {'metadata': {'source': '../data/file/2023-03-15-prompt-engineering.txt'}, 'page_content': "Prompt Engineering\n\nDate: March 15, 2023 | Estimated Reading Time: 21 min | Author: Lilian Weng\nTable of Contents\nPrompt Engineering, also known as In-Context Prompting, refers to m


In [22]:
# test 2: an answer not related with the documents

response = reliability_grader.invoke({"documents": docs, "generation": "Prompt engineering is a technique for promoting and motivating subordinates."})

print(f'Reliability: {response}\n')
print(f'Reference documents: \n {docs[0].to_json().get('kwargs')}\n'[:300])

Reliability: binary_score='no'

Reference documents: 
 {'metadata': {'source': '../data/file/2023-03-15-prompt-engineering.txt'}, 'page_content': "Prompt Engineering\n\nDate: March 15, 2023 | Estimated Reading Time: 21 min | Author: Lilian Weng\nTable of Contents\nPrompt Engineering, also known as In-Context Prompting, refers to m


### 3.6 回答质量评价

In [23]:
from lib.util.StructureOutputUtil import YesOrNoUtil

# llm with structure output
llm = ChatModelUtil.getChatOpenAIModel(LLMVendor.ZHIPU, temperature=0)
structured_llm_grader = llm.with_structured_output(YesOrNoUtil.YesOrNo, method='json_mode', include_raw=False)

# prompt
answer_prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(content=f"""You are a grader assessing whether an answer addresses / resolves a question \n {YesOrNoUtil.json_mode_prompt(yes_means="the answer resolves the question")}"""),
        HumanMessagePromptTemplate.from_template(
            "User question: \n\n {question} \n\n LLM generation: {generation}")
    ]
)

# chain
answer_relevance_grader = answer_prompt | structured_llm_grader

In [24]:
# test
response=answer_relevance_grader.invoke({"question": "who am i", "generation": "I am an AI assistant designed to provide information and answer questions based on the knowledge programmed into me. I don't have personal experiences or consciousness. How can I assist you today?"})
response

YesOrNo(binary_score='no')

In [25]:
response=answer_relevance_grader.invoke({"question": "who are u", "generation": "I am an AI assistant designed to provide information and answer questions based on the knowledge programmed into me. I don't have personal experiences or consciousness. How can I assist you today?"})
response

YesOrNo(binary_score='yes')

In [26]:
response=answer_relevance_grader.invoke({"question": "introduce prompt engineering", "generation": "Prompt engineering is the process of crafting and fine-tuning natural language prompts to guide AI models like GPT-3 towards generating desired outputs. It involves understanding the model's capabilities and limitations to elicit more accurate, relevant, and contextually appropriate responses. This technique is crucial for improving the performance and utility of AI in various applications."})
response

YesOrNo(binary_score='yes')

### 3.7 网页搜索工具

In [27]:
from langchain_community.tools.tavily_search import TavilySearchResults

# check API key
print(os.environ['TAVILY_API_KEY'][:5])

# sample search
web_search_tool = TavilySearchResults()

tvly-


In [35]:
# search_result=web_search_tool.invoke("Tom and Jerry")
# search_result[0]

tvly-


{'url': 'https://www.facebook.com/WarnerBrosME/videos/tom-jerry-official-trailer/492448011730010/',
 'content': 'Tom and Jerry take their cat and mouse game to the big screen. Watch the trailer for the new #TomAndJerryMovie now - coming to theaters 2021.'}

## 4 Graph组成

以图表形式捕获流程

### 4.1 定义Graph State

#### (1) graph state

In [28]:
from typing_extensions import TypedDict
from typing import List

class GraphState(TypedDict):
    """
    表示图表的状态。在Graph运行过程中，存储各个Node产生的数据，
    - 用作Node的输入和输出。
    - 用作Edge的输入，帮助Edge决定路由到哪个Node上

    属性：
    - question：用户提问
    - generation：LLM生成的答案
    - documents：从向量数据库中检索到的文档列表
    """

    question: str
    generation: str
    documents: List[str]

### 4.2 定义Graph Node

#### (1) retrieve：查询vector store

In [29]:
from langchain.schema import Document

def retrieve(state):
    """
    检索文档，从向量数据库查询与用户提问有关的内容

    参数： state (dict)：当前图形状态
    返回： state (dict)：添加到包含已检索文档的状态文档的新键
    """
    print("---检索---")
    question = state["question"]

    # 检索
    documents = retriever.invoke(question)
    
    # 返回更新后的graph state
    return {"documents": documents, "question": question}

#### (2) generate: 使用LLM和vector store生成答案

In [30]:
def generate(state):
    """ 
    使用 vectorstore 生成答案

    参数： state (dict)：当前图形状态
    返回： state (dict)：添加到状态、generation 的新键，包含 LLM Generation
    """
    print("---GENERATE---")
    question = state["question"]
    documents = state["documents"]
    if not isinstance(documents, list):
        documents = [documents]

    # RAG 生成
    generation = rag_chain.invoke({"documents": documents, "question": question})
    return {"documents": documents, "question": question, "generation": generation}

#### (3) llm_fallback：只用LLM生成答案

In [31]:
def llm_fallback(state):
    """
    只使用LLM生成答案

    参数：state (dict)：当前图形状态
    返回：state (dict)：添加到状态、generation 的新键，其中包含 LLM Generation
    """
    print("---LLM Fallback---")
    question = state["question"]
    generation = llm_chain.invoke({"question": question})
    print(f"question: {question}")
    print(f"generation: {generation}")
    return {"question": question, "generation": generation}

#### (4) grade_documents: 检测文档与提问是否相关

In [32]:
def grade_documents(state):
    """
    确定检索到的文档是否与问题相关。

    参数：state (dict)：当前图形状态
    返回：state (dict)：仅使用经过筛选的相关文档更新文档键
    """

    print("---检查文件与问题的相关性---")
    question = state["question"]
    documents = state["documents"]
    print(f"Question: {question}")
    print(f"Documents: {documents}"[:300])

    # 为每个文档评分
    filtered_docs = []
    for d in documents:
        score = retrieval_grader.invoke(
            {"question": question, "document": d.page_content}
        )
        grade = score.binary_score
        if grade == "yes":
            print("---打分：文档相关---")
            filtered_docs.append(d)
        else:
            print("---打分：文档不相关---")
            continue
    return {"documents": filtered_docs, "question": question}

#### (5) web_search: 网页搜索

In [33]:
def web_search(state):
    """
    根据重新表述的问题进行网络搜索。

    参数：state (dict)：当前图形状态
    返回：state (dict)：使用附加的网络结果更新文档键
    """

    print("---网络搜索---")
    question = state["question"]

    # 网络搜索
    docs = web_search_tool.invoke({"query": question})
    print(f"web_results: {docs}"[:300])
        
    web_results = "\n".join([d["content"] for d in docs])
    web_results = Document(page_content=web_results)
    return {"documents": web_results, "question": question}

### 4.3 定义Graph Edge

Edge的输入是Graph State，输出是Node注册在Graph中的Key（4.3小节介绍）

#### (1) route_question: 对提问进行路由

判断把问题路由到哪里(web_search/vectorstore/llm)

In [34]:
def route_question(state):
    """
    将问题路由到网络搜索或 RAG。

    参数： state (dict)：当前图形状态
    返回： str：要调用的下一个节点
    """

    print("---路由用户问题---")
    question = state["question"]
    source = question_router.invoke({"question": question})

    # 如果没有决定则返回 LLM 或引发错误
    if "tool_calls" not in source.additional_kwargs:
        print("---把问题路由到LLM---")
        return "llm_fallback"
    if len(source.additional_kwargs["tool_calls"]) == 0:
        raise "路由无法确定来源"

    # 选择数据源
    datasource = source.additional_kwargs["tool_calls"][0]["function"]["name"]
    if datasource == "web_search":
        print("---把问题路由到网络搜索---")
        return "web_search"
    elif datasource == "vectorstore":
        print("---把问题路由到数据库---")
        return "vectorstore"
    else:
        print("---把问题路由到LLM---")
        return "vectorstore"

#### (2) decide_to_generate：决定是否生成回答

In [35]:
def decide_to_generate(state):
    """
    确定是否生成答案或重新生成问题。

    参数：state (dict)：当前图形状态
    返回：str：下一个要调用的节点的二元决策
    """

    print("---评估已评分文件---")
    question = state["question"]
    filtered_documents = state["documents"]

    if not filtered_documents:
        # 所有文档都已过滤 check_relevance
        # 我们将重新生成一个新查询
        print("---所有文件与问题无关，网络搜索---")
        return "web_search"
    else:
        # 我们有相关文件，因此生成答案
        print("---DECISION: GENERATE---")
        return "generate"

#### (3) grade_generation_v_documents_and_question：检测文档及回答

In [36]:
def grade_generation_v_documents_and_question(state):
    """
    确定生成是否基于文档并回答问题。

    参数： state (dict)：当前图形状态
    返回： str：下一个要调用的节点的决策
    """

    print("---检查幻觉---")
    question = state["question"]
    documents = state["documents"]
    generation = state["generation"]

    print(f"question: {question}")
    print(f"generation: {generation}"[:300])

    score = reliability_grader.invoke(
        {"documents": documents, "generation": generation}
    )
    grade = score.binary_score

    # 检查幻觉
    if grade == "yes":
        print("---生成被文件内容支持---")
        # 检查问答
        print("---生成评价&问题---")
        score = answer_relevance_grader.invoke({"question": question, "generation": generation})
        grade = score.binary_score
        if grade == "yes":
            print("---生成解决了问题---")
            return "useful"
        else:
            print("---生成不能解决问题---")
            return "not useful"
    else:
        print("---生成不能被文件支持---")
        return "not supported"

## 5. 构建和调用Graph

### 5.1 构建Graph

In [43]:
import pprint

from langgraph.graph import END, StateGraph

workflow = StateGraph(GraphState)

# 定义节点并给他们命名（node key）
workflow.add_node("web_search", web_search)             # 网络搜索
workflow.add_node("retrieve", retrieve)                 # 检索
workflow.add_node("grade_documents", grade_documents)   # 给文件打分
workflow.add_node("generate", generate)                 # RAG
workflow.add_node("llm_fallback", llm_fallback)         # llm

# 构建图表
workflow.set_conditional_entry_point(
    route_question,
    {
        "web_search": "web_search",
        "vectorstore": "retrieve",
        "llm_fallback": "llm_fallback",
    },
)
workflow.add_edge("web_search", "generate")
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
    "grade_documents",
    decide_to_generate,
    {
        "web_search": "web_search",
        "generate": "generate",
    },
)
workflow.add_conditional_edges(
    "generate",
    grade_generation_v_documents_and_question,
    {
        "not supported": "generate",  # 幻觉，文档和grand truth不支持
        "not useful": "web_search",   # 生成的答案并不能回答问题
        "useful": END,
    },
)
workflow.add_edge("llm_fallback", END)

# 编译
app = workflow.compile()

### 5.2 测试

#### (1) 需要查询网页来回答的问题

In [44]:
inputs = {
    "question": "Who is the fastest man on earth?"
}
for output in app.stream(inputs):
    for key, value in output.items():
        # 节点
        pprint.pprint(f"Node '{key}':")
        # 可选：打印每个节点的完整状态
    pprint.pprint("\n---\n")

# 最终生成
pprint.pprint(value["generation"])

---路由用户问题---
---把问题路由到网络搜索---
---网络搜索---
web_results: [{'url': 'https://www.nbcnewyork.com/paris-2024-summer-olympics/worlds-fastest-man-noah-lyles-wins-gold-mens-100m-photo-finish/5669070/', 'content': "Noah Lyles can claim the title of the 'World's Fastest Man' after winning gold in the men's 100m race in a true photo finish on Sunday. L
"Node 'web_search':"
'\n---\n'
---GENERATE---
---检查幻觉---
question: Who is the fastest man on earth?
generation: As of my last update, the title of the fastest man on earth is held by Usain Bolt, who set the world record for the 100m sprint in 2009 with a time of 9.58 seconds. However, newer athletes like Yohan Blake and Justin Gatlin have also achieved very fast times. The current world record hol
---生成被文件内容支持---
---生成评价&问题---
---生成解决了问题---
"Node 'generate':"
'\n---\n'
('As of my last update, the title of the fastest man on earth is held by Usain '
 'Bolt, who set the world record for the 100m sprint in 2009 with a time of '
 '9.58 seconds. However, 

#### (2) 需要查向量数据库来回答的问题

In [38]:
# Run
inputs = {"question": "What are the types of agent memory?"}
for output in app.stream(inputs):
    for key, value in output.items():
        # 节点
        pprint.pprint(f"Node '{key}':")
        # 可选：打印每个节点的完整状态
        # pprint.pprint(value["keys"], indent=2, width=80, depth=None)
    pprint.pprint("\n---\n")

# 最终生成
pprint.pprint(value["generation"])

---路由用户问题---
---把问题路由到数据库---
---检索---
"Node 'retrieve':"
'\n---\n'
---检查文件与问题的相关性---
Question: What are the types of agent memory?
Documents: [Document(metadata={'source': '../data/file/2023-06-23-agent.txt'}, page_content='Sensory Memory: This is the earliest stage of memory, providing the ability to retain impressions of sensory information (visual, auditory, etc) after the original stimuli have ended. Sensory memory typicall
---打分：文档相关---
---打分：文档不相关---
---打分：文档相关---
---打分：文档相关---
---评估已评分文件---
---DECISION: GENERATE---
"Node 'grade_documents':"
'\n---\n'
---GENERATE---
---检查幻觉---
question: What are the types of agent memory?
generation: Agent memory types include sensory memory, short-term memory, and long-term memory. Sensory memory briefly holds sensory information, short-term memory temporarily stores and manipulates information, and long-term memory provides permanent storage of information.
---生成被文件内容支持---
---生成评价&问题---
---生成解决了问题---
"Node 'generate':"
'\n---\n'
('Agent memory 

#### (3) 需要由LLM直接回答的问题

In [39]:
# Run
inputs = {"question": "Hello, who I am talking to?"}
for output in app.stream(inputs):
    for key, value in output.items():
        # 节点
        pprint.pprint(f"Node '{key}':")
        # 可选：打印每个节点的完整状态
        # pprint.pprint(value["keys"], indent=2, width=80, depth=None)
    pprint.pprint("\n---\n")

# 最终生成
pprint.pprint(value["generation"])

---路由用户问题---
---把问题路由到LLM---
---LLM Fallback---
question: Hello, who I am talking to?
generation: Hello, you're talking to an AI assistant designed to help with question-answering tasks. How can I assist you today?
"Node 'llm_fallback':"
'\n---\n'
("Hello, you're talking to an AI assistant designed to help with "
 'question-answering tasks. How can I assist you today?')
