# Use Case: Chatbots QuickStart

我们将举例说明如何设计和实现由 LLM 驱动的聊天机器人。以下是我们将使用的一些高级组件：

- `Chat Models`：聊天机器人界面基于信息而非原始文本，因此最适合 Chat Models 而非 LLMs。
- `Prompt Templates`：它简化了将默认信息、用户输入、聊天记录和（可选）其他检索到的上下文组合在一起的提示过程。
- `Chat History`：它允许聊天机器人 “记住” 过去的互动，并在回答后续问题时将其考虑在内。
- `Retrievers`：如果您想构建一个可以使用特定领域的最新知识作为上下文来增强其响应的聊天机器人，这将非常有用。

## QuickStart

In [1]:
%pip install --upgrade --quiet langchain langchain-google-genai

In [2]:
from google.colab import userdata

API_KEY = userdata.get('API_KEY')

让我们初始化聊天模型：

*`gemini-pro` 目前不支持 `SystemMessage`，但可以将其添加到该行的第一条人类信息中。如果需要这种行为，只需将 `convert_system_message_too_human` 设置为 `True` 即可。*

In [3]:
from langchain_google_genai import ChatGoogleGenerativeAI

# chat = ChatGoogleGenerativeAI(model="gemini-pro", google_api_key=API_KEY, convert_system_message_to_human=True)
chat = ChatGoogleGenerativeAI(model="gemini-1.5-pro-latest", google_api_key=API_KEY)

如果我们调用聊天模型，输出结果是 `AIMessage`：

In [4]:
from langchain_core.messages import HumanMessage

response = chat.invoke(
    [
        HumanMessage(
            content="Translate this sentence from English to French: I love programming."
        )
    ]
)
response

AIMessage(content="J'adore programmer. \n", response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': [{'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}]}, id='run-7c90f01a-2e0b-44b3-96b0-e5eeba0cdc59-0')

In [5]:
print(response.content)

J'adore programmer. 



模型本身并没有状态的概念。例如，如果你提出一个后续问题：

In [6]:
response = chat.invoke([HumanMessage(content="What did you just say?")])
print(response.content)

I introduced myself as Gemini, a large language model created by Google. I also mentioned that my knowledge is only current up to November 2023 and that I will try my best to follow your instructions while prioritizing safety. Is there anything specific you would like to know or have me do? 



我们可以看到，它没有将之前的对话转折纳入上下文，因此无法回答问题。

为了解决这个问题，我们需要将整个对话历史记录传入模型。让我们看看这样做会发生什么：

In [7]:
from langchain_core.messages import AIMessage

response = chat.invoke(
    [
        HumanMessage(
            content="Translate this sentence from English to French: I love programming."
        ),
        AIMessage(content="J'adore la programmation."),
        HumanMessage(content="What did you just say?"),
    ]
)
print(response.content)

I said, "J'adore la programmation," which is French for "I love programming." 



现在我们可以看到，我们得到了很好的回应！

这就是聊天机器人进行对话互动的基本理念。

## Prompt templates

让我们定义一个提示模板，使格式化更容易一些。我们可以通过在模型中插入管道来创建一个链：

In [8]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability.",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chain = prompt | chat

上面的 `MessagesPlaceholder` 将聊天信息作为 `chat_history` 直接插入链的输入提示中。然后，我们可以像这样调用链：

In [9]:
response = chain.invoke(
    {
        "messages": [
            HumanMessage(
                content="Translate this sentence from English to French: I love programming."
            ),
            AIMessage(content="J'adore la programmation."),
            HumanMessage(content="What did you just say?"),
        ],
    }
)
print(response.content)

I said, "J'adore la programmation," which is French for "I love programming." 



## Message history

为了更方便地管理聊天记录，我们可以使用一个叫做 `MessageHistory` 的类，它负责储存和加载聊天消息。尽管有很多内置功能可以将聊天消息长期保存到各种数据库中，但是在这个快速入门中，我们将使用一个保存在内存中的、用于演示的聊天消息历史记录，也就是 `ChatMessageHistory`。

In [10]:
from langchain.memory import ChatMessageHistory

demo_ephemeral_chat_history = ChatMessageHistory()
demo_ephemeral_chat_history.add_user_message("hi!")
demo_ephemeral_chat_history.add_ai_message("whats up?")
demo_ephemeral_chat_history

InMemoryChatMessageHistory(messages=[HumanMessage(content='hi!'), AIMessage(content='whats up?')])

In [11]:
demo_ephemeral_chat_history.messages

[HumanMessage(content='hi!'), AIMessage(content='whats up?')]

我们可以将存储的消息直接传递到我们的链中作为参数：

In [12]:
demo_ephemeral_chat_history.add_user_message(
    "Translate this sentence from English to French: I love programming."
)

response = chain.invoke({"messages": demo_ephemeral_chat_history.messages})
print(response.content)

J'adore la programmation. 



In [13]:
demo_ephemeral_chat_history.add_ai_message(response)
demo_ephemeral_chat_history.add_user_message("What did you just say?")

response = chain.invoke({"messages": demo_ephemeral_chat_history.messages})
print(response.content)

I said, "J'adore la programmation," which is French for "I love programming."



现在，我们有了一个基本的聊天机器人！

虽然仅凭模型的内部知识，这个链条本身就能成为一个有用的聊天机器人，但在特定领域知识之上引入某种形式的 “retrieval-augmented generation”（简称 RAG）往往很有用，能让我们的聊天机器人更有针对性。接下来我们将介绍这一点。

## Retrievers

我们可以设置并使用检索器为聊天机器人获取特定领域的知识。为了展示这一点，让我们扩展上面创建的简单聊天机器人，使其能够回答有关 LangSmith 的问题。

我们将使用 LangSmith 文档作为源材料，并将其存储在矢量数据库中，以便日后检索。请注意，本示例将略去解析和存储数据源的一些具体细节。

让我们设置我们的检索器。首先，我们将安装一些必需的依赖项：

In [14]:
%pip install --upgrade --quiet langchain-chroma beautifulsoup4

接下来，我们将使用文档加载器从网页中提取数据：

In [15]:
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://web.archive.org/web/20231216051131/https://docs.smith.langchain.com/overview")
data = loader.load()

接下来，我们将其分割成 LLM 的上下文窗口可以处理的较小块，并将其存储到矢量数据库中：

In [16]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

然后，我们将这些块嵌入并存储到一个矢量数据库中：

In [17]:
from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key=API_KEY)
vectorstore = Chroma.from_documents(documents=all_splits, embedding=embeddings)

最后，让我们从初始化的 `vectorstore` 中创建一个检索器：

In [18]:
# k is the number of chunks to retrieve
retriever = vectorstore.as_retriever(k=4)

docs = retriever.invoke("how can langsmith help with testing?")
docs

[Document(page_content='inputs, and see what happens. At some point though, our application is performing\nwell and we want to be more rigorous about testing changes. We can use a dataset\nthat we‚Äôve constructed along the way (see above). Alternatively, we could spend some\ntime constructing a small dataset by hand. For these situations, LangSmith simplifies', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://web.archive.org/web/20231216051131/https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith'}),
 Document(page_content='for testing the overall flow, while single, modular LLM Chain or LLM/Chat Model examples can be beneficial for testing the simplest

我们可以看到，调用上述检索器后，LangSmith 文档的某些部分包含了测试信息，聊天机器人在回答问题时可以将这些信息作为上下文。

### Handling documents

让我们修改之前的提示词，以接受文档作为上下文。我们将使用 `create_stuff_documents_chain` 辅助函数将所有输入文档 “填充” 到提示中，该函数还能方便地处理格式化。我们使用 `ChatPromptTemplate.from_messages()` 方法来格式化要传递给模型的消息输入，包括直接注入聊天历史消息的 `MessagesPlaceholder()` 方法：

In [19]:
from langchain.chains.combine_documents import create_stuff_documents_chain

question_answering_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Answer the user's questions based on the below context:\n\n{context}",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

document_chain = create_stuff_documents_chain(chat, question_answering_prompt)

我们可以用上面获取的原始文件来调用这个 `document_chain` ：

In [20]:
from langchain.memory import ChatMessageHistory

demo_ephemeral_chat_history = ChatMessageHistory()
demo_ephemeral_chat_history.add_user_message("how can langsmith help with testing?")

response = document_chain.invoke(
    {
        "messages": demo_ephemeral_chat_history.messages,
        "context": docs,
    }
)
response

"## LangSmith and Testing: A Powerful Combination\n\nLangSmith offers several features that can significantly enhance your testing process, making it more efficient and effective. Here's how: \n\n**1. Data Generation and Augmentation:**\n\n*   **Generating Test Cases:** LangSmith can help you create diverse and realistic test cases by generating text that follows specific patterns or criteria. This is particularly useful for testing NLP models, chatbots, and other language-based systems. \n*   **Data Augmentation:** Expanding your existing test data set is crucial for robust testing. LangSmith can augment your data by paraphrasing, translating, or generating similar examples, improving your model'sgeneralizability.\n\n**2. Identifying Bias and Errors:**\n\n*   **Bias Detection:** LangSmith can analyze your training and testing data to identify potential biases. This helps ensure your models are fair and unbiased, leading to more reliable results.\n*   **Error Analysis:**  By analyzing 

我们可以看到模型根据输入文件中的信息所合成的答案。

### Creating a retrieval chain

接下来，让我们将检索器集成到链中。我们的检索器应检索与我们从用户那里传递的最后一条消息相关的信息，因此我们提取它并将其用作输入来获取相关文档，然后将其作为上下文添加到当前链中。我们将上下文和先前的消息传递到文档链中以生成最终答案。

我们还使用 `RunnablePassthrough.assign()` 方法在每次调用时传递中间步骤。下面就是它的样子：

In [21]:
demo_ephemeral_chat_history.messages

[HumanMessage(content='how can langsmith help with testing?')]

In [22]:
from typing import Dict

from langchain_core.runnables import RunnablePassthrough


def parse_retriever_input(params: Dict):
    return params["messages"][-1].content


retrieval_chain = RunnablePassthrough.assign(
    context=parse_retriever_input | retriever,
).assign(
    answer=document_chain,
)

response = retrieval_chain.invoke(
    {
        "messages": demo_ephemeral_chat_history.messages,
    }
)

response

{'messages': [HumanMessage(content='how can langsmith help with testing?')],
 'context': [Document(page_content='inputs, and see what happens. At some point though, our application is performing\nwell and we want to be more rigorous about testing changes. We can use a dataset\nthat we‚Äôve constructed along the way (see above). Alternatively, we could spend some\ntime constructing a small dataset by hand. For these situations, LangSmith simplifies', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://web.archive.org/web/20231216051131/https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith'}),
  Document(page_content='for testing the overall flow, while singl

In [23]:
print(response['answer'])

LangSmith offers several features to assist with testing your language model applications:

* **Dataset Construction and Management:** You can either build datasets as you go or create smaller, targeted datasets manually. LangSmith helps manage these datasets for testing purposes.
* **Simplified Testing:** The platform allows for testing both the overall flow of your application and individual components like LLM Chain or Chat Model examples. This flexibility caters to different testing needs.
* **Evaluation and Feedback:** While initial evaluation might be manual, LangSmith provides tools to track performance over time, identify underperforming data points, and associate feedback with specific runs. This data can then be used to refine your datasets and improve future testing. 



In [24]:
demo_ephemeral_chat_history.add_ai_message(response["answer"])
demo_ephemeral_chat_history.add_user_message("tell me more about that!")

response = retrieval_chain.invoke(
    {
        "messages": demo_ephemeral_chat_history.messages,
    },
)
response

{'messages': [HumanMessage(content='how can langsmith help with testing?'),
  AIMessage(content='LangSmith offers several features to assist with testing your language model applications:\n\n* **Dataset Construction and Management:** You can either build datasets as you go or create smaller, targeted datasets manually. LangSmith helps manage these datasets for testing purposes.\n* **Simplified Testing:** The platform allows for testing both the overall flow of your application and individual components like LLM Chain or Chat Model examples. This flexibility caters to different testing needs.\n* **Evaluation and Feedback:** While initial evaluation might be manual, LangSmith provides tools to track performance over time, identify underperforming data points, and associate feedback with specific runs. This data can then be used to refine your datasets and improve future testing. \n'),
  HumanMessage(content='tell me more about that!')],
 'context': [Document(page_content='however, there 

In [25]:
print(response['answer'])

## LangSmith's Testing and Evaluation Features: A Deeper Dive

**Dataset Management:**

* **Flexibility:**  LangSmith allows you to use datasets built organically during development or create specific datasets for focused testing. 
* **Organization:**  The platform helps you manage and organize these datasets, making it easy to access and utilize them for various testing scenarios.

**Testing Options:**

* **Holistic Testing:**  LangSmith enables testing of the entire application flow, ensuring all components work together seamlessly.
* **Modular Testing:**  You can also test individual modules like LLM Chain or Chat Model examples. This is useful for pinpointing issues within specific components and making targeted improvements.

**Evaluation and Feedback Mechanisms:**

* **Performance Tracking:**  LangSmith lets you monitor your application's performance over time, providing insights into trends and potential areas for improvement.
* **Data Point Identification:**  The platform helps

聊天机器人现在可以用对话的方式回答特定领域的问题了。

顺便说一句，如果不想返回所有中间步骤，您可以使用管道直接进入文档链来定义检索链，而不是最后的 `.assign()` 调用：

In [26]:
retrieval_chain_with_only_answer = (
    RunnablePassthrough.assign(
        context=parse_retriever_input | retriever,
    )
    | document_chain
)

response = retrieval_chain_with_only_answer.invoke(
    {
        "messages": demo_ephemeral_chat_history.messages,
    },
)
response

"## LangSmith's Testing and Evaluation Features: A Deeper Dive\n\nHere's a closer look at how LangSmith can assist you in testing and evaluating your language model applications:\n\n**Dataset Management:**\n\n* **Building Datasets:** LangSmith allows you to create datasets as you work, accumulating data points over time. Alternatively, you can build smaller, more focused datasets manually for specific testing scenarios.\n* **Organization and Tracking:**  The platform helps you keep your datasets organized and readily accessible for testing purposes. This includes features for labeling, filtering, and searching your data.\n\n**Testing Flexibility:**\n\n* **End-to-End Testing:** LangSmith enables you to test the complete flow of your application, ensuring all components work together seamlessly. This is crucial for identifying any integration issues or bottlenecks in the overall user experience.\n* **Modular Testing:** The platform also supports testing individual components like LLM Cha

## Query transformation

在上面的示例中，当我们提出一个后续问题 “tell me more about that!” 时，你可能会注意到，检索到的文档并没有直接包含有关测试的信息。

*跟模型有关系，上面在使用 gemini 1.5 pro 时可以继续获得有关测试的信息。*

这是因为我们将 “tell me more about that!” 作为查询逐字传递给了检索器。检索链中的输出仍然可行，因为文档链检索链可以根据聊天记录生成答案，但我们可以检索到内容更丰富、信息量更大的文档。

为了解决这个常见问题，让我们添加一个 `query transformation` 步骤。我们将对旧的检索器进行如下封装：

In [27]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableBranch

# We need a prompt that we can pass into an LLM to generate a transformed search query

query_transform_prompt = ChatPromptTemplate.from_messages(
    [
        MessagesPlaceholder(variable_name="messages"),
        (
            "user",
            "Given the above conversation, generate a search query to look up in order to"
            "get information relevant to the conversation. Only respond with the query, nothing else.",
        ),
    ]
)

query_transforming_retriever_chain = RunnableBranch(
    (
        lambda x: len(x.get("messages", [])) == 1,
        # If only one message, then we just pass that message's content to retriever
        (lambda x: x["messages"][-1].content) | retriever,
    ),
    # If messages, then we pass inputs to LLM chain to transform the query, then pass to retriever
    query_transform_prompt | chat | StrOutputParser() | retriever,
).with_config(run_name="chat_retriever_chain")

现在，让我们用这个新的 `query_transforming_retriever_chain` 来重新创建之前的链。请注意，这个新链接受一个 dict 作为输入，并解析出一个字符串传递给检索器，因此我们无需在顶层进行额外的解析：

In [28]:
document_chain = create_stuff_documents_chain(chat, question_answering_prompt)

conversational_retrieval_chain = RunnablePassthrough.assign(
    context=query_transforming_retriever_chain,
).assign(
    answer=document_chain,
)

demo_ephemeral_chat_history = ChatMessageHistory()

In [29]:
demo_ephemeral_chat_history.add_user_message("how can langsmith help with testing?")

response = conversational_retrieval_chain.invoke(
    {"messages": demo_ephemeral_chat_history.messages},
)
response

{'messages': [HumanMessage(content='how can langsmith help with testing?')],
 'context': [Document(page_content='inputs, and see what happens. At some point though, our application is performing\nwell and we want to be more rigorous about testing changes. We can use a dataset\nthat we‚Äôve constructed along the way (see above). Alternatively, we could spend some\ntime constructing a small dataset by hand. For these situations, LangSmith simplifies', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://web.archive.org/web/20231216051131/https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith'}),
  Document(page_content='for testing the overall flow, while singl

In [30]:
print(response['answer'])

## LangSmith's Role in Testing LLMs

While LangSmith doesn't directly execute tests, it offers valuable tools to guide and enhance your LLM testing process, especially when dealing with large datasets.  Here's how:

**Focusing Your Evaluation Efforts:**

*   **Highlighting Potential Issues:** LangSmith's automatic evaluation metrics help identify potential problems or inconsistencies in your LLM's output. This allows you to prioritize manual review of specific examples rather than sifting through the entire dataset.
*   **Managing Complexity:**  LLM calls often involve complex combinations of user input, templates, and auxiliary functions. LangSmith helps you understand the exact input provided to the LLM, making it easier to pinpoint the source of any issues.

**Improving Test Coverage:**

*   **Testing Different Flows:** LangSmith supports testing both the overall flow of your application and individual LLM Chain or Chat Model components. This ensures comprehensive evaluation of your

In [31]:
demo_ephemeral_chat_history.add_ai_message(response["answer"])
demo_ephemeral_chat_history.add_user_message("tell me more about that!")

response = conversational_retrieval_chain.invoke(
    {"messages": demo_ephemeral_chat_history.messages}
)
response

{'messages': [HumanMessage(content='how can langsmith help with testing?'),
  AIMessage(content="## LangSmith's Role in Testing LLMs\n\nWhile LangSmith doesn't directly execute tests, it offers valuable tools to guide and enhance your LLM testing process, especially when dealing with large datasets.  Here's how:\n\n**Focusing Your Evaluation Efforts:**\n\n*   **Highlighting Potential Issues:** LangSmith's automatic evaluation metrics help identify potential problems or inconsistencies in your LLM's output. This allows you to prioritize manual review of specific examples rather than sifting through the entire dataset.\n*   **Managing Complexity:**  LLM calls often involve complex combinations of user input, templates, and auxiliary functions. LangSmith helps you understand the exact input provided to the LLM, making it easier to pinpoint the source of any issues.\n\n**Improving Test Coverage:**\n\n*   **Testing Different Flows:** LangSmith supports testing both the overall flow of your 

In [32]:
print(response['answer'])

## Deep Dive into LangSmith's Testing Capabilities

**Understanding the Context:**

*   **Input Clarity:** LangSmith provides detailed logs of the exact input sent to your LLM, including user queries, tool outputs, and any intermediate steps. This transparency is crucial for understanding how different inputs influence the LLM's responses and identifying potential issues. 
*   **Output Traceability:**  Similarly, LangSmith tracks the LLM's output at each stage, allowing you to analyze the reasoning process and pinpoint where errors or inconsistencies might occur. 

**Enhancing Evaluation:**

*   **Automatic Metrics:** LangSmith offers a range of automatic evaluation metrics, such as BLEU score and ROUGE score, to assess the quality and coherence of LLM-generated text. These metrics provide a quantitative measure of performance, supplementing your qualitative analysis.
*   **Custom Metrics:** Beyond the built-in metrics, LangSmith allows you to define and track custom metrics tailored t

您可以看到，用户的初始查询会直接传递给检索器，检索器会返回合适的文档。

对后续问题的调用时会将用户的初始问题重新表述为与 LangSmith 测试更相关的问题，从而获得更高质量的文档。