# Build a retrieval question-answer chain

In the C3 Database Building section, we have introduced how to build a vector knowledge base based on your own local knowledge documents. In the following content, we will use the built vector database to recall query questions, and combine the recall results with the query to build a prompt, which will be input into the large model for question and answer.

## 1. Load the vector database

First, we load the vector database that we built in the previous chapter. Note that you need to use the same Emedding as when you built it.

In [1]:
import sys
sys.path.append("../C3 搭建知识库") # 将父目录放入系统路径中

# Use Zhipu Embedding API. Note that you need to download the encapsulation code implemented in the previous chapter to your local computer.
from zhipuai_embedding import ZhipuAIEmbeddings

from langchain.vectorstores.chroma import Chroma

Load your API_KEY from the environment variable

In [2]:
from dotenv import load_dotenv, find_dotenv
import os

_ = load_dotenv(find_dotenv())    # read local .env file
zhipuai_api_key = os.environ['ZHIPUAI_API_KEY']

Load the vector database, which contains the embeddings of multiple documents under ../../data_base/knowledge_db

In [3]:
# Define Embeddings
embedding = ZhipuAIEmbeddings()

# Vector database persistence path
persist_directory = '../../data_base/vector_db/chroma'

# Load the database
vectordb = Chroma(
    persist_directory=persist_directory,  # 允许我们将persist_directory目录保存到磁盘上
    embedding_function=embedding
)

In [4]:
print(f"向量库中存储的数量：{vectordb._collection.count()}")

向量库中存储的数量：925


We can test the loaded vector database and use a question query to perform vector retrieval. The following code will search the vector database based on similarity and return the top k most similar documents.

> ⚠️Before using similarity search, make sure you have installed the tiktoken package, an open source fast word segmentation tool from OpenAI: `pip install tiktoken`

In [5]:
question = "什么是prompt engineering?"
docs = vectordb.similarity_search(question,k=3)
print(f"检索到的内容数：{len(docs)}")

检索到的内容数：3


Print the retrieved content

In [6]:
for i, doc in enumerate(docs):
    print(f"检索到的第{i}个内容: \n {doc.page_content}", end="\n-----------------------------------------------------\n")

检索到的第0个内容: 
 相反，我们应通过 Prompt 指引语言模型进行深入思考。可以要求其先列出对问题的各种看法，说明推理依据，然后再得出最终结论。在 Prompt 中添加逐步推理的要求，能让语言模型投入更多时间逻辑思维，输出结果也将更可靠准确。

综上所述，给予语言模型充足的推理时间，是 Prompt Engineering 中一个非常重要的设计原则。这将大大提高语言模型处理复杂问题的效果，也是构建高质量 Prompt 的关键之处。开发者应注意给模型留出思考空间，以发挥语言模型的最大潜力。

2.1 指定完成任务所需的步骤

接下来我们将通过给定一个复杂任务，给出完成该任务的一系列步骤，来展示这一策略的效果。

首先我们描述了杰克和吉尔的故事，并给出提示词执行以下操作：首先，用一句话概括三个反引号限定的文本。第二，将摘要翻译成英语。第三，在英语摘要中列出每个名称。第四，输出包含以下键的 JSON 对象：英语摘要和人名个数。要求输出以换行符分隔。
-----------------------------------------------------
检索到的第1个内容: 
 该描述面向家具零售商，因此应具有技术性质，并侧重于产品的材料构造。

在描述末尾，包括技术规格中每个7个字符的产品ID。

使用最多50个单词。

技术规格： {fact_sheet_chair}
"""
response = get_completion(prompt)
print(response)
```

通过上面的示例，我们可以看到 Prompt 迭代优化的一般过程。与训练机器学习模型类似，设计高效 Prompt 也需要多个版本的试错调整。

具体来说，第一版 Prompt 应该满足明确和给模型思考时间两个原则。在此基础上，一般的迭代流程是：首先尝试一个初版，分析结果，然后继续改进 Prompt，逐步逼近最优。许多成功的Prompt 都是通过这种多轮调整得出的。

后面我会展示一个更复杂的 Prompt 案例，让大家更深入地了解语言模型的强大能力。但在此之前，我想强调 Prompt 设计是一个循序渐进的过程。开发者需要做好多次尝试和错误的心理准备，通过不断调整和优化，才能找到最符合具体场景需求的 Prompt  形式。这需要智慧和毅力，但结果往往是值得的。
----------

## 2. Create an LLM

Here, we call OpenAI's API to create an LLM. Of course, you can also use other LLM APIs to create

In [7]:
import os 
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]

In [8]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name = "gpt-3.5-turbo", temperature = 0)

llm.invoke("请你自我介绍一下自己！")

AIMessage(content='你好，我是一个智能助手，专门为您提供各种服务和帮助。我可以回答您的问题，提供信息和建议，帮助您解决问题。如果您有任何需要，请随时告诉我，我会尽力帮助您。感谢您选择我作为您的助手！', response_metadata={'token_usage': {'completion_tokens': 95, 'prompt_tokens': 20, 'total_tokens': 115}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-bf5e56bd-a00f-4370-a1f3-6ce9712eaa65-0')

## 3. Build a retrieval question-answer chain

In [9]:
from langchain.prompts import PromptTemplate

template = """使用以下上下文来回答最后的问题。如果你不知道答案，就说你不知道，不要试图编造答
案。最多使用三句话。尽量使答案简明扼要。总是在回答的最后说“谢谢你的提问！”。
{context}
问题: {question}
"""

QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context","question"],
                                 template=template)


Create another template-based search chain:

In [10]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(llm,
                                       retriever=vectordb.as_retriever(),
                                       return_source_documents=True,
                                       chain_type_kwargs={"prompt":QA_CHAIN_PROMPT})


The method RetrievalQA.from_chain_type() for creating a retrieval QA chain has the following parameters:
- llm: specify the LLM to be used
- Specify chain type: RetrievalQA.from_chain_type(chain_type="map_reduce"), or use the load_qa_chain() method to specify the chain type.
- Custom prompt: By specifying the chain_type_kwargs parameter in the RetrievalQA.from_chain_type() method, the parameter: chain_type_kwargs = {"prompt": PROMPT}
- Return source documents: By specifying the return_source_documents=True parameter in the RetrievalQA.from_chain_type() method; or using the RetrievalQAWithSourceChain() method to return a reference to the source document (coordinates or primary key, index)

## 4. Test the effect of retrieval question and answer chain

In [11]:
question_1 = "什么是南瓜书？"
question_2 = "Prompt Engineering for Developer是谁写的？"

### 4.1 Prompt effect based on recall results and query

In [12]:
result = qa_chain({"query": question_1})
print("大模型+知识库后回答 question_1 的结果：")
print(result["result"])

  warn_deprecated(


大模型+知识库后回答 question_1 的结果：
南瓜书是对《机器学习》（西瓜书）中难以理解的公式进行解析和补充推导的书籍。
谢谢你的提问！


In [13]:
result = qa_chain({"query": question_2})
print("大模型+知识库后回答 question_2 的结果：")
print(result["result"])

大模型+知识库后回答 question_2 的结果：
Prompt Engineering for Developer是由吴恩达老师与OpenAI技术团队成员Isa Fulford老师合作编写的。谢谢你的提问！


### 4.2 The effect of the large model answering itself

In [14]:
prompt_template = """请回答下列问题:
                            {}""".format(question_1)

### Question answering based on large models
llm.predict(prompt_template)

  warn_deprecated(


'南瓜书是指一本内容浅显易懂，适合初学者阅读的书籍。这个词源自于日本，最初是指一种用南瓜做成的灯笼，形状简单易制作。后来，这个词被引申为指那些内容简单易懂，适合初学者入门的书籍。'

In [15]:
prompt_template = """请回答下列问题:
                            {}""".format(question_2)

### Question answering based on large models
llm.predict(prompt_template)

'这本书是由作者 Chris Johnson 编写的。'

> ⭐ Through the above two questions, we found that LLM did not answer some recent knowledge and non-common sense professional questions very well. With our local knowledge, we can help LLM give better answers. In addition, it also helps to alleviate the "illusion" problem of large models.

## 5. Add the memory function of historical dialogue

Now that we have achieved this by uploading local knowledge documents and then saving them to the vector knowledge base, by combining the query question with the recall results of the vector knowledge base and inputting it into the LLM, we get a much better result than asking the LLM to answer directly. When interacting with language models, you may have noticed a key problem - **they don't remember your previous communication content**. This poses a big challenge when we build some applications (such as chatbots), making the conversation seem to lack real continuity. How to solve this problem?

## 1. Memory

In this section we will introduce the storage module in LangChain, that is, how to embed previous conversations into the language model, giving it the ability to have continuous conversations. We will use `ConversationBufferMemory`, which saves a list of chat message histories, which will be passed to the chatbot along with the questions when answering them, thus adding them to the context.

In [16]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(
    memory_key="chat_history",  # 与 prompt 的输入变量保持一致。
    return_messages=True  # 将以消息列表的形式返回聊天记录，而不是单个字符串
)

For more information about the use of Memory, including retaining a specified number of conversation rounds, saving a specified number of tokens, saving summaries of historical conversations, etc., please refer to the relevant documentation of the Memory section of langchain.

## 2. ConversationalRetrievalChain

ConversationalRetrievalChain adds the ability to process conversation history based on the retrieval QA chain.

Its workflow is:

1. Merge the previous conversation with the new question to generate a complete query statement.

2. Search the vector database for relevant documents for the query.

3. After obtaining the results, store all answers in the conversation memory area.

4. Users can view the complete conversation process in the UI.

![](../../figures/Modular_components.png)

This chaining method puts new questions in the context of previous conversations for retrieval, and can handle queries that rely on historical information. And retains all information in the conversation memory for easy tracking.

Next, let's test the effect of this conversation retrieval chain:

Use the vector database and LLM from the previous section! Start by asking a history-free question, "Will we learn Python in this class?" and see the answer.

In [17]:
from langchain.chains import ConversationalRetrievalChain

retriever=vectordb.as_retriever()

qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory
)
question = "我可以学习到关于提示工程的知识吗？"
result = qa({"question": question})
print(result['answer'])

是的，您可以学习关于提示工程的知识。本模块基于吴恩达老师的《Prompt Engineering for Developer》课程编写，旨在与开发者分享使用提示词开发大语言模型应用的最佳实践和技巧。通过学习这些内容，您将了解如何使用提示词来构建令人惊叹的大语言模型应用。希望这对您有所帮助！


Then based on the answer, proceed to the next question: "Why does this course need to teach this knowledge?"

In [19]:
question = "为什么这门课需要教这方面的知识？"
result = qa({"question": question})
print(result['answer'])

这门课程的目的是教授开发者如何使用提示工程来开发大型语言模型（LLM）应用程序。通过学习如何编写清晰具体的指令以及给模型思考时间等核心原则，开发者可以更好地利用大型语言模型来解决各种任务，提高工作效率和创造力。


As you can see, LLM accurately judges this aspect of knowledge, referring to the content as reinforcement learning knowledge, that is, we have successfully passed on historical information to it. This ability to continuously learn and associate previous and subsequent questions can greatly enhance the continuity and intelligence level of the question-answering system.