# 文档问答（QA over Documents）
为了保证LLM能够执行QA任务
1. 需要想LLM传递能够让他参考的上下文信息
2. 需要向LLM准确地传达我们的问题

In [11]:


# here put the import lib
from typing import Any, List, Mapping, Optional, Dict
from langchain_core.callbacks.manager import CallbackManagerForLLMRun
from langchain_core.language_models.llms import LLM
from zhipuai import ZhipuAI

import os

# 继承自 langchain.llms.base.LLM
class ZhipuAILLM(LLM):
    # 默认选用 glm-3-turbo
    model: str = "glm-3-turbo"
    # 温度系数
    temperature: float = 0.1
    # API_Key
    api_key: str = "acf4f9247da5e232fbe056b14b35fd9b.uWW0WvWqwWUYjhzQ"
    
    def _call(self, prompt : str, stop: Optional[List[str]] = None,
                run_manager: Optional[CallbackManagerForLLMRun] = None,
                **kwargs: Any):
        client = ZhipuAI(
            api_key = self.api_key
        )

        def gen_glm_params(prompt):
            '''
            构造 GLM 模型请求参数 messages

            请求参数：
                prompt: 对应的用户提示词
            '''
            messages = [{"role": "user", "content": prompt}]
            return messages
        
        messages = gen_glm_params(prompt)
        response = client.chat.completions.create(
            model = self.model,
            messages = messages,
            temperature = self.temperature
        )

        if len(response.choices) > 0:
            return response.choices[0].message.content
        return "generate answer error"


    # 首先定义一个返回默认参数的方法
    @property
    def _default_params(self) -> Dict[str, Any]:
        """获取调用API的默认参数。"""
        normal_params = {
            "temperature": self.temperature,
            }
        # print(type(self.model_kwargs))
        return {**normal_params}

    @property
    def _llm_type(self) -> str:
        return "Zhipu"

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying parameters."""
        return {**{"model": self.model}, **self._default_params}

In [12]:
llm = ZhipuAILLM()

llm

ZhipuAILLM()

## 1. 短文本问答

概括来说，使用文档作为上下文进行QA系统的构建过程类似于 llm(context + question) = answer

In [13]:
context = """
Rachel is 30 years old
Bob is 45 years old
Kevin is 65 years old
"""

question = "Who is under 40 years old?"

In [14]:
final_prompt = context + question
print(final_prompt)


Rachel is 30 years old
Bob is 45 years old
Kevin is 65 years old
Who is under 40 years old?


In [15]:
output = llm(final_prompt)
print(output.strip())

Rachel is under 40 years old.


## 2. 长文本问答
对于长文本，可以对文本分块，对分块的内容进行embedding，将embeding存入向量数据库，然后进行查询

目标是选择相关的文本块，但是如何选择呢，选择哪些文本块呢？目前最流行的方法是基于比较向量嵌入来选择相似的文本

实现主要步骤
实现文档问答系统，可以分为下面5步，每一步langchain都有相关工具。
1. 文档加载（Document Loader）：文档加载器把文档加载为langchain能够读取的形式。有不同类型的加载器来加载不同数据源的数据，如CSVLoader、PyPDFLoader、Docx2txtLoader、TextLoader等。
2. 文本分割：文本分割器把文档切分为指定大小的分割，分割后的文本称为"文档块"
3. 向量存储：将上一步中分割好的文档块 以 嵌入 的形式存储到向量数据库中
4. 检索Retrival应用程序从存储中检索分割后的文档（例如通过比较余弦相似度，找到与输入问题类似的嵌入片）
5. 输出：把问题和相似的嵌入片（文本形式）都放到提示传递给LLM，让LLM生成结果

In [17]:
!pip install faiss-cpu 
# 需要注意，faiss存在GPU和CPU版本基于你的 runtime 安装对应的版本



In [18]:
# Using Embeddings
# 分割分文，对分块的内容进行 embedding，将 embedding 存储到数据库中，然后进行查询
# 目标是选择相关的文本块，但是我们应该选择哪些文本块呢？目前最流行的方法是基于比较向量嵌入来选择相似的文本

from langchain.vectorstores import FAISS  # 向量数据库
from langchain.chains import RetrievalQA  # QA检索链
from langchain.document_loaders import TextLoader  # 文档加载器

# 按不同的字符递归地分割(按照这个优先级["\n\n", "\n", " ", ""])，这样就能尽量把所有和语义相关的内容尽可能长时间地保留在同一位置.在项目中也推荐使用RecursiveCharacterTextSplitter来进行分割。
from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.embeddings import SentenceTransformerEmbeddings  # 嵌入模型

In [24]:
# 加载文本
loader = TextLoader('./data/wonderland.txt', 'utf-8')
doc = loader.load()

print (f"You have {len(doc)} document")
print (f"You have {len(doc[0].page_content)} characters in that document")

You have 1 document
You have 13637 characters in that document


In [25]:
# 文档切分
text_spliter = RecursiveCharacterTextSplitter(
    chunk_size=3000,
    chunk_overlap=400
)
docs = text_spliter.split_documents(doc)

In [26]:
embeddings = SentenceTransformerEmbeddings(model_name="D:/code/models/M3E/xrunda/m3e-base")

vectorstore = FAISS.from_documents(docs, embeddings)

In [27]:
# QA检索链
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type='stuff',
    retriever=vectorstore.as_retriever(), 
    return_source_documents=False
)

In [28]:
import langchain
langchain.debug = True

query = "What does the author describe the Alice following with?"
qa.run({"query": query})
# 这个过程中，检索器会去获取类似的文件部分，并结合你的问题让 LLM 进行推理，最后得到答案
# 这一步还有很多可以细究的步骤，比如如何选择最佳的分割大小，如何选择最佳的 embedding 引擎，如何选择最佳的检索器等等
# 同时也可以选择云端向量存储

  warn_deprecated(


[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "What does the author describe the Alice following with?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "What does the author describe the Alice following with?",
  "context": "Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, “and what is the use of a book,” thought Alice “without pictures or conversations?” So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the tr

[36;1m[1;3m[llm/end][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain > llm:ZhipuAILLM] [4.77s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": "The author describes Alice following the White Rabbit with curiosity and a sense of adventure. Alice is intrigued by the Rabbit's peculiar behavior and possessions, such as the waistcoat-pocket and watch, which sparks her interest and leads her to chase after the Rabbit. Her actions are driven by her curiosity and the desire to understand the strange occurrences around her. Alice's willingness to explore and follow the Rabbit reflects her sense of adventure and openness to the unexpected events in Wonderland.",
        "generation_info": null,
        "type": "Generation"
      }
    ]
  ],
  "llm_output": null,
  "run": null
}
[36;1m[1;3m[chain/end][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] [4.78s] Exiting Chain run with output:
[0m{
  "text": "The 

"The author describes Alice following the White Rabbit with curiosity and a sense of adventure. Alice is intrigued by the Rabbit's peculiar behavior and possessions, such as the waistcoat-pocket and watch, which sparks her interest and leads her to chase after the Rabbit. Her actions are driven by her curiosity and the desire to understand the strange occurrences around her. Alice's willingness to explore and follow the Rabbit reflects her sense of adventure and openness to the unexpected events in Wonderland."