## 跑通 LangChain 项目的示例 Question Answering

示例中，llm 使用的是 OpenAi 接口，国内使用不方便，故修改为可以本地部署的 LLaMA 模型，同时，为了尽量降低使用门槛，LLaMA 模型也是使用的量化版本，且使用 CPU 运行。

In [1]:
from langchain.llms import LlamaCpp
from langchain.vectorstores import Chroma
from langchain.prompts import PromptTemplate
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.question_answering import load_qa_chain
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings

In [2]:
with open("data/knowledges/state_of_the_union/state_of_the_union.txt") as f:
    state_of_the_union = f.read()

text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)

embeddings = SentenceTransformerEmbeddings(model_name='all-MiniLM-L6-v2', model_kwargs={'device': 'cpu'})

docsearch = Chroma.from_texts(texts, embeddings, metadatas=[{"source": str(i)} for i in range(len(texts))]).as_retriever()

Created a chunk of size 215, which is longer than the specified 200
Created a chunk of size 232, which is longer than the specified 200
Created a chunk of size 242, which is longer than the specified 200
Created a chunk of size 219, which is longer than the specified 200
Created a chunk of size 304, which is longer than the specified 200
Created a chunk of size 205, which is longer than the specified 200
Created a chunk of size 332, which is longer than the specified 200
Created a chunk of size 215, which is longer than the specified 200
Created a chunk of size 203, which is longer than the specified 200
Created a chunk of size 281, which is longer than the specified 200
Created a chunk of size 201, which is longer than the specified 200
Created a chunk of size 250, which is longer than the specified 200
Created a chunk of size 325, which is longer than the specified 200
Created a chunk of size 242, which is longer than the specified 200


In [3]:
query = "What did the president say about Justice Breyer"
docs = docsearch.get_relevant_documents(query)

for doc in docs:
    print(doc)

page_content='And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.' metadata={'source': '196'}
page_content='Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.' metadata={'source': '194'}
page_content='We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.' metadata={'source': '201'}
page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.' metadata={'source': '195'}


In [4]:
# 注意修改 model_path 为自己的模型地址
llm = LlamaCpp(model_path="data/models/chinese_llama/quantize/7B_q5_1.bin", temperature=0, max_tokens=1024, n_ctx=4096)
llm.client.verbose = False

llama.cpp: loading model from data/models/chinese_llama/quantize/7B_q5_1.bin
llama_model_load_internal: format     = ggjt v2 (pre #1508)
llama_model_load_internal: n_vocab    = 49954
llama_model_load_internal: n_ctx      = 4096
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 6717.79 MB (+ 1026.00 MB per state)
.
AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
llama_init_from_file: kv self size  = 2048.00 MB


In [5]:
chain = load_qa_chain(llm, chain_type="stuff")

query = "What did the president say about Justice Breyer"
chain.run(input_documents=docs, question=query)

' The president said that Justice Stephen Breyer was an Army veteran, a Constitutional scholar, and a retiring Justice of the United States Supreme Court who dedicated his life to serving this country. He thanked him for his service.'

### 个性化定制

#### The stuff Chain

In [6]:
chain = load_qa_chain(llm, chain_type="stuff")

query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': ' The president said that Justice Stephen Breyer was an Army veteran, a Constitutional scholar, and a retiring Justice of the United States Supreme Court who dedicated his life to serving this country. He thanked him for his service.'}

In [7]:
prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Answer in Italian:"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)
chain = load_qa_chain(llm, chain_type="stuff", prompt=PROMPT)
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': " Il presidente ha detto sul Giudice Stephen Breyer, un veterano dell'esercito, uno storico della Costituzione e il giudece uscito dal Senato della California che continua la sua tradizione di eccellente."}

### The map_reduce Chain

In [8]:
chain = load_qa_chain(llm, chain_type="map_reduce")

query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

Token indices sequence length is longer than the specified maximum sequence length for this model (1566 > 1024). Running this sequence through the model will result in indexing errors


{'output_text': ' The president did not mention Justice Breyer.'}

In [9]:
#  return the intermediate steps for map_reduce chains,

chain = load_qa_chain(llm, chain_type="map_reduce", return_map_steps=True)

chain({"input_documents": docs, "question": query}, return_only_outputs=True)

Token indices sequence length is longer than the specified maximum sequence length for this model (1559 > 1024). Running this sequence through the model will result in indexing errors


{'intermediate_steps': [' The president said that Justice Breyer was a great jurist and would be missed by all. He also said that he had nominated Judge Ketanji Brown Jackson to fill the vacancy left by Justice Breyer’s departure.',
  ' The president said that Justice Breyer had dedicated his life to serve this country.',
  " We're putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.",
  ' The president said that he was pleased with the decision and that it would be up to the court to decide whether or not to take up the case.'],
 'output_text': ' The president did not mention Justice Breyer.'}

In [10]:
question_prompt_template = """Use the following portion of a long document to see if any of the text is relevant to answer the question. 
Return any relevant text translated into Chinese.
{context}
Question: {question}
Relevant text, if any, in Chinese:"""
QUESTION_PROMPT = PromptTemplate(
    template=question_prompt_template, input_variables=["context", "question"]
)

combine_prompt_template = """Given the following extracted parts of a long document and a question, create a final answer Chinese. 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.

QUESTION: {question}
=========
{summaries}
=========
Answer in Chinese:"""
COMBINE_PROMPT = PromptTemplate(
    template=combine_prompt_template, input_variables=["summaries", "question"]
)
chain = load_qa_chain(llm, chain_type="map_reduce", return_map_steps=True, question_prompt=QUESTION_PROMPT, combine_prompt=COMBINE_PROMPT)
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'intermediate_steps': [' 总统说，“我提名美国最高法院上诉法庭法官凯特尼吉·布朗杰克逊。她是我们国家最杰出的法律头脑之一，将继续伯耶先生卓越的遗产。”',
  ' 总统对布莱耶尔法官的评论。',
  ' \n答：我们正在建立专门的移民法官，以便那些逃离迫害和暴力的家庭能够更快地审理他们的案件。',
  ' 总统说，关于布莱耶尔法官的任命。'],
 'output_text': ' \n\n根据提供的信息，无法确定总统对布莱耶尔法官的评价是什么。'}

### The refine Chain

In [11]:
chain = load_qa_chain(llm, chain_type="refine")

query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': ''}

In [12]:
# Intermediate Steps

chain = load_qa_chain(llm, chain_type="refine", return_refine_steps=True)
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'intermediate_steps': ["A) He praised her for being a great judge\nB) He nominated her as a Supreme Court justice\nC) He said she was one of our nation's top legal minds\nD) He said he would miss her when she retired.",
  '',
  '',
  ''],
 'output_text': ''}

In [13]:
# Custom Prompts

refine_prompt_template = (
    "The original question is as follows: {question}\n"
    "We have provided an existing answer: {existing_answer}\n"
    "We have the opportunity to refine the existing answer"
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{context_str}\n"
    "------------\n"
    "Given the new context, refine the original answer to better "
    "answer the question. "
    "If the context isn't useful, return the original answer. Reply in Chinese."
)
refine_prompt = PromptTemplate(
    input_variables=["question", "existing_answer", "context_str"],
    template=refine_prompt_template,
)


initial_qa_template = (
    "Context information is below. \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the question: {question}\nYour answer should be in Chinese.\n"
)
initial_qa_prompt = PromptTemplate(
    input_variables=["context_str", "question"], template=initial_qa_template
)
chain = load_qa_chain(llm, chain_type="refine", return_refine_steps=True,
                     question_prompt=initial_qa_prompt, refine_prompt=refine_prompt)
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'intermediate_steps': ['答：美国总统对大法官伯耶尔说什么？', '', '', ''], 'output_text': ''}

### The map-rerank Chain

In [14]:
chain = load_qa_chain(llm, chain_type="map_rerank", return_intermediate_steps=True)

query = "What did the president say about Justice Breyer"
results = chain({"input_documents": docs, "question": query}, return_only_outputs=True)

ValueError: Could not parse output:  The president said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson as one of our nation's top legal minds, who will continue Justice Breyer's legacy of excellence.

In [None]:
results["output_text"]

In [None]:
results["intermediate_steps"]

In [16]:
# Custom Prompts

from langchain.output_parsers import RegexParser

output_parser = RegexParser(
    regex=r"(.*?)\nScore: (.*)",
    output_keys=["answer", "score"],
)

prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

In addition to giving an answer, also return a score of how fully it answered the user's question. This should be in the following format:

Question: [question here]
Helpful Answer In Italian: [answer here]
Score: [score between 0 and 100]

Begin!

Context:
---------
{context}
---------
Question: {question}
Helpful Answer In Chinese:"""
PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question"],
    output_parser=output_parser,
)

chain = load_qa_chain(llm, chain_type="map_rerank", return_intermediate_steps=True, prompt=PROMPT)
query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'intermediate_steps': [{'answer': ' 总统说，“我非常感谢布莱尔法官为我们做出的卓越贡献。他将继续我的前任布莱尔法官的卓越工作。”',
   'score': '95'},
  {'answer': ' 特朗普总统对布莱耶尔法官的评论是：“他（布莱耶尔）是一个非常聪明的人，但是他在做出决定时总是会偏离他的立场。”',
   'score': '80'},
  {'answer': " The president said that Justice Breyer is a great judge who has made significant contributions to the court's decisions.",
   'score': '95'},
  {'answer': ' The president said that Justice Breyer was an outstanding jurist and would make a great addition to the court.',
   'score': '95'}],
 'output_text': ' 总统说，“我非常感谢布莱尔法官为我们做出的卓越贡献。他将继续我的前任布莱尔法官的卓越工作。”'}