# 一个完整的知识库问答示例

In [2]:
PDF_NAME = 'serverless-core.pdf'

加载PDF文档

In [3]:
from langchain.document_loaders import PyMuPDFLoader
docs = PyMuPDFLoader(PDF_NAME).load()

print (f'There are {len(docs)} document(s) in {PDF_NAME}.')
print (f'There are {len(docs[0].page_content)} characters in the first page of your document.')

There are 113 document(s) in serverless-core.pdf.
There are 112 characters in the first page of your document.


拆分文档并存储文本嵌入的向量数据

In [5]:
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_docs = text_splitter.split_documents(docs)

embeddings = OpenAIEmbeddings()

vectorstore = Chroma.from_documents(split_docs, embeddings, collection_name="serverless_guide")

基于OpenAI创建QA链

In [7]:
from langchain_openai import OpenAI
from langchain.chains.question_answering import load_qa_chain

llm = OpenAI(temperature=0)
chain = load_qa_chain(llm, chain_type="stuff")

提问，进行相似性查询

In [9]:
query = "What is the use case of AWS Serverless?"
similar_docs = vectorstore.similarity_search(query, 3)

In [10]:
similar_docs

[Document(page_content='Serverless\nDeveloper Guide\n• Mobile applications – Suppose you have a custom mobile application that produces events. \nYou can create a Lambda function to process events published by your custom application. For \nexample, you can conﬁgure a Lambda function to process the clicks within your custom mobile \napplication.\nServices you’ll likely use:\n• AWS Lambda for compute processing tasks\n• Amazon API Gateway for connecting and scaling inbound requests\n• AWS Step Functions for managing and orchestrating microservice workﬂows\n• Amazon DynamoDB & S3 for storing and retrieving data and ﬁles\n• Amazon Cognito for authentication and authorization of users\nStreaming\nStreaming data allows you to gather analytical insights and act upon them, but also presents a \nunique set of design and architectural challenges.\nLambda and Amazon Kinesis can process real-time streaming data for application activity tracking,', metadata={'author': 'AWS', 'creationDate': 'D:202

基于相关文档，利用QA链完成回答

In [16]:
chain.invoke({"question": query, "input_documents": similar_docs})

{'question': 'What is the use case of AWS Serverless?',
 'input_documents': [Document(page_content='Serverless\nDeveloper Guide\n• Mobile applications – Suppose you have a custom mobile application that produces events. \nYou can create a Lambda function to process events published by your custom application. For \nexample, you can conﬁgure a Lambda function to process the clicks within your custom mobile \napplication.\nServices you’ll likely use:\n• AWS Lambda for compute processing tasks\n• Amazon API Gateway for connecting and scaling inbound requests\n• AWS Step Functions for managing and orchestrating microservice workﬂows\n• Amazon DynamoDB & S3 for storing and retrieving data and ﬁles\n• Amazon Cognito for authentication and authorization of users\nStreaming\nStreaming data allows you to gather analytical insights and act upon them, but also presents a \nunique set of design and architectural challenges.\nLambda and Amazon Kinesis can process real-time streaming data for applic