分析长文档

In [2]:
with open("./state_of_the_union.txt", encoding="utf-8") as f:
    state_of_the_union = f.read()

In [3]:
from langchain import OpenAI
from langchain.chains.summarize import load_summarize_chain

llm = OpenAI(temperature=0)
summary_chain = load_summarize_chain(llm, chain_type="map_reduce")


In [4]:
from langchain.chains import AnalyzeDocumentChain

summarize_document_chain = AnalyzeDocumentChain(combine_docs_chain=summary_chain)


In [5]:
summarize_document_chain.run(state_of_the_union)


" In this speech, the speaker addresses the American people and the world, discussing the recent aggression of Russia's Vladimir Putin in Ukraine and the US response. The speaker outlines the economic sanctions and other measures taken to hold Putin accountable, and announces the US Department of Justice's task force to go after the crimes of Russian oligarchs. The US is providing military, economic, and humanitarian assistance to Ukraine, and has mobilized forces to protect NATO countries. President Biden has passed the American Rescue Plan and the Bipartisan Infrastructure Law to help struggling Americans and create jobs. He is also proposing a plan to fight inflation and is announcing a crackdown on companies overcharging American businesses and consumers. He is proposing four steps to move forward safely from the COVID-19 pandemic, honoring the sacrifice of Officer Mora and Officer Rivera, and proposing a Unity Agenda for the Nation. He concludes by expressing his optimism for the 

In [7]:
from langchain.chains.question_answering import load_qa_chain
qa_chain = load_qa_chain(llm, chain_type="map_reduce")
qa_document_chain = AnalyzeDocumentChain(combine_docs_chain=qa_chain)


In [8]:
qa_document_chain.run(input_document=state_of_the_union, question="what did the president say about justice breyer?")


" The president praised Justice Breyer for his service and dedication to the country, and mentioned that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to continue Justice Breyer's legacy of excellence."

在文档中聊天，带有聊天记录
本教程演示了如何使用ConversationalRetrievalChain设置聊天过程中带有聊天历史的链。与RetrievalQAChain唯一的区别是，这个链允许传入聊天历史，以便进行后续提问。


使用memory对象存储聊天历史

In [12]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.document_loaders import TextLoader
loader = TextLoader("./state_of_the_union.txt", encoding="utf-8")
documents = loader.load()


In [13]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)

In [14]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

In [15]:
qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), memory=memory)


In [16]:
query = "What did the president say about Ketanji Brown Jackson"
result = qa({"question": query})

In [17]:
result["answer"]


" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."

In [18]:
query = "Did he mention who she suceeded"
result = qa({"question": query})


In [19]:
result['answer']


' Justice Stephen Breyer'

用容器存储聊天记录，然后传入

In [20]:
qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever())


In [21]:
chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = qa({"question": query, "chat_history": chat_history})


In [22]:
result["answer"]


" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."

得保存问题和历史

In [23]:
chat_history = [(query, result["answer"])]
query = "Did he mention who she suceeded"
result = qa({"question": query, "chat_history": chat_history})

In [24]:
print(chat_history)
print(result)

[('What did the president say about Ketanji Brown Jackson', " The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.")]
{'question': 'Did he mention who she suceeded', 'chat_history': [('What did the president say about Ketanji Brown Jackson', " The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Repu

In [29]:
from langchain.indexes import GraphIndexCreator
from langchain.llms import OpenAI
from langchain.document_loaders import TextLoader
index_creator = GraphIndexCreator(llm=OpenAI(temperature=0))
with open("./state_of_the_union.txt", encoding='utf-8') as f:
    all_text = f.read()


In [32]:
text = "\n".join(all_text.split("  ")[55:58])
text

'\n\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \n\nLet’s increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls America’s best-kept secret: community colleges. \n\nAnd let’s pass the PRO Act when a majority of workers want to form a union—they shouldn’t be stopped.\n\n\nWhen we invest in our workers, when we build the economy from the bottom up and the middle out together, we can do something we haven’t done in a long time: build a better America. \n\nFor more than two years, COVID-19 has impacted every decision in our lives and the life of the nation. \n\nAnd I know you’re tired, frustrated, and exhausted. \n\nBut I also know this. \n\nBecause of the progress we’ve made, because of your resilience and the tools we have, tonight I can say\n\nwe are moving forward safely, back to more normal routines.'

In [34]:
graph = index_creator.from_text(text)
graph.get_triples()


[('minimum wage', '$15 an hour', 'to be raised to'),
 ('Child Tax Credit',
  'so no one has to raise a family in poverty',
  'to be extended'),
 ('Pell Grants', 'to increase historic support of HBCUs', 'to be increased'),
 ('Jill',
  "America's best-kept secret: community colleges",
  'to be invested in'),
 ('PRO Act',
  'when a majority of workers want to form a union',
  'to be passed'),
 ('COVID-19',
  'every decision in our lives and the life of the nation',
  'has impacted')]

In [36]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
loader = TextLoader("./state_of_the_union.txt", encoding="utf-8")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)


In [37]:
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=docsearch.as_retriever())


In [38]:
query = "What did the president say about Ketanji Brown Jackson"
qa.run(query)


" The president said that she was one of the nation's top legal minds, a consensus builder, and had received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. He also said that she was a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers."