# QA and Chat over Documents

## Quickstart

In [48]:
# # Set environment variables and get packages
# pip install langchain
# pip install openai
# pip install chromadb
# pip install bs4
# pip install tiktoken
# pip install python-dotenv
# export OPENAI_API_KEY="..."

In [49]:
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Now you can access the environment variable
import os
api_key = os.getenv('OPENAI_API_KEY')

In [50]:
from langchain.document_loaders import WebBaseLoader
from langchain.indexes import VectorstoreIndexCreator
# Document loader
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
# Index that wraps above steps
index = VectorstoreIndexCreator().from_loaders([loader])
# Question-answering
question = "What is Task Decomposition?"
index.query(question)

" Task Decomposition is a technique used to break down complex tasks into smaller and simpler steps. It is used to enhance model performance on complex tasks and to provide an interpretation of the model's thinking process."

## 1. Loading, Splitting, Storage

In [51]:
# Specify a Document loader
from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

In [52]:
# Split the Document into chunks for embedding and vector storage
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)
all_splits = text_splitter.split_documents(data)

In [53]:
# Embed and store the splits in a vector database (Chroma) 
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
vectorstore = Chroma.from_documents(documents=all_splits,embedding=OpenAIEmbeddings())

## 2. Retrieval

Retrieve relevant splits for any question using similarity_search.

In [54]:
question = "What are the approaches to Task Decomposition?"
docs = vectorstore.similarity_search(question)
len(docs)

4

Vectorstores are commonly used for retrieval. But, they are not the only option. 

All retrievers implement some common methods, such as get_relevant_documents().

In [55]:
# pip install scikit-learn
from langchain.retrievers import SVMRetriever
svm_retriever = SVMRetriever.from_documents(all_splits,OpenAIEmbeddings())
docs_svm=svm_retriever.get_relevant_documents(question)
len(docs)



4

Improve on similarity_search:

- MultiQueryRetriever generates variants of the input question to improve retrieval.
- Max marginal relevance selects for relevance and diversity among the retrieved documents.
- Documents can be filtered during retrieval using metadata filters.

In [56]:
# MultiQueryRetriever
import logging
from langchain.chat_models import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever
logging.basicConfig()
logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)
retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectorstore.as_retriever(),
                                                  llm=ChatOpenAI(temperature=0))
unique_docs = retriever_from_llm.get_relevant_documents(query=question)
len(unique_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. How can Task Decomposition be approached?', '2. What are the different methods for Task Decomposition?', '3. What are the various approaches to decomposing tasks?']


3

## 3. QA

Distill the retrieved documents into an answer using an LLM (e.g., gpt-3.5-turbo) with RetrievalQA chain.

In [57]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever())
qa_chain({"query": question})

{'query': 'What are the approaches to Task Decomposition?',
 'result': 'The approaches to task decomposition mentioned in the provided context are:\n\n1. LLM with simple prompting: This approach involves using a language model like LLM (Large Language Model) to prompt the user with simple instructions or questions to decompose a task. For example, asking "Steps for XYZ. 1." or "What are the subgoals for achieving XYZ?".\n\n2. Task-specific instructions: This approach involves providing task-specific instructions to guide the task decomposition process. For example, giving the instruction "Write a story outline." for decomposing the task of writing a novel.\n\n3. Human inputs: This approach involves involving human inputs in the task decomposition process. It could include collaborating with others, seeking advice or guidance from experts, or relying on human expertise to break down a task into subtasks.\n\nAdditionally, the context mentions the Tree of Thoughts approach, which extends 

### Customizing the prompt
The prompt in RetrievalQA chain can be easily customized.

In [58]:
# Build prompt
from langchain.prompts import PromptTemplate
template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 
Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,)

# Run chain
from langchain.chains import RetrievalQA
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(llm,
                                       retriever=vectorstore.as_retriever(),
                                       chain_type_kwargs={"prompt": QA_CHAIN_PROMPT})

result = qa_chain({"query": question})
result["result"]

'The approaches to task decomposition include using LLM with simple prompting, task-specific instructions, and human inputs. Thanks for asking!'

### Returning source documents
The full set of retrieved documents used for answer distillation can be returned using return_source_documents=True.

In [59]:
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever(),
                                       return_source_documents=True)
result = qa_chain({"query": question})
print(len(result['source_documents']))
result['source_documents'][0]

4


Document(page_content='Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.', metadata={'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log"})

### Citations
Answer citations can be returned using RetrievalQAWithSourcesChain.

In [60]:
from langchain.chains import RetrievalQAWithSourcesChain
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=vectorstore.as_retriever())
result = qa_chain({"question": question})
result

{'question': 'What are the approaches to Task Decomposition?',
 'answer': 'The approaches to task decomposition include:\n1) Using LLM with simple prompting like "Steps for XYZ" or "What are the subgoals for achieving XYZ?"\n2) Using task-specific instructions, such as "Write a story outline" for writing a novel.\n3) Incorporating human inputs.\nSource: https://lilianweng.github.io/posts/2023-06-23-agent/',
 'sources': ''}

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^no source^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

### Customizing retrieved docs in the LLM prompt
Retrieved documents can be fed to an LLM for answer distillation in a few different ways.

stuff, refine, map-reduce, and map-rerank chains for passing documents to an LLM prompt are well summarized here.

stuff is commonly used because it simply "stuffs" all retrieved documents into the prompt.

The load_qa_chain is an easy way to pass documents to an LLM using these various approaches (e.g., see chain_type).

In [61]:
from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm, chain_type="stuff")
chain({"input_documents": unique_docs, "question": question},return_only_outputs=True)

{'output_text': 'The approaches to task decomposition mentioned in the given context are:\n\n1. Chain of thought (CoT): This approach involves instructing the language model (LLM) to "think step by step" and decompose complex tasks into smaller and simpler steps. It enhances model performance on complex tasks by utilizing more test-time computation.\n\n2. Tree of Thoughts: This approach extends CoT by exploring multiple reasoning possibilities at each step. It decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS or DFS, and each state is evaluated by a classifier or majority vote.\n\n3. Task-specific instructions: This approach involves providing specific instructions to the LLM based on the task at hand. For example, using the instruction "Write a story outline" for the task of writing a novel.\n\n4. Human inputs: Task decomposition can also be done with human inputs, where humans provide 

We can also pass the chain_type to RetrievalQA.

In [62]:
qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever(),
                                       chain_type="stuff")
result = qa_chain({"query": question})

## 4. Chat

To keep chat history, first specify a Memory buffer to track the conversation inputs / outputs.

In [63]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

The ConversationalRetrievalChain uses chat in the Memory buffer.

In [64]:
from langchain.chains import ConversationalRetrievalChain
retriever=vectorstore.as_retriever()
chat = ConversationalRetrievalChain.from_llm(llm,retriever=retriever,memory=memory)

In [65]:
result = chat({"question": "What are some of the main ideas in self-reflection?"})
result['answer']

"Some of the main ideas in self-reflection include:\n\n1. Iterative improvement: Self-reflection allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes.\n\n2. Learning from mistakes: Self-reflection plays a crucial role in real-world tasks where trial and error are inevitable. By reflecting on failed trajectories and ideal reflections, agents can learn from their mistakes and make better decisions in the future.\n\n3. Contextual guidance: Self-reflection involves adding reflections into the agent's working memory, up to three, to be used as context for querying a Language Model (LLM). This contextual guidance helps the agent in making informed decisions and planning future actions.\n\nOverall, self-reflection enables agents to learn from their experiences, adapt their strategies, and continuously improve their performance."

The Memory buffer has context to resolve "it" ("self-reflection") in the below question.

In [66]:
result = chat({"question": "How does the Reflexion paper handle it?"})
result['answer']

"The Reflexion paper addresses the main ideas in self-reflection by emphasizing its importance in allowing autonomous agents to improve iteratively. It highlights the role of self-reflection in refining past action decisions and correcting previous mistakes. The paper also introduces the concept of using two-shot examples, consisting of a failed trajectory and an ideal reflection, to create self-reflection in the agent's working memory. These reflections serve as context for querying the LLM (Language Model for Learning) and guiding future changes in the agent's plan."