# Chapter 6 Question and Answer

- [I. Introduction](#I. Introduction)
- [II. Environment Configuration](#II. Environment Configuration)
- [III. Loading Vector Database](#III. Loading Vector Database)
- [IV. Constructing Search-based Question and Answer Chain](#IV. Constructing Search-based Question and Answer Chain)
- [V. In-depth exploration of search-based question and answer chain](#V. In-depth exploration of search-based question and answer chain)
- [5.1 Search-based question and answer chain based on template](#5.1-Search-based question and answer chain based on template)
- [5.2 Search-based question and answer chain based on MapReduce](#5.2-Search-based question and answer chain based on -MapReduce-)
- [5.3 Search-based question and answer chain based on Refine](#5.3-Search-based question and answer chain based on -Refine-)
- [VI. Experiment: Status Record](#VI. Experiment: Status Record)

## 1. Introduction

In the previous chapter, we have discussed how to retrieve documents relevant to a given question. The next step is to take these documents, take the original question, pass them together to the language model, and ask it to answer the question. In this course, we will go into detail about this process and several different ways to accomplish this task.

Now that we have done the whole storage and retrieval process and have the relevant sliced ​​documents, we now need to pass them to the language model to get the answer. The general flow of this process is as follows: first the question is asked, then we find the relevant documents, then pass these sliced ​​documents along with the system prompt to the language model and get the answer.

By default, we pass all the document slices into the same context window, that is, the same language model call. However, there are a few different ways to solve this problem, and they all have advantages and disadvantages. Most of the advantages come from the fact that sometimes there may be a lot of documents, but you simply cannot pass them all into the same context window. MapReduce, Refine, and MapRerank are three methods that are used to solve this problem of short context windows. We will briefly introduce them in this course.

## 2. Environment Configuration

The method of configuring the environment is the same as before, so I will not repeat it here.

In [1]:
import os
import openai
import sys
sys.path.append('../..')

from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']


After September 2, 2023, the GPT-3.5 API will be updated, so a time judgment is required here

In [2]:
import datetime
current_date = datetime.datetime.now().date()
if current_date < datetime.date(2023, 9, 2):
    llm_name = "gpt-3.5-turbo-0301"
else:
    llm_name = "gpt-3.5-turbo"
print(llm_name)

gpt-3.5-turbo-0301


## 3. Loading vector database

In [5]:
# Load the vector database that has been persisted before
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
persist_directory = 'docs/chroma/cs229_lectures/'
embedding = OpenAIEmbeddings()
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

In [6]:
# You can see that it contains the 209 documents we split before
print(vectordb._collection.count())

209


We can test the vector search for a question. The following code will search the vector database based on similarity and return k documents to you.

In [34]:
question = "What are major topics for this class?"
docs = vectordb.similarity_search(question,k=3)
len(docs)

3

In [35]:
question = "这节课的主要话题是什么"
docs = vectordb.similarity_search(question,k=3)
len(docs)

3

## 4. Constructing a search-based question-answering chain

Based on LangChain, we can construct a retrieval-based question answering chain that uses GPT3.5 for question answering, which is a method of question answering through a retrieval step. We can create it as a retriever by passing in a language model and a vector database. Then, we can call it with the question as a query and get an answer.

In [9]:
# Use ChatGPT3.5, set the temperature to 0
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name=llm_name, temperature=0)

In [10]:
# Import search-based question-answer chain
from langchain.chains import RetrievalQA

In [11]:
# Declare a search-based question-answer chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

In [12]:
# You can use this method to search for questions and answers
question = "What are major topics for this class?"
result = qa_chain({"query": question})

In [13]:
result["result"]

'The major topic for this class is machine learning. Additionally, there may be some discussion on statistics and algebra as a refresher, and later in the quarter, there may be some discussion on extensions for the material covered in the main lectures.'

In [32]:
# You can use this method to search for questions and answers
question = "这节课的主要话题是什么"
result = qa_chain({"query": question})

In [33]:
print(result["result"])

从这些上下文来看，这节课的主要话题包括课程信息、在线资源和线性代数。


## 5. Deepen your exploration of the search-based question-answer chain

With the above code, we can implement a simple retrieval-based question-answering chain. Next, let’s dive into the details and see what LangChain does in this retrieval-based question-answering chain.

### 5.1 Template-based retrieval question-answering chain

We first define a prompt template. It contains some instructions on how to use the context snippet below, and then has a placeholder for the context variables.

In [14]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)


In [36]:
# Chinese Version
from langchain.prompts import PromptTemplate

# Build prompt
template = """使用以下上下文片段来回答最后的问题。如果你不知道答案，只需说不知道，不要试图编造答案。答案最多使用三个句子。尽量简明扼要地回答。在回答的最后一定要说"感谢您的提问！"
{context}
问题：{question}
有用的回答："""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)


In [37]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [38]:
question = "Is probability a class topic?"

In [39]:
result = qa_chain({"query": question})

In [40]:
result["result"]

'Yes, probability is a class topic and the instructor assumes familiarity with basic probability and statistics.'

In [47]:
# Chinese Version
question = "机器学习是其中一节的话题吗"

In [48]:
result = qa_chain({"query": question})

In [49]:
result["result"]

'是的，机器学习是其中一节的话题。感谢您的提问！'

In [50]:
result["source_documents"][0]

Document(page_content="So in this class, we've tried to convey to you a broad set of principl es and tools that will \nbe useful for doing many, many things. And ev ery time I teach this class, I can actually \nvery confidently say that af ter December, no matter what yo u're going to do after this \nDecember when you've sort of completed this  class, you'll find the things you learn in \nthis class very useful, and these things will be useful pretty much no matter what you end \nup doing later in your life.  \nSo I have more logistics to go over later, but let's say a few more words about machine \nlearning. I feel that machine learning grew out of  early work in AI, early work in artificial \nintelligence. And over the last — I wanna say last 15 or last 20 years or so, it's been viewed as a sort of growing new capability for computers. And in particular, it turns out \nthat there are many programs or there are many applications that you can't program by \nhand.  \nFor example, if you

This approach is very good because it only involves a single call to the language model. However, it also has limitations, namely, if there are too many documents, it may not be possible to fit them all into the context window. We can use another technology to answer questions on documents, namely MapReduce technology.

### 5.2 Retrieval-based question-answering chain based on MapReduce

In MapReduce technology, each individual document is first sent to the language model individually to obtain the original answer. Then, these answers are combined into the final answer through a final call to the language model. Although this involves more calls to the language model, it has the advantage of processing any number of documents.

In [51]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce"
)

In [52]:
question = "Is probability a class topic?"
result = qa_chain_mr({"query": question})

In [53]:
result["result"]

'It is not clear from the given portion of the document whether probability is a class topic or not. The text only mentions that familiarity with basic probability and statistics is assumed as a prerequisite for the class.'

In [55]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce"
)
# Chinese Version
question = "概率论是其中一节的话题吗"
result = qa_chain_mr({"query": question})
result["result"]

'根据给出的文件部分，没有提到概率论。'

When we run the previous question through this chain, we can see two problems with this approach. First, it's much slower. Second, the results are actually worse. There is no clear answer to this question based on this portion of the given document. This is probably because it's answering each document individually. So if the information is spread out between two documents, it's not capturing all of the information in the same context.

In [None]:
#import os
#os.environ["LANGCHAIN_TRACING_V2"] = "true"
#os.environ["LANGCHAIN_ENDPOINT"] = "https://api.langchain.plus"
#os.environ["LANGCHAIN_API_KEY"] = "..." # replace dots with your api key

We can import the above environment variables and then explore the details of the MapReduce document chain. For example, in the above demonstration, we actually involved four separate calls to the language model. After running each document, they are combined together in the final chain, the Stuffed Documents chain, which merges all these answers into the final call.

### 5.3 Refine-based retrieval question-answering chain

We can similarly set the chain type to Refine. This is a new chain type. A Refine document chain is similar to a MapReduce chain, where the LLM is called once for each document, but the improvement is that the final prompt we send to the LLM each time is a sequence that combines the previous response with the new data and requests a refined response. So this is a concept similar to RNN, where we enhance the context to solve the problem of information being distributed across different documents.

In [56]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="refine"
)
question = "Is probability a class topic?"
result = qa_chain_mr({"query": question})
result["result"]

'Based on the additional context provided, probability is assumed to be a prerequisite and not a main topic of the class. The instructor assumes that students are familiar with basic probability and statistics, including random variables, expectation, variance, and basic linear algebra. The class will not be very programming-intensive, but some programming will be done in MATLAB or Octave. The instructor will provide a refresher course on the prerequisites in some of the discussion sections. The class also assumes familiarity with basic linear algebra, including matrices, vectors, matrix multiplication, and matrix inverse. Most undergraduate linear algebra courses, such as Math 51, 103, Math 113, or CS205 at Stanford, are more than enough. The instructor will also review the prerequisites in some of the discussion sections.'

In [57]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="refine"
)
question = "概率论是其中一节的话题吗"
result = qa_chain_mr({"query": question})
result["result"]

'Based on the additional context provided, the instructor mentions that they will cover statistics and algebra in the discussion sections as a refresher, and will also use the discussion sections to go over extensions of the material taught in the main lectures. However, there is no explicit mention of probability theory being covered in the course. Therefore, the original answer still stands.'

You'll notice that this result is better than the result from the MapReduce chain. This is because using a Refined Chain allows you to combine information piece by piece, actually encouraging more information passing than a MapReduce chain.

## 6. Experiment: Status Recording

Let's do an experiment here.

We will create a QA chain, using the default stuff. Let's ask the question, is probability theory a subject of the course? It will answer, probability theory should be a prerequisite. We will ask, why are these prerequisites needed? Then we get an answer. The prerequisites for this course assume basic knowledge of computer science and basic computer skills and principles. This is not related to the previous question about probability.

In [58]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

In [59]:
question = "Is probability a class topic?"
result = qa_chain({"query": question})
result["result"]

'Yes, probability is a topic in this class. The speaker assumes that students have familiarity with basic probability and statistics, and mentions that most undergraduate statistics classes will be more than enough preparation for this class.'

In [60]:
question = "why are those prerequesites needed?"
result = qa_chain({"query": question})
result["result"]

'The prerequisites are needed because in this class, the instructor assumes that all students have a basic knowledge of computer science and knowledge of basic computer skills and principles. This includes understanding of big-O notation and other fundamental concepts. Without this basic knowledge, it may be difficult to understand the material covered in the class.'

In [62]:
question = "概率论是这节课的一个内容吗"
result = qa_chain({"query": question})
result["result"]

'是的，作者在文中提到了这门课程需要学生具备基本的概率论和统计学知识。'

In [63]:
question = "为什么需要具备这些知识"
result = qa_chain({"query": question})
result["result"]

'在这段上下文中，作者提到这些知识是这门课程的先决条件，因为这门课程涉及到机器学习的基本概念和算法，需要学生具备计算机科学和基本计算机技能和原理的基本知识。如果学生没有这些基础知识，可能会很难理解和应用机器学习算法。因此，学生需要具备这些知识才能更好地学习和应用机器学习。'

Basically, the chain we are using does not have any concept of state. It does not remember previous questions or previous answers. To achieve this, we need to introduce memory, which is what we will discuss in the next section.