# QA using a Retriever
Here we will make QA step by step  
https://python.langchain.com/docs/use_cases/question_answering/

**Load**: Specify a DocumentLoader to load in your unstructured data as Documents. A Document is a piece of text (the page_content) and associated metadata.  
There are many [Loaders](https://integrations.langchain.com/)  
`documents = TextLoader(....).load()` 

**Split**: Split the Document into chunks for embedding and vector storage.  
`texts = RecursiveCharacterTextSplitter(...).split_documents(documents)`

**Store**: To be able to look up our document splits, we first need to store them where we can later look them up.  
   The most common way to do this is to embed the contents of each document then store the embedding and document in a vector store,  
   with the embedding being used to index the document.  
   `vectorstore = FAISS.from_documents(texts, embeddings)`
   

### Step 1: retrieve data (VectorStore)
https://python.langchain.com/docs/use_cases/question_answering/how_to/vector_db_qa

In this example we will know what we will get from **VectorStore**

In [1]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
import os
from mylib.MyModelUtils import MyModelUtils
from langchain.vectorstores import Chroma, FAISS, DeepLake
# 1. Load data
def get_documents():
    from langchain.document_loaders import TextLoader
    loader = TextLoader(os.path.abspath("./mydata/state_of_the_union.txt"), encoding="utf-8")
    documents = loader.load()
    return documents


# 2. Split data
def get_text_splits(documents):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=950, chunk_overlap=50)
    # text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    all_text_splits = text_splitter.split_documents(documents)
    return all_text_splits

# embedding data
def get_sentence_transformer():
    model_name = os.path.abspath("./models/sentence-transformers/all-mpnet-base-v2")
    model_kwargs = {"device": MyModelUtils.device()}
    embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)
    return embeddings

# Store data
# type = faiss or chroma
def get_vectorstore(all_text_splits, sentence_transformer, type="deeplake"):
    # embedding and storing text splits in the vectorstore
    if type=="chroma":
        vectorstore = Chroma.from_documents(all_text_splits, sentence_transformer)
    elif type=="deeplake":
        vectorstore = DeepLake.from_documents(all_text_splits,sentence_transformer,  dataset_path="./.my_deeplake/", overwrite=True)
    elif type=="faiss":
        vectorstore = FAISS.from_documents(all_text_splits, sentence_transformer)
    else:
        raise Exception("No vectorstore type(chroma, deeplake, faiss)")
    return vectorstore


## You can found what we get from vectorstore and will be sent to LLM
def test_vectorstore(vectorstore):
    query = "What did the president say about Ketanji Brown Jackson?"
    docs = vectorstore.similarity_search(query)
    # __OR__
    # embedding_vector = OpenAIEmbeddings().embed_query(query)
    # docs = db.similarity_search_by_vector(embedding_vector)
    # __OR__
    # docs_and_scores = db.similarity_search_with_score(query)
    print(f"What will we get from vectorestore with: \"{query}\"")
    print("===============================")
    print(docs[0].page_content)
    print("===============================")
        
documents = get_documents() # 1. load document
all_text_splits=get_text_splits(documents) # 2. Split document
sentence_transformer = get_sentence_transformer() # transfermer for embedding

# This what we want
vectorstore = get_vectorstore(all_text_splits, sentence_transformer) # 3. Split document

test_vectorstore(vectorstore) # test it



creating embeddings: 100% 5/5 [00:02<00:00,  1.97it/s]
100% 44/44 [00:00<00:00, 574.97it/s]


Dataset(path='./.my_deeplake/', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
   text       text      (44, 1)     str     None   
 metadata     json      (44, 1)     str     None   
 embedding  embedding  (44, 768)  float32   None   
    id        text      (44, 1)     str     None   
What will we get from vectorestore with: "What did the president say about Ketanji Brown Jackson?"
A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. 

And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. 

We can do both. At our border, we’ve installed new technology like cu



In [2]:
## We can also save and load FIASS
# vectorstore.save_local("faiss_index")
# vectorstore = FAISS.load_local("faiss_index", sentence_transformer)
# test_vectorstore(vectorstore)

## We can also merge 2 vectorstores
# vectorstore1.merge_from(vectorstore2)

## Step2. Chain the doc data into QA
load_qa_chain is the core function of all QA functions，the RetrievalQA and ConversationalRetrievalChain都是以他為基礎。  
以下是load_qa_chain中四種型態的一種StuffDocumentsChain (stuff, map_reduce, refine, map_rerank)。
他們唯一的不同點就是對於資料來源的合併方式，稍後說明。

比如 StuffDocumentsChain 就是把所有找到的資料串成一個字串`("\n\n").join(ALL_DATA)`

模型返回具体答案。

**要注意LLM的 context_length 必須大於document分割長度 + new_token長度**


### We make a llm first

In [3]:
from langchain.llms import CTransformers
import os

model_id=os.path.abspath('./models/Llama-2-7b-Chat-GGUF')

# context_length must be > chunk_size=1000 of text_splitter
config = {'max_new_tokens': 256, 'repetition_penalty': 1.1,'context_length':1000+256}
# https://api.python.langchain.com/en/latest/llms/langchain.llms.ctransformers.CTransformers.html
cTransformers_llm = CTransformers(model=model_id, model_file="llama-2-7b-chat.Q4_K_M.gguf", config=config)


### load_qa_chain 

https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/retrieval_qa/prompt.py

Returns BaseCombineDocumentsChain, such as StuffDocumentsChain

In [4]:
from langchain.chains.prompt_selector import ConditionalPromptSelector, is_chat_model
from langchain.chains.question_answering import load_qa_chain  # :StuffDocumentsChain :BaseCombineDocumentsChain
from timeit import timeit
query = "What did the president say about Ketanji Brown Jackson?"
docs = vectorstore.similarity_search(query) #這是我們之前的範例，可以知道我們從vertorstore找到什麼資料
print("========Document Content=======")
print(docs)
print("========Generate from Document=======")
doc_chain = load_qa_chain(llm=cTransformers_llm, chain_type="stuff") # a StuffDocumentsChain，把資料整合成字串，塞入對話中
print(timeit(lambda: print(doc_chain.run(input_documents=docs, question=query)),number=1))


[Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n\nWe can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.  \n\nWe’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.  \n\nWe’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n\nWe’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '/app/project/mydata/state_o

### load_qa_chain() 

Now, you may find it quite fascinating, but upon closer examination, you might also have numerous questions.  

Here's the question: How does it know how to extract the information we need for the Language Model (LLM), and how does it provide this information to the LL?

Why are the keywords `quesion` and `input_document` used?

This is where the main function of `load_qa_chain` comes into play.  
It will use the keywords you input to retrieve relevant documents from the retriever and pass the data to the next step.

Taking `load_qa_chain(chain_type="stuff")` as an examp is an [explanation of the StuffDocume](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/combine_documents/stuff.py)  


\(好了，現在各位應該覺得很神奇，但是仔細一想又滿頭問號。  
問題來了，為什麼他知道要如何提取我們需要的資料給LLM，又是如何給LLM的？  
關鍵字為什麼是`question` 跟 `input_documet`?

這就是`load_qa_chain`的主要功能了。
他會根據你輸入的關鍵字從retriever找出有用的文件，並將資料傳給下一步。

以`load_qa_chain(chain_type="stuff")`為例，以下是StuffDocumentsChain的[說明](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/combine_documents/stuff.py)\)  




```python
"""
    This chain takes a list of documents and first combines them into a single string.
    It does this by formatting each document into a string with the `document_prompt`
    and then joining them together with `document_separator`. It then adds that new
    string to the inputs with the variable name set by `document_variable_name`.
    Those inputs are then passed to the `llm_chain`.
"""
```
You can understand that when you use `load_qa_chain(....).run(input_documents=docs, question=query)`,  
the StuffDocumentsChain takes all the text content (page_content) from each document and joins them into a single string separated by "\n\n."  
This concatenated text is then placed into a prompt and provided to the Language Model (LLM), following the format below:

\(可得知 load_qa_chain(....).run(input_documents=docs, question=query)  
得到的StuffDocumentsChain就是把所有doc的文字內容(page_content)用"\n\n"join成一個字串，  
然後再放進一個prompt丟給LLM，格式如下:\)

https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/question_answering/stuff_prompt.py  
```python
# stuff
prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Helpful Answer:"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]

As a result,LLMreceive this input:
```python
"""Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

DATA1\n\nDATA2\n\nDATA3\n\nDATA4..........\n\nDATAX

Question: What did the president say about Ketanji Brown Jackson, and how old is he?
Helpful Answer:"""
```

That's why we use the keyword `question`

-------------------------------
And another keyword `input_document` comes from:  
https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/combine_documents/base.py

```python
input_key: str = "input_documents"  #: :meta private:
output_key: str = "output_text"  #: :meta private:
```

The whole QA workingflow is as following:    


| Vector Store | load_qa_chain | Llama2 |
|--------------|---------------|--------|
| DATA -> Loader -> Split -> Transformer + Embedding -> VectorStore -> Array(Document(),Document(), ....) | -> StuffDocumentsChain -> String -> Prompt | -> LLM |    


-------------------------------

[chain_type](https://liaokong.gitbook.io/llm-kai-fa-jiao-cheng/) 說明:  
- stuff：(default)這種方式最簡單粗暴，會把所有的文檔一次全部傳給llm。
- map_reduce：這種方式會先對每個文檔進行處理（映射步驟），然後再將所有處理過的文檔組合起來得到最終的輸出（減少步驟）。
  
- refine: 这种方式会先总结第一个 document，然后在将第一个 document 总结出的内容和第二个 document 一起发给 llm 模型在进行总结，以此类推。这种方式的好处就是在总结后一个 document 的时候，会带着前一个的 document 进行总结，给需要总结的 document 添加了上下文，增加了总结内容的连贯性。

- map_rerank: 这种一般不会用在总结的 chain 上，而是会用在问答的 chain 上，他其实是一种搜索答案的匹配方式。首先你要给出一个问题，他会根据问题给每个 document 计算一个这个 document 能回答这个问题的概率分数，然后找到分数最高的那个 document ，在通过把这个 document 转化为问题的 prompt 的一部分（问 llm 模型，最后 llm 模型返回具体答案(String)

### RetrievalQA

https://python.langchain.com/docs/use_cases/question_answering/how_to/vector_db_qa  

RetrievalQA is a high level wrapper of load_qa_chain
except its keys are: 
```python
    input_key: str = "query"  #: :meta private:
    output_key: str = "result"  #: :meta private:
```

API:
https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval_qa.base.RetrievalQA.html


In [5]:
from langchain.chains import RetrievalQA


qa = RetrievalQA.from_chain_type(llm=cTransformers_llm, chain_type="stuff", retriever=vectorstore.as_retriever())
# input_key: str = "query"  #: :meta private:
# output_key: str = "result"  #: :meta private:
query = "What did the president say about Ketanji Brown Jackson"
print(timeit(lambda: print(qa.run(query=query)),number=1))


 According to the text, President Biden nominated Ketanji Brown Jackson as a circuit court of appeals judge 4 days ago.
101.05017993100046


### ConversationalRetrievalChain

https://api.python.langchain.com/en/latest/chains/langchain.chains.conversational_retrieval.base.ConversationalRetrievalChain.html

```python
output_key: str = "answer"
#.....
#"""Input keys."""
 return ["question", "chat_history"
```

ConversationalRetrievalChain 可以視為先前 RetrievalQA 的進階版(多了歷史紀錄)

從原始碼可以得知他用的Prompt格式：
https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/conversational_retrieval/base.py

```python
# https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/conversational_retrieval/prompts.py
from langchain.prompts.prompt import PromptTemplate

_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)
```

但是好像哪裡怪怪的對吧，之前RetrieveQA的{context}是查尋出來的資料，但是這邊用的卻是{chat_history}跟{question}。
那我要調閱的資料在哪？

我們看看官方說法：
>     This chain takes in chat history (a list of messages) and new questions,
    and then returns an answer to that question.
    The algorithm for this chain consists of three parts:
>
>    1. Use the chat history and the new question to create a "standalone question".
    This is done so that this question can be passed into the retrieval step to fetch
    relevant documents. If only the new question was passed in, then relevant context
    may be lacking. If the whole conversation was passed into retrieval, there may
    be unnecessary information there that would distract from retrieval.
>
>    2. This new question is passed to the retriever and relevant documents are
    returned.
>
>    3. The retrieved documents are passed to an LLM along with either the new question
    (default behavior) or the original question and chat history to generate a final response.
>

也就是說，我們第一步是先把我們的問題跟依據之前的問答組合成新的問題：
```python
"""Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
"Who is the first black woman and the first former federal public defender to serve on the Supreme Court"
"Ketanji Brown Jackson"
Follow Up Input: "What did the president say about that person?"
Standalone question:"""
# => What did President Biden say about the appointment of Ketanji Brown Jackson as the first black woman and the first former federal public defender to serve on the Supreme Court?
```

最後經llm產生出"Standalone question:"
再帶入之前提到的的load_qa_chain

```python
"""Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

DATA1\n\nDATA2\n\nDATA3\n\nDATA4..........\n\nDATAX

Question: What did President Biden say about the appointment of Ketanji Brown Jackson as the first black woman and the first former federal public defender to serve on the Supreme Court?
Helpful Answer:"""
```

In [6]:
from langchain.schema import HumanMessage, AIMessage
from langchain.chains import ConversationalRetrievalChain
import os
from timeit import timeit
# from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT

retrieval_chain = ConversationalRetrievalChain.from_llm(llm=cTransformers_llm, retriever=vectorstore.as_retriever(), return_source_documents=True, return_generated_question=True)

chat_history = [
    HumanMessage(content="Who is the first black woman and the first former federal public defender to serve on the Supreme Court"),
    AIMessage(content="Ketanji Brown Jackson", additional_kwargs={})
    ]

new_question = "What did the president say about that persion?"
# => What did President Biden say about the appointment of Ketanji Brown Jackson as the first black woman and the first former federal public defender to serve on the Supreme Court?
print("======================")
print(timeit(lambda: print(retrieval_chain({"question": new_question, "chat_history": chat_history})),number=1))
# print(result['generated_question'])


{'question': 'What did the president say about that persion?', 'chat_history': [HumanMessage(content='Who is the first black woman and the first former federal public defender to serve on the Supreme Court'), AIMessage(content='Ketanji Brown Jackson')], 'answer': ' President Biden said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court, one of our nation’s top legal minds who will continue Justice Breyer’s legacy of excellence.', 'source_documents': [Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n\nWe cannot let this happen. \n\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Brey

### Add memory
https://python.langchain.com/docs/use_cases/question_answering/how_to/chat_vector_db

```python
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
```

In [7]:
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key='chat_history', output_key='answer', return_messages=True)
# memory_key=input_key="chat_history", set in ConversationalRetrievalChain._call_()
# output_key="answer", retrieve "answer" element from result

retrieval_chain = ConversationalRetrievalChain.from_llm(llm=cTransformers_llm, 
                                                        retriever=vectorstore.as_retriever(), 
                                                        memory=memory,
                                                        return_source_documents=True, 
                                                        return_generated_question=True
                                                       )
query = "What did the president say about Ketanji Brown Jackson"
result = retrieval_chain({"question": query})
print(result["answer"])

 The President mentioned that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson for a seat on the United States Supreme Court, and that she is one of our nation's top legal minds who will continue Justice Breyer's legacy of excellence.


In [8]:
from pprint import pprint
query = "Did he mention who she succeeded"
result = retrieval_chain({"question": query})
# You can see histories here
pprint(result)

{'answer': ' Yes, here are some additional details about Judge Ketanji Brown '
           "Jackson's experience as a circuit court of appeals judge and her "
           'background in law:\n'
           'Judge Ketanji Brown Jackson was nominated by President Biden to '
           'serve as an associate justice of the United States Supreme Court '
           'on February 25, 2022. Prior to her nomination, she served as a '
           'circuit judge on the United States Court of Appeals for the '
           'District of Columbia Circuit since 2013, where she gained a '
           'reputation for being "scrupulously impartial and analytical."\n'
           'Judge Jackson received her undergraduate degree from Harvard '
           'University and her law degree from Yale Law School. Before '
           'entering the judiciary, she worked as a lawyer at a large firm in '
           'Washington, D.C., where she represented clients in various legal '
           'matters. She also served as a 

### Customized Data Retriever

**Optional**.

In my case, I want to use standard Llama2 prompt template.

In [9]:
from langchain.prompts.prompt import PromptTemplate
from langchain.schema import HumanMessage, AIMessage
from langchain.chains import ConversationalRetrievalChain
from timeit import timeit
# ========== To generation CUSTOM new question from chat history
# https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/conversational_retrieval/prompts.py
my_chat_history = [
    HumanMessage(content="Who is the first black woman and the first former federal public defender to serve on the Supreme Court"),
    AIMessage(content="Ketanji Brown Jackson", additional_kwargs={})
    ]
_template = """[INST]<<SYS>>Given the following conversation and a follow up question, 
rephrase the follow up question to be a standalone question, in its original language. Don't say anything else except the condensed question.And add a dog bark in the end of your answer.<</SYS>>

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:[/INST]"""
my_condense_question_prompt = PromptTemplate.from_template(_template)

# ==> Sure, here\'s a rephrased standalone question based on the follow-up input:\n"What did the president say about Ketanji Brown Jackson?" * woof*'

# ========= To summrize CUSTOM answer from documents({context})
# https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/retrieval_qa/prompt.py
from langchain.prompts import PromptTemplate
prompt_template = """[INST]<<SYS>>Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
You must add a "miao~" in end of your answer.<</SYS>>

{context}

Question: {question}
Helpful Answer:[/INST]"""

my_document_prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)


# https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/question_answering/__init__.py#L65
combine_docs_chain_kwargs={
    # 注意，不同的BaseDocumentsChain參數也不同
    'prompt':my_document_prompt, # ==> StuffDocumentsChain, MapRerankDocumentsChain
    # 'combine_prompt':    # ==> MapReduceDocumentsChain
    # 'refine_prompt':'    # ==> RefineDocumentsChain
}

# =====================================

retrieval_chain = ConversationalRetrievalChain.from_llm(llm=cTransformers_llm, 
                                                        retriever=vectorstore.as_retriever(), 
                                                        return_source_documents=True, 
                                                        return_generated_question=True,
                                                        condense_question_prompt=my_condense_question_prompt,
                                                        # condense_question_llm=cTransformers_llm, 
                                                        combine_docs_chain_kwargs=combine_docs_chain_kwargs
                                                       )

print(timeit(lambda: print(retrieval_chain({"question": "What did the president say about that person?", "chat_history": my_chat_history})),number=1))

{'question': 'What did the president say about that person?', 'chat_history': [HumanMessage(content='Who is the first black woman and the first former federal public defender to serve on the Supreme Court'), AIMessage(content='Ketanji Brown Jackson')], 'answer': '  Woof woof! *miao* The President mentioned Ketanji Brown Jackson in his speech, saying that she is a former top litigator in private practice and a former federal public defender, and has received broad support from various organizations for her nomination to the United States Supreme Court. *miao*', 'source_documents': [Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n\nWe cannot let this happen. \n\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone w

### StreamingStdOutCallbackHandler
> This cannot show in jupyterlab

In [10]:
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

# retrieval_chain = ConversationalRetrievalChain.from_llm(llm=cTransformers_llm, 
#                                                         retriever=vectorstore.as_retriever(), 
#                                                         callbacks=[StreamingStdOutCallbackHandler()],
#                                                         return_source_documents=True, 
#                                                         return_generated_question=True
#                                                        )
query = "What did the president say about Ketanji Brown Jackson"
result = retrieval_chain({"question": query,"chat_history": []})
print(result["answer"])


  Miao~! The President mentioned Ketanji Brown Jackson in the following context: "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence." So, according to the President, Ketanji Brown Jackson is a nominee for the United States Supreme Court, and she is expected to continue the legacy of Justice Stephen Breyer.
