# QA using a Retriever
https://python.langchain.com/docs/use_cases/question_answering/how_to/vector_db_qa

In [2]:
###### chromadb is not compatible with latest pydantic and fastapi
# pydantic.errors.PydanticImportError: `BaseSettings` has been moved to the `pydantic-settings` package. See https://docs.pydantic.dev/2.3/migration/#basesettings-has-moved-to-pydantic-settings for more details.
# !pip install chromadb
# !pip install fastapi==0.99.1
# !pip install pydantic==1.10.0
# NOTE: you have to restart jupyter kernel to reload lib

!pip install tiktoken  # for OpenAIEmbeddings.

Collecting tiktoken
  Obtaining dependency information for tiktoken from https://files.pythonhosted.org/packages/f4/2e/0adf6e264b996e263b1c57cad6560ffd5492a69beb9fd779ed0463d486bc/tiktoken-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading tiktoken-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Downloading tiktoken-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m0m
[?25hInstalling collected packages: tiktoken
Successfully installed tiktoken-0.5.1


## Example 1: Basic QA with OpenAI
https://python.langchain.com/docs/use_cases/question_answering/

1. Load: Specify a DocumentLoader to load in your unstructured data as Documents. A Document is a piece of text (the page_content) and associated metadata.
   There are many [Loaders](https://integrations.langchain.com/)
   `documents = TextLoader(....).load()`  
2. Split: Split the Document into chunks for embedding and vector storage.  
   `texts = CharacterTextSplitter(...).split_documents(documents)`
3. Store: To be able to look up our document splits, we first need to store them where we can later look them up.  
   The most common way to do this is to embed the contents of each document then store the embedding and document in a vector store,  
   with the embedding being used to index the document.  
   `vectorstore = FAISS.from_documents(texts, embeddings)`

You can find the resources [here](https://integrations.langchain.com/)
loader: such as TextLoader
embeddings: such as OpenAIEmbeddings, HuggingFaceEmbeddings
vectorstore: such as FAISS and Chromadb



In [7]:
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma, FAISS

loader = TextLoader(os.path.abspath("./mydata/state_of_the_union.txt"), encoding="utf-8")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
openai_api_key=""
with open('./mydata/openai_api_key.txt', 'r') as file:
    openai_api_key = file.read().strip()

embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
# docsearch = Chroma.from_documents(texts, embeddings) # Chromadb is bugged
vectorstore = FAISS.from_documents(texts, embeddings)

qa = RetrievalQA.from_chain_type(llm=OpenAI(openai_api_key=openai_api_key), chain_type="stuff", retriever=vectorstore.as_retriever())

In [8]:
query = "What did the president say about Ketanji Brown Jackson"
qa.run(query)

" The President said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, from a family of public school educators and police officers, a consensus builder, and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."

## Example2: QA with other LLM (Llama2)

### Step 1: 
Here we will also make QA step by step  
https://python.langchain.com/docs/use_cases/question_answering/

**Load**: Specify a DocumentLoader to load in your unstructured data as Documents. A Document is a piece of text (the page_content) and associated metadata.
There are many [Loaders](https://integrations.langchain.com/)
`documents = TextLoader(....).load()` 

**Split**: Split the Document into chunks for embedding and vector storage.  
`texts = RecursiveCharacterTextSplitter(...).split_documents(documents)`

**Store**: To be able to look up our document splits, we first need to store them where we can later look them up.  
   The most common way to do this is to embed the contents of each document then store the embedding and document in a vector store,  
   with the embedding being used to index the document.  
   `vectorstore = FAISS.from_documents(texts, embeddings)`

In this example we will know what we will get from **VectorStore**

In [2]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
import os
from mylib.MyModelUtils import MyModelUtils
from langchain.vectorstores import Chroma, FAISS
# 1. Load data
def get_documents():
    from langchain.document_loaders import TextLoader
    loader = TextLoader(os.path.abspath("./mydata/state_of_the_union.txt"), encoding="utf-8")
    documents = loader.load()
    return documents


# 2. Split data
def get_text_splits(documents):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
    # text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    all_text_splits = text_splitter.split_documents(documents)
    return all_text_splits

# embedding data
def get_sentence_transformer():
    model_name = os.path.abspath("./models/sentence-transformers/all-mpnet-base-v2")
    model_kwargs = {"device": MyModelUtils.device()}
    embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)
    return embeddings

# Store data
# type = faiss or chroma
def get_vectorstore(all_text_splits, sentence_transformer, type="faiss"):
    # embedding and storing text splits in the vectorstore
    if type=="chroma":
        vectorstore = Chroma.from_documents(all_text_splits, sentence_transformer)
    else: # elif type=="faiss":
        vectorstore = FAISS.from_documents(all_text_splits, sentence_transformer)
    return vectorstore


## You can found what we get from vectorstore and will be sent to LLM
def test_vectorstore(vectorstore):
    query = "What did the president say about Ketanji Brown Jackson"
    docs = vectorstore.similarity_search(query)
    # __OR__
    # embedding_vector = OpenAIEmbeddings().embed_query(query)
    # docs = db.similarity_search_by_vector(embedding_vector)
    # __OR__
    # docs_and_scores = db.similarity_search_with_score(query)
    print(f"What will we get from vectorestore with: \"{query}\"")
    print("===============================")
    print(docs[0].page_content)
    print("===============================")
        
documents = get_documents() # 1. load document
all_text_splits=get_text_splits(documents) # 2. Split document
sentence_transformer = get_sentence_transformer() # transfermer for embedding

# This what we want
vectorstore = get_vectorstore(all_text_splits, sentence_transformer) # 3. Split document

test_vectorstore(vectorstore) # test it

What will we get from vectorestore with: "What did the president say about Ketanji Brown Jackson"
In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. 

We cannot let this happen. 

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice B

In [None]:
## We can also save and load FIASS
# vectorstore.save_local("faiss_index")
# vectorstore = FAISS.load_local("faiss_index", sentence_transformer)
# test_vectorstore(vectorstore)

## We can also merge 2 vectorstores
# vectorstore1.merge_from(vectorstore2)

# Step2. Chain the doc data in to QA
我們先介紹一下 load_qa_chain，他是所有QA相關功能的核心，後續介紹的 RetrievalQA 跟 ConversationalRetrievalChain都是以他為基礎。
以下是load_qa_chain中四種型態的一種(stuff, map_reduce, refine, map_rerank)。
他們唯一的不同點就是對於資料來源的合併方式，稍後說明。
模型返回具体答案。



In [None]:
from langchain.chains.prompt_selector import ConditionalPromptSelector, is_chat_model

from langchain.chains.question_answering import load_qa_chain  # :StuffDocumentsChain :BaseCombineDocumentsChain
query = "What did the president say about Ketanji Brown Jackson, and how old is he?"
docs = vectorstore.similarity_search(query) #這是我們之前的範例，可以知道我們從vertorstore找到什麼資料
print(docs)
doc_chain = load_qa_chain(llm=langchain_hf_llm, chain_type="stuff")
print("===============")
result = doc_chain.run(input_documents=docs, question=query) 
print(result)


### load_qa_chain 說明
看完上面範例後，以下是StuffDocumentsChain的[說明](https://github.com/langchain-ai/langchain/blob/740eafe41da7317f42387bdfe6d0f1f521f2cafd/libs/langchain/langchain/chains/combine_documents/stuff.py#L20)

```python
"""
    This chain takes a list of documents and first combines them into a single string.
    It does this by formatting each document into a string with the `document_prompt`
    and then joining them together with `document_separator`. It then adds that new
    string to the inputs with the variable name set by `document_variable_name`.
    Those inputs are then passed to the `llm_chain`.
"""
```


可得知 load_qa_chain(....).run(input_documents=docs, question=query)得到的StuffDocumentsChain就是把所有doc的文字內容(page_content)用"\n\n"join成一個字串，然後再放進一個prompt丟給LLM，格式如下

*附錄*
```python
# stuff
prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Helpful Answer:"""
```


-------------------------------

於是整個QA的流程如下:    

| Vector Store | load_qa_chain |
|--------------|---------------|
| DATA -> Loader -> Split -> Transformer + Embedding -> VectorStore -> Array(Document(),Document(), ....) | -> StuffDocumentsChain -> String -> Prompt -> LLM |    


-------------------------------

[chain_type](https://liaokong.gitbook.io/llm-kai-fa-jiao-cheng/) 說明:  
- stuff：(default)這種方式最簡單粗暴，會把所有的文檔一次全部傳給LLM模型進行處理2。
    map_reduce：這種方式會先對每個文檔進行處理（映射步驟），然後再將所有處理過的文檔組合起來得到最終的輸出（減少步驟）。
  
- refine: 这种方式会先总结第一个 document，然后在将第一个 document 总结出的内容和第二个 document 一起发给 llm 模型在进行总结，以此类推。这种方式的好处就是在总结后一个 document 的时候，会带着前一个的 document 进行总结，给需要总结的 document 添加了上下文，增加了总结内容的连贯性。

- map_rerank: 这种一般不会用在总结的 chain 上，而是会用在问答的 chain 上，他其实是一种搜索答案的匹配方式。首先你要给出一个问题，他会根据问题给每个 document 计算一个这个 document 能回答这个问题的概率分数，然后找到分数最高的那个 document ，在通过把这个 document 转化为问题的 prompt 的一部分（问题+document）发送给 llm 模型，最后 llm 模型返回具体答案。 模型返回具体答案。
```r:"""

### RetrievalQA

In [4]:
from mylib.Utils import timeit, myprint
from langchain.llms import HuggingFacePipeline
from langchain.chains import RetrievalQA
model_id=os.path.abspath('./models/Llama-2-7b-chat-hf')
model_util = MyModelUtils(model_id = model_id)

@timeit
def langchain_pipeline_init_timeit():
    modelcfg_kwargs = model_util.default_modelconf_kwargs 
    pretrained_kwargs = model_util.make_model_kwargs_for_pretrained(**modelcfg_kwargs)
    hf_pipeline = model_util.init_hf_pipeline(pretrained_kwargs)
    
    langchain_hf_llm = HuggingFacePipeline(pipeline=hf_pipeline,
                            pipeline_kwargs={'batch_size':128},
                         )
    return langchain_hf_llm

langchain_hf_llm = langchain_pipeline_init_timeit()

# input_key: str = "query"  #: :meta private:
# output_key: str = "result"  #: :meta private:
qa = RetrievalQA.from_chain_type(llm=langchain_hf_llm, chain_type="stuff", retriever=vectorstore.as_retriever())
query = "What did the president say about Ketanji Brown Jackson"
qa.run(query)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Function langchain_pipeline_init_timeit() {} Took 36.6855 seconds




' The president said that he nominated Ketanji Brown Jackson to the United States Supreme Court.'

### Step2. chain the vectorstore data into llm

We will do two things here:
1. Load Language Model for Langchain (langchain.llms.HuggingFacePipeline)
2. chain the vectorstore data into llm (ConversationalRetrievalChain)

ConversationalRetrievalChain 可以視為先前 RetrievalQA 的進階版(多了歷史紀錄)



In [9]:
from langchain.llms import HuggingFacePipeline
from langchain.schema import HumanMessage
from langchain.chains import ConversationalRetrievalChain
import os
from mylib.MyModelUtils import MyModelUtils
from mylib.Utils import timeit, myprint

model_id=os.path.abspath('./models/Llama-2-7b-chat-hf')
model_util = MyModelUtils(model_id = model_id)

@timeit
def langchain_pipeline_init_timeit():
    modelcfg_kwargs = model_util.default_modelconf_kwargs 
    pretrained_kwargs = model_util.make_model_kwargs_for_pretrained(**modelcfg_kwargs)
    hf_pipeline = model_util.init_hf_pipeline(pretrained_kwargs)
    
    langchain_hf_llm = HuggingFacePipeline(pipeline=hf_pipeline,
                            pipeline_kwargs={'batch_size':128},
                         )
    return langchain_hf_llm

langchain_hf_llm = langchain_pipeline_init_timeit()

retrieval_chain = ConversationalRetrievalChain.from_llm(llm=langchain_hf_llm, retriever=vectorstore.as_retriever(), return_source_documents=True)
query = "What did the president say about Ketanji Brown Jackson?\n"
result = retrieval_chain({"question": query, "chat_history": []})

print(result['answer'])
print("======================")
print(result)


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Function langchain_pipeline_init_timeit() {} Took 79.1457 seconds
{'question': 'What did the president say about Ketanji Brown Jackson?\n', 'chat_history': [], 'answer': " The president mentioned Ketanji Brown Jackson in the context of nominating someone to serve on the United States Supreme Court. The president stated that he nominated Jackson 4 days ago and that she is one of our nation's top legal minds who will continue Justice Breyer's legacy of excellence.", 'source_documents': [Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious cons

In [None]:
好了，現在各位應該覺得很神奇，但是仔細一想又滿頭問號。
問題來了，為什麼他知道要如何提取我們的問題給LLM，又是如何給LLM的？
為什麼要傳入({"question": query, "chat_history": []}
為什麼是"question"?
首先我們看
https://api.python.langchain.com/en/latest/_modules/langchain/chains/conversational_retrieval/base.html#ConversationalRetrievalChain  
裡面有幾行程式碼:  
```python
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
# .....
# CONDENSE_QUESTION_PROMPT 被設為預設值
condense_question_prompt: BasePromptTemplate = CONDENSE_QUESTION_PROMPT,
# .....
# 
doc_chain = load_qa_chain(
            llm,
            chain_type=chain_type,
            verbose=verbose,
            callbacks=callbacks,
            **combine_docs_chain_kwargs,
        )
# ......
condense_question_chain = LLMChain(
    llm=_llm,
    prompt=condense_question_prompt,
    verbose=verbose,
    callbacks=callbacks,
)
```


請看以下程式碼：

https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/conversational_retrieval/prompts.py
```py
from langchain.prompts.prompt import PromptTemplate

_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)

prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Helpful Answer:"""
QA_PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)
```
懂了嗎？
ConversationalRetrievalChain

CONDENSE_QUESTION_PROMPT