# Retrieval-augmented Generation (RAG)

This notebook provides several prototypes of a Retrieval-Augmented Generation (RAG) system. We use product data analytics use cases as a running example.

### Use Case
We have a large collection of documents and want to use LLM to summarize these documents, answer standalone questions based on the document content, or answer questions in a conversational mode. Examples include a sales assistant that answers customers' questions about company's products, coding assistant that answers developers' questions about the codebase, and legal assistant that answers questions about regulations.   

### Prototype: Approach and Data
We start with a basic case where the input documents are small enough to fit the LLM context, and then develop more advanced solutions that can handle large document collections. We use small input documents that are available in the `tensor-house-data` repository.

### Usage and Productization
The implementation uses a production-grade framework, but external embedding storage (vector store) and additional components such as caching are typically needed to create production grade applications.

## Environment Setup and Initialization

In [19]:
#
# Imports
#
from langchain_community.llms.vertexai import VertexAI

from langchain_core.prompts import PromptTemplate

from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader

from langchain_community.embeddings import VertexAIEmbeddings
from langchain_community.vectorstores.chroma import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

#
# Initialize LLM provider
# (google-cloud-aiplatform must be installed)
#
from google.cloud import aiplatform
aiplatform.init(
    project='<< specify your project name here >>',
    location='us-central1'
)

## Question Answering Using In-Prompt Documents

The most basic scenario is querying small documents that fit the LLM prompt.

In [4]:
query_prompt = """
Please read the following text and answer which fruits have the highest concentration of citric acid.

TEXT:
Citric acid occurs in a variety of fruits and vegetables, most notably citrus fruits. Lemons and limes have particularly 
high concentrations of the acid; it can constitute as much as 8% of the dry weight of these fruits (about 47 g/L in the juices).
The concentrations of citric acid in citrus fruits range from 0.005 mol/L for oranges and grapefruits to 0.30 mol/L in lemons 
and limes; these values vary within species depending upon the cultivar and the circumstances under which the fruit was grown.
"""

llm = VertexAI(temperature=0.7)
response = llm(query_prompt)
print(response)

 Lemons and limes have the highest concentration of citric acid among the mentioned fruits.


## Question Answering Using MapReduce

For large documents and collections of documents that do not fit the LLM context, we can apply the MapReduce pattern to independently extract relevant summaries from document parts, and then merge these summaries into the final answer. This approach is appropriate for summarization and summarization-like queries.

In [17]:
from langchain.chains import LLMChain, StuffDocumentsChain, ReduceDocumentsChain, MapReduceDocumentsChain

#
# Load the input document
#
loader = TextLoader("../../tensor-house-data/search/food-additives/citric-acid-applications.txt")
documents = loader.load()

#
# Splitting
#
text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
print(f'The input document has been split into {len(texts)} chunks\n')

#
# Querying
#
map_prompt = """
Use the following portion of a long document to see if any of the text is relevant to answer the question. 
Return bullet points that help to answer the question.

{context}

Question: {question}
Bullet points:
"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["context", "question"])

combine_prompt = """
Given the following bullet points extracted from a long document and a question, create a final answer.
Question: {question}

=========
{context}
=========

Final answer:
"""
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=["summaries", "question"])

llm = VertexAI(temperature=0.7)

document_prompt = PromptTemplate(
    input_variables=["page_content"],
    template="{page_content}"
)

document_variable_name = "context"

map_llm_chain = LLMChain(llm=llm, prompt=map_prompt_template)

reduce_llm_chain = LLMChain(llm=llm, prompt=combine_prompt_template)

combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_llm_chain,
    document_prompt=document_prompt,
    document_variable_name=document_variable_name
)
reduce_documents_chain = ReduceDocumentsChain(
    combine_documents_chain=combine_documents_chain,
)
chain = MapReduceDocumentsChain(
    llm_chain=map_llm_chain,
    reduce_documents_chain=reduce_documents_chain,
    document_variable_name=document_variable_name
)

question = "What are the three most important applications of citric acid? Provide a short justification for each application."

print(chain.invoke({'input_documents': texts, 'question': question})['output_text'])

The input document has been split into 2 chunks
 The three most important applications of citric acid are:

1. **Food and beverage industry:** Citric acid enhances flavors and acts as a preservative, making it a popular choice in various food and beverage products, especially soft drinks.

2. **Cleaning products:** Citric acid's ability to chelate metals and remove hard water stains makes it an effective component in soaps, detergents, and household cleaners, improving their performance in hard water conditions.

3. **Pharmaceutical and biotechnology industry:** Citric acid is used in various pharmaceutical and biotechnological applications, including hair care products to open hair cuticles and enhance treatment penetration, photography


## Question Answering Using Vector Search

For large documents and point questions that require only specific document parts to be answered, LLMs can be combined with traditional information retrieval techniques. The input document(s) is split into chunks which are then indexed in a vector store. To answer the user question, the most relevant chunks are retrieved and passed to the LLM.

In [31]:
#
# Load the input document
#
loader = TextLoader("../../tensor-house-data/search/food-additives/food-additives.txt")
documents = loader.load()

#
# Splitting
#
text_splitter = CharacterTextSplitter(chunk_size=3000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
print(f'The input document has been split into {len(texts)} chunks\n')

#
# Indexing and storing
#
embeddings = VertexAIEmbeddings(model_name="textembedding-gecko")
vectorstore = Chroma.from_documents(texts, embeddings)
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 1})

#
# Querying
#
llm = VertexAI(temperature=0.7, verbose=True)
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.

{context}

Question: {question}

Helpful Answer:"""
custom_rag_prompt = PromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

query = "What is the melting point of citric acid?"
rag_chain.invoke(query)

The input document has been split into 5 chunks


' The melting point of citric acid is approximately 153 °C (307 °F).'

## Conversational Retrieval

In this section, we prototype a conversational retrieval system. It combines the chat history with the retrieved documents to answer the question.

In [38]:
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

#
# Load the input document
#
loader = TextLoader("../../tensor-house-data/search/food-additives/food-additives.txt")
documents = loader.load()

#
# Splitting
#
text_splitter = CharacterTextSplitter(chunk_size=3000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
print(f'The input document has been split into {len(texts)} chunks\n')

#
# Indexing and storing
#
embeddings = VertexAIEmbeddings(model_name="textembedding-gecko")
vectorstore = Chroma.from_documents(texts, embeddings)
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 1})

#
# Initialize new chat
#

contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{question}"),
    ]
)
contextualize_q_chain = contextualize_q_prompt | llm | StrOutputParser()

qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\

{context}"""
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{question}"),
    ]
)

def contextualized_question(input: dict):
    if input.get("chat_history"):
        return contextualize_q_chain
    else:
        return input["question"]


rag_chain = (
    RunnablePassthrough.assign(
        context=contextualized_question | retriever
    )
    | qa_prompt
    | llm
)

chat_history = []

question1 = "How much times aspartam is sweeter than table sugar?"
ai_msg1 = rag_chain.invoke({"question": question, "chat_history": chat_history})
chat_history.extend([HumanMessage(content=question1), AIMessage(content=ai_msg1)])

question2 = "What is its caloric value?"
ai_msg2 = rag_chain.invoke({"question": question2, "chat_history": chat_history})
chat_history.extend([HumanMessage(content=question2), AIMessage(content=ai_msg2)])

for m in chat_history:
    print(f'{m.type}: {m.content}')

The input document has been split into 5 chunks
human: How much times aspartam is sweeter than table sugar?
ai:  Aspartame is approximately 150 to 200 times sweeter than sucrose (table sugar), making it an intense sweetener.
human: What is its caloric value?
ai:  AI: Aspartame is virtually calorie-free, as the human body does not metabolize it into energy like sugars or carbohydrates.
