## RAG-Based Q&A for FNMA Selling Guide 
 
### Advanced RAG Approaches: Query Transformation -- Multiple Queries

> *This notebook should work well with the **`Amazon Bedrock and LangChain freamwork`** kernel in SageMaker Studio*

## Use Case
#### Purpose
To help answer questions based on the LLM and RAG architecture

The model will try to answer from the documents in easy language.

#### Dataset
Fannie Mae Selling Guide (PDF document)



### Query Transformation -- Multiple Queries
An LLM is used to automate the process of prompt tuning, to generate multiple queries from different perspectives for a given user input question.This call generates multiple versions of the initial question, then retrieval is performed on this set of questions.
<img src="./images/rag_multiqueries.jpg" width="800" height="600">



In [3]:
# !pip install langchain_openai
# !pip install chroma 
# !pip install chromadb

In [2]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
import json
import os
import sys

import boto3

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww


# ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----

# os.environ["AWS_DEFAULT_REGION"] = "<REGION_NAME>"  # E.g. "us-east-1"
# os.environ["AWS_PROFILE"] = "<YOUR_PROFILE>"
# os.environ["BEDROCK_ASSUME_ROLE"] = "<YOUR_ROLE_ARN>"  # E.g. "arn:aws:..."

boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None)
)

Create new client
  Using region: us-east-1
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-east-1.amazonaws.com)


## Configure langchain

We begin with instantiating the LLM and the Embeddings model. Here we are using Anthropic Claude for text generation and Amazon Titan for text embedding.

Note: It is possible to choose other models available with Bedrock. You can replace the `model_id` as follows to change the model.

`llm = Bedrock(model_id="amazon.titan-text-express-v1")`

In [5]:
# We will be using the Titan Embeddings Model to generate our Embeddings.
from langchain.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock


# - create the Anthropic Model
llm = Bedrock(model_id="anthropic.claude-v2", client=boto3_bedrock, model_kwargs={'max_tokens_to_sample':1024, 'temperature':0.1,'top_p':0.5})
#llm = Bedrock(model_id="meta.llama2-13b-chat-v1", client=boto3_bedrock, model_kwargs={'temperature':0.1, 'max_gen_len':1024})

bedrock_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", client=boto3_bedrock)

## Data Preparation
Let's first transform files to build the document store and vector index. For this example we will be using public FNMA Selling Guidedocuments from

Leverage [DirectoryLoader from PyPDF available under LangChain](https://python.langchain.com/en/latest/reference/modules/document_loaders.html) and splitting them into smaller chunks.

Note: The retrieved document/text should be large enough to contain enough information to answer a question; but small enough to fit into the LLM prompt. Also the embeddings model has a limit of the length of input tokens limited to 8192 tokens, which roughly translates to ~32,000 characters. For the sake of this use-case we are creating chunks of roughly 1000 characters with an overlap of 100 characters using [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html).

In [6]:
import numpy as np
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

# loader = PyPDFDirectoryLoader("./data/")

# documents = loader.load()
import pickle

with open('./data/loaded_document.pkl', 'rb') as file:
    documents = pickle.load(file)

In [7]:
# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
     chunk_size =1250,
     chunk_overlap  = 125,
)
docs = text_splitter.split_documents(documents)

In [8]:
avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)
avg_char_count_pre = avg_doc_length(documents)
avg_char_count_post = avg_doc_length(docs)
print(f'Average length among {len(documents)} documents loaded is {avg_char_count_pre} characters.')
print(f'After the split we have {len(docs)} documents more than the original {len(documents)}.')
print(f'Average length among {len(docs)} documents (after split) is {avg_char_count_post} characters.')

Average length among 1199 documents loaded is 2087 characters.
After the split we have 2720 documents more than the original 1199.
Average length among 2720 documents (after split) is 969 characters.


Following the similar pattern embeddings could be generated for the entire corpus and stored in a vector store.

This can be easily done using [FAISS](https://github.com/facebookresearch/faiss) implementation inside [LangChain](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/faiss.html) which takes  input the embeddings model and the documents to create the entire vector store. Using the Index Wrapper we can abstract away most of the heavy lifting such as creating the prompt, getting embeddings of the query, sampling the relevant documents and calling the LLM. 

In [9]:
from langchain.chains.question_answering import load_qa_chain
from langchain.vectorstores import FAISS,Chroma
from langchain.indexes import VectorstoreIndexCreator
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

# vectorstore_faiss = FAISS.from_documents(
#     docs,
#     bedrock_embeddings,
# )

if 'vectordb' in globals(): # If you've already made your vectordb this will delete it so you start fresh
    vectordb.delete_collection()
    
vectordb = Chroma.from_documents(documents=docs, embedding=bedrock_embeddings)



## Question Answering

Now that we have our vector store in place, we can start asking questions.

In [10]:
query = "What are acceptable flood insurance policies for the lender?"

The first step would be to create an embedding of the query such that it could be compared with the documents

### Quick way
You have the possibility to use the wrapper provided by LangChain which wraps around the Vector Store and takes input the LLM.
This wrapper performs the following steps behind the scences:
- Take the question as input
- Create question embedding
- Fetch relevant documents
- Stuff the documents and the question into a prompt
- Invoke the model with the prompt and generate the answer in a human readable manner.

In [11]:
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
prompt_template = """

Human: Use the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
<context>
{context}
</context

Question: {question}

Answer:"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

In [12]:
from typing import List
from pydantic import BaseModel, Field

from langchain.output_parsers import PydanticOutputParser
# Output parser will split the LLM result into a list of queries
class LineList(BaseModel):
    # "lines" is the key (attribute name) of the parsed output
    lines: List[str] = Field(description="Lines of text")

class LineListOutputParser(PydanticOutputParser):
    def __init__(self) -> None:
        super().__init__(pydantic_object=LineList)

    def parse(self, text: str) -> LineList:
        lines = text.strip().split("\n")
        return LineList(lines=lines)

output_parser = LineListOutputParser()

In [13]:

llm_multiquery = Bedrock(model_id="anthropic.claude-v2", client=boto3_bedrock, model_kwargs={'max_tokens_to_sample':1024, 'temperature':0,'top_p':0.5})


In [14]:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_core.callbacks.manager import CallbackManagerForRetrieverRun
import logging
logger = logging.getLogger(__name__)
logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

class MultiQueryRetriever_1(MultiQueryRetriever):
    logger = logging.getLogger(__name__)
    
    def __init__(self,*args, **kwargs):
        super().__init__(*args, **kwargs)
        
    def generate_queries(
        self, question: str, run_manager: CallbackManagerForRetrieverRun
    ) -> List[str]:
        """Generate queries based upon user input.

        Args:
            question: user query

        Returns:
            List of LLM generated queries that are similar to the user input
        """
        response = self.llm_chain(
            {"question": question}, callbacks=run_manager.get_child()
        )
        lines_0 = getattr(response["text"], self.parser_key, [])
        lines = [item for item in lines_0 if len(item)>0]
        if self.verbose:
            logger.info(f"Generated queries: {lines}")
        return lines

In [15]:
retriever_multiquery= MultiQueryRetriever_1.from_llm(
    retriever=vectordb.as_retriever(), llm=llm_multiquery, parser_key='lines'
)

In [16]:
qa_multiquery = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever_multiquery,
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)

  warn_deprecated(


What are acceptable flood insurance policies for the lender?
  Based on the context provided, acceptable flood insurance policies for the lender are:

- A standard policy issued under the National Flood Insurance Program (NFIP).

- A policy issued by a private insurer, provided the terms and amount of coverage are at least equal
to that provided under an NFIP policy based on a review of the full policy. The private insurer must
also meet Fannie Mae's rating requirements.

A Policy Declaration page is acceptable evidence of flood insurance.


In [None]:
answer =qa_multiquery({"query": query})
print_ww(answer['query'],'\n',answer['result'])

Let's ask a different question:

In [20]:
query_2 = "Can part-time income be used to qualify?"

In [21]:
answer_2 = qa({"query": query_2})
print_ww(answer_2['query'],'\n',answer_2['result'])

Can part-time income be used to qualify?
  Based on the provided context, I do not have enough information to determine if part-time income
can be used to qualify. The context discusses lead paint and environmental hazards, but does not
mention anything about part-time income or qualifying income sources. Without more details about the
loan application and income sources, I cannot conclusively answer whether part-time income can be
used.


In [None]:
## Retrieved documents as the context information
for document in answer_2['source_documents']:
    print_ww(document.page_content, '\n',document.metadata, '\n')

In [22]:
query_3 =  'When can rental income be used to qualify?'
answer_3 = qa({"query": query_3})
print_ww(answer_3['query'],'\n',answer_3['result'])

When can rental income be used to qualify?
  Based on the provided context, rental income can be used to qualify the borrower if:

- The rental income is derived from a property that is either a two- to four-unit principal
residence property in which the borrower occupies one of the units, or a one- to four-unit
investment property.

- The borrower has a history of receiving rental income from the subject or non-subject property, as
documented by tax returns or lease agreements.

- The rental income meets all other general requirements for documenting and calculating rental
income used for qualifying purposes.

Rental income generally cannot be used to qualify if it is derived from the borrower's principal
residence in a one-unit property or second home. However, there are some exceptions that allow
rental income from boarders or accessory units in certain cases.


In [23]:
query_4 ='What if a property has a roof leak?'
answer_4 = qa({"query": query_4})
print_ww(answer_4['query'],'\n',answer_4['result'])

What if a property has a roof leak?
  Based on the provided context, if a property has a roof leak, the lender should assess whether the
leak affects the safety, soundness, or structural integrity of the property.

Some key considerations:

- If the roof leak is minor and can be easily repaired, the lender may be able to deliver the loan
to Fannie Mae as long as the leak does not affect the safety, soundness, or structural integrity of
the property. The lender would need to obtain documentation of professional repair estimates and
ensure sufficient funds are available to guarantee completion of repairs.

- If the roof leak is more substantial and affects the safety, soundness, or structural integrity of
the property, the roof should be repaired before the loan is delivered to Fannie Mae. The lender
would need satisfactory evidence showing the repairs have been completed.

- For condos and co-ops, the condition of both the individual unit and the overall building where
the unit is locat

In [None]:
## Retrieved documents as the context information
for document in answer_3['source_documents']:
    print_ww(document.page_content, '\n',document.metadata, '\n')