## RAG-Based Q&A for FNMA Selling Guide 
 
### Advanced RAG Approaches: Retrieval Enhancement -- Fusion Retrieval/Hybrid Search

> *This notebook should work well with the **`Amazon Bedrock and LangChain freamwork`** kernel in SageMaker Studio*

### Retrieval Enhancement -- Fusion Retrieval/Hybrid Search

A relatively old idea that you could take the best from both worlds — keyword-based old school search — sparse retrieval algorithms like tf-idf or search industry standard BM25 — and modern semantic or vector search and combine it in one retrieval result.
<img src="./images/fusion-retrieval.jpg" width="800" height="600">

## Use Case
#### Purpose
To help answer questions based on the LLM and RAG architecture

The model will try to answer from the documents in easy language.

#### Dataset
Fannie Mae Selling Guide (PDF document)



## Implementation
In order to follow the RAG approach this notebook is using the LangChain framework where it has integrations with different services and tools that allow efficient building of patterns such as RAG. We will be using the following tools:

- **LLM (Large Language Model)**: Anthropic Claude V1 available through Amazon Bedrock

  This model will be used to understand the document chunks and provide an answer in human friendly manner.
- **Embeddings Model**: Amazon Titan Embeddings available through Amazon Bedrock

  This model will be used to generate a numerical representation of the textual documents
- **Document Loader**: PDF Loader available through LangChain

  This is the loader that can load the documents from a source, for the sake of this notebook we are loading the sample files from a local path. This could easily be replaced with a loader to load documents from enterprise internal systems.

- **Vector Store**: FAISS available through LangChain

  In this notebook we are using this in-memory vector-store to store both the embeddings and the documents. In an enterprise context this could be replaced with a persistent store such as AWS OpenSearch, RDS Postgres with pgVector, ChromaDB, Pinecone or Weaviate.


In [4]:
# !pip install rank_bm25

In [5]:
import warnings
warnings.filterwarnings('ignore')

In [6]:
import json
import os
import sys

import boto3

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww


# ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----

# os.environ["AWS_DEFAULT_REGION"] = "<REGION_NAME>"  # E.g. "us-east-1"
# os.environ["AWS_PROFILE"] = "<YOUR_PROFILE>"
# os.environ["BEDROCK_ASSUME_ROLE"] = "<YOUR_ROLE_ARN>"  # E.g. "arn:aws:..."

boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None)
)

Create new client
  Using region: us-east-1
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-east-1.amazonaws.com)


## Configure langchain

We begin with instantiating the LLM and the Embeddings model. Here we are using Anthropic Claude for text generation and Amazon Titan for text embedding.

Note: It is possible to choose other models available with Bedrock. You can replace the `model_id` as follows to change the model.

`llm = Bedrock(model_id="amazon.titan-text-express-v1")`


In [7]:
# We will be using the Titan Embeddings Model to generate our Embeddings.
from langchain.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock

# - create the Anthropic Model
llm = Bedrock(model_id="anthropic.claude-v2", client=boto3_bedrock, model_kwargs={'max_tokens_to_sample':1024, 'temperature':0.1,'top_p':0.5})
#llm = Bedrock(model_id="meta.llama2-13b-chat-v1", client=boto3_bedrock, model_kwargs={'temperature':0.1, 'max_gen_len':1024})

bedrock_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", client=boto3_bedrock)

## Data Preparation
Let's first transform files to build the document store and vector index. For this example we will be using public FNMA Selling Guidedocuments from

Leverage [DirectoryLoader from PyPDF available under LangChain](https://python.langchain.com/en/latest/reference/modules/document_loaders.html) and splitting them into smaller chunks.

Note: The retrieved document/text should be large enough to contain enough information to answer a question; but small enough to fit into the LLM prompt. Also the embeddings model has a limit of the length of input tokens limited to 8192 tokens, which roughly translates to ~32,000 characters. For the sake of this use-case we are creating chunks of roughly 1000 characters with an overlap of 100 characters using [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html).

In [8]:
import numpy as np
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

# loader = PyPDFDirectoryLoader("./data/")

# documents = loader.load()

import pickle

with open('./data/loaded_document.pkl', 'rb') as file:
    documents = pickle.load(file)

In [10]:
# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size =1250,
    chunk_overlap  = 125,
)
docs = text_splitter.split_documents(documents)

In [11]:
from langchain.chains.question_answering import load_qa_chain
from langchain_community.vectorstores import Chroma
from langchain.vectorstores import FAISS
from langchain.indexes import VectorstoreIndexCreator
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

from langchain.storage import InMemoryStore

from langchain.retrievers import  BM25Retriever, EnsembleRetriever

if 'vectordb' in globals(): # If you've already made your vectordb this will delete it so you start fresh
    vectordb.delete_collection()
    
vectordb = Chroma.from_documents(documents=docs, embedding=bedrock_embeddings)
# wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss)

## Question Answering

Now that we have our vector store in place, we can start asking questions.

In [12]:
query = "What are acceptable flood insurance policies for the lender?"


### Quick way
You have the possibility to use the wrapper provided by LangChain which wraps around the Vector Store and takes input the LLM.
This wrapper performs the following steps behind the scences:
- Take the question as input
- Create question embedding
- Fetch relevant documents
- Stuff the documents and the question into a prompt
- Invoke the model with the prompt and generate the answer in a human readable manner.

In [13]:
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
prompt_template = """

Human: Use the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
<context>
{context}
</context

Question: {question}

Answer:"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

In [14]:
# initialize the bm25 retriever and vector retriever
bm25_retriever = BM25Retriever.from_documents(
    documents=docs, k=4
)
vector_retriever = vectordb.as_retriever(
        search_type="similarity", search_kwargs={"k": 4})
 

In [15]:
# initialize the ensemble retriever
retriever_ensemble = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever], weights=[0.4, 0.6]
)

In [16]:
# bm25_retriever.get_relevant_documents(query)

In [17]:
# ##only BM25

# qa_bm25 = RetrievalQA.from_chain_type(
#     llm=llm,
#     chain_type="stuff",
#     retriever=bm25_retriever,
#     return_source_documents=True,
#     chain_type_kwargs={"prompt": PROMPT}
# )
# answer_bm25 = qa_bm25({"query": query})
# print_ww(answer_bm25['query'],'\n',answer_bm25['result'])

In [18]:
# qa_vector = RetrievalQA.from_chain_type(
#     llm=llm,
#     chain_type="stuff",
#     retriever=vector_retriever,
#     return_source_documents=True,
#     chain_type_kwargs={"prompt": PROMPT}
# )
# answer_vector = qa_vector({"query": query})
# print_ww(answer_vector['query'],'\n',answer_vector['result'])

In [19]:
qa_ensemble = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever_ensemble,
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)
answer_ensemble = qa_ensemble({"query": query})
print_ww(answer_ensemble['query'],'\n',answer_ensemble['result'])

  warn_deprecated(


What are acceptable flood insurance policies for the lender?
  Based on the context provided, acceptable flood insurance policies for the lender are:

- A standard policy issued under the National Flood Insurance Program (NFIP).

- A policy issued by a private insurer, provided the terms and amount of coverage are at least equal
to that provided under an NFIP policy based on a review of the full policy. The insurer must also
meet Fannie Mae's rating requirements.

- A Policy Declaration page is acceptable evidence of flood insurance.

The flood insurance policy must include the standard mortgagee clause naming Fannie Mae, the lender,
or the servicer. A mortgagee clause is not required for a master flood insurance policy issued by
the NFIP or a private insurer.


Let's ask a different question:

In [39]:
query_2 ='When can rental income be used to qualify?'

In [40]:
answer_2_ensemble = qa_ensemble({"query": query_2})
print_ww(answer_2_ensemble['query'],'\n',answer_2_ensemble['result'])

When can rental income be used to qualify?
  Based on the context provided, rental income can be used to qualify the borrower if:

- The lender determines the rental income is stable and likely to continue.

- The rental income is derived from a property that is either a 2-4 unit principal residence with
the borrower occupying one unit, or a 1-4 unit investment property.

- The lender documents the rental income using the borrower's most recent signed federal income tax
return and schedules, current signed lease agreements, or Fannie Mae Form 1007 or 1025.

- For the subject property:

-- If an investment property, the net rental income is calculated using the proposed PITIA and added
to qualifying income.

-- If a 2-4 unit principal residence, the net rental income is calculated without subtracting the
proposed PITIA, added to qualifying income, and the PITIA is included in the debt-to-income ratio.

So in summary, rental income can be used for qualifying purposes if it meets Fannie M

In [41]:
query_3= 'Can part-time income be used to qualify?'

In [42]:
answer_3 = qa_ensemble({"query": query_3})
print_ww(answer_3['query'],'\n',answer_3['result'])

Can part-time income be used to qualify?
  Based on the provided context, part-time income can be used to qualify as long as it meets certain
requirements. The key points are:

- Part-time, second job, or seasonal income can be used to qualify if it has been received for at
least 12 months and is likely to continue.

- There cannot be any gap in employment greater than one month in the most recent 12-month period,
unless the part-time income is considered seasonal.

- The lender must verify a minimum history of two years of part-time income, but income received for
a shorter period (at least 12 months) may be considered if there are positive factors to reasonably
offset the shorter income history.

So in summary, yes part-time income can be used to qualify as long as it meets the requirements
around length and continuity of receipt. The lender will verify it using methods like paystubs,
W-2s, and bank statements.
