## RAG-Based Q&A for FNMA Selling Guide 
 
### Advanced RAG Approaches: Query Transformation --Stepback Prompting

> *This notebook should work well with the **`Amazon Bedrock and LangChain freamwork`** kernel in SageMaker Studio*

### Query Transformation --Stepback Prompting
This technique can be combined with regular rag by doing retrieval on both the original and step-back question.

<img src="./images/RAG-multiqueries.jpg" width="600" height="400">

## Use Case
#### Purpose
To help answer questions based on the LLM and RAG architecture

The model will try to answer from the documents in easy language.

#### Dataset
Fannie Mae Selling Guide (PDF document)



## Implementation
In order to follow the RAG approach this notebook is using the LangChain framework where it has integrations with different services and tools that allow efficient building of patterns such as RAG. We will be using the following tools:

- **LLM (Large Language Model)**: Anthropic Claude V1 available through Amazon Bedrock

  This model will be used to understand the document chunks and provide an answer in human friendly manner.
- **Embeddings Model**: Amazon Titan Embeddings available through Amazon Bedrock

  This model will be used to generate a numerical representation of the textual documents
- **Document Loader**: PDF Loader available through LangChain

  This is the loader that can load the documents from a source, for the sake of this notebook we are loading the sample files from a local path. This could easily be replaced with a loader to load documents from enterprise internal systems.

- **Vector Store**: FAISS available through LangChain

  In this notebook we are using this in-memory vector-store to store both the embeddings and the documents. In an enterprise context this could be replaced with a persistent store such as AWS OpenSearch, RDS Postgres with pgVector, ChromaDB, Pinecone or Weaviate.


In [3]:
# !pip install langchainhub

In [2]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
import json
import os
import sys

import boto3

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww


# ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----

# os.environ["AWS_DEFAULT_REGION"] = "<REGION_NAME>"  # E.g. "us-east-1"
# os.environ["AWS_PROFILE"] = "<YOUR_PROFILE>"
# os.environ["BEDROCK_ASSUME_ROLE"] = "<YOUR_ROLE_ARN>"  # E.g. "arn:aws:..."

boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None)
)

Create new client
  Using region: us-east-1
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-east-1.amazonaws.com)


## Configure langchain

We begin with instantiating the LLM and the Embeddings model. Here we are using Anthropic Claude for text generation and Amazon Titan for text embedding.

Note: It is possible to choose other models available with Bedrock. You can replace the `model_id` as follows to change the model.

`llm = Bedrock(model_id="amazon.titan-text-express-v1")`

Check Available text generation and embedding models Ids under Amazon Bedrock.


In [4]:
# We will be using the Titan Embeddings Model to generate our Embeddings.
from langchain.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock


# - create the Anthropic Model
llm = Bedrock(model_id="anthropic.claude-v2", client=boto3_bedrock, model_kwargs={'max_tokens_to_sample':1024, 'temperature':0.1,'top_p':0.5})
#llm = Bedrock(model_id="meta.llama2-13b-chat-v1", client=boto3_bedrock, model_kwargs={'temperature':0.1, 'max_gen_len':1024})

bedrock_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", client=boto3_bedrock)

In [5]:
# - create the Anthropic Model
llm_gen = Bedrock(model_id="anthropic.claude-v2", client=boto3_bedrock, model_kwargs={'max_tokens_to_sample':1024, 'temperature':0,'top_p':0.5})
#llm = Bedroc

## Data Preparation
Let's first transform files to build the document store and vector index. For this example we will be using public FNMA Selling Guidedocuments from

Leverage [DirectoryLoader from PyPDF available under LangChain](https://python.langchain.com/en/latest/reference/modules/document_loaders.html) and splitting them into smaller chunks.

Note: The retrieved document/text should be large enough to contain enough information to answer a question; but small enough to fit into the LLM prompt. Also the embeddings model has a limit of the length of input tokens limited to 8192 tokens, which roughly translates to ~32,000 characters. For the sake of this use-case we are creating chunks of roughly 1000 characters with an overlap of 100 characters using [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html).

In [6]:
import numpy as np
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

# loader = PyPDFDirectoryLoader("./data/")

# documents = loader.load()
import pickle

with open('./data/loaded_document.pkl', 'rb') as file:
    documents = pickle.load(file)

In [7]:
# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size =1250,
    chunk_overlap  = 125,
)
docs = text_splitter.split_documents(documents)

In [8]:
avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)
avg_char_count_pre = avg_doc_length(documents)
avg_char_count_post = avg_doc_length(docs)
print(f'Average length among {len(documents)} documents loaded is {avg_char_count_pre} characters.')
print(f'After the split we have {len(docs)} documents more than the original {len(documents)}.')
print(f'Average length among {len(docs)} documents (after split) is {avg_char_count_post} characters.')

Average length among 1199 documents loaded is 2087 characters.
After the split we have 2720 documents more than the original 1199.
Average length among 2720 documents (after split) is 969 characters.


### Stepback Prompting 
How to implement a stepback prompting in LangChain?

In [24]:

from langchain.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnableLambda

######################################

# Few Shot Examples
examples = [
    {
        "input": "Can part-time income be used to qualify for the mortgage?",
        "output": "what kind of incomes can qualify the borrower for the mortgage?"
    },
    {
        "input": "Jan Sindel’s was born in what country?", 
        "output": "what is Jan Sindel’s personal history?"
    },
]
# We now transform these to example messages
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

######################################

prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:"""),
    # Few shot examples
    few_shot_prompt,
    # New question
    ("user", "{question}"),
])

######################################

question_gen = prompt | llm_gen | StrOutputParser()



In [25]:
######################################
question = "Can be a home for the mortgage be powered by solar panels only?"
######################################

question_gen.invoke({"question": question})

' What are the energy source requirements for a home to qualify for a mortgage?'

In [26]:
######################################
question = 'Can part-time income be used to qualify?'
######################################

question_gen.invoke({"question": question})

' What types of income can be used to qualify for a mortgage?'

In [11]:
from langchain.chains.question_answering import load_qa_chain
from langchain.vectorstores import FAISS,Chroma
from langchain.indexes import VectorstoreIndexCreator
from langchain.indexes.vectorstore import VectorStoreIndexWrapper


if 'vectordb' in globals(): # If you've already made your vectordb this will delete it so you start fresh
    vectordb.delete_collection()
    
vectordb = Chroma.from_documents(documents=docs, embedding=bedrock_embeddings)


In [15]:
# set retriever/index, including search methods and how many contexts will be retrieved
retriever_vector =vectordb.as_retriever(
        search_type="similarity", search_kwargs={"k":4})

## Question Answering

Now that we have our vector store in place, we can start asking questions.

Now we have the relevant documents, it's time to use the LLM to generate an answer based on these documents. 


LangChain provides an abstraction of how this can be done easily.

### Quick way
You have the possibility to use the wrapper provided by LangChain which wraps around the Vector Store and takes input the LLM.
This wrapper performs the following steps behind the scences:
- Take the question as input
- Create question embedding
- Fetch relevant documents
- Stuff the documents and the question into a prompt
- Invoke the model with the prompt and generate the answer in a human readable manner.

In [13]:
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

prompt_template = """

Human: Use the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
<context>
{normal_context}
</context

Question: {question}

Answer:"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["normal_context", "question"]
)

In [14]:
from langchain import hub

response_prompt = hub.pull("langchain-ai/stepback-answer")

response_prompt.template = 'Use the following pieces of context to provide a concise answer to the question at the end. \
If you do not know the answer, just say that you do not know, do not try to make up an answer. \
\n\n{normal_context}\n{step_back_context}\n\nOriginal Question: {question}\nAnswer:'


In [16]:
qa_chain = {
    # Retrieve context using the normal question
    "normal_context": RunnableLambda(lambda x: x['question']) | retriever_vector,
    # Retrieve context using the step-back question
    "step_back_context": question_gen | retriever_vector,
    # Pass on the question
    "question": lambda x: x["question"]
} | response_prompt | llm | StrOutputParser()


In [12]:
query = "What are acceptable flood insurance policies for the lender?"

In [17]:
answer = qa_chain.invoke({"question":query})
print_ww(answer)

 Based on the provided context, acceptable flood insurance policies for the lender are:

- A standard policy issued under the National Flood Insurance Program (NFIP).

- A policy issued by a private insurer, provided the terms and amount of coverage are at least equal
to that provided under an NFIP policy and the insurer meets Fannie Mae's rating requirements.

The flood insurance policy must include the standard mortgagee clause naming the lender. A Policy
Declaration page is acceptable evidence of flood insurance.


Let's ask a different question:

In [18]:
query_2 =  'When can rental income be used to qualify?'
#query_3 = 'What are lender incentives for mortgage borrowers?'

In [19]:
answer_2= qa_chain.invoke({"question":query_2})
print_ww(answer_2)

 Based on the provided context, rental income can be used to qualify if:

- It is derived from a property that is either the subject property (2-4 unit principal residence or
1-4 unit investment property) or a non-subject property (any property type).

- The borrower has at least a one-year history of receiving rental income or documented property
management experience.

- The income is likely to continue, as supported by documentation like tax returns, lease
agreements, etc.

- It meets all other standard income verification requirements based on the source and type of
rental income.

So in summary, rental income can be used for qualifying purposes if it comes from an eligible
property type, there is history/documentation to support it continuing, and it is verified following
standard income documentation requirements.


In [20]:
query_3 = 'Can part-time income be used to qualify?'

In [21]:
answer_3 = qa_chain.invoke({"question":query_3})
print_ww(answer_3)

 Based on the provided context, here are the key points for using part-time income to qualify:

- Part-time or secondary employment income can be used to qualify if it has been received for at
least 12 months and is stable. A 2-year history is recommended but not required.

- There cannot be any gaps in employment greater than 1 month in the most recent 12-month period,
unless the income is considered seasonal.

- Seasonal part-time income can be used if the borrower has at least a 2-year history of seasonal
employment and income, and it is properly documented.

- The income must be verified following standard documentation requirements based on the source and
type of income.

- The full amount of the part-time income is added to the borrower's total income when calculating
the qualifying income.

- The total qualifying income that includes part-time income cannot exceed the borrower's regular
employment income.

So in summary, yes part-time income can be used to qualify, provided it m