# Retrieval Augmented Generation (RAG) with FAISS

This example demonstrates the use of Retrieval Augmented Generation (RAG) using Langhchain and FAISS

## Install requirements

In [2]:
pip install boto3 pypdf faiss-cpu

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.0.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## Setup Bedrock

Setup Amazon Bedrock client and helper function to format output text.

In [3]:
import boto3
from io import StringIO
import sys
import textwrap

session = boto3.Session(profile_name='bach-dev', region_name='us-east-1')
boto3_bedrock = session.client(service_name='bedrock-runtime')

def print_ww(*args, width: int = 100, **kwargs):
    """Like print(), but wraps output to `width` characters (default 100)"""
    buffer = StringIO()
    try:
        _stdout = sys.stdout
        sys.stdout = buffer
        print(*args, **kwargs)
        output = buffer.getvalue()
    finally:
        sys.stdout = _stdout
    for line in output.splitlines():
        print("\n".join(textwrap.wrap(line, width=width)))

## Configure Langchain

In [4]:
# We will be using the Titan Embeddings Model to generate our Embeddings.
from langchain.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock

# - create the Anthropic Model
llm = Bedrock(model_id="anthropic.claude-v2", client=boto3_bedrock, model_kwargs={'max_tokens_to_sample':200})
bedrock_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", client=boto3_bedrock)

## Data Preparation
Here we will split the iMIS guide in to chunks to prepare for inserting into the vector DB.

In [5]:
import numpy as np
from langchain.text_splitter import  RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("data/imis_guide.pdf")
pages = loader.load()

# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(  
    chunk_size = 1000,
    chunk_overlap  = 100,
)
docs = text_splitter.split_documents(pages)

print(f"Split {len(pages)} pages into {len(docs)} chunks.")

Split 70 pages into 136 chunks.


We can check the size of the document chunks.

In [7]:
avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)
avg_char_count_pre = avg_doc_length(pages)
avg_char_count_post = avg_doc_length(docs)
print(f'Average length among {len(pages)} pages loaded is {avg_char_count_pre} characters.')
print(f'After the split we have {len(docs)} chunks more than the original {len(pages)}.')
print(f'Average length among {len(docs)} chunks (after split) is {avg_char_count_post} characters.')

Average length among 70 pages loaded is 1451 characters.
After the split we have 136 chunks more than the original 70.
Average length among 136 chunks (after split) is 762 characters.


This is what an example of an embedding looks like using Amazon Bedrock

In [8]:
try:
    sample_embedding = np.array(bedrock_embeddings.embed_query(docs[0].page_content))
    print("Sample embedding of a document chunk: ", sample_embedding)
    print("Size of the embedding: ", sample_embedding.shape)

except ValueError as error:
    if  "AccessDeniedException" in str(error):
        print(f"\x1b[41m{error}\
        \nTo troubeshoot this issue please refer to the following resources.\
         \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
         \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")      
        class StopExecution(ValueError):
            def _render_traceback_(self):
                pass
        raise StopExecution        
    else:
        raise error

Sample embedding of a document chunk:  [ 0.10449219 -0.1484375  -0.31835938 ...  0.18554688 -0.38867188
 -0.484375  ]
Size of the embedding:  (1536,)


## Save to FAISS
Here we will save the chunks into FAISS

In [9]:
from langchain.vectorstores import FAISS
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

vectorstore_faiss = FAISS.from_documents(docs, bedrock_embeddings)

wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss)

## Question Answering
We can ask our vector store (FAISS) to retrieve similar documents based on a question.  This is what the vector response looks like.

In [10]:
query = """When are comments required?"""
query_embedding = vectorstore_faiss.embedding_function.embed_query(query)
np.array(query_embedding)

array([ 0.359375  , -0.27929688,  0.24511719, ...,  0.13671875,
       -0.5859375 ,  0.17871094])

When we query the vector store we can retrieve the chunks of the document that were referenced by the similarity search.

In [11]:
relevant_documents = vectorstore_faiss.similarity_search_by_vector(query_embedding)
print(f'{len(relevant_documents)} documents are fetched which are relevant to the query.')
print('----')
for i, rel_doc in enumerate(relevant_documents):
    print(f'## Document {i+1}: {rel_doc.page_content}.......')
    print('---')

4 documents are fetched which are relevant to the query.
----
## Document 1: 18 | P a g e  
  
 
      Comments  
 
Comments are used to provide information that is not captured elsewhere in the system.  
E.g., “John Smith has a pre -existing breakdown on a 2003 Toyota Camry.  Advised him that he will be covered for 
this tow at the Basic coverage level, even though he purchased Plus.”  
When you add a Comment you have the option of also making this comment an ‘Alert’.   
1.) If there is a comment with an alert, it will be more prominent on the  HH Profile  
2.) Comments with alerts will be indicated on integrated systems e.g., D2000, AX etc.  
 
 
Only the Comment Title and the Date it was added will b e 
displayed on the HH Profile. To view comment details, double 
click on Comment and be redirected to Comments in HH 
Maintenance.                                
 
 
 
 
 
 
 
 
Comment s are written to H istory at the time the comment is created.  If you add an end date to the commen

### Generating the response with Amazon Bedrock
We can build the prompt template, feed in our context from the vector store along with the question, to generate a response from the LLM.

In [12]:
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

prompt_template = """

Human: Use the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
<context>
{context}
</context

Question: {question}

Assistant:"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore_faiss.as_retriever(search_type="similarity", search_kwargs={"k": 3}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)

answer = qa({"query": query})
print_ww(answer)

  warn_deprecated(


{'query': 'When are comments required?', 'result': ' Based on the context provided, comments are
required in the following situations:\n\n- Moves:\n- Swapping Primary & Associate members but not
cancelling the former Primary \n- ATP move without changing the address or adding a wrong address
flag to either household\n\n- Transfer In: \n- Manual Transfer In (not using the Transfer In Wizard)
because status will be F instead of T\n\n- Roadside:\n- Pre-existing condition (new membership or
upgrading)\n\n- Payments:  \n- Leaving a household in Partial Paid/Collect status (taking a partial
payment only)\n- Leaving a household in Prospect or Unpaid status (not taking payment)\n- Mailing a
postdated cheque to PCC to hold for processing date\n\n- Creation of a 2nd iMIS ID (intentional):\n-
Transfer In that was previously an AMA Member \n- Converting a Child to an Associate before they are
16 years', 'source_documents': [Document(page_content='18 | P a g e  \n  \n \n      Comments  \n
\nComment

Let's ask a different question and output the result with reference to the pages in the original PDF source. 

In [13]:
query_2 = "What is the difference between eBill and email address?"

answer_2 = qa({"query": query_2})

#print_ww(answer_2)
print_ww(answer_2['result'])
answer_2['source_documents']

 Based on the context provided, the main difference between eBill and email address is:

eBill refers to the electronic/paperless billing option that replaces the paper renewal bill sent
out approximately 1 month prior to a member's AMA membership expiry date.

Email address refers to a member's email contact information. The context indicates you may need to
verify both eBill preferences and email address when making updates for a member.


[Document(page_content='15 | P a g e  \n  \n‘eBill’ is different from ‘Email.’ If you are making updates to eBill or an email address, ensure that you’re asking the \nMember if both should be updated.   \n                           \n \nCards & Bills  \nYou may order ad hoc C ards and Bills for members if required. You may either order Cards for the entire HH by \nselecting Household Card , or for an individual (s) by selecting Individual Card.  \n                                              \n \n \n       Household Card E xpiry  \n \nCards have a 3 year expiry and we print this replacement  date on the card.  If you order cards for any reason the system \nwill automatically check to see if the HH card expiry date is within 6 months from today.  If it is, it will not only order t he \ncard(s) you are requesting, it will advance the card expiry dat e by 3 years and order new cards for the entire HH.  \n \nIf this HH has a donor and requested  ‘Cards to:  Donor’, all 3 year replacement 