# Retrieval Augmented Question & Answering with Amazon Bedrock using LangChain

> *This notebook should work well with the **`Data Science 3.0`** kernel in SageMaker Studio*

### Context
Previously we saw that the model told us how to to change the tire, however we had to manually provide it with the relevant data and provide the contex ourselves. We explored the approach to leverage the model availabe under Bedrock and ask questions based on it's knowledge learned during training as well as providing manual context. While that approach works with short documents or single-ton applications, it fails to scale to enterprise level question answering where there could be large enterprise documents which cannot all be fit into the prompt sent to the model. 

### Pattern
We can improve upon this process by implementing an architecure called Retreival Augmented Generation (RAG). RAG retrieves data from outside the language model (non-parametric) and augments the prompts by adding the relevant retrieved data in context. 

In this notebook we explain how to approach the pattern of Question Answering to find and leverage the documents to provide answers to the user questions.

### Challenges
- How to manage large document(s) that exceed the token limit
- How to find the document(s) relevant to the question being asked

### Proposal
To the above challenges, this notebook proposes the following strategy
#### Prepare documents
![Embeddings](./images/Embeddings_lang.png)

Before being able to answer the questions, the documents must be processed and a stored in a document store index
- Load the documents
- Process and split them into smaller chunks
- Create a numerical vector representation of each chunk using Amazon Bedrock Titan Embeddings model
- Create an index using the chunks and the corresponding embeddings
#### Ask question
![Question](./images/Chatbot_lang.png)

When the documents index is prepared, you are ready to ask the questions and relevant documents will be fetched based on the question being asked. Following steps will be executed.
- Create an embedding of the input question
- Compare the question embedding with the embeddings in the index
- Fetch the (top N) relevant document chunks
- Add those chunks as part of the context in the prompt
- Send the prompt to the model under Amazon Bedrock
- Get the contextual answer based on the documents retrieved

## Usecase
#### Dataset
To explain this architecture pattern we are using the documents from IRS. These documents explain topics such as:
- Original Issue Discount (OID) Instruments
- Reporting Cash Payments of Over $10,000 to IRS
- Employer's Tax Guide

#### Persona
Let's assume a persona of a layman who doesn't have an understanding of how IRS works and if some actions have implications or not.

The model will try to answer from the documents in easy language.


## Implementation
In order to follow the RAG approach this notebook is using the LangChain framework where it has integrations with different services and tools that allow efficient building of patterns such as RAG. We will be using the following tools:

- **LLM (Large Language Model)**: Anthropic Claude V1 available through Amazon Bedrock

  This model will be used to understand the document chunks and provide an answer in human friendly manner.
- **Embeddings Model**: Amazon Titan Embeddings available through Amazon Bedrock

  This model will be used to generate a numerical representation of the textual documents
- **Document Loader**: PDF Loader available through LangChain

  This is the loader that can load the documents from a source, for the sake of this notebook we are loading the sample files from a local path. This could easily be replaced with a loader to load documents from enterprise internal systems.

- **Vector Store**: FAISS available through LangChain

  In this notebook we are using this in-memory vector-store to store both the embeddings and the documents. In an enterprise context this could be replaced with a persistent store such as AWS OpenSearch, RDS Postgres with pgVector, ChromaDB, Pinecone or Weaviate.
- **Index**: VectorIndex

  The index helps to compare the input embedding and the document embeddings to find relevant document
- **Wrapper**: wraps index, vector store, embeddings model and the LLM to abstract away the logic from the user.

## Setup

Before running the rest of this notebook, you'll need to run the cells below to (ensure necessary libraries are installed and) connect to Bedrock.

For more details on how the setup works and ⚠️ **whether you might need to make any changes**, refer to the [Bedrock boto3 setup notebook](../00_Intro/bedrock_boto3_setup.ipynb) notebook.

In this notebook, we'll also need some extra dependencies:

- [FAISS](https://github.com/facebookresearch/faiss), to store vector embeddings
- [PyPDF](https://pypi.org/project/pypdf/), for handling PDF files

In [1]:
# Make sure you ran `download-dependencies.sh` from the root of the repository first!

%pip install --quiet "faiss-cpu>=1.7,<2" langchain==0.0.249 "pypdf>=3.8,<4"

Note: you may need to restart the kernel to use updated packages.


In [14]:
import json
import os
import sys

import boto3

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww


# ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----

# os.environ["AWS_DEFAULT_REGION"] = "<REGION_NAME>"  # E.g. "us-east-1"
# os.environ["AWS_PROFILE"] = "<YOUR_PROFILE>"
# os.environ["BEDROCK_ASSUME_ROLE"] = "<YOUR_ROLE_ARN>"  # E.g. "arn:aws:..."
# os.environ["BEDROCK_ENDPOINT_URL"] = "<YOUR_ENDPOINT_URL>"  # E.g. "https://..."


boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    endpoint_url=os.environ.get("BEDROCK_ENDPOINT_URL", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None),
)

Create new client
  Using region: None
boto3 Bedrock client successfully created!
bedrock(https://bedrock.us-east-1.amazonaws.com)


## Configure langchain

We begin with instantiating the LLM and the Embeddings model. Here we are using Anthropic Claude for text generation and Amazon Titan for text embedding.

Note: It is possible to choose other models available with Bedrock. You can replace the `model_id` as follows to change the model.

`llm = Bedrock(model_id="amazon.titan-tg1-large")`

Available model IDs include:

- `amazon.titan-tg1-large`
- `ai21.j2-grande-instruct`
- `ai21.j2-jumbo-instruct`
- `anthropic.claude-instant-v1`
- `anthropic.claude-v1`

In [15]:
# We will be using the Titan Embeddings Model to generate our Embeddings.
from langchain.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock

# - create the Anthropic Model
llm = Bedrock(model_id="anthropic.claude-v2", client=boto3_bedrock, 
              model_kwargs={'max_tokens_to_sample':200})
bedrock_embeddings = BedrockEmbeddings(client=boto3_bedrock)

In [98]:
llm, bedrock_embeddings

(Bedrock(cache=None, verbose=False, callbacks=None, callback_manager=None, tags=None, metadata=None, client=<botocore.client.Bedrock object at 0x7f073138ebc0>, region_name=None, credentials_profile_name=None, model_id='anthropic.claude-v2', model_kwargs={'max_tokens_to_sample': 200}, endpoint_url=None),
 BedrockEmbeddings(client=<botocore.client.Bedrock object at 0x7f073138ebc0>, region_name=None, credentials_profile_name=None, model_id='amazon.titan-e1t-medium', model_kwargs=None, endpoint_url=None))

## Data Preparation
Let's first download some of the files to build our document store. For this example we will be using public IRS documents from [here](https://www.irs.gov/publications).

After downloading we can load the documents with the help of [DirectoryLoader from PyPDF available under LangChain](https://python.langchain.com/en/latest/reference/modules/document_loaders.html) and splitting them into smaller chunks.

Note: The retrieved document/text should be large enough to contain enough information to answer a question; but small enough to fit into the LLM prompt. Also the embeddings model has a limit of the length of input tokens limited to 512 tokens, which roughly translates to ~2000 characters. For the sake of this use-case we are creating chunks of roughly 1000 characters with an overlap of 100 characters using [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html).

In [161]:
import numpy as np
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

loader = PyPDFDirectoryLoader("./data_kr/")

documents = loader.load()
# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 210,
    chunk_overlap  = 50,
)
docs = text_splitter.split_documents(documents)

In [162]:
avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)
avg_char_count_pre = avg_doc_length(documents)
avg_char_count_post = avg_doc_length(docs)
print(f'Average length among {len(documents)} documents loaded is {avg_char_count_pre} characters.')
print(f'After the split we have {len(docs)} documents more than the original {len(documents)}.')
print(f'Average length among {len(docs)} documents (after split) is {avg_char_count_post} characters.')

Average length among 9 documents loaded is 1184 characters.
After the split we have 66 documents more than the original 9.
Average length among 66 documents (after split) is 176 characters.


We had 3 PDF documents which have been split into smaller ~500 chunks.

Now we can see how a sample embedding would look like for one of those chunks

In [163]:
print("docs[0].page_content: \n", docs[0].page_content)

docs[0].page_content: 
 새로운 정원을 시작하는 데 도움이 되는 몇 가지 팁을 알려드리겠습니다.  - 먼저 정원의 위치를 잘 선택하세요. 해당 장소가 충분한 햇빛을 받는지, 물 공급이 용이한지 확인하세요.  - 토양의 상태를 테스트하여 pH 수준과 영양분을 확인하세요. 필요하다면 토양 개선제를 추가하세요.  - 정원 계획을 세우세요. 어떤 종류의 식물을 키울지, 어디에 심을지 미리 계획하세요.  -


In [164]:
# docs

In [165]:
# docs[0].page_content
# kr_sent = '새로운 정원을 시작하는 데 도움이 되는 몇 가지 팁을 알려드리겠습니다.'
# kr_sent = '새로운 정원을 시작하는 데 도움이 되는 몇 가지 팁을 알려드리겠습니다.  - 먼저 정원의 위치를 잘 선택하세요. 해당 장소가 충분한 햇빛을 받는지, 물 공급이 용이한지 확인하세요.  - 토양의 상태를 테스트하여 pH 수준과 영양분을 확인하세요. 필요하다면 토양 개선제를 추가하세요.  - 정원 계획을 세우세요. 어떤 종류의 식물을 키울지, 어디에 심을지 미리 계획하세요.  - 해당 식물에 맞는 토질, 일조량, 물, 영양분을 공급하세요.  - 제초와 병충해 관리를 상시 확인하세요.  - 수확기에는 과수나 채소를 정기적으로 관리,'
# kr_sent = "새로운 정원을 시작하는 데 도움이 되는 몇 가지 팁을 알려드리겠습니다.\
# - 먼저 정원의 위치를 잘 선택하세요. 해당 장소가 충분한 햇빛을 받는지, 물 공급이 용이한지 확인하세요.\
# - 토양의 상태를 테스트하여 pH 수준과 영양분을 확인하세요. 필요하다면 토양 개선제를 추가하세요.\
# - 정원 계획을 세우세요. 어떤 종류의 식물을 키울지, 어디에 심을지 미리 계획하세요.\
# - 해당 식물에 맞는 토질, 일조량, 물, 영양분을 공급하세요.\
# - 제초와 병충해 관리가" #  상시 확인하세요" # .수확기에는 과수나 채소를 정기적으로 관리,"

In [166]:
# sample_embedding = np.array(bedrock_embeddings.embed_query(kr_sent))
# print("Sample embedding of a document chunk: ", sample_embedding)
# print("Size of the embedding: ", sample_embedding.shape)

In [167]:
sample_embedding = np.array(bedrock_embeddings.embed_query(docs[0].page_content))
print("Sample embedding of a document chunk: ", sample_embedding)
print("Size of the embedding: ", sample_embedding.shape)

Sample embedding of a document chunk:  [-0.03857422 -0.13183594  0.23828125 ...  0.578125    0.04272461
  0.01708984]
Size of the embedding:  (4096,)


Following the similar pattern embeddings could be generated for the entire corpus and stored in a vector store.

This can be easily done using [FAISS](https://github.com/facebookresearch/faiss) implementation inside [LangChain](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/faiss.html) which takes  input the embeddings model and the documents to create the entire vector store. Using the Index Wrapper we can abstract away most of the heavy lifting such as creating the prompt, getting embeddings of the query, sampling the relevant documents and calling the LLM. [VectorStoreIndexWrapper](https://python.langchain.com/en/latest/modules/indexes/getting_started.html#one-line-index-creation) helps us with that.

**⚠️⚠️⚠️ NOTE: it might take few minutes to run the following cell ⚠️⚠️⚠️**

In [168]:
from langchain.chains.question_answering import load_qa_chain
from langchain.vectorstores import FAISS
from langchain.indexes import VectorstoreIndexCreator
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

vectorstore_faiss = FAISS.from_documents(
    docs,
    bedrock_embeddings,
)

wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss)

In [169]:
wrapper_store_faiss

VectorStoreIndexWrapper(vectorstore=<langchain.vectorstores.faiss.FAISS object at 0x7f0730654520>)

## Question Answering

Now that we have our vector store in place, we can start asking questions.

In [170]:
# query = "Is it possible that I get sentenced to jail due to failure in filings?"
# query = "새로운 정원을 어떻게 시작할까요?"
query = "3년 시점의 해지 환급금은 얼마에요?"

The first step would be to create an embedding of the query such that it could be compared with the documents

In [171]:
query_embedding = vectorstore_faiss.embedding_function(query)
np.array(query_embedding)

array([ 0.01770019, -0.03491211,  0.21191406, ...,  0.20996094,
       -0.01953125, -0.00665283])

We can use this embedding of the query to then fetch relevant documents.
Now our query is represented as embeddings we can do a similarity search of our query against our data store providing us with the most relevant information.

In [172]:
relevant_documents = vectorstore_faiss.similarity_search_by_vector(query_embedding)
print(f'{len(relevant_documents)} documents are fetched which are relevant to the query.')
print('----')
for i, rel_doc in enumerate(relevant_documents):
    print_ww(f'## Document {i+1}: {rel_doc.page_content}.......')
    print('---')

4 documents are fetched which are relevant to the query.
----
## Document 1: 이미 납입한 보험료는 포함하지 않음) 중 큰 금액을 진단보험금으로 지급합니다.5대질병 집중보장
뇌출혈, 급성심근경색증, 말기신부전증, 말기간질환, 말기폐질환 진단시
안심하고 치료받을 수 있도록 일시금과 매월 생활자금(3년)을 보장해 드립니다.
맞춤형 종합보장
생활보장특약 3개플랜(암/간병/상해플랜) 외 각종 입원 및 수술보장 등.......
---
## Document 2: 일시금 및 월생활자금의 50% 지급
만기지급금만기 생존시
※ 최초계약(만기지급형)에 한함300만원
※ 보험기간 중 피보험자가 사망한 경우에는 사망 당시의 책임준비금을 계약자에게 지급하고 이 계약은 더 이상 효력을 가지지 않습니다.
※  보험료 납입기간 중 피보험자가 장해분류표 중 동일한 재해 또는 재해 이외의 동일한 원인으로 여러 신체부위의 장해지급률.......
---
## Document 3: 무배당  교보다이렉트플러스건강보험(갱신형)  3보장내용
주계약(갱신형) 보장내용                   보험가입금액 1,500만원 기준
급부명 지급사유 지급금액
뇌출혈
진단보험금뇌출혈로 진단확정시
※ 다만, 최초 1회의 진단확정에 한함
※ 갱신계약의 경우 [뇌출혈 보장계약]으로
     갱신된 경우에 한함일시금  1,500 만원
+.......
---
## Document 4: ※ 보험료 납입이 면제되었다 하더라도 갱신되는 경우 갱신계약의 보험료는 납입해야 함
아플 때 힘이 되도록 특정산정특례대상보장
특약가입시 중증질환자[뇌혈관 및 심장질환] · 희귀질환자 산정특례대상으로
등록되었을 경우 경제적 부담을 덜고 제대로 치료 받을 수 있도록 보장합니다........
---


Now we have the relevant documents, it's time to use the LLM to generate an answer based on these documents. 

We will take our inital prompt, together with our relevant documents which were retreived based on the results of our similarity search. We then by combining these create a prompt that we feed back to the model to get our result. At this point our model should give us highly informed information on how we can change the tire of our specific car as it was outlined in our manual.

LangChain provides an abstraction of how this can be done easily.

### Quick way
You have the possibility to use the wrapper provided by LangChain which wraps around the Vector Store and takes input the LLM.
This wrapper performs the following steps behind the scences:
- Takes input the question
- Create question embedding
- Fetch relevant documents
- Stuff the documents and the question into a prompt
- Invoke the model with the prompt and generate the answer in a human readable manner.

In [173]:
answer = wrapper_store_faiss.query(question=query, llm=llm)
print_ww(answer)

 주어진 정보로는 3년 시점의 해지 환급금을 알 수 없습니다. 해지 환급금은 보험 가입 시점, 납입 보험료, 보험 기간 등 여러 요인에 따라 달라질 수 있기 때문에 확정적으로
말씀드리기 어렵습니다. 정확한 해지 환급금은 보험사에 문의하셔야 합니다.


Let's ask a different question:

In [174]:
# query_2 = "What is the difference between market discount and qualified stated interest"
query_2 = "새로운 정원을 어떻게 시작해요?"

In [175]:
answer_2 = wrapper_store_faiss.query(question=query_2, llm=llm)
print_ww(answer_2)

 새로운 정원을 시작하는 데 이 글이 제안하는 몇 가지 유용한 팁은 다음과 같습니다:

- 정원의 위치를 잘 선택하세요. 햇빛과 물 공급이 충분한지 확인하세요.

- 토양의 상태를 테스트하세요. pH 수준과 영양분을 확인하세요.

- 정원 계획을 세우세요. 식물 종류와 위치를 미리 계획하세요.

- 정원 일지를 작성하여 식물의 성장과정을 기록하세요.

- 토


### Customisable option
In the above scenario you explored the quick and easy way to get a context-aware answer to your question. Now let's have a look at a more customizable option with the helpf of [RetrievalQA](https://python.langchain.com/en/latest/modules/chains/index_examples/vector_db_qa.html) where you can customize how the documents fetched should be added to prompt using `chain_type` parameter. Also, if you want to control how many relevant documents should be retrieved then change the `k` parameter in the cell below to see different outputs. In many scenarios you might want to know which were the source documents that the LLM used to generate the answer, you can get those documents in the output using `return_source_documents` which returns the documents that are added to the context of the LLM prompt. `RetrievalQA` also allows you to provide a custom [prompt template](https://python.langchain.com/en/latest/modules/prompts/prompt_templates/getting_started.html) which can be specific to the model.

Note: In this example we are using Anthropic Claude as the LLM under Amazon Bedrock, this particular model performs best if the inputs are provided under `Human:` and the model is requested to generate an output after `Assistant:`. In the cell below you see an example of how to control the prompt such that the LLM stays grounded and doesn't answer outside the context.

In [180]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

prompt_template = """Human: Use the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Assistant:"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore_faiss.as_retriever(
        search_type="similarity", search_kwargs={"k": 3}
    ),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)
# query = "Is it possible that I get sentenced to jail due to failure in filings?"
query = "5대 질병 진단시 지급되는 금액은 얼마인가요?"
result = qa({"query": query})
print_ww(result['result'])

 일시금 1,500만원 + 월생활자금 50만원 * 36개월 = 1,800만원이 5대 질병 진단시 지급됩니다.


In [181]:
result['source_documents']

[Document(page_content='일시금 및 월생활자금의 50% 지급\n만기지급금만기 생존시\n※ 최초계약(만기지급형)에 한함300만원\n※ 보험기간 중 피보험자가 사망한 경우에는 사망 당시의 책임준비금을 계약자에게 지급하고 이 계약은 더 이상 효력을 가지지 않습니다.\n※  보험료 납입기간 중 피보험자가 장해분류표 중 동일한 재해 또는 재해 이외의 동일한 원인으로 여러 신체부위의 장해지급률', metadata={'source': 'data_kr/book_SMMDINLM239.pdf', 'page': 2}),
 Document(page_content='말기폐질환으로 진단확정시\n※ 다만, 최초 1회의 진단확정에 한함\n※ 갱신계약의 경우 [말기폐질환 보장계약]\n     으로 갱신된 경우에 한함일시금  1,500 만원 \n+ \n월생활자금  50만원 × 36개월 확정지급\n(총지급금액 1,800만원)  \n※ 다만, 최초 계약은 가입 후 1년 미만 진단확정시  \n     일시금 및 월생활자금의 50% 지급\n만기지급금만기 생존시', metadata={'source': 'data_kr/book_SMMDINLM239.pdf', 'page': 2}),
 Document(page_content='을 더하여 50% 이상 장해상태가 되었을 경우 또는 뇌출혈, 급성심근경색증, 말기신부전증, 말기간질환 또는 말기폐질환으로 \n진단이 확정된 경우에는 차회 이후의 보험료 납입을 면제합니다. 그러나 보험료의 납입이 면제되었다 하더라도 갱신시 갱신\n계약의 보험료는 납입하여야 하며, 최초계약의 보장개시일 이후 이미 보험료의 납입을 면제한 장해상태 또는 최초계약의', metadata={'source': 'data_kr/book_SMMDINLM239.pdf', 'page': 2})]

In [185]:
def show_context_used(context_list):
    for context in context_list:
        print(type(context))
        print(context)
        
show_context_used(result['source_documents'])        

<class 'langchain.schema.document.Document'>
page_content='일시금 및 월생활자금의 50% 지급\n만기지급금만기 생존시\n※ 최초계약(만기지급형)에 한함300만원\n※ 보험기간 중 피보험자가 사망한 경우에는 사망 당시의 책임준비금을 계약자에게 지급하고 이 계약은 더 이상 효력을 가지지 않습니다.\n※  보험료 납입기간 중 피보험자가 장해분류표 중 동일한 재해 또는 재해 이외의 동일한 원인으로 여러 신체부위의 장해지급률' metadata={'source': 'data_kr/book_SMMDINLM239.pdf', 'page': 2}
<class 'langchain.schema.document.Document'>
page_content='말기폐질환으로 진단확정시\n※ 다만, 최초 1회의 진단확정에 한함\n※ 갱신계약의 경우 [말기폐질환 보장계약]\n     으로 갱신된 경우에 한함일시금  1,500 만원 \n+ \n월생활자금  50만원 × 36개월 확정지급\n(총지급금액 1,800만원)  \n※ 다만, 최초 계약은 가입 후 1년 미만 진단확정시  \n     일시금 및 월생활자금의 50% 지급\n만기지급금만기 생존시' metadata={'source': 'data_kr/book_SMMDINLM239.pdf', 'page': 2}
<class 'langchain.schema.document.Document'>
page_content='을 더하여 50% 이상 장해상태가 되었을 경우 또는 뇌출혈, 급성심근경색증, 말기신부전증, 말기간질환 또는 말기폐질환으로 \n진단이 확정된 경우에는 차회 이후의 보험료 납입을 면제합니다. 그러나 보험료의 납입이 면제되었다 하더라도 갱신시 갱신\n계약의 보험료는 납입하여야 하며, 최초계약의 보장개시일 이후 이미 보험료의 납입을 면제한 장해상태 또는 최초계약의' metadata={'source': 'data_kr/book_SMMDINLM239.pdf', 'page': 2}


## Conclusion
Congratulations on completing this moduel on retrieval augmented generation! This is an important technique that combines the power of large language models with the precision of retrieval methods. By augmenting generation with relevant retrieved examples, the responses we recieved become more coherent, consistent and grounded. You should feel proud of learning this innovative approach. I'm sure the knowledge you've gained will be very useful for building creative and engaging language generation systems. Well done!

In the above implementation of RAG based Question Answering we have explored the following concepts and how to implement them using Amazon Bedrock and it's LangChain integration.

- Loading documents and generating embeddings to create a vector store
- Retrieving documents to the question
- Preparing a prompt which goes as input to the LLM
- Present an answer in a human friendly manner

### Take-aways
- Experiment with different Vector Stores
- Leverage various models available under Amazon Bedrock to see alternate outputs
- Explore options such as persistent storage of embeddings and document chunks
- Integration with enterprise data stores

# Thank You