## Context
In this pattern we will explore how to use Aurora Postgres PGVector, to store embedding. In this example we will see how to store corpus as embedding in the vector datastore and use that in the context of the query to retrive answer for the model. For embedding we will be using Titan embedding and for llm we will be leveraging Anthropic Claude


### Pattern
We can improve upon this process by implementing an architecure called Retreival Augmented Generation (RAG). RAG retrieves data from outside the language model (non-parametric) and augments the prompts by adding the relevant retrieved data in context. 

In this notebook we explain how to approach the pattern of Question Answering to find and leverage the documents to provide answers to the user questions.

### Challenges
- How to manage large document(s) that exceed the token limit
- How to find the document(s) relevant to the question being asked

### Proposal
To the above challenges, this notebook proposes the following strategy
#### Prepare documents
![Embeddings](../embeddings_lang.png)

Before being able to answer the questions, the documents must be processed and a stored in a document store index
- Load the documents
- Process and split them into smaller chunks
- Create a numerical vector representation of each chunk using Amazon Bedrock Titan Embeddings model
- Create an index using the chunks and the corresponding embeddings
#### Ask question
![Question](../chatbot_lang.png)

When the documents index is prepared, you are ready to ask the questions and relevant documents will be fetched based on the question being asked. Following steps will be executed.
- Create an embedding of the input question
- Compare the question embedding with the embeddings in the index
- Fetch the (top N) relevant document chunks
- Add those chunks as part of the context in the prompt
- Send the prompt to the model under Amazon Bedrock
- Get the contextual answer based on the documents retrieved

### Pre-requisites

a.  will need to have created a Amazon RDS postgres database
b.  I executed this pattern against Aurora Postgres serverless v2 v15.3 . This by defaults supports IVF Flat index
c.  Once the prostgres cluster is created. Firstly make sure,the VPC's Cluster security group allows access to your device. There are a number of ways to confiugure this, but will not be diving deep in that. 

     1. Connect to the database 
     psql -h <<hostname>>  -U <<username>> -d <<databsename>>
     
     2. Create vector extensions
     CREATE EXTENSION vector;
     
     3. validate the extensions with the command \dx . It should list all extensions 
     eg:


     
     -[ RECORD 1 ]-------------------------------------------
Name        | aws_commons
Version     | 1.2
Schema      | public
Description | Common data types across AWS services

     -[ RECORD 2 ]-------------------------------------------
Name        | aws_ml
Version     | 1.0
Schema      | public
Description | ml integration

     -[ RECORD 3 ]-------------------------------------------
Name        | plpgsql
Version     | 1.0
Schema      | pg_catalog
Description | PL/pgSQL procedural language

     -[ RECORD 4 ]-------------------------------------------
Name        | vector
Version     | 0.4.1
Schema      | public
Description | vector data type and ivfflat access method


In [None]:
%pip install --no-build-isolation --force-reinstall \
    "boto3>=1.28.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57"

In [None]:
%pip install langchain>=0.1.11
%pip install pypdf==4.1.0
%pip install langchain-community faiss-cpu==1.8.0 tiktoken==0.6.0 sqlalchemy==2.0.28

### This is the driver required to store embeeded data to Vector Database

In [None]:
%pip install psycopg psycopg2-binary pgvector

In [None]:
import json
import os
import sys

import boto3
import botocore

boto3_bedrock = boto3.client('bedrock-runtime')

### The next cell we choose Claude as the llm and we use titan-embedding-model embedding format. This will be used to embedd the query and corpus

In [None]:
from langchain.document_loaders import CSVLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain.vectorstores.pgvector import PGVector, DistanceStrategy
from langchain.docstore.document import Document
from langchain.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock
import os

#Note that the best practise is to fetech from secrets manager

os.environ['PGVECTOR_DRIVER'] = 'psycopg2'
os.environ['PGVECTOR_USER'] = '<<postgres user>>'
os.environ['PGVECTOR_PASSWORD'] = '<<password>>'
os.environ['PGVECTOR_HOST'] = '<<host endpoint>>'
os.environ['PGVECTOR_PORT'] = '5432'
os.environ['PGVECTOR_DATABASE'] = '<<database name>>'

#anthropic.claude-v1
#amazon.titan-embed-text-v1
# - create the Anthropic Model for text generation

llm = Bedrock(model_id="anthropic.claude-v2", client=boto3_bedrock, model_kwargs={'max_tokens_to_sample':200})
bedrock_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1",client=boto3_bedrock)
print(bedrock_embeddings.model_id)

In [None]:
import numpy as np
import os
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader
from langchain.vectorstores.pgvector import PGVector, DistanceStrategy
from typing import List, Tuple
from langchain.vectorstores import pgvector

loader = PyPDFDirectoryLoader("./data/")


connection_string = PGVector.connection_string_from_db_params(                                                  
    driver = os.environ.get("PGVECTOR_DRIVER"),
    user = os.environ.get("PGVECTOR_USER"),                                      
    password = os.environ.get("PGVECTOR_PASSWORD"),                                  
    host = os.environ.get("PGVECTOR_HOST"),                                            
    port = os.environ.get("PGVECTOR_PORT"),                                          
    database = os.environ.get("PGVECTOR_DATABASE")                                       
)

documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
print(len(documents))
print(len(docs))

In [None]:
collection_name = "tbl_store_embedding"

print({connection_string})
db = PGVector.from_documents(
     embedding=bedrock_embeddings,
     documents=docs,
     collection_name=collection_name,
     connection_string=connection_string
)


### Quick way
You have the possibility to use the wrapper provided by LangChain which wraps around the Vector Store and takes input the LLM.
This wrapper performs the following steps behind the scences:
- Take the question as input
- Create question embedding
- Fetch relevant documents
- Stuff the documents and the question into a prompt
- Invoke the model with the prompt and generate the answer in a human readable manner.

In [None]:
from langchain.vectorstores.pgvector import PGVector
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain.indexes import VectorstoreIndexCreator
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
query = "Tell me the summary or key take away from AWS Well Architected  framework int bulletd points"


prompt_template = """

Human: Use the following pieces of context to provide a detailed respone to the question at the end
<context>
{context}
</context

Question: {question}

Assistant:"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

#print("Prompt template looks like: ", PROMPT)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(
        search_type="similarity", search_kwargs={"k": 3}
    ),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)
answer = qa({"query": query})
print(answer['result'])

answer['source_documents']