# Question and Answering
[Retrieval Augmented Question & Answering with Amazon Bedrock using LangChain](https://github.com/aws-samples/amazon-bedrock-workshop/blob/main/03_QuestionAnswering/01_qa_w_rag_claude.ipynb)

In [1]:
from sagemaker import get_execution_role

In [2]:
strSageMakerRoleName = get_execution_role().rsplit('/', 1)[-1]
print (f"SageMaker Execution Role Name: {strSageMakerRoleName}")

SageMaker Execution Role Name: AmazonSageMakerServiceCatalogProductsUseRole


In [3]:
#!wget https://preview.documentation.bedrock.aws.dev/Documentation/SDK/bedrock-python-sdk.zip
#!unzip bedrock-python-sdk.zip -d bedrock-sdk
#!rm -rf bedrock-python-sdk.zip

--2023-07-26 15:48:37--  https://preview.documentation.bedrock.aws.dev/Documentation/SDK/bedrock-python-sdk.zip
Resolving preview.documentation.bedrock.aws.dev (preview.documentation.bedrock.aws.dev)... 18.65.168.46, 18.65.168.92, 18.65.168.52, ...
Connecting to preview.documentation.bedrock.aws.dev (preview.documentation.bedrock.aws.dev)|18.65.168.46|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 249246443 (238M) [application/zip]
Saving to: ‘bedrock-python-sdk.zip’


2023-07-26 15:48:48 (23.9 MB/s) - ‘bedrock-python-sdk.zip’ saved [249246443/249246443]

Archive:  bedrock-python-sdk.zip
   creating: bedrock-sdk/reviews/
  inflating: bedrock-sdk/.unit-crt   
  inflating: bedrock-sdk/awscli-bundle.zip  
  inflating: bedrock-sdk/awscli-1.27.162.tar.gz  
  inflating: bedrock-sdk/botocore-1.29.162.tar.gz  
  inflating: bedrock-sdk/AWSCLISetup.exe  
  inflating: bedrock-sdk/boto3-1.26.162.tar.gz  
  inflating: bedrock-sdk/botocore-1.29.162-py3-none-any.whl  
  inf

In [11]:
install_needed = False

In [5]:
import sys
import IPython

if install_needed:
    print("installing deps and restarting kernel")
    !{sys.executable} -m pip install -U pip
    !{sys.executable} -m pip install -U sagemaker
    !{sys.executable} -m pip install -U ./bedrock-sdk/botocore-1.29.162-py3-none-any.whl
    !{sys.executable} -m pip install -U ./bedrock-sdk/boto3-1.26.162-py3-none-any.whl
    !{sys.executable} -m pip install -U ./bedrock-sdk/awscli-1.27.162-py3-none-any.whl
    !{sys.executable} -m pip install -U langchain
    !rm -rf bedrock-sdk

    IPython.Application.instance().kernel.do_shutdown(True)

installing deps and restarting kernel
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting pip
  Downloading pip-23.2.1-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m14.7 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.2
    Uninstalling pip-23.1.2:
      Successfully uninstalled pip-23.1.2
Successfully installed pip-23.2.1
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting sagemaker
  Downloading sagemaker-2.173.0.tar.gz (854 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m854.4/854.4 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
Collecting PyYAML~=6.0 (from sagemaker)
  Obtaining dependency information for PyYAML~=6.0 from https://files.py

In [2]:
import os
import sys
module_path = "."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww

In [3]:
import boto3
import langchain

In [4]:
bedrock_region = "us-west-2" 
bedrock_config = {
    "region_name":bedrock_region,
    "endpoint_url":"https://prod.us-west-2.frontend.bedrock.aws.dev"
}

In [5]:
boto3_bedrock = bedrock.get_bedrock_client(
    region=bedrock_config["region_name"],
    url_override=bedrock_config["endpoint_url"])
    
modelInfo = boto3_bedrock.list_foundation_models()    
print('models: ', modelInfo)

Create new client
  Using region: us-west-2
assumed_role:  None
boto3 Bedrock client successfully created!
bedrock(https://prod.us-west-2.frontend.bedrock.aws.dev)
models:  {'ResponseMetadata': {'RequestId': '2ac2acef-cdae-451c-b084-39f99943d652', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Wed, 26 Jul 2023 15:52:09 GMT', 'content-type': 'application/json', 'content-length': '256', 'connection': 'keep-alive', 'x-amzn-requestid': '2ac2acef-cdae-451c-b084-39f99943d652'}, 'RetryAttempts': 0}, 'modelSummaries': [{'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-tg1-large', 'modelId': 'amazon.titan-tg1-large'}, {'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-e1t-medium', 'modelId': 'amazon.titan-e1t-medium'}]}


In [6]:
from langchain.llms.bedrock import Bedrock

In [7]:
modelId = 'amazon.titan-tg1-large'
llm = Bedrock(model_id=modelId, client=boto3_bedrock)

In [8]:
llm('Who is the president of usa?')

'\nThe current President of the United States of America is Joe Biden.'

## Data Preparation

In [40]:
if install_needed:
    !pip install PyPDF2 --quiet

In [41]:
import PyPDF2
from io import BytesIO

In [42]:
import sagemaker, boto3, json
from sagemaker.session import Session

In [43]:
sess = sagemaker.Session()
s3_bucket = sess.default_bucket()
s3_prefix = 'docs'

In [46]:
s3_bucket

'sagemaker-ap-northeast-1-677146750822'

In [47]:
#s3_file_name = 'sample-blog.pdf'
s3_file_name = '2016-3series.pdf'
#s3_file_name = 'gen-ai-aws.pdf'

In [49]:
s3r = boto3.resource("s3")
doc = s3r.Object(s3_bucket, s3_prefix+'/'+s3_file_name)
       
contents = doc.get()['Body'].read()
reader = PyPDF2.PdfReader(BytesIO(contents))
        
raw_text = []
for page in reader.pages:
    raw_text.append(page.extract_text())
contents = '\n'.join(raw_text)  

In [50]:
#new_contents = str(contents[:8000]).replace("\n"," ") 
new_contents = str(contents).replace("\n"," ") 

#print('new_contents: ', new_contents)

In [115]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=5000,chunk_overlap=0)
texts = text_splitter.split_text(new_contents) 

In [116]:
texts[0]

"Owner's Manual for Vehicle The Ultimate Driving Machine® THE BMW 3 SERIES SEDAN. OWNER'S MANUAL. Contents A-Z Online Edition for Part no. 01 40 2 960 440 - II/15  3 Series Owner's Manual for Vehicle Thank you for choosing a BMW. The more familiar you are with your vehicle, the better control you will have on the road. We therefore strongly suggest: Read this Owner's Manual before starting off in your new BMW. Also use the Integrated Owner's Manual in your vehicle. It con‐ tains important information on vehicle operation that will help you make full use of the technical features available in your BMW. The manual also contains information designed to en‐ hance operating reliability and road safety, and to contribute to maintaining the value of your BMW. Any updates made after the editorial deadline for the printed or Integrated Owner's Manual are found in the appendix of the printed Quick Reference for the vehicle. Supplementary information can be found in the additional bro‐ chures in 

### 택스트를 Kendra로 보내기 
Kendra에 데이터 전송시 docment는 10개이내로 보낼수 있음.  Member must have length less than or equal to 10

In [117]:
requestId = "a123456"

In [128]:
kendra = boto3.client("kendra")

In [156]:
kendraIndex = "50a29d7f-f091-4340-a2cd-fa62f4752e92";

In [191]:
documents = []

In [192]:
index = 0
for t in texts[:10]:
    documents.append({
        "Id": requestId+'_'+str(index),
        "Blob": t,
        "ContentType": "PLAIN_TEXT",
    })
    index = index+1

In [193]:
result = kendra.batch_put_document(
    IndexId = kendraIndex,
    Documents = documents
)

print(result)

{'FailedDocuments': [], 'ResponseMetadata': {'RequestId': 'eb10d8a8-5bf8-488c-80a1-4c87bdc1162a', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'eb10d8a8-5bf8-488c-80a1-4c87bdc1162a', 'content-type': 'application/x-amz-json-1.1', 'content-length': '22', 'date': 'Wed, 26 Jul 2023 23:06:56 GMT'}, 'RetryAttempts': 0}}


### S3 Object를 Kendra로 보내기

In [217]:
doc_document = {
    "S3Path": {
        "Bucket": s3_bucket,
        "Key": s3_prefix+'/'+s3_file_name
    },
    "Title": "Document from client",
    "Id": requestId
}

In [218]:
documents = [
    doc_document
]

In [219]:
result = kendra.batch_put_document(
    Documents = documents,
    IndexId = kendraIndex,
    RoleArn = "arn:aws:iam::677146750822:role/service-role/AmazonSageMakerServiceCatalogProductsUseRole"
)

In [220]:
print(result)

{'FailedDocuments': [], 'ResponseMetadata': {'RequestId': 'f2de5b31-efab-4912-b9e6-0371743b9220', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'f2de5b31-efab-4912-b9e6-0371743b9220', 'content-type': 'application/x-amz-json-1.1', 'content-length': '22', 'date': 'Thu, 27 Jul 2023 00:12:29 GMT'}, 'RetryAttempts': 0}}


## Kendra

In [33]:
import boto3
from langchain.retrievers import AmazonKendraRetriever

In [34]:
kendraIndex = "50a29d7f-f091-4340-a2cd-fa62f4752e92"

In [73]:
retriever = AmazonKendraRetriever(index_id=kendraIndex)

In [78]:
query = "tell me the manual"

In [99]:
query = "tell me about kyoungsu"

In [205]:
relevant_documents = retriever.get_relevant_documents(query)

In [206]:
len(relevant_documents)

3

In [207]:
print(f'{len(relevant_documents)} documents are fetched which are relevant to the query.')
print('----')
for i, rel_doc in enumerate(relevant_documents):
    print_ww(f'## Document {i+1}: {rel_doc.page_content}.......')
    print('---')

3 documents are fetched which are relevant to the query.
----
## Document 1: Document Title: Document from client
Document Excerpt:
The vehicle identification number can also be found behind the windshield. Reporting safety defects
For US customers The following only applies to vehicles owned and operated in the US. If you believe
that your vehicle has a defect which could cause a crash or could cause in‐ jury or death, you
should immediately inform the National Highway Traffic Safety Adminis‐ tration NHTSA, in addition to
notifying BMW of North America, LLC, P.O. Box 1227, West‐ Seite 9 Notes 9 Online Edition for Part
no. 01 40 2 960 440 - II/15 wood, New Jersey 07675-1227, Telephone 1-800-831-1117. If NHTSA receives
similar complaints, it may open an investigation, and if it finds that a safety defect exists in a
group of vehicles, it may order a recall and remedy campaign. However, NHTSA cannot become involved
in individual problems between you, your dealer, or BMW of North America,

### Customisable option

In [208]:
from langchain.prompts import PromptTemplate

prompt_template = """Human: Use the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Assistant:"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

In [209]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)

In [216]:
query

'tell me about kyoungsu'

In [210]:
result = qa({"query": query})
print('result: ', result)

result:  {'query': 'tell me about kyoungsu', 'result': " Yes, he's my friend. He's from Korea. He's in the same class as me.\n", 'source_documents': [Document(page_content='Document Title: Document from client\nDocument Excerpt: \nThe vehicle identification number can also be found behind the windshield. Reporting safety defects For US customers The following only applies to vehicles owned and operated in the US. If you believe that your vehicle has a defect which could cause a crash or could cause in‐ jury or death, you should immediately inform the National Highway Traffic Safety Adminis‐ tration NHTSA, in addition to notifying BMW of North America, LLC, P.O. Box 1227, West‐ Seite 9 Notes 9 Online Edition for Part no. 01 40 2 960 440 - II/15 wood, New Jersey 07675-1227, Telephone 1-800-831-1117. If NHTSA receives similar complaints, it may open an investigation, and if it finds that a safety defect exists in a group of vehicles, it may order a recall and remedy campaign. However, NHT

In [211]:
source_documents = result['source_documents']
print(source_documents)

[Document(page_content='Document Title: Document from client\nDocument Excerpt: \nThe vehicle identification number can also be found behind the windshield. Reporting safety defects For US customers The following only applies to vehicles owned and operated in the US. If you believe that your vehicle has a defect which could cause a crash or could cause in‐ jury or death, you should immediately inform the National Highway Traffic Safety Adminis‐ tration NHTSA, in addition to notifying BMW of North America, LLC, P.O. Box 1227, West‐ Seite 9 Notes 9 Online Edition for Part no. 01 40 2 960 440 - II/15 wood, New Jersey 07675-1227, Telephone 1-800-831-1117. If NHTSA receives similar complaints, it may open an investigation, and if it finds that a safety defect exists in a group of vehicles, it may order a recall and remedy campaign. However, NHTSA cannot become involved in individual problems between you, your dealer, or BMW of North America, LLC. To contact NHTSA, you may call the Vehicle S

In [212]:
print('output: ', result['result'])

output:   Yes, he's my friend. He's from Korea. He's in the same class as me.



In [213]:
prompt_template = """

Human: This is a friendly conversation between a human and an AI. 
The AI is talkative and provides specific details from its context but limits it to 240 tokens.
If the AI does not know the answer to a question, it truthfully says it 
does not know.

Assistant: OK, got it, I'll be a talkative truthful AI assistant.

Human: Here are a few documents in <documents> tags:
<documents>
{context}
</documents>
Based on the above documents, provide a detailed answer for, {question} Answer "don't know" 
if not present in the document. 

Assistant:"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)

In [214]:
query

'tell me about kyoungsu'

In [215]:
result = qa({"query": query})
print('output: ', result['result'])

output:   Don't know.
