## Refining Kendra Output with Language Models

In this notebook, we will use an existing Amazon Kendra Index and refine it with a language model from Hugging Face model hub. 
We will use a Question Answering model here, which will take the output from Kendra query and extract a more precise answer from Amazon Kendra output text.

In [2]:
import boto3
from PIL import Image
import json
import html
import sagemaker

s3=boto3.resource('s3')
region = boto3.session.Session().region_name
role = sagemaker.get_execution_role()

s3Bucket = sagemaker.Session().default_bucket()

In [3]:
# Define Kendra client
kendra = boto3.client('kendra')

In [4]:
indexId = '884609ed-c06c-452f-9383-b708df745995'   #remember to change to your index ID here

In [5]:
query='What is the purpose of SageMaker GroundTruth'

In [6]:
response=kendra.query(
        QueryText = query,
        IndexId = indexId, # paste the Index ID here
)

In [7]:
for query_result in response['ResultItems']:
        
        if query_result['Type']=='QUESTION_ANSWER':
            document_text = query_result['AdditionalAttributes'][1]['Value']['TextWithHighlightsValue']['Text']
            print('Type: ' + str(query_result['Type']))
            for item in query_result['DocumentAttributes']:
                if item['Key']=='_category':
                    print('Document Category: ' + item['Value']['StringValue']+'\n')
            print(document_text)
            print('-----------------------------------')
            break

        elif query_result['Type']=='ANSWER':
            document_text = query_result['AdditionalAttributes'][0]['Value']['TextWithHighlightsValue']['Text']
            print('Type: ' + str(query_result['Type']))
            for item in query_result['DocumentAttributes']:
                if item['Key']=='_category':
                    print('Document Category: ' + item['Value']['StringValue']+'\n')
            print(document_text)                        
            print('-----------------------------------')
            break
            
        elif query_result['Type']=='DOCUMENT':
            document_text = query_result['DocumentExcerpt']['Text']
            print('Type: ' + str(query_result['Type']))
            for item in query_result['DocumentAttributes']:
                if item['Key']=='_category':
                    print('Document Category: ' + item['Value']['StringValue']+'\n')
            print(document_text)
            print('-----------------------------------')            

Type: ANSWER
Document Category: Machine Learning

Amazon SageMaker 


Amazon SageMaker is a fully-managed machine learning (ML) service that enables 


developers and data scientists to quickly and easily build, train, and deploy machine 


learning models at any scale. Amazon SageMaker Ground Truth helps build training 


data sets quickly and accurately using an active learning model to label data, combining 


machine learning and human interaction to make the model progressively better.  


SageMaker provides fully-managed and pre-built Jupyter notebooks to address 


common use cases. The services come with multiple built-in, high-performance 


algorithms, and there is the AWS Marketplace for Machine Learning containing more 


than 100 additional pre-trained ML models and algorithms. You can also bring your own 


algorithms and frameworks that are built into a Docker container.
-----------------------------------


In [8]:
document_text

'Amazon SageMaker \n\n\nAmazon SageMaker is a fully-managed machine learning (ML) service that enables \n\n\ndevelopers and data scientists to quickly and easily build, train, and deploy machine \n\n\nlearning models at any scale. Amazon SageMaker Ground Truth helps build training \n\n\ndata sets quickly and accurately using an active learning model to label data, combining \n\n\nmachine learning and human interaction to make the model progressively better.  \n\n\nSageMaker provides fully-managed and pre-built Jupyter notebooks to address \n\n\ncommon use cases. The services come with multiple built-in, high-performance \n\n\nalgorithms, and there is the AWS Marketplace for Machine Learning containing more \n\n\nthan 100 additional pre-trained ML models and algorithms. You can also bring your own \n\n\nalgorithms and frameworks that are built into a Docker container.'

In [9]:
from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()

# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'Qiliang/bart-large-cnn-samsum-ChatGPT_v3',
	'HF_TASK':'text2text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.17.0',
	pytorch_version='1.10.2',
	py_version='py38',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.4xlarge' # ec2 instance type
)

----!

In [10]:
context = document_text.replace("\n","")
question = query
predictor.predict({"inputs": "context:"+ context + "question: "+ question})

[{'generated_text': 'The purpose of Amazon SageMaker GroundTruth is to help developers and data scientists build, train, and deploy machine learning models at any scale. The service provides fully-managed and pre-built Jupyter notebooks to address common use cases, as well as multiple built-in, high-performance algorithms. There is also the AWS Marketplace for Machine Learning containing more than 100 additional pre-trained ML models and algorithms.'}]

In [11]:
query = "What are the benefits of using SageMaker"

In [12]:
for query_result in response['ResultItems']:
        
        if query_result['Type']=='QUESTION_ANSWER':
            document_text = query_result['AdditionalAttributes'][1]['Value']['TextWithHighlightsValue']['Text']
            print('Type: ' + str(query_result['Type']))
            for item in query_result['DocumentAttributes']:
                if item['Key']=='_category':
                    print('Document Category: ' + item['Value']['StringValue']+'\n')
            print(document_text)
            print('-----------------------------------')
            break

        elif query_result['Type']=='ANSWER':
            document_text = query_result['AdditionalAttributes'][0]['Value']['TextWithHighlightsValue']['Text']
            print('Type: ' + str(query_result['Type']))
            for item in query_result['DocumentAttributes']:
                if item['Key']=='_category':
                    print('Document Category: ' + item['Value']['StringValue']+'\n')
            print(document_text)                        
            print('-----------------------------------')
            break
            
        elif query_result['Type']=='DOCUMENT':
            document_text = query_result['DocumentExcerpt']['Text']
            print('Type: ' + str(query_result['Type']))
            for item in query_result['DocumentAttributes']:
                if item['Key']=='_category':
                    print('Document Category: ' + item['Value']['StringValue']+'\n')
            print(document_text)
            print('-----------------------------------')            

Type: ANSWER
Document Category: Machine Learning

Amazon SageMaker 


Amazon SageMaker is a fully-managed machine learning (ML) service that enables 


developers and data scientists to quickly and easily build, train, and deploy machine 


learning models at any scale. Amazon SageMaker Ground Truth helps build training 


data sets quickly and accurately using an active learning model to label data, combining 


machine learning and human interaction to make the model progressively better.  


SageMaker provides fully-managed and pre-built Jupyter notebooks to address 


common use cases. The services come with multiple built-in, high-performance 


algorithms, and there is the AWS Marketplace for Machine Learning containing more 


than 100 additional pre-trained ML models and algorithms. You can also bring your own 


algorithms and frameworks that are built into a Docker container.
-----------------------------------


In [14]:
context = document_text.replace("\n","")
question = query
response_LM = predictor.predict({"inputs": "context:"+ context + "question: "+ question})
response_LM[0]['generated_text']

'The Amazon SageMaker service is a fully-managed machine learning (ML) service that enables developers and data scientists to build, train, and deploy machine learning models at any scale. The service provides pre-built Jupyter notebooks to address common use cases and comes with multiple built-in, high-performance algorithms. There is also the AWS Marketplace for Machine Learning containing more than 100 additional pre-trained ML models and algorithms.'

In [17]:
endpoint_name = predictor.endpoint
endpoint_name

The endpoint attribute has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


'huggingface-pytorch-inference-2023-03-27-14-38-49-570'