### This notebook aims to demonstrate how to use the JumpStart API for interviewee's answers Summarization. 

This code is inspired by [this](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_text_summarization/Amazon_JumpStart_Text_Summarization.ipynb) project.

### Step 1: Set Up
---

This notebook requires latest version of sagemaker and ipywidgets

---

In [30]:
!pip install sagemaker ipywidgets --upgrade --quiet

In [1]:
import sagemaker, boto3, json
from sagemaker import get_execution_role

aws_role = get_execution_role()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()


### Step 2: Select a model

---
We download jumpstart model_manifest file from the jumpstart s3 bucket, filter-out all the Text Summarization models and select a model for inference.

---

In [2]:
from ipywidgets import Dropdown

# download JumpStart model_manifest file.
boto3.client("s3").download_file(
    f"jumpstart-cache-prod-{aws_region}", "models_manifest.json", "models_manifest.json"
)
with open("models_manifest.json", "rb") as json_file:
    model_list = json.load(json_file)

# filter-out all the Text Summarization models from the manifest list.
text_summarization_models = []
for model in model_list:
    model_id = model["model_id"]
    if "-summarization-" in model_id and model_id not in text_summarization_models:
        text_summarization_models.append(model_id)

# display the model-ids in a dropdown to select a model for inference.
model_dropdown = Dropdown(
    options=text_summarization_models,
    value="huggingface-summarization-distilbart-cnn-6-6",
    description="Select a model",
    style={"description_width": "initial"},
    layout={"width": "max-content"},
)

In [33]:
display(model_dropdown)

Dropdown(description='Select a model', index=5, layout=Layout(width='max-content'), options=('huggingface-summ…

In [31]:
from ipywidgets import Dropdown
# model_version="*" fetches the latest version of the model
model_id, model_version = model_dropdown.value, "*"

### Step 3: Deploy an Endpoint

In [4]:
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base


endpoint_name = name_from_base(f"jumpstart-example-infer-{model_id}")

inference_instance_type = "ml.m5.xlarge"

# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)

# Retrieve the inference script uri. This includes all dependencies and scripts for model loading, inference handling etc.
deploy_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="inference"
)


# Retrieve the model uri. This includes the pre-trained model and parameters.
model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)

In [5]:
# Create the SageMaker model instance
model = Model(
    image_uri=deploy_image_uri,
    source_dir=deploy_source_uri,
    model_data=model_uri,
    entry_point="inference.py",  # entry point file in source_dir and present in deploy_source_uri
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name,
)

# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

-----!

### Step 4: Query endpoint and parse response

In [6]:
def query(model_predictor, text):
    """Query the model predictor."""

    encoded_text = text.encode("utf-8")

    query_response = model_predictor.predict(
        encoded_text,
        {
            "ContentType": "application/x-text",
            "Accept": "application/json",
        },
    )
    return query_response


def parse_response(query_response):
    """Parse response and return summary text."""

    model_predictions = json.loads(query_response)
    translation_text = model_predictions["summary_text"]
    return translation_text

### Step 5: Read Input file and summarise every response:
---
We read our input file with all responses and call model summarisation for inference.

---


In [21]:
import pandas as pd
import os

In [22]:
BUCKET='cnatest' # Or whatever you called your bucket
data_key = 'transcript_with_mapped_questions_and_answers.csv' # Where the file is within your bucket
data_location = 's3://{}/{}'.format(BUCKET, data_key)
df = pd.read_csv(data_location)

In [23]:
def summarise_response(input_text):
    # cut input text to 5000 symbols
    query_response = query(model_predictor, input_text[:5000])
    summary_text = parse_response(query_response)
    return summary_text
    

In [24]:
df['response_summary']=df['text'].apply(lambda x:summarise_response(x))

In [25]:
df.head()

Unnamed: 0,responce_to_question,text,response_summary
0,As a national savings for the healthcare syste...,if this is running correctly? And we get buy i...,Minister says 20 years in the long run we will...
1,Fees will be paid to GPS based on the health r...,start first. What I used to hear a lot of is t...,A G. P. Clinic in one of the older estates in...
2,From your point of view What is Healthier SG,I think healthier S. G. Is a whole rethinking ...,S. G. Puts the emphasis back onto the page an...
3,If there's one thing that you could you know c...,for me I think information to unify all the in...,The home recovery program to have a national ...
4,Should the typical patient be concerned about ...,over time. We will see a general increase in c...,Health care must be seen as something that co...


In [26]:
responses = df.text.values
summaries = df.response_summary

### Sneak peek:
---
Overall results do not look so good.
Anyway we will save them and we will try a different model

---

In [28]:
# print:
newline, bold, unbold = "\n", "\033[1m", "\033[0m"
for response_i, summary_i in zip(responses,summaries):
    print(f"Input text: {response_i}{newline}" f"Summary text: {bold}{summary_i}{unbold}{newline}")
    print('---')

Input text: if this is running correctly? And we get buy in from the populist I would say in the long run we will see benefit you know a decade two decades down the road for sure.I feel that it takes a generation. That's the time it takes for chronic diseases. You have to actually set in and you know perhaps develop complications.I think for complications 10 to 15 years you should already see microvascular complications. Maybe about 5 to 10 years. Micro maybe about 10 15 then. But 20 years if you can set it attitude changes as well as the whole concept of regular screening regular prevention I think you see outcomes much earlier especially when community cares involved when you have all your fitness your dietetics all this coming in together. You should see much earlier Results. I don't think we're looking at 20 years investment here. I think the ministry look at a much earlier ri on the returns actuallyWe should be seeing some of that. But I think [PII] has probably mentioned some of 

### Step 6: Save Results:

In [43]:
df.to_csv('summarisation_with_model_distilbart_cnn_6_6.csv')

### Step 7: Clean up the endpoint

In [37]:
# Delete the SageMaker endpoint
model_predictor.delete_model()
model_predictor.delete_endpoint()

---

## Lets Try different model:
---

In [47]:
display(model_dropdown)

Dropdown(description='Select a model', index=4, layout=Layout(width='max-content'), options=('huggingface-summ…

In [48]:
model_id, model_version = model_dropdown.value, "*"

In [49]:
# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)

# Retrieve the inference script uri. This includes all dependencies and scripts for model loading, inference handling etc.
deploy_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="inference"
)


# Retrieve the model uri. This includes the pre-trained model and parameters.
model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)

In [50]:
model_uri

's3://jumpstart-cache-prod-us-east-1/huggingface-infer/infer-huggingface-summarization-distilbart-cnn-12-6.tar.gz'

In [51]:
# Create the SageMaker model instance
model = Model(
    image_uri=deploy_image_uri,
    source_dir=deploy_source_uri,
    model_data=model_uri,
    entry_point="inference.py",  # entry point file in source_dir and present in deploy_source_uri
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name,
)

# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

-----!

In [52]:
BUCKET='cnatest' # Or whatever you called your bucket
data_key = 'transcript_with_mapped_questions_and_answers.csv' # Where the file is within your bucket
data_location = 's3://{}/{}'.format(BUCKET, data_key)
df = pd.read_csv(data_location)

In [53]:
df['response_summary']=df['text'].apply(lambda x:summarise_response(x))

In [54]:
responses = df.text.values
summaries = df.response_summary
# print:
newline, bold, unbold = "\n", "\033[1m", "\033[0m"
for response_i, summary_i in zip(responses,summaries):
    print(f"Input text: {response_i}{newline}" f"Summary text: {bold}{summary_i}{unbold}{newline}")
    print('---')

Input text: if this is running correctly? And we get buy in from the populist I would say in the long run we will see benefit you know a decade two decades down the road for sure.I feel that it takes a generation. That's the time it takes for chronic diseases. You have to actually set in and you know perhaps develop complications.I think for complications 10 to 15 years you should already see microvascular complications. Maybe about 5 to 10 years. Micro maybe about 10 15 then. But 20 years if you can set it attitude changes as well as the whole concept of regular screening regular prevention I think you see outcomes much earlier especially when community cares involved when you have all your fitness your dietetics all this coming in together. You should see much earlier Results. I don't think we're looking at 20 years investment here. I think the ministry look at a much earlier ri on the returns actuallyWe should be seeing some of that. But I think [PII] has probably mentioned some of 

In [55]:
df.to_csv('summarisation_with_model_distilbart_cnn_12_6.csv')

### Clean up the endpoint

In [56]:
# Delete the SageMaker endpoint
model_predictor.delete_model()
model_predictor.delete_endpoint()