# LLM Selection and Deployment with Prompt Engineering

---

This workshop uses SageMaker Notebook, and please ensure the kernel is set to **conda_python3**.

Many use cases, such as building a chatbot, require text-to-text generation models like **[BloomZ 7B1](https://huggingface.co/bigscience/bloomz-7b1)**, **[Flan T5 XXL](https://huggingface.co/google/flan-t5-xxl)**, and **[Flan T5 UL2](https://huggingface.co/google/flan-ul2)** to respond to user questions with insightful answers. 

While the **BloomZ 7B1**, **Flan T5 XXL**, and **Flan T5 UL2** models have gained substantial general knowledge during training, there is often a need to ingest and employ a large library of more specific information.

In this notebook we will demonstrate:

1. How to deploy Large Language Models (LLMs) in SageMaker Jumpstart; 

2. Common use cases of LLMs in the post call scenario;

3. Ask a question to LLMs with or without providing the examples. 

### Contents

- [1. Deploy Large Language Models (LLMs) in SageMaker JumpStart](#1.-Deploy-Large-Language-Models-in-SageMaker-JumpStart)
- [2. Common Use Cases of LLMs](#2.-Common-Use-Cases-of-LLMs)
- [3. Ask Your Questions to LLMs](#3.-Ask-Your-Questions-to-LLMs)
- [4. Delete the Endpoint](#4.-Delete-the-Endpoint)

**Note**
* This notebook serves as a template so that you can easily replace the example dataset with your own to build a custom question and answering application.
* This lab will take you 20 mins (10 mins deployment + 10 mins testing model).

## 1. Deploy Large Language Models in SageMaker JumpStart

---

To better illustrate the idea, let's first deploy all the models required to perform the demo. 

In the workshop, we will deploy Flan T5 Small and Flan T5 Base and make comparison between them.



In [None]:
!pip install --upgrade sagemaker --quiet
!pip install ipywidgets==7.0.0 --quiet

In [None]:
import time
import json
import boto3
import sagemaker
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.session import Session
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
model_version = "*"

sagemaker = boto3.client('sagemaker')

In [None]:
def query_endpoint_with_json_payload(encoded_json, endpoint_name, content_type="application/json"):
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType=content_type, Body=encoded_json
    )
    return response


def parse_response_model_flan_t5(query_response):
    model_predictions = json.loads(query_response["Body"].read())
    generated_text = model_predictions["generated_texts"]
    return generated_text


def parse_response_multiple_texts_bloomz(query_response):
    generated_text = []
    model_predictions = json.loads(query_response["Body"].read())
    for x in model_predictions[0]:
        generated_text.append(x["generated_text"])
    return generated_text

You can deploy more models — Flan T5 XL, BloomZ 7B1, and Flan UL2 — as large language models (LLMs) by yourself to compare their performances. To do so, you need to modify the `_MODEL_CONFIG_` dictionary defined as follows.

You may check [avaliable models](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html) on Amazon SageMaker JumpStart to get the avaliable mode list, and [g5 pricing](https://aws.amazon.com/ec2/instance-types/g5/) to estimate the budget.

In [None]:
_MODEL_CONFIG_ = {
    "huggingface-text2text-flan-t5-small": {
        "instance type": "ml.g5.xlarge",
        "env": {"TS_DEFAULT_WORKERS_PER_MODEL": "1"},
        "parse_function": parse_response_model_flan_t5,
        "prompt": """Answer based on context:\n\n{context}\n\n{question}""",
    },
    "huggingface-text2text-flan-t5-base": {
        "instance type": "ml.g5.2xlarge",
        "env": {},
        "parse_function": parse_response_model_flan_t5,
        "prompt": """Answer based on context:\n\n{context}\n\n{question}""",
    }
}

In [None]:
newline, bold, unbold = "\n", "\033[1m", "\033[0m"

endpoints = sagemaker.list_endpoints()['Endpoints']

if not endpoints:
    
    for model_id in _MODEL_CONFIG_.keys():

        print(model_id)
        endpoint_name = name_from_base(f"jp-{model_id}", short=False)
        inference_instance_type = _MODEL_CONFIG_[model_id]["instance type"]

        print(endpoint_name)
        # Retrieve the inference container uri. This is the base HuggingFace container image for the default model above.
        deploy_image_uri = image_uris.retrieve(
            region=None,
            framework=None,  # automatically inferred from model_id
            image_scope="inference",
            model_id=model_id,
            model_version=model_version,
            instance_type=inference_instance_type,
        )
        # Retrieve the model uri.
        model_uri = model_uris.retrieve(
            model_id=model_id, model_version=model_version, model_scope="inference"
        )

        model_inference = Model(
            image_uri=deploy_image_uri,
            model_data=model_uri,
            role=aws_role,
            predictor_cls=Predictor,
            name=endpoint_name,
            env=_MODEL_CONFIG_[model_id]["env"],
        )
        model_predictor_inference = model_inference.deploy(
            initial_instance_count=1,
            instance_type=inference_instance_type,
            predictor_cls=Predictor,
            endpoint_name=endpoint_name,
        )

        print(f"{bold}Model {model_id} has been deployed successfully.{unbold}{newline}")
        _MODEL_CONFIG_[model_id]["endpoint_name"] = endpoint_name
        print("---")

else:
    for endpoint in endpoints:
        endpoint_name = endpoint['EndpointName']
        print(endpoint_name)
        
        for model_id in _MODEL_CONFIG_.keys():
            if model_id in endpoint_name:
                _MODEL_CONFIG_[model_id]["endpoint_name"] = endpoint_name
        
    print("---")
        

## 2. Common Use Cases of LLMs

---

- Text summarization
- Common sense reasoning
- Question answering
- Sentiment classification
- Translation
- Pronoun resolution (代名詞解析)
- Text generation based on articles
- Imaginary article based on the title

Here are more sample queries for your reference: [Zero-shot prompting for the Flan-T5 foundation model in Amazon SageMaker JumpStart](https://aws.amazon.com/blogs/machine-learning/zero-shot-prompting-for-the-flan-t5-foundation-model-in-amazon-sagemaker-jumpstart/)

### Compare the results from different LLMs

Now, let's test the two models - Flan T5 Small and Flan T5 Base - with three sample prompts in the post call scenario. You can check the `transcripts` folder for the raw data of all call transcripts.

In [None]:
# sample transcript 1
f = open("transcripts/neutral-short.txt", "r")
transcript_account = f.read()

# sample transcript 2
f = open("transcripts/negative-refund.txt", "r")
transcript_neg_refund = f.read()

# sample transcript 3
f = open("transcripts/positive-partial-refund.txt", "r")
transcript_pos_refund = f.read()

Here, we have designed two prompts for different intents: (1) intent detection in calls and (2) sentiment analysis.

In [None]:
# purpose 1
prompt_intent = "Based on the transcript, what's the purpose of the call?"

# purpose 2
prompt_sentiment = "Based on the transcript, what's the sentiment of the customer?"

Let's merge the transcript and the instruction together. 

When adjusting the hyperparameters of the LLM, you can experiment by modifying the values within the `payload` dictionary.

In [None]:
question = """
Here is what customer said in the call: 
{transcript}
    
{purpose}
"""

In [None]:
# TODO : Change the prompt
prompt = question.format(transcript=transcript_pos_refund, purpose=prompt_intent)

payload = {
    "text_inputs": prompt,
    "max_length": 100,
    "num_return_sequences": 1,
    "top_k": 10,
    "top_p": 0.95, #0.95,
    "do_sample": True,
}

Let's send the prompt to both models.

In [None]:
for model_id in _MODEL_CONFIG_:
    endpoint_name = _MODEL_CONFIG_[model_id]["endpoint_name"]
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = _MODEL_CONFIG_[model_id]["parse_function"](query_response)
    print(f"For model: {model_id}, the generated output is: {generated_texts[0]}\n")

print("---")
print(prompt)

## 3. Ask Your Questions to LLMs

---

Experiment with your own prompts and observe the responses you receive from both models. Explore zero-shot and few-shot prompting strategies to assess the performance of the two models.

In [None]:
WITH_TAG_INFO = True

if WITH_TAG_INFO:
    prompt = """
    This is a sentiment analysis. Please choose a tag provided below as your answer:
    POSITIVE
    NEUTRAL
    NEGATIVE

    What's the sentiment of the below statement:

    The weather conditions are expected to remain stable with clear skies and gentle breezes.

    Answer:
    """
else: 
    prompt = """
    This is a sentiment analysis. 

    What's the sentiment of the below statement:

    The weather conditions are expected to remain stable with clear skies and gentle breezes.

    Answer:
    """

In [None]:
payload = {
    "text_inputs": prompt,
    "max_length": 100,
    "num_return_sequences": 1,
    "top_k": 1,
    "top_p": 1.0, #0.95,
    "do_sample": False,
}

In [None]:
for model_id in _MODEL_CONFIG_:
    endpoint_name = _MODEL_CONFIG_[model_id]["endpoint_name"]
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = _MODEL_CONFIG_[model_id]["parse_function"](query_response)
    print(f"For model: {model_id}, the generated output is: {generated_texts[0]}\n")

print("---")
print(prompt)

---

### More examples

Below are some examples you can try:

Example-1
Input Text: "translate hello in french:"
Model Prediction: 'generated_text': 'Bonjour'

Example-2
Input Text : "A step by step recipe to make bolognese pasta:"
Model Prediction: 'generated_text': 'Toss the pasta with the sauce, then add the meat and toss again.'  

Example-3
Input Text : "Tell me the steps to make a pizze"
Model Prediction: 'generated_text': 'Preheat oven to 400°F. Grease a 9x13-inch baking pan'  

In [None]:
# TODO: replace with the given example or your own prompt
prompt = "Tell me the steps to make a pizze"

In [None]:
payload = {
    "text_inputs": prompt,
    "max_length": 100,
    "num_return_sequences": 1,
    "top_k": 1,
    "top_p": 1.0, #0.95,
    "do_sample": False,
}

In [None]:
for model_id in _MODEL_CONFIG_:
    endpoint_name = _MODEL_CONFIG_[model_id]["endpoint_name"]
    query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    generated_texts = _MODEL_CONFIG_[model_id]["parse_function"](query_response)
    print(f"For model: {model_id}, the generated output is: {generated_texts[0]}\n")

print("---")
print(prompt)

## 4. Delete the Endpoint
---
- Keep: huggingface-text2text-flan-t5-base
- Delete: huggingface-text2text-flan-t5-small

In [None]:
_MODEL_CONFIG_

In [None]:
# TODO: replace with the endpoint name
endpoint_name = None 
model_name = "huggingface-text2text-flan-t5-small"

In [None]:
if endpoint_name is not None:
    sagemaker.delete_endpoint(EndpointName=endpoint_name)
else:
    print("Please provide the endpoint name for deletion.")