Welcome to Amazon SageMaker JumpStart! You can use Sagemaker JumpStart to solve many Machine Learning tasks through one-click in SageMaker Studio, or through SageMaker Python SDK.

In this demo notebook, we demonstrate how to fine-tune a pre-trained Large Language Model (LLM) using FLAN T5 XL model as an example.

LLMs are pre-trained on a large corpus of text (e.g., crawl of the Internet), and often can perform a variety of natural language-related tasks out of the box. However, it has been observed that a task-specific fune-tuning (i.e., additional training) can further improve model performance on that particular task or language domain.

Note: This notebook was tested on ml.t3.medium instance in Amazon SageMaker Studio with Python 3 (Data Science) kernel and in Amazon SageMaker Notebook ml.t3.medium instance with conda_python3 kernel.

In [None]:
# TODO: test in SM studio?

General setup

In [1]:
import sagemaker

# SageMaker estimator and model instances will use this role
aws_role = sagemaker.session.Session().get_caller_identity_arn()

# We will store fine-tuned models in this S3 bucket
output_bucket = sagemaker.Session().default_bucket()

# This will be useful for printing
newline, bold, unbold = "\n", "\033[1m", "\033[0m"

# We are using XL version of FLAN T5 in this demo
model_size= "xl"

# Demo results were obtained using this training instance
training_instance_type = "ml.p3.16xlarge"
num_training_gpus = 8

Fine-tuning

In [2]:
# TODO this will be retrieved using SM API
train_image_uri = "763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04"
train_source_uri = "./text2text"

if model_size == "xl":
    # copied from s3://sagemaker-jumpstart-cache-contributor-staging/jumpstart-1p/text2text/training-huggingface-text2text-huggingface-text2text-flan-t5-xl-repack.tar.gz
    train_model_uri = "s3://sagemaker-us-west-2-802376408542/avkan/training-huggingface-text2text-huggingface-text2text-flan-t5-xl-repack.tar.gz"
elif model_size == "base":
    # copied from s3://sagemaker-jumpstart-cache-contributor-staging/jumpstart-1p/text2text/infer-huggingface-text2text-flan-t5-base.tar.gz
    train_model_uri = "s3://sagemaker-us-west-2-802376408542/avkan/infer-huggingface-text2text-flan-t5-base.tar.gz"
else:
    raise NotImplementedError

In [3]:
# TODO for faster trial, use dev data and 1 epoch (1 hour) / for demo we used train data and 3 epochs (5 hours)

In [4]:
# TODO what is the proper dataset path?
data_channel = "train"
training_dataset_s3_path = f"s3://sagemaker-us-west-2-802376408542/data/SQuADv2/genuq/{data_channel}/"

output_location = f"s3://{output_bucket}/demo-fine-tune-flan-t5/"

# TODO
hyperparameters = {
    "epochs": 3,                        # number of training epochs
    "learning_rate": 1e-4,              # learning rate used during training
    "batch_size": 8*num_training_gpus,  # (batch size per gpu) x (num gpus on training instance)
    "training_script": "train.py",      # training script
    # "max_input_length": 300,            # data inputs will be truncated at this length
    # "max_output_length": 40,           # data outputs will be truncated at this length
    # "generation_max_length": 40,       # max length of generated output
}

Training job will take a while

In [5]:
from sagemaker.estimator import Estimator
from sagemaker.utils import name_from_base

training_job_name = name_from_base(f"jumpstart-demo-{model_size}-{hyperparameters['epochs']}")

# TODO check how are these used
training_metric_definitions = [
    {"Name": "val_loss", "Regex": "'eval_loss': ([0-9\\.]+)"},
    {"Name": "train_loss", "Regex": "'loss': ([0-9\\.]+)"},
    {"Name": "epoch", "Regex": "'epoch': ([0-9\\.]+)"},
]

print(f"{bold}model uri:{unbold} {train_model_uri}")
print(f"{bold}job name:{unbold} {training_job_name}")

# Create SageMaker Estimator instance
sm_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    volume_size=300,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=output_location,
    metric_definitions=training_metric_definitions,
)

# Launch a SageMaker training job over data located in the given S3 path
# Training jobs can take hours, it is recommended to set wait=False,
# and monitor job status through SageMaker console
sm_estimator.fit(
    {"training": training_dataset_s3_path},
    job_name=training_job_name,
    wait=False
)

[1mmodel uri:[0m s3://sagemaker-us-west-2-802376408542/avkan/training-huggingface-text2text-huggingface-text2text-flan-t5-xl-repack.tar.gz
[1mjob name:[0m jumpstart-demo-xl-3-2023-04-06-08-16-42-738


INFO:sagemaker:Creating training-job with name: jumpstart-demo-xl-3-2023-04-06-08-16-42-738


Remainder of the notebook should be executed once the training job is successfully completed. Variable, `training_job_name` should contain job name. `output_location` should point to an S3 location with a fine-tuned model artifact.

We will create two inference endpoints, one for the original pre-trained model, and one for the fine-tuned model. We will then run the same request agains the two endpoints and compare the results.

In [6]:
# Preparing for model inference
model_id, model_version = f"huggingface-text2text-flan-t5-{model_size}", "*"
inference_instance_type = "ml.p3.2xlarge"

Each endpoint deployment can take a few minutes

In [7]:
from sagemaker import image_uris

# Retrieve the inference docker image URI. This is the base HuggingFace container image
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    model_id=model_id,
    model_version=model_version,
    image_scope="inference",
    instance_type=inference_instance_type,
)

INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


In [8]:
from sagemaker import model_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

# Retrieve the URI of the pre-trained model
pre_trained_model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)

pre_trained_name = name_from_base(f"jumpstart-demo-pre-trained-{model_id}")

# TODO comment on repacking
# TODO no repacking for smaller models?

# Create the SageMaker model instance of the pre-trained model
pre_trained_model = Model(
    image_uri=deploy_image_uri,
    model_data=pre_trained_model_uri,
    role=aws_role,
    predictor_cls=Predictor,
    name=pre_trained_name,
)

print(f"{bold}image URI:{unbold}{newline} {deploy_image_uri}")
print(f"{bold}model URI:{unbold}{newline} {pre_trained_model_uri}")
print("Deploying an endpoint ...")

# Deploy the pre-trained model. Note that we need to pass Predictor class when we deploy model
# through Model class, for being able to run inference through the sagemaker API
pre_trained_predictor = pre_trained_model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=pre_trained_name,
    volume_size=30,
)
print(f"{newline}Deployed an endpoint {pre_trained_name}")

INFO:sagemaker:Creating model with name: jumpstart-demo-pre-trained-huggingface--2023-04-06-15-23-00-516


[1mimage URI:[0m
 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04
[1mmodel URI:[0m
 s3://jumpstart-cache-prod-us-west-2/huggingface-infer/infer-huggingface-text2text-flan-t5-xl.tar.gz
Deploying an endpoint ...


INFO:sagemaker:Creating endpoint-config with name jumpstart-demo-pre-trained-huggingface--2023-04-06-15-23-00-516
INFO:sagemaker:Creating endpoint with name jumpstart-demo-pre-trained-huggingface--2023-04-06-15-23-00-516


------------!
Deployed an endpoint jumpstart-demo-pre-trained-huggingface--2023-04-06-15-23-00-516


In [9]:
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

fine_tuned_name = name_from_base(f"jumpstart-demo-fine-tuned-{model_id}")
fine_tuned_model_uri = f"{output_location}{training_job_name}/output/model.tar.gz"

# Create the SageMaker model instance of the fine-tuned model
fine_tuned_model = Model(
    image_uri=deploy_image_uri,
    model_data=fine_tuned_model_uri,
    role=aws_role,
    predictor_cls=Predictor,
    name=fine_tuned_name,
)

print(f"{bold}image URI:{unbold}{newline} {deploy_image_uri}")
print(f"{bold}model URI:{unbold}{newline} {fine_tuned_model_uri}")
print("Deploying an endpoint ...")

# Deploy the fine-tuned model.
fine_tuned_predictor = fine_tuned_model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=fine_tuned_name,
    volume_size=30,
)
print(f"{newline}Deployed an endpoint {fine_tuned_name}")

INFO:sagemaker:Creating model with name: jumpstart-demo-fine-tuned-huggingface-t-2023-04-06-15-29-03-700


[1mimage URI:[0m
 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04
[1mmodel URI:[0m
 s3://sagemaker-us-west-2-802376408542/demo-fine-tune-flan-t5/jumpstart-demo-xl-3-2023-04-06-08-16-42-738/output/model.tar.gz
Deploying an endpoint ...


INFO:sagemaker:Creating endpoint-config with name jumpstart-demo-fine-tuned-huggingface-t-2023-04-06-15-29-03-700
INFO:sagemaker:Creating endpoint with name jumpstart-demo-fine-tuned-huggingface-t-2023-04-06-15-29-03-700


-------------!
Deployed an endpoint jumpstart-demo-fine-tuned-huggingface-t-2023-04-06-15-29-03-700


In [10]:
import boto3
import json

# Helper functions for running inference queries
def query_endpoint_with_json_payload(payload, endpoint_name):
    encoded_json = json.dumps(payload).encode("utf-8")
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/json", Body=encoded_json
    )
    return response

def parse_response_multiple_texts(query_response):
    model_predictions = json.loads(query_response["Body"].read())
    generated_text = model_predictions["generated_texts"]
    return generated_text

In [11]:
test_paragraphs = [
"""
Adelaide is the capital city of South Australia, the state's largest city and the fifth-most populous city in Australia.
"Adelaide" may refer to either Greater Adelaide (including the Adelaide Hills) or the Adelaide city centre.
The demonym Adelaidean is used to denote the city and the residents of Adelaide. The Traditional Owners of the Adelaide
region are the Kaurna people. The area of the city centre and surrounding parklands is called Tarndanya in the Kaurna language.

Adelaide is situated on the Adelaide Plains north of the Fleurieu Peninsula, between the Gulf St Vincent in the west and
the Mount Lofty Ranges in the east. Its metropolitan area extends 20 km (12 mi) from the coast to the foothills of
the Mount Lofty Ranges, and stretches 96 km (60 mi) from Gawler in the north to Sellicks Beach in the south.
""",
"""
Amazon Elastic Block Store (Amazon EBS) provides block level storage volumes for use with EC2 instances. EBS volumes behave like raw, unformatted block devices. You can mount these volumes as devices on your instances. EBS volumes that are attached to an instance are exposed as storage volumes that persist independently from the life of the instance. You can create a file system on top of these volumes, or use them in any way you would use a block device (such as a hard drive). You can dynamically change the configuration of a volume attached to an instance.

We recommend Amazon EBS for data that must be quickly accessible and requires long-term persistence. EBS volumes are particularly well-suited for use as the primary storage for file systems, databases, or for any applications that require fine granular updates and access to raw, unformatted, block-level storage. Amazon EBS is well suited to both database-style applications that rely on random reads and writes, and to throughput-intensive applications that perform long, continuous reads and writes.
""",
"""
Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases. 
You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition. 
All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input. 
Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend's Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages.
"""
]

In [12]:
prompt = "Ask a question which is related to the following text, but cannot be answered based on the text. Text: {context}"

# TODO
parameters = {
    "max_length": 30,           # restrict the length of the answer
    "num_return_sequences": 3,
    "num_beams": 10,
    # "seed": 1, # 42, #146,                # for reproducibility
    # "do_sample": True,
    # "top_k": 50,
    # "top_p": 0.95,
    # "temperature": 1.3,
}

def generate_questions(endpoint_name, text):
    expanded_prompt = prompt.replace("{context}", text)
    payload = {"text_inputs": expanded_prompt, **parameters}
    query_response = query_endpoint_with_json_payload(payload, endpoint_name=endpoint_name)
    generated_texts = parse_response_multiple_texts(query_response)
    for i, generated_text in enumerate(generated_texts):
        print(f"Response {i}: {generated_text}{newline}")

In [13]:
print(f"{bold}Prompt:{unbold} {repr(prompt)}")
for paragraph in test_paragraphs:
    print("-" * 80)
    print(paragraph)
    print("-" * 80)
    print(f"{bold}pre-trained{unbold}")
    generate_questions(pre_trained_name, paragraph)
    print(f"{bold}fine-tuned{unbold}")
    generate_questions(fine_tuned_name, paragraph)

[1mPrompt:[0m 'Ask a question which is related to the following text, but cannot be answered based on the text. Text: {context}'
--------------------------------------------------------------------------------

Adelaide is the capital city of South Australia, the state's largest city and the fifth-most populous city in Australia.
"Adelaide" may refer to either Greater Adelaide (including the Adelaide Hills) or the Adelaide city centre.
The demonym Adelaidean is used to denote the city and the residents of Adelaide. The Traditional Owners of the Adelaide
region are the Kaurna people. The area of the city centre and surrounding parklands is called Tarndanya in the Kaurna language.

Adelaide is situated on the Adelaide Plains north of the Fleurieu Peninsula, between the Gulf St Vincent in the west and
the Mount Lofty Ranges in the east. Its metropolitan area extends 20 km (12 mi) from the coast to the foothills of
the Mount Lofty Ranges, and stretches 96 km (60 mi) from Gawler in the no

In [None]:
# Delete resources
pre_trained_predictor.delete_model()
pre_trained_predictor.delete_endpoint()
fine_tuned_predictor.delete_model()
fine_tuned_predictor.delete_endpoint()