![image source: https://prompthero.com/prompt/967d64692e0](images/2023-05-10-amazon-jumpstart-text2img-stablediffusion.jpg)

## Credits
This notebook takes inpiration from the [AWS Machine Learning Blog](https://aws.amazon.com/blogs/machine-learning/) post when they announced the availability of [Stable Diffusion V1](https://stability.ai/blog/stable-diffusion-announcement) and [Stable Diffusion V2](https://stability.ai/blog/stable-diffusion-v2-release) models on [SageMaker JumpStart](https://aws.amazon.com/sagemaker/jumpstart/). You may find the original post here [Generate images from text with the stable diffusion model on Amazon SageMaker JumpStart](https://aws.amazon.com/blogs/machine-learning/generate-images-from-text-with-the-stable-diffusion-model-on-amazon-sagemaker-jumpstart/).

## Introduction

**What Is Amazon SageMaker?**

*Amazon SageMaker is a fully managed machine learning service*. With SageMaker, data scientists and developers can quickly and easily build and train machine learning models, and then directly deploy them into a production-ready hosted environment. It provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you don't have to manage servers. It also provides common machine learning algorithms that are optimized to run efficiently against extremely large data in a distributed environment. With native support for bring-your-own-algorithms and frameworks, SageMaker offers flexible distributed training options that adjust to your specific workflows. You can deploy a model into a secure and scalable environment by launching it with a few clicks from SageMaker Studio or the SageMaker console.

::: {.callout-note}

**Amazon SageMaker** introduction is taken from [SageMaker Developer Guide](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html). You may use *Developer Guide* for more details including [Get Started with Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/gs.html).

:::

**What is SageMaker JumpStart?**

*SageMaker JumpStart is the machine learning (ML) hub of SageMaker that provides hundreds of built-in algorithms, pre-trained models, and end-to-end solution templates to help you quickly get started with ML*. JumpStart also provides solution templates that set up infrastructure for common use cases, and executable example notebooks for machine learning with SageMaker.

::: {.callout-note}

**SageMaker JumpStart** introduction is taken from [SageMaker JumpStart Developer Guide](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html). You may use *Developer Guide* for more details including [Get Started](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html) and one-click, end-to-end [Solution Templates](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-solutions.html) for many common machine learning use cases.

:::

**What is Stable Diffusion?**

*Stable Diffusion is a text-to-image model that enables you to create photorealistic images from just a text prompt.* A diffusion model trains by learning to remove noise that was added to a real image. This de-noising process generates a realistic image. These models can also generate images from text alone by conditioning the generation process on the text. For instance, Stable Diffusion is a latent diffusion where the model learns to recognize shapes in a pure noise image and gradually brings these shapes into focus if the shapes match the words in the input text.

**How JumpStart simplify it?**

Training and deploying large models and running inference on models such as Stable Diffusion is often challenging and include issues such as cuda out of memory, payload size limit exceeded and so on. *JumpStart* simplifies this process by providing ready-to-use scripts that have been robustly tested. Furthermore, it provides guidance on each step of the process including the recommended instance types, how to select parameters to guide image generation process, prompt engineering etc. Moreover, you can deploy and run inference on any of the 80+ Diffusion models from JumpStart without having to write any piece of your own code.

::: {.callout-note}

Stable Diffusion introduction is taken from [Amazon JumpStart Text To Image](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_text_to_image/Amazon_JumpStart_Text_To_Image.ipynb) notebook. For more indepth intro on this topic, I suggest reading *Jay Alammar* [The Illustrated Stable Diffusion](https://jalammar.github.io/illustrated-stable-diffusion/) guide.

::: 

## Environment
This notebook is created with `Amazon SageMaker Studio` running on `ml.t3.medium` instance with `Python 3 (Base Python 2.0)` kernel.

* **GitHub**: [2023-05-10-amazon-jumpstart-text2img-stablediffusion.ipynb](https://github.com/hassaanbinaslam/myblog/blob/main/posts/2023-05-10-amazon-jumpstart-text2img-stablediffusion.ipynb)

![](images/2023-05-10-amazon-jumpstart-text2img-stablediffusion/notebook-env.png)

For deploying models and doing inference, I recommend using the `ml.p3.2xlarge` or `ml.g4dn.2xlarge`. For this notebook, I have relied on `ml.p3.2xlarge` instance. For generating multiple images per prompt `ml.g4dn.2xlarge` can be slow and you will get timeout errors as higlighted below.

(image of timeout)

**Important Note**

By default both `ml.p3.2xlarge` and `ml.g4dn.2xlarge` may not be available in your AWS account. To get access you need to generate a `Request quota increase` ticket from *Service Quotas > AWS services > Amazon SageMaker > ml.p3.2xlarge for endpoint usage*. It may take upto 24 hours for a service request to get approved.

::: {.callout-tip}
**Why this timeout exception?**

When you deploy a model into production using Amazon SageMaker hosting services you get an API endpoint. Your client applications use this API to get inferences from the model hosted at the specified endpoint. There is a 60 seconds hard limit on these API endpoints.

*A customer’s model containers must respond to requests within 60 seconds. The model itself can have a maximum processing time of 60 seconds before responding to invocations. If your model is going to take 50-60 seconds of processing time, the SDK socket timeout should be set to be 70 seconds.*

To read more about it refer to the documentation [SageMakerRuntime.Client.invoke_endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime/client/invoke_endpoint.html)

**What to do when our model requires more than 60 seconds for inference?**

For such cases AWS recommends using *Amazon SageMaker Asynchronous Inference*. This option is ideal for inferences with large payload sizes (up to 1GB) and/or long processing times (up to 15 minutes). To read more about it use the following references.

* [Amazon SageMaker Asynchronous Inference announcement](https://aws.amazon.com/about-aws/whats-new/2021/08/amazon-sagemaker-asynchronous-new-inference-option/)
* [How Asynchronous inference works?](https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html)
* [GitHub Issue: Increasing the timeout for SageMaker InvokeEndpoint](https://github.com/aws/sagemaker-python-sdk/issues/1119#issuecomment-904414810)

:::

## Set up the environment

There are some initial steps required to execute this notebook.

In [3]:
%%capture
!pip install --upgrade sagemaker
!pip install matplotlib
!pip install watermark

# 1. Get the latest version of SageMaker Python SDK. https://github.com/aws/sagemaker-python-sdk
# 2. Install matplotlib. https://github.com/matplotlib/matplotlib
# 3. Install watermark. An IPython magic extension for printing date and time stamps, version numbers, and hardware information. https://github.com/rasbt/watermark

In [8]:
%load_ext watermark

# To load the watermark magic, execute the following line in your IPython notebook or current IPython shell
# to learn more about the usage: https://github.com/rasbt/watermark/blob/master/docs/watermark.ipynb

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark


In [19]:
%watermark -v -m -p numpy,matplotlib,boto3,json,sagemaker

# watermark the notebook environment
# watermark step is optional. This is done to make the environment details more transpaent

Python implementation: CPython
Python version       : 3.8.12
IPython version      : 8.12.0

numpy     : 1.24.3
matplotlib: 3.7.1
boto3     : 1.26.111
json      : 2.0.9
sagemaker : 2.153.0

Compiler    : GCC 10.2.1 20210110
OS          : Linux
Release     : 4.14.311-233.529.amzn2.x86_64
Machine     : x86_64
Processor   : 
CPU cores   : 2
Architecture: 64bit



This first step is to initialize the SageMaker session. This session manages interactions with the Amazon SageMaker APIs and any other AWS services needed. It provides convenient methods for manipulating entities and resources that Amazon SageMaker uses, such as training jobs, endpoints, and input datasets in S3. AWS service calls are delegated to an underlying Boto3 session, which by default is initialized using the AWS configuration chain. When you make an Amazon SageMaker API call that accesses an S3 bucket location and one is not specified, the Session creates a default bucket based on a naming convention which includes the current AWS account ID.

To read more about *SageMaker Session* refer to the documentation [sagemaker.session.Session](https://sagemaker.readthedocs.io/en/stable/api/utility/session.html#sagemaker.session.Session)

In [20]:
import sagemaker, boto3, json
from sagemaker import get_execution_role

aws_role = get_execution_role()
aws_region = boto3.Session().region_name
sagemaker_session = sagemaker.Session()

aws_region

'us-east-1'

## Define functions to deploy models and get inference endpoints

In this section we will define some functions that will make it easy for us to deploy JumpStart pretrained models and get inference endpoint against them.

In [26]:
## Select the diffusion model

# deploy the model. it may take 10 minutes to start
from sagemaker import image_uris, model_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base


def get_model_endpoint(model_id, sagemaker_session, instance_type="ml.p3.2xlarge"):
    endpoint_name = name_from_base(f"jumpstart-example-{model_id}")

    # Please use ml.g5.24xlarge instance type if it is available in your region. ml.g5.24xlarge has 24GB GPU compared to 16GB in ml.p3.2xlarge and supports generation of larger and better quality images.

    # inference_instance_type = "ml.g4dn.2xlarge"
    # inference_instance_type = "ml.g5.2xlarge"
    # inference_instance_type = "ml.p3.2xlarge"
    inference_instance_type = instance_type

    # Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.
    deploy_image_uri = image_uris.retrieve(
        region=None,
        framework=None,  # automatically inferred from model_id
        image_scope="inference",
        model_id=model_id,
        model_version="*",
        instance_type=inference_instance_type,
    )

    # Retrieve the model uri. This includes the pre-trained model and parameters as well as the inference scripts.
    # This includes all dependencies and scripts for model loading, inference handling etc..
    model_uri = model_uris.retrieve(
        model_id=model_id, model_version=model_version, model_scope="inference"
    )

    # To increase the maximum response size from the endpoint.
    env = {
        "MMS_MAX_RESPONSE_SIZE": "20000000",
    }

    # Create the SageMaker model instance
    model = Model(
        image_uri=deploy_image_uri,
        model_data=model_uri,
        role=aws_role,
        predictor_cls=Predictor,
        name=endpoint_name,
        env=env,
    )

    # deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,
    # for being able to run inference through the sagemaker API.
    return model.deploy(
        initial_instance_count=1,
        instance_type=inference_instance_type,
        predictor_cls=Predictor,
        endpoint_name=endpoint_name,
        sagemaker_session=sagemaker_session,
    )

def remove_model_endpoint(model_predictor):
    # Delete the SageMaker endpoint
    model_predictor.delete_model()
    model_predictor.delete_endpoint()

## Define functions to query endpoints and display results

In the next section we will define some functions that will be used to query the inference endpoint and display the results.

In [None]:
# following code is adapted from the notebook: https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_text_to_image/JumpStart_Stable_Diffusion_Inference_Only.ipynb
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from io import BytesIO
import base64
import json

# path to save the generated images
image_path = './images/2023-05-10-amazon-jumpstart-text2img-stablediffusion/generated/'

def display_img(img, filename):
    """Display and save the hallucinated image."""
    
    plt.figure(figsize=(7, 7), frameon=False)
    plt.imshow(np.array(img))
    plt.axis('off')
    plt.savefig(image_path+filename, bbox_inches='tight') # comment it to NOT save generated images
    plt.show()
    
    
def query_endpoint_with_json_payload(model_predictor, payload, content_type, accept):
    """Query the model predictor with json payload."""

    encoded_payload = json.dumps(payload).encode("utf-8")

    query_response = model_predictor.predict(
        encoded_payload,
        {
            "ContentType": content_type,
            "Accept": accept,
        },
    )
    return query_response

def display_encoded_images(generated_images, title):
    """Decode the images and convert to RGB format and display

    Args:
    generated_images: are a list of jpeg images as bytes with b64 encoding.
    """

    for count, generated_image in enumerate(generated_images):
        generated_image_decoded = BytesIO(base64.b64decode(generated_image.encode()))
        generated_image_rgb = Image.open(generated_image_decoded).convert("RGB")
        
        # prepare filename from the prompt to store the image
        temp = re.sub(r'[^a-zA-Z0-9\s]+', '', sample) # remove special chars from prompt
        temp = temp.replace(' ', '-') # turn spaces to '-'
        temp = temp[:50] # limit the lenght of string upto 100 chars
        filename = temp + str(count) + '.jpg' # add count and extension to the image name
        
        # display the generated image
        display_img(generated_image_rgb, filename)

def parse_response_multiple_images(query_response):
    """Parse response and return generated image and the prompt"""

    response_dict = json.loads(query_response)
    return response_dict["generated_images"], response_dict["prompt"]


def query_model_and_display(payload, model_predictor):
    query_response = query_endpoint_with_json_payload(
        model_predictor, payload, "application/json", "application/json;jpeg"
    )
    generated_images, prompt = parse_response_multiple_images(query_response)

    display_encoded_images(generated_images, title)

## Selecting the SageMaker pretrained Diffusion model and Prompt Engineering

### Model selection
SageMaker provides many pretrained models. Use the following link to sect
https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html

In this notebook we will the following models

### Prompt engineering

Writing a good prompt can sometime be an art. It is often difficult to predict whether a certain prompt will yield a satisfactory image with a given model. However, there are certain templates that have been observed to work. Broadly, a prompt can be roughly broken down into three pieces: (i) type of image (photograph/sketch/painting etc.), (ii) description (subject/object/environment/scene etc.) and (iii) the style of the image (realistic/artistic/type of art etc.). You can change each of the three parts individually to generate variations of an image. Adjectives have been known to play a significant role in the image generation process. Also, adding more details help in the generation process.

To generate a realistic image, you can use phrases such as “a photo of”, “a photograph of”, “realistic” or “hyper realistic”. To generate images by artists you can use phrases like “by Pablo Piccaso” or “oil painting by Rembrandt” or “landscape art by Frederic Edwin Church” or “pencil drawing by Albrecht Dürer”. You can also combine different artists as well. To generate artistic image by category, you can add the art category in the prompt such as “lion on a beach, abstract”. Some other categories include “oil painting”, “pencil drawing, “pop art”, “digital art”, “anime”, “cartoon”, “futurism”, “watercolor”, “manga” etc. You can also include details such as lighting or camera lens such as 35mm wide lens or 85mm wide lens and details about the framing (portrait/landscape/close up etc.).
