![image source: https://prompthero.com/prompt/967d64692e0](images/2023-05-10-amazon-jumpstart-text2img-stablediffusion.jpg)

## Credits
This notebook took inspiration from the [AWS Machine Learning Blog](https://aws.amazon.com/blogs/machine-learning/) post when they announced the availability of [Stable Diffusion V1](https://stability.ai/blog/stable-diffusion-announcement) and [Stable Diffusion V2](https://stability.ai/blog/stable-diffusion-v2-release) models on [SageMaker JumpStart](https://aws.amazon.com/sagemaker/jumpstart/). You may find the original post here [Generate images from text with the stable diffusion model on Amazon SageMaker JumpStart](https://aws.amazon.com/blogs/machine-learning/generate-images-from-text-with-the-stable-diffusion-model-on-amazon-sagemaker-jumpstart/).

## Introduction

### What Is Amazon SageMaker?

*Amazon SageMaker is a fully managed machine learning service*. With SageMaker, data scientists and developers can quickly and easily build and train machine learning models, and then directly deploy them into a production-ready hosted environment. It provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you don't have to manage servers. It also provides common machine learning algorithms that are optimized to run efficiently against extremely large data in a distributed environment. With native support for bring-your-own-algorithms and frameworks, SageMaker offers flexible distributed training options that adjust to your specific workflows. You can deploy a model into a secure and scalable environment by launching it with a few clicks from SageMaker Studio or the SageMaker console.

::: {.callout-note}

**Amazon SageMaker** introduction is taken from [SageMaker Developer Guide](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html). You may use *Developer Guide* for more details including [Get Started with Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/gs.html).

:::

### What is SageMaker JumpStart?

*SageMaker JumpStart is the machine learning (ML) hub of SageMaker that provides hundreds of built-in algorithms, pre-trained models, and end-to-end solution templates to help you quickly get started with ML*. JumpStart also provides solution templates that set up infrastructure for common use cases, and executable example notebooks for machine learning with SageMaker.

::: {.callout-note}

**SageMaker JumpStart** introduction is taken from [SageMaker JumpStart Developer Guide](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html). You may use *Developer Guide* for more details including [Get Started](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html) and one-click, end-to-end [Solution Templates](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-solutions.html) for many common machine learning use cases.

:::

### What is Stable Diffusion?

*Stable Diffusion is a text-to-image model that enables you to create photorealistic images from just a text prompt.* A diffusion model trains by learning to remove noise that was added to a real image. This de-noising process generates a realistic image. These models can also generate images from text alone by conditioning the generation process on the text. For instance, Stable Diffusion is a latent diffusion where the model learns to recognize shapes in a pure noise image and gradually brings these shapes into focus if the shapes match the words in the input text.

#### How JumpStart simplify it?

Training and deploying large models and running inference on models such as Stable Diffusion is often challenging and include issues such as CUDA out of memory, payload size limit exceeded and so on. *JumpStart* simplifies this process by providing ready-to-use scripts that have been robustly tested. Furthermore, it provides guidance on each step of the process including the recommended instance types, how to select parameters to guide image generation process, prompt engineering etc. Moreover, you can deploy and run inference on any of the 80+ Diffusion models from JumpStart without having to write any piece of your own code.

::: {.callout-note}

Stable Diffusion introduction is taken from [Amazon JumpStart Text To Image](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_text_to_image/Amazon_JumpStart_Text_To_Image.ipynb) notebook. For more in depth discussion on this topic, I suggest reading *Jay Alammar* [The Illustrated Stable Diffusion](https://jalammar.github.io/illustrated-stable-diffusion/) guide.

::: 

## Environment
This notebook is created with `Amazon SageMaker Studio` running on `ml.t3.medium` instance with `Python 3 (Base Python 2.0)` kernel.

* **GitHub**: [2023-05-10-amazon-jumpstart-text2img-stablediffusion.ipynb](https://github.com/hassaanbinaslam/myblog/blob/main/posts/2023-05-10-amazon-jumpstart-text2img-stablediffusion.ipynb)

![](images/2023-05-10-amazon-jumpstart-text2img-stablediffusion/notebook-env.png)

For model deployment and inference, I recommend using the `ml.p3.2xlarge` or `ml.g4dn.2xlarge` instance. I have relied on the `ml.p3.2xlarge` instance for this notebook. For generating multiple images per prompt, `ml.g4dn.2xlarge` can be slow, and you will get timeout errors highlighted below.

::: {.callout-important}

By default, both `ml.p3.2xlarge` and `ml.g4dn.2xlarge` may not be available in your AWS account. To get access, you need to generate a `Request quota increase` ticket from *Service Quotas > AWS services > Amazon SageMaker > ml.p3.2xlarge for endpoint usage*. A service request may take up to 24 hours to get approved.

:::

(image of timeout)

::: {.callout-tip collapse="false"}
**Why this timeout exception?**

You get an API endpoint when you deploy a model into production using Amazon SageMaker hosting services. Your client applications use this API to get inferences from the model hosted at the specified endpoint. There is a 60 seconds hard limit on these API endpoints.

*A customer’s model containers must respond to requests within 60 seconds. The model itself can have a maximum processing time of 60 seconds before responding to invocations. If your model is going to take 50-60 seconds of processing time, the SDK socket timeout should be set to be 70 seconds.*

To read more about it, refer to the documentation [SageMakerRuntime.Client.invoke_endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime/client/invoke_endpoint.html)

**What to do when your model requires more than 60 seconds for inference?**

For such cases, AWS recommends using *Amazon SageMaker Asynchronous Inference*. This option is ideal for inferences with large payload sizes (up to 1GB) or long processing times (up to 15 minutes). To read more about it, use the following references.

* [Amazon SageMaker Asynchronous Inference announcement](https://aws.amazon.com/about-aws/whats-new/2021/08/amazon-sagemaker-asynchronous-new-inference-option/)
* [How does Asynchronous inference work?](https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html)
* [GitHub Issue: Increasing the timeout for SageMaker InvokeEndpoint](https://github.com/aws/sagemaker-python-sdk/issues/1119#issuecomment-904414810)

:::

## Set up the environment

There are some initial steps required to execute this notebook. They mainly involve installing the needed packages and initializing the SageMaker session.

In [3]:
%%capture
!pip install --upgrade sagemaker
!pip install matplotlib
!pip install watermark

# 1. Get the latest version of SageMaker Python SDK. https://github.com/aws/sagemaker-python-sdk
# 2. Install matplotlib. https://github.com/matplotlib/matplotlib
# 3. Install watermark. An IPython magic extension for printing date and time stamps, version numbers, and hardware information. https://github.com/rasbt/watermark

[watermark](https://github.com/rasbt/watermark) extension is a great utility to expose packages, kernel, and hardware information. Though this is optional, and you may skip this step, it is a great way to report execution environment information and make it more transparent.

In [8]:
%load_ext watermark

# To load the watermark magic, execute the following line in your IPython notebook or current IPython shell
# to learn more about the usage: https://github.com/rasbt/watermark/blob/master/docs/watermark.ipynb

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark


In [19]:
%watermark -v -m -p numpy,matplotlib,boto3,json,sagemaker

# watermark the notebook environment
# watermark step is optional. This is done to make the environment details more transpaent

Python implementation: CPython
Python version       : 3.8.12
IPython version      : 8.12.0

numpy     : 1.24.3
matplotlib: 3.7.1
boto3     : 1.26.111
json      : 2.0.9
sagemaker : 2.153.0

Compiler    : GCC 10.2.1 20210110
OS          : Linux
Release     : 4.14.311-233.529.amzn2.x86_64
Machine     : x86_64
Processor   : 
CPU cores   : 2
Architecture: 64bit



Next, we will initialize the SageMaker session. This session manages interactions with the Amazon SageMaker APIs and any other AWS services needed. It provides convenient methods for manipulating entities and resources that Amazon SageMaker uses, such as training jobs, endpoints, and input datasets in S3. AWS service calls are delegated to an underlying Boto3 session, which is initialized using the AWS configuration chain by default. When you make an Amazon SageMaker API call that accesses an S3 bucket location, and one is not specified, the Session creates a default bucket based on a naming convention that includes the current AWS account ID.

To read more about *SageMaker Session* refer to the documentation [sagemaker.session.Session](https://sagemaker.readthedocs.io/en/stable/api/utility/session.html#sagemaker.session.Session)

In [20]:
import sagemaker, boto3
from sagemaker import get_execution_role

aws_role = get_execution_role()
aws_region = boto3.Session().region_name
sagemaker_session = sagemaker.Session()

aws_region

'us-east-1'

## Define functions to deploy models and get inference endpoints

In this section, we will define some functions that will make it easy for us to deploy JumpStart pre-trained models and get inference endpoints against them.

In [26]:
#| code-fold: true
#| code-summary: "Show the code"
from sagemaker import image_uris, model_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base


def get_model_endpoint(model_id, sagemaker_session, instance_type="ml.p3.2xlarge"):
    """Deploy the model on the provided instance type are return the inference endpoint"""

    # Get the endpoint name from the provided 'model_id'
    endpoint_name = name_from_base(f"jumpstart-example-{model_id}")

    # recommended `inference_instance_type` are
    # "ml.g4dn.2xlarge"
    # "ml.g5.2xlarge"
    # "ml.p3.2xlarge"
    inference_instance_type = instance_type

    # Retrieve the inference docker container uri.
    # This is the base HuggingFace container image for the default model above.
    deploy_image_uri = image_uris.retrieve(
        region=None,
        framework=None,  # automatically inferred from model_id
        image_scope="inference",
        model_id=model_id,
        model_version="*",  # '*' means get the latest version
        instance_type=inference_instance_type,
    )

    # Retrieve the model uri. This includes the pre-trained model and parameters as well as the inference scripts.
    # This includes all dependencies and scripts for model loading, inference handling etc..
    model_uri = model_uris.retrieve(
        model_id=model_id, model_version=model_version, model_scope="inference"
    )

    # To increase the maximum response size from the endpoint.
    # Response in our case will be generated images
    env = {
        "MMS_MAX_RESPONSE_SIZE": "20000000",
    }

    # Create the SageMaker model instance
    model = Model(
        image_uri=deploy_image_uri,
        model_data=model_uri,
        role=aws_role,
        predictor_cls=Predictor,
        name=endpoint_name,
        env=env,
    )

    # Deploy the Model and return Inference endpoint. Note that we need to pass Predictor class when we deploy model through Model class,
    # for being able to run inference through the sagemaker API.
    return model.deploy(
        initial_instance_count=1,
        instance_type=inference_instance_type,
        predictor_cls=Predictor,
        endpoint_name=endpoint_name,
        sagemaker_session=sagemaker_session,
    )


def remove_model_endpoint(model_predictor):
    """Remove the model and deployed inference endpoint"""
    model_predictor.delete_model()
    model_predictor.delete_endpoint()

## Define functions to query endpoints and display results

In the next section, we will define some functions that we will use to query the inference endpoint and display the results.

In [None]:
#| code-fold: true
#| code-summary: "Show the code"
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from io import BytesIO
import base64
import json

# Define a path to save the generated images
image_path = "./images/2023-05-10-amazon-jumpstart-text2img-stablediffusion/generated/"


def display_and_save_img(image, filename):
    """Display and save the hallucinated image."""

    plt.figure(figsize=(7, 7), frameon=False)
    plt.imshow(np.array(image))
    plt.axis("off")
    plt.savefig(
        image_path + filename, bbox_inches="tight"
    )  # comment it to NOT save generated images
    plt.show()


def query_endpoint_with_json_payload(model_predictor, payload, content_type, accept):
    """Query the model predictor with json payload."""

    encoded_payload = json.dumps(payload).encode("utf-8")

    query_response = model_predictor.predict(
        encoded_payload, {"ContentType": content_type, "Accept": accept,},
    )
    return query_response


def display_encoded_images(generated_images, prompt):
    """Decode the images and convert to RGB format and display

    Args:
    generated_images: are a list of jpeg images as bytes with b64 encoding.
    prompt: text string used to generate the images
    """

    for count, generated_image in enumerate(generated_images):
        generated_image_decoded = BytesIO(base64.b64decode(generated_image.encode()))
        generated_image_rgb = Image.open(generated_image_decoded).convert("RGB")

        # prepare filename to store the image from the prompt 
        temp = re.sub(
            r"[^a-zA-Z0-9\s]+", "", prompt
        )  # remove special chars from prompt

        temp = temp.replace(" ", "-")  # turn spaces to '-'
        temp = temp[:50]  # limit the lenght of string upto 100 chars
        
        filename = (
            temp + str(count) + ".jpg"
        )  # add count and extension to the image name

        # display the generated image
        display_and_save_img(generated_image_rgb, filename)


def parse_response_multiple_images(query_response):
    """Parse response and return generated image and the prompt"""

    response_dict = json.loads(query_response)
    return response_dict["generated_images"], response_dict["prompt"]


def query_model_and_display(payload, model_predictor):
    query_response = query_endpoint_with_json_payload(
        model_predictor, payload, "application/json", "application/json;jpeg"
    )
    generated_images, prompt = parse_response_multiple_images(query_response)

    display_encoded_images(generated_images, prompt)


### Supported Inference parameters

To get inference from the model API, we have to pass along some advanced parameters in the request. These include

* `prompt`: prompt to guide the image generation. It must be specified and can be a string or a list of strings.
* `width`: width of the hallucinated image. If specified, it must be a positive integer divisible by 8.
* `height`: height of the hallucinated image. If specified, it must be a positive integer divisible by 8.
* `num_inference_steps`: Number of denoising steps during image generation. More steps lead to a higher-quality image. If specified, it must be a positive integer.
* `guidance_scale`: Higher guidance scale results in an image closely related to the prompt at the expense of image quality. If specified, it must be a float. guidance_scale<=1 is ignored.
* `negative_prompt`: guide image generation against this prompt. If specified, it must be a string or a list of strings used with guidance_scale. If guidance_scale is disabled, this is also disabled. Moreover, if a prompt is a list of strings, then negative_prompt must also be a list of strings.
* `num_images_per_prompt`: number of images returned per prompt. If specified, it must be a positive integer.
* `seed`: Fix the randomized state for reproducibility. If specified, it must be an integer.

An example request payload is provided below. The effect of these parameters will become apparent when we eventually generate images using them. 

```python
payload = {
    "prompt": "a portrait of a man",
    "negative_prompt": "beard"
    "width": 512,
    "height": 512,
    "num_images_per_prompt": 1,
    "num_inference_steps": 50,
    "guidance_scale": 7.5,
    "seed": 1,
}
```

::: {.callout-tip collapse="false"}

* `prompt and negative_prompt`: Stable Diffusion models are not good at understanding negative words 'without', 'except', 'exclude', 'not'  etc in the prompt statement. For example, the prompt "a portrait of a man without a beard" may still generate an image of a man with a beard. However, including "negative_promt" with the word "beard" has much more influence on the model.
* `height and width`: The model often performs best when the generated image has dimensions similar to the training data dimension used for model training. It is recommended to read about the model to get the correct dimensions.
* `num_images_per_prompt`: If you try to generate many images from the real-time inference endpoint. From experience, a number between 1 and 5 works best.
* `num_inference_steps`: When experimenting with prompts, I tend to keep the "inference_steps" under 50. At 20, you will get a black-and-white image but get an idea of the output. If I find an output with fine patterns, I try higher *inference_steps* between 100 and 150 to improve the quality further. 
 
::: 

## Selecting SageMaker pre-trained Diffusion model and Prompt Engineering

### Model selection
SageMaker JumpStart provides many pre-trained models. Use the following link to search and select the correct *Model ID*. [Built-in Algorithms with pre-trained Model Table](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html). We are only interested in *text-to-image* models, and we can filter them using `txt2img` string in the search bar. 

![](images/2023-05-10-amazon-jumpstart-text2img-stablediffusion/pretrained-model-table.png)

At the time of writing this notebook, 86 `txt2img` models are available on JumpStart. We may use any of them to generate the images. I have selected a few from them that are the more well-known. However, you are welcome to experiment with anyone of them. Following in the list of the models that we will use in the later part of this notebook.

* [model-txt2img-stabilityai-stable-diffusion-v2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1)
* [huggingface-txt2img-prompthero-openjourney](https://huggingface.co/prompthero/openjourney)
* [huggingface-txt2img-andite-anything-v4-0](https://huggingface.co/andite/anything-v4.0)
* [huggingface-txt2img-dreamlike-art-dreamlike-diffusion-1-0](https://huggingface.co/dreamlike-art/dreamlike-diffusion-1.0)
* [huggingface-txt2img-nitrosocke-mo-di-diffusion](https://huggingface.co/nitrosocke/mo-di-diffusion)
* [huggingface-txt2img-envvi-inkpunk-diffusion](https://huggingface.co/Envvi/Inkpunk-Diffusion)

### Prompt engineering

Writing a good prompt can be an art. Predicting whether a particular prompt will yield a satisfactory image with a given model is often difficult. However, specific templates have been observed to work. Broadly, a prompt can be roughly broken down into three pieces:

1. Type of image (photograph/sketch/painting, etc.)
2. Description (subject/object/environment/scene, etc.)
3. The style of the image (realistic/artistic/type of art, etc.)

You can change each of the three parts individually to generate variations of an image. *Adjectives* have been known to play a significant role in the image-generation process. Also, adding more details about the scene helps in the generation process. Here are some suggestions that you may follow to generate good prompts.

* To generate a realistic image, you can use phrases such as “a photo of,” “a photograph of,” “realistic,” or “hyper-realistic.” 
* To generate images by artists, you can use phrases like “by Pablo Picasso,” “oil painting by Rembrandt,” “landscape art by Frederic Edwin Church,” or “pencil drawing by Albrecht Dürer.” 
* You can combine different artists as well. For example, to generate artistic images by category, you can add the art category in the prompt such as “lion on a beach, abstract.” 
* Some other types include “oil painting,” “pencil drawing, “pop art,” “digital art,” “anime,” “cartoon,” “futurism,” “watercolor,” “manga,” etc. 
* You can also include details such as lighting or camera lenses such as 35mm wide lens or 85mm wide lens, and information on the framing (portrait/landscape/close up, etc.).


::: {.callout-tip collapse="false"}

The above concise prompt engineering outline is taken from [AWS Blog Post](https://aws.amazon.com/blogs/machine-learning/generate-images-from-text-with-the-stable-diffusion-model-on-amazon-sagemaker-jumpstart/). For a more in-depth discussion and techniques to write good prompts, you may consult below resources.

* [Stable Diffusion Art: a definitive prompt guide](https://stable-diffusion-art.com/prompt-guide/). An excellent beginner-level guide.
* [Best Stable Diffusion Negative Prompt List To Use
](https://thenaturehero.com/stable-diffusion-negative-prompt-list/). It highlights techniques to improve results by using negative prompts.
* [OpenArt.ai Prompt Book](https://openart.ai/promptbook). A very detailed guide covering many techniques and areas.

:::

## Stable Diffusion v2-1

Let's get our first model up and running [stable-diffusion-2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1). This base model is trained from scratch and can be applied to various use cases. However, to use this model, it is also important to understand some of the limitations.

* The model does not achieve perfect photorealism
* The model cannot render legible text
* The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to "A red cube on top of a blue sphere"
* Faces and people in general may not be generated properly.
* The model was trained mainly with English captions and will not work as well in other languages.

### Realistic people

Let's see how good is this model to generate real people images.

'''
positive
'''