# Introduction to JumpStart - Depth guided image generation

---
Welcome to Amazon [SageMaker JumpStart](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html)! You can use Sagemaker JumpStart to solve many Machine Learning tasks through one-click in SageMaker Studio, or through [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/overview.html#use-prebuilt-models-with-sagemaker-jumpstart).

In this demo notebook, we introduce a new feature that enables users to generate depth aware images with Stable Diffusion models. You can generate images radically different from an existing image while preserving structural coherence and depth. This can be useful in a variety of applications including product design, digital advertisements, interior design, landscape design, style transfer from one image to another, generating image in a specific pose.

---

1. [Set Up](#1.-Set-Up)
2. [Run inference on the pre-trained model](#2.-run-inference-on-the-pre-trained-model)
3. [Query endpoint and parse response](#3.-Query-endpoint-and-parse-response)
4. [Use Cases](#4.-Use-Cases)
5. [Impact of parameters on performance](#5.-Impact-of-parameters-on-performance)
6. [Clean up the endpoint](#6.-Clean-up-the-endpoint)

Note: This notebook was tested on ml.t3.medium instance in Amazon SageMaker Studio with Python 3 (Data Science) kernel and in Amazon SageMaker Notebook instance with conda_python3 kernel.

Note: This notebook requires an accelerated computing instance to deploy the model. Please make sure you have sufficient quota to execute the notebook.

Note: After you’re done running the notebook, make sure to delete all resources so that all the resources that you created in the process are deleted and your billing is stopped. Code in [Clean up the endpoint](#6.-Clean-up-the-endpoint) deletes model and endpoints that are created.

### 1. Set Up

---
Before executing the notebook, there are some initial steps required for set up. This notebook requires ipywidgets and latest version of sagemaker.

---

In [None]:
!pip install ipywidgets==7.0.0 --quiet

#### Permissions and environment variables

---
To host on Amazon SageMaker, we need to set up and authenticate the use of AWS services. Here, we use the execution role associated with the current notebook as the AWS account role with SageMaker access. 

---

In [None]:
import sagemaker, boto3, json
from sagemaker import get_execution_role

aws_role = get_execution_role()

### 2. Run inference on the pre-trained model

#### 2.1. Select a Model

***
You can continue with the default model, or can choose a different model from the dropdown generated upon running the next cell. A complete list of SageMaker pre-trained models can also be accessed at Sagemaker [pre-trained Models](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html#). For this lab, we recommend using the default model_id.

***

In [None]:
from ipywidgets import Dropdown
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models

# Retrieves all Text-to-Image generation models.
filter_value = "task == depth2img"
depth2img_models = list_jumpstart_models(filter=filter_value)

# display the model-ids in a dropdown to select a model for inference.
model_dropdown = Dropdown(
    options=depth2img_models,
    value="model-depth2img-stable-diffusion-2-depth-fp16",
    description="Select a model",
    style={"description_width": "initial"},
    layout={"width": "max-content"},
)
display(model_dropdown)

In [None]:
# model_version="*" fetches the latest version of the model
model_id, model_version = model_dropdown.value, "*"

is_controlnet_model= (model_id != "model-depth2img-stable-diffusion-2-depth-fp16")

### 2.2. Retrieve JumpStart Artifacts & Deploy an Endpoint

***

Using SageMaker, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. We start by retrieving the `deploy_image_uri`, `deploy_source_uri`, and `model_uri` for the pre-trained model. To host the pre-trained model, we create an instance of [`sagemaker.model.Model`](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html) and deploy it. This may take a few minutes.

***

In [None]:
%%time
from sagemaker import image_uris, model_uris, script_uris, hyperparameters, instance_types
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

endpoint_name = name_from_base(f"jumpstart-example-{model_id}")

# Instances with more GPU memory supports generation of larger images.
inference_instance_type = instance_types.retrieve_default(
    region=None,
    model_id=model_id,
    model_version=model_version,
    scope="inference"
)

# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)

# Retrieve the model uri. This includes the pre-trained model and parameters as well as the inference scripts.
# This includes all dependencies and scripts for model loading, inference handling etc..
model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)

# Create the SageMaker model instance
model = Model(
    image_uri=deploy_image_uri,
    model_data=model_uri,
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name,
)

# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.4xlarge",
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

### 3. Query endpoint and parse response

---
Input to the endpoint is a prompt, an  image and image generation parameters in json format and encoded in `utf-8` format. Output of the endpoint is a `json` with generated images and the input prompt.

---

We start by writing some helper function for querying the endpoint, parsing the response and display generated image.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from typing import List, Union
from PIL import Image


def query(model_predictor, payload):
    """Query the model predictor."""
    query_response = model_predictor.predict(
        payload,
        {
            "ContentType": "application/json",
            "Accept": "application/json",
        },
    )
    return query_response


def parse_response(query_response):
    """Parse response and return the generated images and prompt."""

    response_dict = json.loads(query_response)
    return response_dict["generated_images"]


def display_img_and_titles(img_list: List[Union[str,Image.Image]], titles: List[str], num_images_per_row:int = 1):
    """Display images.
    
    img: can be a list of image names or an image
    titles: list of strings.
    """
    f= plt.figure(figsize=(30,30))

    for i in range(len(img_list)):
        img, title = img_list[i], titles[i]
        if isinstance(img, str):
            img = Image.open(img).convert("RGB")

        if i%num_images_per_row == 0:
            if i>0:
                plt.show(block=True)
                f = plt.figure(figsize=(30,30))
        
        ax = f.add_subplot(1, num_images_per_row, i%num_images_per_row+1)
        plt.imshow(img)
        ax.title.set_text(title)
        ax.axis("off")
        i +=1
    
    plt.show(block=True)

def download_image_from_jumpstart_bucket(input_img_file_name):
    region = boto3.Session().region_name
    s3_bucket = f"jumpstart-cache-prod-{region}"
    key_prefix = "model-metadata/assets"
    s3 = boto3.client("s3")
    s3.download_file(s3_bucket, f"{key_prefix}/{input_img_file_name}", input_img_file_name)
    


---
Below, we put in the example input image and a prompt. You can put in any text and any image and the model generates the corresponding image with similar spatial features. 

You may also test it with your own images! Simply put the image into the folder _images/input/_. Your output image will be saved in _images/output_, download it to your local machine to save it

---

In [None]:
import base64
from PIL import Image
from io import BytesIO
from pathlib import Path

Path("images/output").mkdir(exist_ok=True)

def download_query_parse_response_and_display(input_img_file_name, parameters, num_images_per_row=2, original_image_display_title = "original", generated_image_display_title: Union[str, List[str]] = "generated", override_parameter_choices = {}, skip_display=False):
    
    # endpoint expects payload to be a json with the low resolution jpeg image as bytes encoded with base64.b64 encoding.
    with open(input_img_file_name, "rb") as f:
        input_image_bytes = f.read()
    encoded_image = base64.b64encode(bytearray(input_image_bytes)).decode()
    payload = parameters.copy()
    payload["image"] = encoded_image
    generated_images = []
    if override_parameter_choices:
        for parameter, parameter_choices in override_parameter_choices.items():
            for parameter_choice in parameter_choices:
                payload[parameter] = parameter_choice
                query_response = query(model_predictor, json.dumps(payload).encode("utf-8"))
                generated_images += parse_response(query_response)
    else:
        query_response = query(model_predictor, json.dumps(payload).encode("utf-8"))
        # endpoint returns the jpeg image as bytes encoded with base64.b64 encoding.
        generated_images= parse_response(query_response)

    generated_images_rgb = []
    count = 1
    for generated_image in generated_images:
        generated_image_decoded = BytesIO(base64.b64decode(generated_image.encode()))
        generated_images_rgb.append(Image.open(generated_image_decoded).convert("RGB"))
        
        
        filename = input_img_file_name.split(".")[0].split("/")[-1] + str(count) + ".jpg"
        count+=1
        
        image_path = f'images/output/{filename}' 
        decoded_image_data = base64.b64decode(generated_image)
        with open(image_path, 'wb+') as file:
            file.write(decoded_image_data)

        
    if isinstance(generated_image_display_title, str):
        generated_image_display_title = [generated_image_display_title]*len(generated_images_rgb)
    if not skip_display:
        display_img_and_titles([input_img_file_name] + generated_images_rgb, [original_image_display_title]+generated_image_display_title, num_images_per_row=num_images_per_row)

In [None]:
filename = 'room.jpg'
input_img_file_name = f'images/input/{filename}'

parameters = {
    "prompt": "contemporary style,  marble floor",
    "num_inference_steps": 50,
    "guidance_scale": 7.5,
    "num_images_per_prompt":2,
}

download_query_parse_response_and_display(input_img_file_name, parameters,num_images_per_row=1)

### Supported parameters

***
This model supports many parameters while performing inference. They include:

* **prompt**: prompt to guide the image generation. Must be specified and can be a string or a list of strings.
* **image**: The original image.
* **num_inference_steps**  (optional): number of denoising steps during image generation. More steps lead to higher quality image. If specified, it must a positive integer.
* **guidance_scale**  (optional): higher guidance scale results in image closely related to the prompt, at the expense of image quality. If specified, it must be a float. guidance_scale<=1 is ignored.
* **negative_prompt** (optional): guide image generation against this prompt. If specified, it must be a string or a list of strings and used with guidance_scale. If guidance_scale is disabled, this is also disabled. Moreover, if prompt is a list of strings then negative_prompt must also be a list of strings.
* **num_images_per_prompt**  (optional): number of images returned per prompt. If specified it must be a positive integer.
* **seed**: fix the randomized state for reproducibility. If specified, it must be an integer.
* **batch_size** (optional): Number of images to generate in a single forward pass. If using a smaller instance or generating many images, please reduce batch_size to be a small number (1-2). Number of images = number of prompts*num_images_per_prompt.
* **strength** (optional, only for sd-depth, not applicable to controlnet depth): Amount of noise to add the original image initially. If specified, it must be between 0 and 1. If strength is 1, maximum noise will be added to the input image before denoising process starts and it effectively ignores the input image except the depth map. If strength is 0, no noise is added to the input image before the denoising process starts.
* **scheduler** (optional): Scheduler (also known as sampler) to use during the de-noising process. It controls the tradeoff between de-noising speed and de-noising quality. You are encouraged to try different schedulers to figure out which works best for your purpose. If specified, it must be from the following list [`PNDMScheduler`, `EulerAncestralDiscreteScheduler`, `KDPM2AncestralDiscreteScheduler`, `UniPCMultistepScheduler`, `DEISMultistepScheduler`, `DDIMScheduler`, `KDPM2DiscreteScheduler`, `EulerDiscreteScheduler`, `HeunDiscreteScheduler`, `DDPMScheduler`]. Note that once you change the scheduler, all subsequent inference calls will use that scheduler. You can change the scheduler again by setting different value for the scheduler. To learn more, please see [this documentation](https://huggingface.co/docs/diffusers/using-diffusers/schedulers) and the [blog post](https://stable-diffusion-art.com/samplers/).

***

In [None]:
parameters = { 
    "prompt":"European style, marble floor, minimalist lifestyle, nature and wood, magical house",
    "num_inference_steps":30,
    "guidance_scale":7.5,
    "negative_prompt":"poor quality",
    "num_images_per_prompt":2,
    "seed": 1,
    "batch_size":2,
    "strength":0.5,
    "scheduler": "DDIMScheduler"
}
download_query_parse_response_and_display(input_img_file_name, parameters, num_images_per_row=1)

### 4. Use Cases
***

Stable Diffusion Depth-to-Image can be useful in a variety of creative applications and generate images radically different from the original while preserving the coherence and depth. It takes away the hassle of having to do extensive Photoshop for idea exploration. Simply use a short description to guide the image generation and a new novel image is presented to you within seconds. Here are some possible use-cases.

***

#### 4.1. Marketing and Branding
***
You are taking photos of your product to be placed in digital advertisements or brochures and is tasked to come up with a photo that brings a unique ‘feel and message’ to the audience. The original photos are good but lacks creativity. Pass your photo to Depth-to-Image with interesting prompts and see how it can generate intriguing ideas for you. Here is an example of how a photo of a simple beverage can be elevated into a stunning photo.

***

In [None]:
filename = "beverage.jpg"
input_img_file_name = f'images/input/{filename}'

parameters = { 
    "prompt":"a glass of cocktail, intimate and romantic ambience",
    "seed": 2,
    "strength":0.7,
    "num_inference_steps": 100,
    "num_images_per_prompt": 5,
    "batch_size":2
}

download_query_parse_response_and_display(input_img_file_name, parameters, num_images_per_row=3)

A slightly lower than 1 denoising strength help to retain similarity of the original photo, while giving Depth-to-Image sufficient room to generate novel ideas. By retaining the characteristics of the original beverage, it make these ideas feasible to be executed on your product.

#### 4.2. Interior Designs
***
Depth-to-Image works really well on exploring different interior design styles while keeping the interior space and boundaries coherent with your input image. This enables you to quickly do a mock-up of how the space will look in many different styles. You are also able to specify specific features you wish to see in your space.  Here are some examples we have generated. 
***

In [None]:
filename = "room.jpg"
input_img_file_name = f'images/input/{filename}'

parameters = { 
    "prompt":"Scandinavian style, majestic and luxurious, chandelier lights, warm lighting",
    "seed": 1,
    "strength":1,
    "num_images_per_prompt":5,
    "batch_size":2
}
download_query_parse_response_and_display(input_img_file_name, parameters, num_images_per_row=3)

Here are some prompts for you to try out: “European style, marble floor, minimalist lifestyle, nature and wood, magical house”.

### 4.3. Game Development

***
Graphics and themes in a game can have a huge impact on players’ experience. To make a game more captivating, companies strive to create the most appealing in-game landscapes. Using Depth-to-Image, you can provide a base image that contains some elements that you want to include, and generate an image that have entirely different style. Here are some of the examples:

***

In [None]:
filename = "mountain.jpg"
input_img_file_name = f'images/input/{filename}'

parameters = { 
    "prompt": "a dragon mountain range and river, magical hut, dark and stormy",
    "seed": 1,
    "strength":0.75,
    "num_images_per_prompt":5,
    "batch_size":2
}
download_query_parse_response_and_display(input_img_file_name, parameters, num_images_per_row=3)

### 5. Impact of parameters on performance
In this section, we will explore what are the impact the parameters will have on the generated images. Understanding the impacts can help us in guiding the generation to our desired goals.

In [None]:
# Example images on which to do evaluation

example_images_and_prompts = [
    ["images/input/mountain.jpg", "god of thunder, mysterious cottage, snowy mountains with ice golems, ultra realistic, sci-fi movie"],
    ["images/input/room.jpg", "European style, marble floor, minimalist lifestyle, nature and wood, magical house" ],
    ["images/input/beverage.jpg", "a glass of cocktail with intimate and romantic ambience"],
    ["images/input/bottle.jpg", "modern fragrance scent bottle, flower petals and baby breath, ultra realistic, 8k, scandinavian style, warm wood background"]
]


#### 5.1. Strength

***
_Note that strength is not applicable to Controlnet based depth-2-image models_.

Strength determine the amount of noise, controlled by denoising strength, added to the image based on a seed. A value of 0 will add no noise to the orginal image, while a value of 1 will completely replaced the original image with noise.

The strength parameter can be used to control how much the output image resembles the original image. Using depth-to-image will help ensure that the objects in the original image will retain its shape and size even with a strength of 1. To completely change the style of an object, set the strength to 1. If you wish to retain some characteristics of the original object, a strength of 0.3 to 0.5 is recommended.
***

In [None]:
# Check for controlnet models and skip this section
is_controlnet_model = False

if not is_controlnet_model:
    for example_img, prompt in example_images_and_prompts:
        parameters = {"prompt": prompt, "seed":1}
        strength_choices = [0.1,0.3,0.5,0.7,1]
        download_query_parse_response_and_display(example_img, parameters, num_images_per_row=3, 
                                                  override_parameter_choices = {"strength":strength_choices}, 
                                                 generated_image_display_title = [f"strength:{strength}" for strength in strength_choices]
                                                 )
else:
    print("Strength is not an applicable parameter for controlnet models")

#### 5.2. Guidance Scale/CFG scale

***
Guidance Scale controls how much influence the prompt will have on the image generation process. This parameter can range from -999 to 999, with higher values giving the prompt more influence. A negative value will simply make the prompt work as a "negative prompt" instead. The common practice for this parameter value will be between 1 to 30. For any values below 0, negative prompt should be used instead, while any values above 30 will likely result in an over-contrasted image.
***

In [None]:
for example_img, prompt in example_images_and_prompts:
    parameters = {"prompt": prompt, "seed":1}
    guidance_scale_choices = [1,7.5,15,22.5,30]
    download_query_parse_response_and_display(example_img, parameters, num_images_per_row=3, 
                                              override_parameter_choices = {"guidance_scale":guidance_scale_choices}, 
                                             generated_image_display_title = [f"guidance_scale:{guidance_scale}" for guidance_scale in guidance_scale_choices]
                                             )

#### 5.3. Number of steps

***
Stable diffusion works by iterating the process of reducing the noise with guidance by the prompt, from seemingly random noises. Finally it produce an output image that is human recognizable. As a general rule, a higher number of steps will result in a more detailed image. 

Take note that higher number of steps will result in longer processing time. Any value above 100 typically will not improve the details of the output further, and the quality may even start to degrade. The recommended setting for Steps would be any value between 10 to 100. 
***

In [None]:
for example_img, prompt in example_images_and_prompts:
    parameters = {"prompt": prompt, "seed":1}
    num_inference_steps_choices = [10,30,50,75,100]
    download_query_parse_response_and_display(example_img, parameters, num_images_per_row=3, 
                                              override_parameter_choices = {"num_inference_steps":num_inference_steps_choices}, 
                                             generated_image_display_title = [f"num_inference_steps:{num_inference_steps}" for num_inference_steps in num_inference_steps_choices]
                                             )

#### 5.4. Seed
***
Stable Diffusion generates an output image from noises, and the noises are generated from a seed value. Seed can be useful in many ways. By defining the seed value, you can ensure that your generated images can be replicated with the same prompt and parameters. With the help of a seed, you can also regenerate the image with slightly different prompt or parameters, while keeping the overall composition similar to the initial generated image.

In [None]:
example_images_and_prompts = [
    ["images/input/room.jpg", "European style, marble floor, minimalist lifestyle, nature and wood, magical house"],
    ["images/input/room.jpg", "European style, wooden floor, minimalist lifestyle, nature and wood, magical house"],
    ["images/input/room.jpg", "European style, vinyl floor, minimalist lifestyle, nature and wood, magical house"]
]

for example_img, prompt in example_images_and_prompts:
    parameters = {"prompt": prompt, "strength": 1}
    seed_choices = [100, 101, 102]
    download_query_parse_response_and_display(example_img, parameters, num_images_per_row=3, 
                                              override_parameter_choices = {"seed":seed_choices}, 
                                             generated_image_display_title = [f"seed:{seed}" for seed in seed_choices]
                                             )

### 6. Clean up the endpoint

***
After you’re done running the notebook, make sure to delete all resources created in the process to ensure that the billing is stopped.
***

In [None]:
# Delete the SageMaker endpoint
model_predictor.delete_model()
model_predictor.delete_endpoint()

### 7. Conclusion

In the notebook, we have learnt how to deploy Stable Diffusion depth-to-image. We have also explored possible use cases and applications of the model's various parameters. Think out of the box and bring your ideas to life!