# ControlNet enhanced OctoShop Pipeline
In this iPython Notebook, 
* We'll test a self-authored SDXL+Container container locally
* We'll then provide instructions update the model container running on an OctoAI endpoint
* You'll go ahead and learn to utilise Controlnets in the SDXL pipeline to get amazing images based on input images
* Finally you'll test your SDXL endpoint within a complete OctoShop pipeline, that now features a ControlNet and AI-based face swapping for better consistency

In [None]:
# Let's import some useful libraries
import requests
import json
from PIL import Image
from io import BytesIO
from base64 import b64encode, b64decode
from IPython.display import display

# Let's import the OctoAI Python SDK
from octoai.client import Client

# A helper function that reads a PIL Image objects and returns a base 64 encoded string
def encode_image(image: Image) -> str:
    buffer = BytesIO()
    image.save(buffer, format="png")
    im_base64 = b64encode(buffer.getvalue()).decode("utf-8")
    return im_base64

# A helper function that reads a base64 encoded string and returns a PIL Image object
def decode_image(image_str: str) -> Image:
    return Image.open(BytesIO(b64decode(image_str)))

# A helper function that rescales images to a resolution that is known to work great in SDXL
def rescale_image(image: Image) -> Image:
    w, h = image.size
    if w == h:
        width = 1024
        height = 1024
    elif w > h:
        width = 1024
        height = 1024 * h // w
    else:
        width = 1024 * w // h
        height = 1024
    image = image.resize((width, height))
    return image

# Initialize the OctoAI Client
# This will make it easier to interface with the model containers
client = Client()

# Please ignore the warning below - we don't need the token because we've enabled public access on our endpoints
# WARNING:root:OCTOAI_TOKEN environment variable is not set. You won't be able to reach OctoAI endpoints.

## A. Test your SDXL+ControlNet container locally
Make sure you've completed Sections 1 and 2 of Lab 2 described in the README.md.

As a recap, the SDXL model container takes as input a dictionary with the following keys:
* `image` (string) - a base64-encoded image
* `prompt` (string) - the SDXL text prompt
* `negative_prompt` (string) - the SDXL text prompt
* `guidance_scale` (float) - the guidance scale (a.k.a. the configuration scale) of SDXL
* `num_inference_steps` (int) - the number of SDXL denoising steps
* `seed` (int) - seed of the image generation
* `controlnet_conditioning_scale` (float) - controls how strong of an effect the ControlNet has on SDXL generation
* `control_guidance_start` (float) - on a scale from 0 to 1, determines when the ControlNet kicks in during the diffusion process
* `control_guidance_end` (float) - on a scale from 0 to 1, determines when the ControlNet stops having an effect during the diffusion process

SDXL model container returns the following as outputs:
* `image` (string) - a base64-encoded image

Note that now that we pass in an image to the SDXL+ControlNet model, we derive the SDXL image dimensions (width, height) from the SDXL input parameters.

In [None]:
# Let's grab the Docker logo
r = requests.get('https://raw.githubusercontent.com/vegaluisjose/blob/main/docker.jpeg')
image = Image.open(BytesIO(r.content))

# Rescale the image
image = rescale_image(image)

# Display the Docker logo
display(image)

In [None]:
# Let's prepare our SDXL+ControlNet inference endpoint payload
# Intentionally we'll turn off the ControlNet, therefore the input image has no effect on the output image.
# We hard code the user-prompt to "an ultrarealistic photo of a whale with shipping containers on its back" (no use of CLIP Interrogator)
SDXL_payload = {
    "image": encode_image(image),
    "prompt": "an ultrarealistic photo of a whale with shipping containers on its back",
    "negative_prompt": "blurry photo, distortion, low-res, bad quality",
    "num_inference_steps": 20,
    "guidance_scale": 7.5,
    "seed": 1,
    "controlnet_conditioning_scale": 0.0,  # Determines how strongly the ControlNet affects the generation process - let's start at 0
    "control_guidance_start": 0.0,         # At 0, means start applying the ControlNet when 0% of the steps have completed
    "control_guidance_end": 0.5,           # At 0.5, means stop applying the ControlNet when 50% of the steps have completed
}

# Run inference on the OctoAI SDXL+ControlNet model container running locally
output = client.infer(
    endpoint_url="http://localhost:8080/predict",
    inputs=SDXL_payload
)

# Get the base64 encoded image string
image_string = output["completion"]["image"]

# Convert to a PIL image
sdxl_image = decode_image(image_string)

# Display your masterpiece!
display(sdxl_image)

In [None]:
# As a comparison, let's see what SDXL would generate if the ControlNet applied intentionally very strongly
SDXL_payload["controlnet_conditioning_scale"] = 1.25

# Run inference on the OctoAI SDXL+ControlNet model container running locally
output = client.infer(
    endpoint_url="http://localhost:8080/predict",
    inputs=SDXL_payload
)

# Get the base64 encoded image string
image_string = output["completion"]["image"]

# Convert to a PIL image
sdxl_image = decode_image(image_string)

# Display your masterpiece!
display(sdxl_image)

In [None]:
# As a comparison, let's see what SDXL would generate if the ControlNet was applied at "just at the right" strength
SDXL_payload["controlnet_conditioning_scale"] = 0.5

# Run inference on the OctoAI SDXL+ControlNet model container running locally
output = client.infer(
    endpoint_url="http://localhost:8080/predict",
    inputs=SDXL_payload
)

# Get the base64 encoded image string
image_string = output["completion"]["image"]

# Convert to a PIL image
sdxl_image = decode_image(image_string)

# Display your masterpiece!
display(sdxl_image)

## B. Summary of the SDXL ControlNet experiments

In this first part, we  evaluated the impact of ControlNets on SDXL-generated images:
* At first, we turned off ControlNet entirely - this means that we are generating an image completely unconstrained by the input control image (the image of Moby).
* Second, we turned the ControlNet conditioning scale all the way up. This applied a constraint that was a bit too strong, leading to a resulting image that looked too close to the input image (the image of the Docker logo). We lost all of the realism that we're trying to attain.
* Finally, we set the ControlNet to just the right strength to get a photo that is both realistic and also close to the input control image (the image of Moby logo).

## C. Upload the image to your DockerHub
Now sign onto your DockerHub in a browser: https://hub.docker.com/

Create a repository by clicking on the `Create repository` blue button. Name it `dockercon-sdxl-canny`, and provide a short description as you see fit. Leave it public. Hit the `Create` blue button.

Once that's done, note the full path to the repo, as `<dockerhub-username>/dockercon-sdxl-canny`.

Under `dockercon23/lab2/dockercon-sdxl-canny`, run the following to tag the Docker image we just tested to a versioned image we'll push to the newly created DockerHub repository.
```
docker tag sdxl-canny:latest <dockerhub-username>/dockercon-sdxl-canny:v0.1.0
```

Then push the tagged SDXL Canny model image!

```
docker push <dockerhub-username>/dockercon-sdxl-canny:v0.1.0
```

This should take under 10 minutes to upload the container image given that the image is quite voluminous (that's pretty common for Generative AI models with their huge sets of weights!).

Refresh the dockerhub page of the sdxl-canny repository, and you should see a new `v0.1.0` image that was uploaded just now!

![Docker](https://raw.githubusercontent.com/vegaluisjose/blob/main/docker_sdxl_canny.png)

If you don't feel like waiting for the full image to upload, you can go ahead and use this image that we've prebuilt for step D: [tmoreau89octo/dockercon-sdxl-canny:v0.1.0](https://hub.docker.com/layers/tmoreau89octo/dockercon-sdxl-canny/v0.1.0/images/sha256-ee91b22e1329f484593a9d59ff52c4a4bc751e7a80385a1da72ac359541d3490?context=repo).


## D. Update the image on an already running OctoAI endpoint
Sign onto your OctoAI account in a browser: https://octoai.cloud/endpoints

Click on the endpoint that you created in Lab1, e.g. `dockercon23-sdxl`.

Click on the `Edit endpoint` button.

Edit the `Container image` under `Model container` to point to the new image you've just uploaded to DockerHub: `<dockerhub-username>/dockercon-sdxl-canny:v0.1.0`.

The endpoint will swap the old SDXL container for the new SDXL-Canny container. This swap operation will take a few minutes the first time since it needs to download the container image from DockerHub, then start the container and initialize the SDXL+ControlNet pipeline.

![OctoAI](https://raw.githubusercontent.com/vegaluisjose/blob/main/octoai_sdxl_canny.png)

## D. Test your SDXL container served on an OctoAI endpoint
In this step, we'll test the SDXL container in the exact same way as we did when we ran the container locally on the AWS dev instance, except that now we'll be sending a POST request to a remote endpoint.

You'll need to change the SDXL endpoint URL from `http://localhost:8080` to your unique endpoint URL.

In [None]:
# FIXME: Replace "http://localhost:8080" with your endpoint URL below
sdxl_endpoint_url = "http://localhost:8080"
# Make sure you've overwritten the URL!!!
assert sdxl_endpoint_url != "http://localhost:8080"

# Compared to Step A, we've replaced the http://localhost:8080 with
# the URL of your newly launched OctoAI endpoint
output = client.infer(
    endpoint_url="{}/predict".format(sdxl_endpoint_url),
    inputs=SDXL_payload
)

# Get the base64 encoded image string
image_string = output["completion"]["image"]

# Convert to a PIL image
sdxl_image = decode_image(image_string)

# Display your masterpiece!
display(sdxl_image)

## G. Let's test the OctoShop workflow enhanced by ControlNets

We are using the exact same workflow as in Lab 1 but it's been tweaked to make use of a ControlNet in the SDXL stage.

1. User provides an image as input (docker logo) and a prompt (set in space).

2. Image goes through CLIP interrogator and produces a text description of the image.

3. The user prompt gets fed along with the CLIP interrogator description into Llama 2 that describes a new scene.

4. That textual description of the scene is fed into SDXL. SDXL image generation is constrained by a user-provided image (docker logo) via a ControlNet. This lets us generate a brand new photo of the Docker logo set in space that is more consistent with the original logo compared to Lab1!

We've provided the whole flow below of OctoShop for you to play around with the Generative AI workflow! 
* Try changing the URL of the input image to a different image!
* Try changing the user prompt to a different prompt!
* Try changing the user style to a different style!
* Try changing any combination of the above at the same time!
* And feel free to tweak the various settings to familiarize yourself a bit more to the different models that are being invoked.

In [None]:
# Let's define the OctoShop as a self-contained function
def octoshop(image: Image, user_prompt: str, user_style: str) -> (Image, str, str):

    # OctoAI endpoint URLs
    clip_endpoint_url = "https://dockercon23-clip-4jkxk521l3v1.octoai.run"
    llama2_endpoint_url = "https://dockercon23-llama2-4jkxk521l3v1.octoai.run/v1/chat/completions"
    sdxl_endpoint_url = "http://localhost:8080" # ADD_YOUR_SDXL_ENDPOINT_URL_HERE
    assert sdxl_endpoint_url != "http://localhost:8080"

    # STEP 0
    # Rescale the image
    image = rescale_image(image)

    # STEP 1
    # Feed that image into CLIP interrogator
    clip_request = {
        "mode": "fast",
        "image": encode_image(image),
    }
    output = client.infer(
        endpoint_url="{}/predict".format(clip_endpoint_url),
        inputs=clip_request
    )
    clip_labels = output["completion"]["labels"]
    clip_labels = clip_labels.split(',')[0]

    # STEP 2
    # Feed that CLIP label and the user prompt into a LLAMA model
    llama_prompt = "\
    ### Instruction: In a single sentence, {}: {}\
    ### Response:".format(user_prompt, clip_labels)
    llama_inputs = {
      "model": "llama-2-7b-chat",
      "messages": [
        {
          "role": "system",
          "content": "Below is an instruction that describes a task. Write a response that appropriately completes the request."
        },
        {
          "role": "user",
          "content": "{}".format(llama_prompt)
        }
      ],
      "stream": False,
      "max_tokens": 70,
      "temperature": 0
    }
    outputs = client.infer(endpoint_url=llama2_endpoint_url, inputs=llama_inputs)
    llama2_text = outputs.get('choices')[0].get("message").get('content')

    # STEP 3
    # Feed the Llama 2 text into the SDXL model
    SDXL_payload = {
        "image": encode_image(image),
        "prompt": user_style["prompt"].replace("{prompt}", llama2_text),
        "negative_prompt": user_style["negative_prompt"],
        "num_inference_steps": 20,
        "guidance_scale": 7.5,
        "seed": 1,
        "controlnet_conditioning_scale": 0.5,
        "control_guidance_start": 0.0,
        "control_guidance_end": 0.5,
    }
    # Run inference on the OctoAI SDXL model container running locally
    output = client.infer(
        endpoint_url="{}/predict".format(sdxl_endpoint_url),
        inputs=SDXL_payload
    )
    image_string = output["completion"]["image"]
    sdxl_image = decode_image(image_string)

    return sdxl_image, clip_labels, llama2_text

In [None]:
# Set the to the image URL
image_url = 'https://raw.githubusercontent.com/vegaluisjose/blob/main/docker.jpeg'

# Process encode the input image into a string
r = requests.get(image_url)
image = Image.open(BytesIO(r.content))

# Set the user prompt
user_prompt = "set in outer space"

# Set the style of SDXL
user_style = {
    "name": "sai-cinematic",
    "prompt": "cinematic film still {prompt} . shallow depth of field, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy",
    "negative_prompt": "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
}

# Invoke OctoShop
sdxl_image, clip_labels, llama2_output = octoshop(image, user_prompt, user_style)

# Display the image
print("CLIP Interrogator output: {}".format(clip_labels))
print("Llama 2 output: {}".format(llama2_output))
display(sdxl_image)

In [None]:
# Let's try it with a person this time around
image_url = 'https://media.snl.no/media/7650/standard_compressed_Mona_Lisa.jpg'
r = requests.get(image_url)
image = Image.open(BytesIO(r.content))
display(image)

In [None]:
# Invoke OctoShop
sdxl_image, clip_labels, llama2_output = octoshop(image, user_prompt, user_style)

# Display the image
print("CLIP Interrogator output: {}".format(clip_labels))
print("Llama 2 output: {}".format(llama2_output))
display(sdxl_image)

## F. Let's test the OctoShop workflow enhanced by AI-based FaceSwap

ControlNet gives us pose consistency on people but the faces lose their likeness. There is a simple solution to this which is built on top of AI-based face swap. We brought up an endpoint that will let us perform AI-based faceswaps to bring the likeness of the mona lisa on our final result.

For this DockerCon23 workshop, we've pre-allocated a face swap model endpoint pool available at the following URL: https://dockercon23-faceswap-4jkxk521l3v1.octoai.run


**If you try this tutorial after October 3rd 2023**, this face swap endpoint will be taken down. You can still create and manage your own by going on https://octoai.cloud/endpoints
* Click on the `Create a Custom Endpoint` blue button.
* Name your endpoint, e.g. `dockercon23-faceswap`.
* Under the `Model container` details:
    * Set the `Container image` to `tmoreau89octo/faceswap:v0.1.3`
    * Leave the `Container port` to its default `8080` value.
    * Leave the `Registry credential` to `Public`.
    * Set the `Health check path` to `/healthcheck`.
    * Enable public access by toggling the switch (usually we'd recommend leaving it disabled but for the purpose of this lab, let's keep things simple).
    * No need to specify secrets.
    * No need to specify environment variables.
* Under `Hardware tier`, select `Medium`.
* Under `Configure autoscaling`:
    * Change Min replicas to `1`. This will ensure at least one replica remains up and running.
    * Change Max replicas to `1`. This will ensure no more than one replica remains up and running.
    * Leave the timeout to `300` seconds.
    * Now hit the `Create` button!

![OctoAI](https://raw.githubusercontent.com/vegaluisjose/blob/main/octoai_faceswap.png)


In [None]:
# Let's define a Face Swap function
def faceswap(src: Image, dest: Image) -> (Image):

    # OctoAI endpoint URLs
    faceswap_endpoint_url = "https://dockercon23-faceswap-4jkxk521l3v1.octoai.run"

    output = client.infer(
        endpoint_url="{}/predict".format(faceswap_endpoint_url),
        inputs={
            "src_image": encode_image(image),
            "dst_image": encode_image(dest)
        }
    )
    fs_sdxl_image_string = output["completion"]["image"]
    fs_sdxl_image = decode_image(fs_sdxl_image_string)

    return fs_sdxl_image

In [None]:
# Add the FaceSwap
sdxl_fs_image = faceswap(image, sdxl_image)

# Display the image
display(sdxl_fs_image)