# Baseline OctoShop Pipeline
In this iPython Notebook, 
* We'll test a newly minted SDXL container locally
* We'll then provide instructions to launch it on OctoAI compute services
* You'll go ahead and learn to generate different prompts to SDXL to get amazing images
* Finally you'll test your SDXL endpoint within an "Baseline OctoShop" pipeline, composed of a CLIP Interrogator, LLAMA2, and SDXL

In [None]:
# Let's import some useful libraries
import requests
import json
from PIL import Image
from io import BytesIO
from base64 import b64encode, b64decode
from IPython.display import display

# Let's import the OctoAI Python SDK
from octoai.client import Client

# A helper function that reads a PIL Image objects and returns a base 64 encoded string
def encode_image(image: Image) -> str:
    buffer = BytesIO()
    image.save(buffer, format="png")
    im_base64 = b64encode(buffer.getvalue()).decode("utf-8")
    return im_base64

# A helper function that reads a base64 encoded string and returns a PIL Image object
def decode_image(image_str: str) -> Image:
    return Image.open(BytesIO(b64decode(image_str)))

# Initialize the OctoAI Client
# This will make it easier to interface with the model containers
client = Client()

## A. Test your SDXL container locally
Make sure you've completed Sections 1 and 2 of Lab 1 described in the README.md.

As a recap, the SDXL model container takes as input a dictionary with the following keys:
* `prompt` (string) - the SDXL text prompt
* `negative_prompt` (string) - the SDXL text prompt
* `guidance_scale` (float) - the guidance scale (a.k.a. the configuration scale) of SDXL
* `num_inference_steps` (int) - the number of SDXL denoising steps
* `width` (int) - the width of the SDXL output image
* `height` (int) - the height of the SDXL output image
* `seed` (int) - seed of the image generation

SDXL model container returns the following as outputs:
* `image` (string) - a base64-encoded image

In [None]:
# Let's prepare our SDXL inference endpoint payload
SDXL_payload = {
    "prompt": "a photo of an octopus playing chess",
    "negative_prompt": "blurry photo, distortion, low-res, bad quality",
    "num_inference_steps": 20,
    "guidance_scale": 7.5,
    "width": 1024,
    "height": 1024,
    "seed": 1
}

# Run inference on the OctoAI SDXL model container running locally
output = client.infer(
    endpoint_url="http://localhost:8080/predict",
    inputs=SDXL_payload
)

# Get the base64 encoded image string
image_string = output["completion"]["image"]

# Convert to a PIL image
sdxl_image = decode_image(image_string)

# Display your masterpiece!
display(sdxl_image)

## B. Upload the image to your DockerHub
Now sign onto your DockerHub in a browser: https://hub.docker.com/

Create a repository by clicking on the `Create repository` blue button. Name it `dockercon-sdxl`, and provide a short description as you see fit. Leave it public. Hit the `Create` blue button.

Once that's done, note the full path to the repo, as `<dockerhub-username>/dockercon-sdxl`.

Under `dockercon23/lab1/sdxl`, run the following to tag the Docker image we just tested to a versioned image we'll push to the newly created DockerHub repository.
```
docker tag sdxl:latest <dockerhub-username>/dockercon-sdxl:v0.1.0
```

Then push the tagged SDXL model image!

```
docker push <dockerhub-username>/dockercon-sdxl:v0.1.0
```

This should take about 10 minutes given that the image is quite voluminous (that's pretty common for Generative AI models with their huge sets of weights!).

Refresh the dockerhub page of the sdxl repository, and you should see a new `v0.1.0` image that was uploaded just now!

![Docker](https://raw.githubusercontent.com/vegaluisjose/blob/main/docker_sdxl.png)

If you don't feel like waiting for the full image to upload, you can go ahead and use this image that we've prebuilt for step D: [tmoreau89octo/dockercon-sdxl:v0.1.0](https://hub.docker.com/layers/tmoreau89octo/dockercon-sdxl/v0.1.0/images/sha256-b6d5d858e98fc9fb6482a52d7b8ec47a73c631614a5a99ec8e890bb83f15a277?context=repo).

## C. Deploy the SDXL image on an OctoAI endpoint
Sign onto your OctoAI account in a browser: https://octoai.cloud/endpoints

Click on the `Create a Custom Endpoint` blue button.

Name your endpoint, e.g. `dockercon23-sdxl`.

Under the `Model container` details:
* Set the `Container image` to `<dockerhub-username>/dockercon-sdxl:v0.1.0`
* Leave the `Container port` to its default `8080` value.
* Leave the `Registry credential` to `Public`.
* Set the `Health check path` to `/healthcheck`.
* Enable public access by toggling the switch (usually we'd recommend leaving it disabled but for the purpose of this lab, let's keep things simple).
* No need to specify secrets.
* No need to specify environment variables.

Under `Hardware tier`, select `Medium`. The `Small` tier is unfortunately not powerful enough to run SDXL.

Under `Configure autoscaling`:
* Change Min replicas to `1`. This will ensure at least one replica remains up and running.
* Change Max replicas to `1`. This will ensure no more than one replica remains up and running.
* Leave the timeout to `300` seconds.

![OctoAI](https://raw.githubusercontent.com/vegaluisjose/blob/main/octoai_sdxl.png)

Now hit the `Create` button!

## D. Manage your SDXL OctoAI endpoint

The endpoint will need to "cold start" and this could take about 10 minutes.

On the OctoAI endpoint Info view, you'll know that your endpoint is warming up because the endpoint status will be set to `Starting`. You can click on the blinking square under `Replicas` to see that the container image is being pulled onto the Octoai endpoint replica.

Once the status is set to `Running`, you can view the logs by clicking on the `View logs` button.

You can at any time ramp your endpoint down, by clicking on the `Pause endpoint` button.

***Last but not least, save the `Endpoint URL` that's displayed in the `dockercon23-sdxl` model endpoint Info view. We'll use it in the next step, and when you launch your Discord bot***

![OctoAICreated](https://raw.githubusercontent.com/vegaluisjose/blob/main/octoai_sdxl_created.png)

You can go to Section E to experiment with SDXL styles while you wait for your endpoint to warm up, as it doesn't require the SDXL OctoAI endpoint to be up and running.

## E. Play with SDXL Styles!

While you wait for steps B and D which each take a while, you can experiment with SDXL styles.

The key to getting beautiful, stylized images with SDXL is to provide the right prompt and negative prompt. We refer to this process as "prompt engineering".

You can find inspiration on what kinds of beautiful images you can generate with SDXL by browsing through this beautiful image gallery: https://moby-dock.vercel.app/

SDXL styles prompts can be accessed from this open source project: 
* https://github.com/twri/sdxl_prompt_styler/blob/main/sdxl_styles_sai.json
* https://github.com/twri/sdxl_prompt_styler/blob/main/sdxl_styles_twri.json

To use these styles use the code below to perform some prompt engineering! Play with different styles!

In [None]:
# The original SDXL prompt
prompt = "a photo of an octopus playing chess"

# We copy the style entry for the game retro arcade from https://github.com/twri/sdxl_prompt_styler/blob/0664f1e378661888bc0f0fc101c98a1a696e658e/sdxl_styles_twri.json#L222-L226
sdxl_style = {
    "name": "game-retro arcade",
    "prompt": "retro arcade style {prompt} . 8-bit, pixelated, vibrant, classic video game, old school gaming, reminiscent of 80s and 90s arcade games",
    "negative_prompt": "modern, ultra-high resolution, photorealistic, 3D"
}

# Let's go ahead and apply the style to our SDXL payload
SDXL_payload = {
    "prompt": sdxl_style["prompt"].replace("{prompt}", prompt),
    "negative_prompt": sdxl_style["negative_prompt"],
    "num_inference_steps": 20,
    "guidance_scale": 7.5,
    "width": 1024,
    "height": 1024,
    "seed": 1
}

# Run inference on the OctoAI SDXL model container running locally
output = client.infer(
    endpoint_url="http://localhost:8080/predict",
    inputs=SDXL_payload
)

# Get the base64 encoded image string
image_string = output["completion"]["image"]

# Convert to a PIL image
sdxl_image = decode_image(image_string)

# Display your masterpiece!
display(sdxl_image)

## E. Test your SDXL container served on an OctoAI endpoint
In this step, we'll test the SDXL container in the exact same way as we did when we ran the container locally on the AWS dev instance, except that now we'll be sending a POST request to a remote endpoint.

You'll need to change the SDXL endpoint URL from `http://localhost:8080` to your unique endpoint URL.

In [None]:
# FIXME: Replace "http://localhost:8080" with your OctoAI SDXL endpoint URL below
sdxl_endpoint_url = "http://localhost:8080"
# Make sure you've overwritten the URL!!!
assert sdxl_endpoint_url != "http://localhost:8080"

# Run inference on the OctoAI SDXL model container running locally
output = client.infer(
    endpoint_url="{}/predict".format(sdxl_endpoint_url),
    inputs=SDXL_payload
)

# Get the base64 encoded image string
image_string = output["completion"]["image"]

# Convert to a PIL image
sdxl_image = decode_image(image_string)

# Display your masterpiece!
display(sdxl_image)

## F. Test the CLIP Interrogator Model on the Docker Logo
Now that we've tested that our SDXL model endpoint works we'll proceed to testing the other models used in the "Baseline OctoShop" pipeline. Let's start with the CLIP Interrogator model first.

This model takes in an image, and produces a text-based description of the image. Think of it as reverse Stable Diffusion, which takes in text and produces an image.

Note that for this DockerCon23 workshop, we've pre-allocated a CLIP Interrogator endpoint pool available at the following URL: https://dockercon23-clip-4jkxk521l3v1.octoai.run
You don't need to do anything!

**If you try this tutorial after October 3rd 2023**, this CLIP endpoint will be taken down. You can still create and manage your own by going on https://octoai.cloud/templates
* Under the list of Example Models, select `Image captioning (CLIP)`, and click on `Clone`
* Name your endpoint, set Min replicas and Max replicas to 1, and Enable public access with the toggle
* You can then launch your endpoint by hitting `Clone`
* Once the endpoint is running, you can copy the URL of the OctoAI endpoint, and set `clip_endpoint_url` to it below

![OctoAICreated](https://raw.githubusercontent.com/vegaluisjose/blob/main/octoai_clip.png)

We'll test the CLIP Interrogator in a Image to Text and Text to Image workflow using SDXL to see what we end up with! Let's try this out.

In [None]:
# We provide a CLIP Interrogator endpoint for you, but with OctoAI you can launch your own by using one of OctoAI's many model templates
clip_endpoint_url = "https://dockercon23-clip-4jkxk521l3v1.octoai.run"

# Let's grab the Docker logo
r = requests.get('https://raw.githubusercontent.com/vegaluisjose/blob/main/docker.jpeg')
image = Image.open(BytesIO(r.content))

# Display the Docker logo
display(image)

In [None]:
# The CLIP interrogator request is simple
# We use CLIP interrogator's fast mode to get a response quickly
clip_request = {
    "mode": "fast",
    "image": encode_image(image),
}

# Run inference on the CLIP model container running locally
output = client.infer(
    endpoint_url="{}/predict".format(clip_endpoint_url),
    inputs=clip_request
)

# Get the labes from the output dictionary
clip_labels = output["completion"]["labels"]

# Print the CLIP labels
print("Here is what CLIP interrogator sees: {}".format(clip_labels))

In [None]:
# Let's just keep the first CLIP comma-separated answer here to keep the prompt simple
clip_labels = clip_labels.split(',')[0]
print("Shortened CLIP interrogator labels: {}".format(clip_labels))

In [None]:
# Next let's try to generate a cinematic SDXL image out of those CLIP labels!

# Let's use a cinematic style: https://github.com/twri/sdxl_prompt_styler/blob/0664f1e378661888bc0f0fc101c98a1a696e658e/sdxl_styles_sai.json#L17-L21
sdxl_style = {
    "name": "sai-cinematic",
    "prompt": "cinematic film still {prompt} . shallow depth of field, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy",
    "negative_prompt": "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
}

# Update the SDXL prompts
SDXL_payload["prompt"] = sdxl_style["prompt"].replace("{prompt}", clip_labels)
SDXL_payload["negative_prompt"] = sdxl_style["negative_prompt"]

# Run inference on the OctoAI SDXL model container running locally
output = client.infer(
    endpoint_url="{}/predict".format(sdxl_endpoint_url),
    inputs=SDXL_payload
)

# Get the base64 encoded image string
image_string = output["completion"]["image"]

# Convert to a PIL image
sdxl_image = decode_image(image_string)

# Display your masterpiece!
display(sdxl_image)

## G. Using LLMs to manipulate the Docker Logo based on a User Prompt
In section E, we turned the Docker logo into a hyper realistic photo by using a CLIP interrogator model to obtain a text-based explanation of what the logo was, then fed that text into and SDXL model to obtain a hyperrealistic version of the logo.

In the next section we'll use an LLM, specifically LLAMA2-7B in order to alter the image we're generating even more. We'll base ourselves on a user prompt, which asks to set the image on the moon. We'll feed that prompt and the CLIP labels into the LLM in order to obtain a richer picture of our whale in space to then feed into SDXL.

Here again, for this DockerCon23 workshop, we've pre-allocated a LLAMA2 model endpoint pool available at the following URL: https://dockercon23-llama2-4jkxk521l3v1.octoai.run
You don't need to do anything!

**If you try this tutorial after October 3rd 2023** however, this LLAMA2 endpoint will be taken down. You can still create and manage your own by going on https://octoai.cloud/templates
* Under the list of Example Models, click on `LLama 2 7B Chat`
* Then you can hit the big `Clone Template` button
* Once the endpoint is running, you can copy the URL of the OctoAI endpoint, and set `llama2_endpoint_url` to it below

![OctoAICreated](https://raw.githubusercontent.com/vegaluisjose/blob/main/octoai_llama2.png)

Let's go ahead and learn how to use the LLAMA2 model!

In [None]:
# We provide a LLAMA2 endpoint for you, but with OctoAI you can launch your own by using one of OctoAI's many model templates
llama2_endpoint_url = "https://dockercon23-llama2-4jkxk521l3v1.octoai.run/v1/chat/completions"

# Let's start with the user prompt which we'll set as follows
user_prompt = "set in outer space"

# Now let's engineer a prompt to set what CLIP Interrogator sees on the moon
llama_prompt = "\
### Instruction: In a single sentence, {}: {}\n\
### Response:".format(user_prompt, clip_labels)

# Let's print the LLAMA2 prompt before we feed it into LLAMA2
print("LLAMA2 prompt:\n{}".format(llama_prompt))

In [None]:
# Now let's prepare an LLM prompt that describes the tasks to accomplish to our LLAMA2 model
# You can leave the parameters below as-is.
llama_inputs = {
    "model": "llama-2-7b-chat",
    "messages": [
        {
            "role": "system",
            "content": "Below is an instruction that describes a task. Write a response that appropriately completes the request."
        },
        {
            "role": "user",
            "content": "{}".format(llama_prompt)
        }
    ],
    "stream": False,
    "max_tokens": 256
}

# Send to LLAMA endpoint and do some post processing on the response stream
outputs = client.infer(endpoint_url=llama2_endpoint_url, inputs=llama_inputs)

# Get the LLAMA2 output
llama2_text = outputs.get('choices')[0].get("message").get('content')

# Print the LLAMA2 story
print("LLAMA2 generated text below:\n{}".format(llama2_text))

In [None]:
# Next let's feed this LLAMA2 generated story into SDXL
SDXL_payload["prompt"] = sdxl_style["prompt"].replace("{prompt}", llama2_text)

# Run inference on the OctoAI SDXL model container running locally
output = client.infer(
    endpoint_url="{}/predict".format(sdxl_endpoint_url),
    inputs=SDXL_payload
)

# Get the base64 encoded image string
image_string = output["completion"]["image"]

# Convert to a PIL image
sdxl_image = decode_image(image_string)

# Display your masterpiece!
display(sdxl_image)

## H. Let's recap through the entire Baseline OctoShop workflow now

Now that we've introduced each model consecutively, let's tie it all together into the OctoShop workflow.

1. User provides an image as input (docker logo) and a prompt (set in space).

2. Image goes through CLIP interrogator and produces a text description of the image.

3. The user prompt gets fed along with the CLIP interrogator description into LLAMA2 that describes a new scene.

4. That textual description of the scene is fed into SDXL to generate a brand new photo of the Docker logo set in space!

We've provided the whole flow below of OctoShop for you to play around with the Generative AI workflow! 
* Try changing the URL of the input image to a different image!
* Try changing the user prompt to a different prompt!
* Try changing the user style to a different style!
* Try changing any combination of the above at the same time!
* And feel free to tweak the various settings to familiarize yourself a bit more to the different models that are being invoked.

In [None]:
# Let's define the OctoShop as a self-contained function
def octoshop(image: Image, user_prompt: str, user_style: dict) -> (Image, str, str):

    # OctoAI endpoint URLs
    clip_endpoint_url = "https://dockercon23-clip-4jkxk521l3v1.octoai.run"
    llama2_endpoint_url = "https://dockercon23-llama2-4jkxk521l3v1.octoai.run/v1/chat/completions"
    sdxl_endpoint_url = "http://localhost:8080" # ADD_YOUR_SDXL_ENDPOINT_URL_HERE
    assert sdxl_endpoint_url != "http://localhost:8080"

    # STEP 1
    # Feed that image into CLIP interrogator
    clip_request = {
        "mode": "fast",
        "image": encode_image(image),
    }
    output = client.infer(
        endpoint_url="{}/predict".format(clip_endpoint_url),
        inputs=clip_request
    )
    clip_labels = output["completion"]["labels"]
    clip_labels = clip_labels.split(',')[0]

    # STEP 2
    # Feed that CLIP label and the user prompt into a LLAMA model
    llama_prompt = "\
    ### Instruction: In a single sentence, {}: {}\
    ### Response:".format(user_prompt, clip_labels)
    llama_inputs = {
        "model": "llama-2-7b-chat",
        "messages": [
            {
                "role": "system",
                "content": "Below is an instruction that describes a task. Write a response that appropriately completes the request."
            },
            {
                "role": "user",
                "content": "{}".format(llama_prompt)
            }
        ],
        "stream": False,
        "max_tokens": 256
    }
    outputs = client.infer(endpoint_url=llama2_endpoint_url, inputs=llama_inputs)
    llama2_text = outputs.get('choices')[0].get("message").get('content')

    # STEP 3
    # Feed the LLAMA2 text into the SDXL model
    SDXL_payload = {
        "prompt": user_style["prompt"].replace("{prompt}", llama2_text),
        "negative_prompt": user_style["negative_prompt"],
        "num_inference_steps": 20,
        "guidance_scale": 7.5,
        "width": 1024,
        "height": 1024,
        "seed": 1
    }
    # Run inference on the OctoAI SDXL model container running locally
    output = client.infer(
        endpoint_url="{}/predict".format(sdxl_endpoint_url),
        inputs=SDXL_payload
    )
    image_string = output["completion"]["image"]
    sdxl_image = decode_image(image_string)

    return sdxl_image, clip_labels, llama2_text

In [None]:
# Set the to the image URL
image_url = 'https://raw.githubusercontent.com/vegaluisjose/blob/main/docker.jpeg'

# Process encode the input image into a string
r = requests.get(image_url)
image = Image.open(BytesIO(r.content))

# Set the user prompt
user_prompt = "set in outer space"

# Set the style of SDXL
user_style = {
    "name": "sai-cinematic",
    "prompt": "cinematic film still {prompt} . shallow depth of field, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy",
    "negative_prompt": "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
}

# Invoke OctoShop
sdxl_image, clip_labels, llama2_output = octoshop(image, user_prompt, user_style)

# Display the image
print("CLIP Interrogator output: {}".format(clip_labels))
print("LLAMA2 output: {}".format(llama2_output))
display(sdxl_image)