# Build, test and deploy a Stable Diffusion 2 endpoint in Paperspace

This notebook is a walkthrough of how to use the simple `api-deployment` repo to create a Docker image with a Stable Diffusion 2 endpoint which can be deployed locally and then in Paperspace. 

The container created using the cloned repository is also available in the Graphcore public registry on Dockerhub and can be used to directly launch a deployment locally or on Paperspace by skipping to Step 2 or 3 respectively. The minor changes to access and run the publicly available image rather than the locally built image from Step 1 will be outlined for each of these steps.

Here, we'll cover:
* Cloning and running up the FastAPI service from this notebook to create a locally hosted endpoint.
* How to access and send requests to the endpoint and receive model output.
* How to build a container to deploy the endpoint with Paperspace deployments, and access the model.

The public model inference images available on Graphcore's Docker Hub have all of the necessary dependencies 'baked in', including executables and model binaries, to make the process of serving up an endpoint as smooth as possible. The internals of the image are based on the [api-deployment](https://github.com/graphcore/api-deployment) repository model-serving architecture. This is designed to be a straightforward example of serving a model with FastAPI and running up a local endpoint. Once you've tested your local endpoint functionality, you can use the same container to launch up a deployment in [Paperspace](https://www.paperspace.com/)!

First, install all required dependencies for this notebook:

In [None]:
! # # Buildah installation:https://fabianlee.org/2022/08/02/buildah-installing-buildah-and-podman-on-ubuntu-20-04/
! chmod +x setup.sh
! ./setup.sh > /dev/null
! buildah version

! pip install gradient
! pip install gradio
! pip install matplotlib

## Clone the repo

First, clone the repository containing Stable Diffusion and the files for serving.

In [1]:
! git clone https://github.com/graphcore/stable-diffusion

Cloning into 'stable-diffusion'...
remote: Enumerating objects: 176, done.[K
remote: Counting objects: 100% (176/176), done.[K
remote: Compressing objects: 100% (115/115), done.[K
remote: Total 176 (delta 86), reused 124 (delta 50), pack-reused 0[K
Receiving objects: 100% (176/176), 636.90 KiB | 1.72 MiB/s, done.
Resolving deltas: 100% (86/86), done.


The repo contains all of the basic necessities for serving an endpoint using stable diffusion. The `Dockerfile` specifies requirements and builds and runs the Docker container, `src` contains the model and model specific requirements, the FastAPI based endpoint for the model, and the model-independent files for creating the server. `run_server.sh` is a script to launch the server, which can be run directly or runs automatically as part of the docker container. 

In [4]:
! cd stable-diffusion && ls -l

total 56
-rw-r--r-- 1 arsalanu all   576 May  9 11:15 docker-compose.yml
-rw-r--r-- 1 arsalanu all   512 May  9 11:15 Dockerfile
-rw-r--r-- 1 arsalanu all  1066 May  9 11:15 LICENSE
-rw-r--r-- 1 arsalanu all   578 May  9 11:15 ps-deploy-config.yaml
-rw-r--r-- 1 arsalanu all 10335 May  9 11:15 README.md
-rw-r--r-- 1 arsalanu all    40 May  9 11:15 requirements.txt
-rw-r--r-- 1 arsalanu all 19617 May  9 11:15 running_local_endpoint.ipynb
-rwxr-xr-x 1 arsalanu all  2378 May  9 11:15 run_server.sh
drwxr-xr-x 4 arsalanu all   112 May  9 11:15 src
drwxr-xr-x 2 arsalanu all   146 May  9 11:15 utils


## Run a local endpoint

While building the container with all features/executables baked in is needed for launching a public Paperspace deployment, it is not a necessary step if you want to test and run the endpoint locally, as we can do this directly from the repo using the `run_server.sh` script. 

First, install all of the dependencies for serving the model, as well as for the model itself.

In [None]:
! cd stable-diffusion && pip install -r requirements.txt && pip install -r src/models/stable_diffusion_2_txt2img_512/requirements.txt

The terminal output when running the server is 'endless', and will block the above cell from ending, so for the purpose of the notebook we run the server as a background process. Once this command is run, the endpoint server will start warming up, performing any necessary preparation required to use the endpoint, such as building the model executables or creating and loading any other required binaries.

In [None]:
import os
_STORE_SYSTEM = get_ipython().system
get_ipython().system = os.system # Allows running server processes in background in notebook

! cd stable-diffusion && ./run_server.sh &

get_ipython().system = _STORE_SYSTEM # Revert back to original iPython shell after starting server

In either instance, we need to wait for the server to be ready before actually sending any requests to the endpoint. We can wait for the built-in server health-check feature to return a positive status using a simple looping function. For Stable Diffusion, this step may take up to a few minutes. First, import the necessary packages for the function:

In [None]:
import requests
import json
import random
import time

Then we can instantiate our simple function which waits for the readiness status:

In [None]:
def wait_for_readiness(url):
    while True:
        try:
            response = requests.get(f"{url}/readiness")
            response = response.json()
            if response['message'] == 'Readiness check succeeded.': 
                print(f"Server ready - {response['message']}")
                break
            else:
                print(f"Server waiting - {response['message']}")
                raise Exception
        except Exception as e:
            time.sleep(2)
    return True

Next, we time and call the function:

In [None]:
print("Waiting for readiness...")

warmup_start = time.perf_counter()
ready = wait_for_readiness("http://0.0.0.0:8100")

print(f"Warm up time: {time.perf_counter() - warmup_start}s")

The message should say 'Readiness check succeeded', which means we are ready to start generating images with the model using the live endpoint.

Lets create a dictionary for the parameters to send to the model. This is specific to and defined by the model endpoint that has been created. For Stable Diffusion, we must pass:

* `prompt`: Main body of text describing the image we want to create.
* `random_seed`: Can be used to emulate a deterministic image output from the same prompt each time (we set this to random to observe variation in the image).
* `guidance scale`: Specific to Stable Diffusion, it controls how strongly the generated image will follow the text output.
* `return_json`: Defines whether to return a JSON object in the response or not, to receive an encoded image, we want to set this to True.
* `negative_prompt`: Defines any aspects we don't want to see in the image.
* `num_inference_steps`: The number of sampling steps undertaken by the model, increasing this up to a point should improve the image quality of the generated image, 25-50 steps is a reasonable range for this.

In [None]:
model_params = {
      "prompt": "big red dog",
      "random_seed": random.randint(0,99999999),
      "guidance_scale": 9,
      "return_json": True,
      "negative_prompt": "string",
      "num_inference_steps": 25
}

Next, we can use `requests` to send a POST call to the REST endpoint at the IP address that the endpoint is running on. This will return an image in the response JSON body.

In [None]:
response = requests.post("http://0.0.0.0:8100/stable_diffusion_2_txt2img_512", json=model_params)

if response.status_code != 200:
    print(response.status_code)
    
response = response.json()

Now, the image has been returned in Base64 encoded form within the JSON, we can decode this using the `base64` and `io` libraries to visualise the image. First, we decode the images returned by the model and convert them to PIL RGB images - in this case there is only one image.

In [None]:
from PIL import Image
import base64
import io

images_b64 = [i for i in response['images']]

pil_images = []
for b64_img in images_b64:
    base64bytes = base64.b64decode(b64_img)
    bytesObj = io.BytesIO(base64bytes)
    img = Image.open(bytesObj)
    
    pil_images.append(img)
    
print("Number of images returned: ", len(pil_images))

Finally, we can view the images with `matplotlib`:

In [None]:
import matplotlib.pyplot as plt

plt.axis('off')
plt.imshow(pil_images[0])
plt.show()

## Deploy on Paperspace

To deploy on Paperspace, we need a container image which includes server files and model that we used previously to run the local endpoint. The container image for this example is already available in the Graphcore public registry and can be directly defined in the Paperspace deployment configuration, which will pull and run the image.

Alternatively, it is also possible to build and push the container to your own Dockerhub registry. The address for the container can be used in the deployment configuration instead, and the deployment will run using your container.

### (Optional) Build and upload the image manually

In a local workspace, to build the container we can simply run:
```
docker build -t <local_container_name> .
```
From the root directory of the repository.

As we are using the container within a Paperspace notebook VM from a container, it is preferable to use an alternative container manager to build the image with the available user privileges. For this purpose, we can use `buildah` and `podman` to run Docker-equivalent commands on a container. To build the container, we can use `buildah bud` rather than `Docker build` as such:

In [None]:
! buildah bud -t local-sd2-endpoint stable-diffusion/

Next, tag the image with the name of your Dockerhub registry and the name with which to upload the image.

In [None]:
username = input(prompt="Enter your Dockerhub username")

In [None]:
container_name = input(prompt="Enter an image name for the container to be uploaded to your Dockerhub registry")

In [None]:
! buildah tag local-sd2-endpoint $username/$container_name

Finally, push the built image to your personal Dockerhub registry.

In [None]:
! buildah push docker.io://$username/$container_name

Now that we have tested the deployment using our local endpoint, we can create a full deployment on IPUs in Paperspace. For this stage, we require a built container serving an endpoint as created in the previous steps. The essential feature of deployment is to create a Paperspace deployment specification `.yaml` which contains the necessary information to launch the deployment. In our `stable-diffusion` repo, there is a ready-to-go specification `ps-deploy-config.yaml` which can be used to generate a deployment. Lets have a look at the contents of this spec:

```
enabled: true
image: gcapidev/stable-diffusion-2-512-deployment
port: 8100
env:
  - name: SERVER_MODELS
    value: '[{"model":"stable_diffusion_2_txt2img_512", "replicas":"2"}]'
  - name: POPTORCH_CACHE_DIR
    value: /src/model_cache
  - name: HUGGINGFACE_HUB_CACHE
    value: /src/model_cache
  - name: HF_HOME
    value: /src/model_cache
resources:
  replicas: 1
  instanceType: Bow-POD16
  autoscaling:
    enabled: true
    maxReplicas: 2
    metrics:
      - metric: requestDuration
        summary: average
        value: 2
healthChecks:
  readiness:
    path: /readiness
```

The key things to specify in the file are:
* `image`: the address of the container to be deployed on Dockerhub in the format `<username>/<container_name>`. For example, if deploying from the Graphcore public registry, the username will be `graphcore`. 
* `SERVER_MODELS` environment variable: 
    * `"model"`: A container may have multiple model endpoints within it, defining this variable in the config allows you to specify which of the models you want to start a deployment with.
    * `"replicas"`: This defines the maximum number of replicas the model can create within a single machine when under load (internal autoscaling) e.g., if the model uses 8 IPUs and you are deploying on a number of IPU-POD16s, each POD16 will be able to launch up to 2 instances of the model if the replicas are set to 2. This allows for internal scaling (within machine) as well as the existing external scaling (over multiple machines).
* `replicas` in `resources`: This is the number of IPU machines the endpoint is replicated over.
* `instanceType`: Which IPU machine to use, available options include an IPU-POD4, IPU-POD16 or Bow-POD16.
* `autoscaling`: Allows you to set maximum replicas to *externally* scale over, and the metrics to use to determine when to scale, in the above case, we increase the number of replicas if the duration of request responses exceeds 2 seconds.

**NOTE: By default, this notebook will use the Graphcore public registry image for Stable Diffusion - if you would like to deploy the custom image uploaded to your personal registry, open the `ps-deploy-config.yaml` from the cloned repository from the folder tree on the left, and modify the `image` field to point to your container.**

### Launch the deployment

Before deploying, ensure you have an active project on the Paperspace console which you will be deploying from. Then, we can use the Gradient CLI to deploy directly from the command line.

In the Paperspace console, generate an API key while logged in to your account from under `Team settings`. Use this API key to log in to your account from the Gradient CLI:

In [5]:
import getpass

TOKEN = getpass.getpass(prompt='Enter your Gradient API key:')

KeyboardInterrupt: Interrupted by user

In [4]:
! gradient apiKey $TOKEN

value


Next, check which clusters contain IPU machines: 

In [9]:
! gradient clusters machineTypes list

[0m[K[0m[?25h[0m[0m+-----------+-----------+-----------+--------------+-----------+-----------------+-----------+[0m[0m[0m[0m[0m[0m[0m[0m
| Name      | Kind      | CPU Count | RAM [Bytes]  | GPU Count | GPU Model       | Clusters  |
+-----------+-----------+-----------+--------------+-----------+-----------------+-----------+
| A100      | a100      | 12        | 96636764160  | 1         | Ampere A100     | clg07azjl |
| A100-80G  | a100-80g  | 12        | 96636764160  | 1         | Ampere A100 80G | clg07azjl |
| A4000     | a4000     | 8         | 48318382080  | 1         | Ampere A4000    | clg07azjl |
| A4000x2   | a4000     | 16        | 96636764160  | 2         | Ampere A4000    | clg07azjl |
| A5000     | a5000     | 8         | 48318382080  | 1         | Ampere A5000    | clg07azjl |
| A5000x2   | a5000     | 16        | 96636764160  | 2         | Ampere A5000    | clg07azjl |
| A6000     | a6000     | 8         | 48318382080  | 1         | Ampere A6000    | clg07a

Deploy on paperspace using `gradient deployments create` with the arguments:
* `--name`: the desired name for your deployment
* `--projectId`: Obtain your project ID from your project (this is on the project page on Paperspace and also printed when a project is created from the terminal)
* `--spec`: Define the specification file
* `--clusterId`: Obtain the cluster ID from Step 2
    
Lets set these values and create the deployment:

In [15]:
deployment_name = input(prompt='Enter a name for your deployment:')

Enter a name for your deployment: aa


In [16]:
project_id = input(prompt='Enter your Paperspace project ID:')

Enter your Paperspace project ID: jjn


Then launch the deployment with the following command.

In [None]:
! gradient deployments create \
    --name $deployment_name \
    --projectId $project_id \
    --spec "./stable-diffusion/ps-deploy-config.yaml" \
    --clusterId "clehbtvty"

This will return a unique deployment ID for your deployment. You can view the spec, URL, and deployment run status with:
```
gradient deployments get --id <your_deployment_id>
```

You can also view current metrics, logs and deployment status in the Paperspace console, inside your created project by switching to the *Deployments* tab and clicking on the running deployment.

The URL is the address you should use to request the endpoint. The process for sending requests is the same as with `localhost` but requires simply replacing `localhost` with the Gradient deployment URL generated for your endpoint.

In [17]:
your_deployment_id = input(prompt='Enter your deployment ID:')

Enter your deployment ID: mnj


In [None]:
! gradient deployments get --id $your_deployment_id

You can update the deployment, for example to change the spec or stop the deployment with:
```
gradient deployments update 
    --id <your_deployment_id>
    --name <your_deployment_name>
    --projectId <your_project_id>
    --spec <updated_deployment_config_spec>
    --clusterId <cluster_id>
```
To stop the deployment, update the spec with the `enabled` value set to `false`.
You can also change the environment variables in the spec by modifying the `SERVER_MODELS` variable.

## (Optional) Create a simple demo frontend for your deployment with **Gradio**

You can create a create an easy frontend demo for your deployment using Gradio. All you need is the URL of your deployment, and a simple function to process the Stable Diffusion 2 input parameters, request the model and decode the output image - the same process we set out for the local endpoint earlier:

This notebook points by default to the locally hosted endpoint. If you would like to run the Gradio app with the launched Paperspace deployment, change the following cell to point to the generated Paperspace deployment URL.

In [None]:
URL = "http://0.0.0.0:8100"

In [None]:
import gradio as gr
import numpy as np

def stable_diffusion_2_inference(prompt, guidance_scale, num_inference_steps):
    model_params = {
      "prompt": prompt,
      "random_seed": random.randint(0,99999999),
      "guidance_scale": guidance_scale,
      "return_json": True,
      "negative_prompt": "string",
      "num_inference_steps": num_inference_steps
    }
    
    response = requests.post(f"{URL}/stable_diffusion_2_txt2img_512", json=model_params)
    response = response.json()
    
    images_b64 = [i for i in response['images']]
    pil_images = []
    for b64_img in images_b64:
        base64bytes = base64.b64decode(b64_img)
        bytesObj = io.BytesIO(base64bytes)
        img = Image.open(bytesObj)

        pil_images.append(img)
    
    return np.array(pil_images[0])

Then, we can initialise the Gradio app to launch a GUI from inside this notebook by defining our inputs, outputs and the processing function:

In [None]:
gr.close_all()
demo = gr.Interface(
    fn=stable_diffusion_2_inference, 
    inputs=[gr.Textbox(value="Ice skating on the moon"),
            gr.Slider(1,50,value=9, step=1, label='Guidance scale'),
            gr.Slider(1,100,value=25, step=1, label='Number of steps')
           ], 
    outputs=gr.Image(shape=(512,512))
    )

demo.launch(share=True)

## Deleting the deployment

Finally, to delete your deployment completely, simply run the next cell:

In [None]:
gradient deployments delete --id $your_deployment_id