# Finetune Stable Diffusion and generate images with Intel Habana Gaudi
In this tutorial, we will go through how to finetune a stable diffusion model and generate images with Intel Habana Gaudi(HPU). Moreover, we will show how to adapt an existing HPU example to use Ray. Once you learned how to make the adaption, you can easily access more models and optimizations that has been developed for HPU by "Ray-ifying" examples from [optimum-habana/examples](https://github.com/huggingface/optimum-habana/tree/main/examples) and [Model References](https://github.com/HabanaAI/Model-References)!

Now, let's see how we can "Ray-ify" this [stable diffusion example](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion)

## Fine-tuning with Textual Inversion
First, let's start with fine-tuning. Download this [script](https://github.com/huggingface/optimum-habana/blob/main/examples/stable-diffusion/training/textual_inversion.py) which fine-tunes a Stable Diffusion model on HPU. You can follow [this document](https://github.com/huggingface/optimum-habana/blob/main/examples/stable-diffusion/training/README.md#textual-inversion) and try it once without using Ray.

In order to run this script on Ray, we need to make some changes. But don't worry, it's actually pretty simple. Basically, we just need to identify the main training loop, and run it in TorchTrainer.

First, check this block at the end of the file:
```python
if __name__ == "__main__":
    main()
```

Originally, this script will be started by MPI if multiple workers are used. But with Ray, we should setup TorchTrainer and supply a main function, which is `main()` in this example.

Therefore, it becomes straight-forward to make these changes:
```python
if __name__ == "__main__":
    import ray
    from ray import train
    from ray.train import ScalingConfig, Checkpoint, CheckpointConfig, RunConfig
    from ray.train.torch import TorchTrainer, TorchConfig

    ray.init(address="auto")

    # Configure computation resources
    # In ScalingConfig, require an HPU for each worker
    scaling_config = ScalingConfig(num_workers=1, resources_per_worker={"CPU": 1, "HPU": 1})
    # Set backend to hccl in TorchConfig
    torch_config = TorchConfig(backend = "hccl")
    # Initialize a Ray TorchTrainer
    trainer = TorchTrainer(
        train_loop_per_worker=main,
        torch_config=torch_config,
        scaling_config=scaling_config,
    )

    result = trainer.fit()
```

Before we try to run, we need to inspect the `main` function to see if it can work in such way. As we skim through the function, it's clear that it does not take any input parameters, but it calls `parse_args` to get all configurations. Originally, these configurations are set in command line by MPI. But because we switch to Ray to start the workers, command line arguments are no longer accessible. Therefore, `parse_args` should be called in the main program and passed to `main` function.

Apart from this, no other changes are necessary. By inserting the following code, you can now run the script on Ray.

In [None]:
# Replace lines below:
# def main():
#     args = parse_args()
# with these lines:
def main(config):
    args = config["args"]

In [None]:
# Replace lines below:
# if __name__ == "__main__":
#     main()
# with these lines:
if __name__ == "__main__":
    import ray
    from ray import train
    from ray.train import ScalingConfig, Checkpoint, CheckpointConfig, RunConfig
    from ray.train.torch import TorchTrainer, TorchConfig

    ray.init(address="auto")

    # Configure computation resources
    # In ScalingConfig, require an HPU for each worker
    scaling_config = ScalingConfig(num_workers=1, resources_per_worker={"CPU": 1, "HPU": 1})
    # Set backend to hccl in TorchConfig
    torch_config = TorchConfig(backend = "hccl")
    # Initialize a Ray TorchTrainer
    trainer = TorchTrainer(
        train_loop_per_worker=main,
		train_loop_config={"args": parse_args()},
        torch_config=torch_config,
        scaling_config=scaling_config,
    )

    result = trainer.fit()

One last thing: remember to use absolute path in the command line arguments. The reason is similar to why we move `parse_args` out, Ray's workers do not share the current working directory. Now, you can run the fine-tuning of Stable Diffusion on Ray! Below is an example output:

In [5]:
!python ~/optimum-habana/examples/stable-diffusion/training/textual_inversion.py \
  --pretrained_model_name_or_path runwayml/stable-diffusion-v1-5 \
  --train_data_dir "/root/cat" \
  --learnable_property object \
  --placeholder_token "<cat-toy>" \
  --initializer_token toy \
  --resolution 512 \
  --train_batch_size 4 \
  --max_train_steps 3000 \
  --learning_rate 5.0e-04 \
  --scale_lr \
  --lr_scheduler constant \
  --lr_warmup_steps 0 \
  --output_dir /tmp/textual_inversion_cat \
  --save_as_full_pipeline \
  --gaudi_config_name Habana/stable-diffusion \
  --throughput_warmup_steps 3

  torch.utils._pytree._register_pytree_node(
  torch.utils._pytree._register_pytree_node(
  torch.utils._pytree._register_pytree_node(
2024-05-09 05:21:02,195	INFO worker.py:1564 -- Connecting to existing Ray cluster at address: 172.17.0.2:6379...
2024-05-09 05:21:02,200	INFO worker.py:1740 -- Connected to Ray cluster. View the dashboard at [1m[32m127.0.0.1:8265 [39m[22m

View detailed results here: /root/ray_results/TorchTrainer_2024-05-09_05-21-02
To visualize your results with TensorBoard, run: `tensorboard --logdir /tmp/ray/session_2024-05-09_05-19-45_397857_439/artifacts/2024-05-09_05-21-02/TorchTrainer_2024-05-09_05-21-02/driver_artifacts`
[36m(TrainTrainable pid=15530)[0m   torch.utils._pytree._register_pytree_node(

Training started with configuration:
╭───────────────────────────────────────────────╮
│ Training config                               │
├───────────────────────────────────────────────┤
│ train_loop_config/args   ...t_warmup_steps=3) │
╰──────────────────────

One of the advantages of Ray is that it scales easily. In this example, we can easily scale the training to multiple workers by changing `num_workers` in `ScalingConfig`. Torch distributed environment will be automatically initialized in Ray.

## Serve the fine-tuned model on Ray
Now that we have fine-tuned a Stable Diffusion model, we can serve it for image generation. The code below loads the fine-tuned model and generates an image.

In [None]:
import torch
from optimum.habana.diffusers import GaudiStableDiffusionPipeline
model_id = "/tmp/textual_inversion_cat/"
pipe = GaudiStableDiffusionPipeline.from_pretrained(
  model_id,
  torch_dtype=torch.bfloat16,
  use_habana=True,
  use_hpu_graphs=True,
  gaudi_config="Habana/stable-diffusion",
)
prompt = "A <cat-toy> backpack"
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
image.save("cat-backpack.png")

We can easily use Ray Serve to deploy it as an HTTP service. The code below is modified from this [example](https://docs.ray.io/en/master/serve/tutorials/stable-diffusion.html). Save it to `gaudi_sd_deploy.py`, and use `serve run gaudi_sd_deploy:entrypoint` to start the Serve application.

In [None]:
import torch
from optimum.habana.diffusers import GaudiStableDiffusionPipeline
from io import BytesIO
from fastapi import FastAPI
from fastapi.responses import Response

from ray import serve
from ray.serve.handle import DeploymentHandle


app = FastAPI()


@serve.deployment(num_replicas=1)
@serve.ingress(app)
class APIIngress:
    def __init__(self, diffusion_model_handle: DeploymentHandle) -> None:
        self.handle = diffusion_model_handle

    @app.get(
        "/imagine",
        responses={200: {"content": {"image/png": {}}}},
        response_class=Response,
    )
    async def generate(self, prompt: str, img_size: int = 512):
        assert len(prompt), "prompt parameter cannot be empty"

        image = await self.handle.generate.remote(prompt, img_size=img_size)
        file_stream = BytesIO()
        image.save(file_stream, "PNG")
        return Response(content=file_stream.getvalue(), media_type="image/png")


@serve.deployment(
    ray_actor_options={"resources": {"HPU": 1}}
)
class GaudiStableDiffusion:
    def __init__(self, model_id):
        self.pipe = GaudiStableDiffusionPipeline.from_pretrained(
            model_id,
            torch_dtype=torch.bfloat16,
            use_habana=True,
            use_hpu_graphs=True,
            gaudi_config="Habana/stable-diffusion",
        )

    def generate(self, prompt: str, img_size: int = 512):
        assert len(prompt), "prompt parameter cannot be empty"

        image = self.pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
		return image


entrypoint = APIIngress.bind(GaudiStableDiffusion.bind("/tmp/textual_inversion_cat/"))

After you successfully deployed this Serve application, run the code below to generate an image.

In [None]:
import requests

prompt = "a <cat-toy> is dancing on the grass."
input = "%20".join(prompt.split(" "))
resp = requests.get(f"http://127.0.0.1:8000/imagine?prompt={input}")
with open("output.png", 'wb') as f:
    f.write(resp.content)

Here is an example image: ![output](link to image)