# Running Dreambooth Stable Diffusion on Sagemaker
- https://github.com/XavierXiao/Dreambooth-Stable-Diffusion

In [None]:
%load_ext autoreload
%autoreload 2

## Prepare environment

In [None]:
import os
from os.path import exists as path_exists

In [None]:
try:
    import torch
except ImportError:
    !pip install torch

In [None]:
if torch.cuda.is_available():
    print('CUDA available.')
    device = 'cuda'
else:
    print('CUDA not available. Please connect to a GPU instance if possible.')
    device = 'cpu'

In [None]:
!nvidia-smi

In [None]:
if not 'Dreambooth-Stable-Diffusion' in os.getcwd():
    if not path_exists('Dreambooth-Stable-Diffusion'):
        os.system("git clone https://github.com/XavierXiao/Dreambooth-Stable-Diffusion")
    os.chdir("Dreambooth-Stable-Diffusion")

## Data preparation

[@XavierXiao](https://github.com/XavierXiao/Dreambooth-Stable-Diffusion) wrote:
> To fine-tune a stable diffusion model, you need to obtain the pre-trained stable diffusion models following their instructions. Weights can be downloaded on HuggingFace. You can decide which version of checkpoint to use, but I use `sd-v1-4-full-ema.ckpt`.

> We also need to create a set of images for regularization, as the fine-tuning algorithm of Dreambooth requires that. Details of the algorithm can be found in the paper. Note that in the original paper, the regularization images seem to be generated on-the-fly. However, here I generated a set of regularization images before the training. The text prompt for generating regularization images can be photo of a `<class>`, where `<class>` is a word that describes the class of your object, such as dog. The command is:

```bash
python scripts/stable_txt2img.py --ddim_eta 0.0 --n_samples 8 --n_iter 1 --scale 10.0 --ddim_steps 50 --ckpt /path/to/original/stable-diffusion/sd-v1-4-full-ema.ckpt --prompt "a photo of a <class>"
```

In [None]:
class_word = "yerba mate"

In [None]:
ddim_eta = 0.0
n_samples = 8
n_iter = 1
scale = 10.0
ddim_steps = 50
ckpt = '/path/to/original/stable-diffusion/sd-v1-4-full-ema.ckpt'
prompt = f"a photo of a {class_word}"

In [None]:
os.system(f"python scripts/stable_txt2img.py --ddim_eta {ddim_eta} " + 
          f"--n_samples {n_samples} --n_iter {n_iter} --scale {scale} " + 
          f"--ddim_steps {ddim_steps} --ckpt {ckpt} --prompt {prompt}")

## Model training

[@XavierXiao](https://github.com/XavierXiao/Dreambooth-Stable-Diffusion) wrote:
> Training can be done by running the following command

```bash
python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml 
                -t 
                --actual_resume /path/to/original/stable-diffusion/sd-v1-4-full-ema.ckpt  
                -n <job name> 
                --gpus 0, 
                --data_root /root/to/training/images 
                --reg_data_root /root/to/regularization/images 
                --class_word <xxx>
```

> Detailed configuration can be found in `configs/stable-diffusion/v1-finetune_unfrozen.yaml`. In particular, the default learning rate is `1.0e-6` as I found the `1.0e-5` in the Dreambooth paper leads to poor editability. The parameter `reg_weight` corresponds to the weight of regularization in the Dreambooth paper, and the default is set to `1.0`.

> Dreambooth requires a placeholder word [V], called identifier, as in the paper. This identifier needs to be a relatively rare tokens in the vocabulary. The original paper approaches this by using a rare word in `T5-XXL` tokenizer. For simplicity, here I just use a random word sks and hard coded it.. If you want to change that, simply make a change in this file.

> Training will be run for 800 steps, and two checkpoints will be saved at `./logs/<job_name>/checkpoints`, one at 500 steps and one at final step. Typically the one at 500 steps works well enough. I train the model use two A6000 GPUs and it takes ~15 mins.

In [None]:
base = "configs/stable-diffusion/v1-finetune_unfrozen.yaml"
n = 'test'
gpus = 0,
data_root = "/root/to/training/images"
reg_data_root = '/path/to/regularization/images'

In [None]:
os.system(f"python main.py --base {base} -t --actual_resume {ckpt} -n {n} " + 
          f"--gpus {gpus} --data_root {data_root} --reg_data_root {reg_data_root} --class_word {class_word}")

## Image generation from saved checkpoint

In [None]:
saved_ckpt = "/path/to/saved/checkpoint/from/training"
new_prompt = f"photo of a wooden {class_word}"

In [None]:
os.system(f"python scripts/stable_txt2img.py --ddim_eta {ddim_eta} " + 
          f"--n_samples {n_samples} --n_iter {n_iter} --scale {scale} " + 
          f"--ddim_steps {ddim_steps} --ckpt {saved_ckpt} --prompt {new_prompt}")