In [None]:
!rm -rf user_uploaded_training_images
!rm -rf regularization_images

!mkdir user_uploaded_training_images
!mkdir -p regularization_images/samples

In [None]:
%pip install -r requirements.txt

To fine-tune a stable diffusion model, we need to obtain the pre-trained stable diffusion model following their instructions. You can download the `sd-v1-4.ckpt` from google drive down below

In [None]:
!gdown https://drive.google.com/uc?id=1n1vWu1EL3UveBH60bBq2HGsZR669LnZJ

We also need to create a set of images for regularization, as the fine-tuning algorithm of Dreambooth requires that. Details of the algorithm can be found in the paper. Note that in the original paper, the regularization images seem to be generated on-the-fly. However, here I generated a set of regularization images before the training. The text prompt for generating regularization images can be `photo of a <class>`, where `<class>` is a word that describes the class of your object, such as `dog`. 

More regularization images may lead to stronger regularization and better editability. After that, save the generated images (separately, one image per .png file) at `/workspace/Dreambooth-Textual-Inverstion/regularization_images/samples`.

Please try 100 or 200, to better align with the original paper.

For some cases, if the generated regularization images are highly unrealistic (happens when you want to generate "man" or "woman"), you can find a diverse set of images (of man/woman) online, and use them as regularization images.

upload your training images to the folder named `user_uploaded_training_images`

In [None]:
class_word = "<class>" #replace <class> with the type of subject that you are training. e.g "person", "waterbottle"
job_name = "<job_name>" #replace <job_name> with the name of the job. This will be used in naming generated files for better housekeeping
path_to_pretrained_ckpt = "/workspace/Dreambooth-Textual-Inverstion/sd-v1-4.ckpt" #replace this with the path to the pretrained model you downloaded from Google Drive
unique_token_name = "<sks>" #replace <sks> with something that won't clash with other subjects in the model. This is the keyword you'll use in your prompts

If you're training on pictures of a man, you can use the below repository of 1500 images of a man

In [None]:
dataset="man_unsplash"
!git clone https://github.com/djbielejeski/Stable-Diffusion-Regularization-Images-{dataset}.git

!mv -v Stable-Diffusion-Regularization-Images-{dataset}/{dataset}/*.* regularization_images/samples

In [2]:
# generate the regularization images
!python scripts/stable_txt2img.py \
    --ddim_eta 0.0 \
    --n_samples 200 \
    --n_iter 1 \
    --scale 10.0 \
    --ddim_steps 50  \
    --ckpt {path_to_pretrained_ckpt} \
    --prompt f"a photo of a {class_word}" \
    --outdir "regularization_images"

/System/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python: can't open file 'scripts/stable_txt2img.py': [Errno 2] No such file or directory


Go to `/workspace/Dreambooth-Textual-Inverstion/ldm/data/personalized.py` and replace `sks` in the line that says `photo of a sks {}` with the `unique_token_name` you set in the cell above. Something that won't clash with other subjects in the model. The final line should then say `photo of a <unique name> {}` where `<unique name>` is whatever your keyword is

In [None]:
!rm -rf user_uploaded_training_images/.ipynb_checkpoints #clear old data if any

!python main.py \
    --base "configs/stable-diffusion/v1-finetune_unfrozen.yaml" \
    --train \
    --actual_resume {path_to_pretrained_ckpt} \
    --name {job_name} \
    --data_root "/workspace/Dreambooth-Textual-Inverstion/user_uploaded_training_images" \
    --reg_data_root "/workspace/Dreambooth-Textual-Inverstion/regularization_images/samples" \
    --class_word {class_word} \

Detailed configuration can be found in configs/stable-diffusion/v1-finetune_unfrozen.yaml. In particular, the default learning rate is 1.0e-6 as I found the 1.0e-5 in the Dreambooth paper leads to poor editability. The parameter reg_weight corresponds to the weight of regularization in the Dreambooth paper, and the default is set to 1.0.

Dreambooth requires a placeholder word [V], called identifier, as in the paper. This identifier needs to be a relatively rare tokens in the vocabulary. The original paper approaches this by using a rare word in T5-XXL tokenizer. For simplicity, here I just use a random word sks and hard coded it.. If you want to change that, simply make a change in this file.

Training will be run for 800 steps, and two checkpoints will be saved at `./logs/<job_name>/checkpoints`, one at 500 steps and one at final step. Typically the one at 500 steps works well enough. I train the model use two A6000 GPUs and it takes ~15 mins.

After training, personalized samples can be obtained by running the command



In [None]:
path_to_newly_trained_ckpt = f"/workspace/Dreambooth-Textual-Inverstion/logs/{job_name}/checkpoints/last.ckpt"

!python scripts/stable_txt2img.py \
    --ddim_eta 0.0 \
    --n_samples 8 \
    --n_iter 1 \
    --scale 10.0 \
    --ddim_steps 100 \
    --ckpt {path_to_newly_trained_ckpt} \
    --prompt f"photo of a {unique_token_name} <class>" \