# Fine-tuning of Stable Diffusion models

A notebook for training Stable Diffusion models using Dreambooth or Low-rank Adaptation (LoRA) approaches.

Tested with [Stable Diffusion v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) and [Stable Diffusion v2-base](https://huggingface.co/stabilityai/stable-diffusion-2-base).

Stay up-to-date on [Github](https://github.com/brian6091/Dreambooth), and leave an [issue](https://github.com/brian6091/Dreambooth/issues) if you run into problems.

[![Brian6091's GitHub stats](https://github-readme-stats.vercel.app/api?username=brian6091&hide=contribs&theme=onedark&show_icons=true)](https://github.com/brian6091/Dreambooth)

[<a href="https://www.buymeacoffee.com/jvsurfsqv" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" height="40px" width="140px" alt="Buy Me A Coffee"></a>](https://www.buymeacoffee.com/jvsurfsqv)

# Install dependencies (takes about 1 minute)

In [None]:
%%capture
# !cd content/
# !git clone https://github.com/brian6091/Dreambooth --branch main --single-branch
# !pip install -r "Dreambooth/requirements.txt"
# !pip install -U --pre triton
# !pip install torchinfo

# !git clone https://github.com/brian6091/lora --branch v0.0.5 --single-branch
# !python -m pip install content/lora/

In [None]:
#@title xformers
#%%capture

# !nvidia-smi -L

# Tested with Tesla T4 and A100 GPUs
# !pip install xformers==0.0.16rc425
# May complain about some incompatibilities, which are resolved by upgrading the following:
#!pip install -U --pre torchvision
#!pip install -U --pre torchtext
#!pip install -U --pre torchaudio

# Which model to train from?

In [1]:
#@title ## Name or path to initial model and VAE
#@markdown Obligatory (e.g., runwayml/stable-diffusion-v1-5, stabilityai/stable-diffusion-2-base, or full path to model in diffusers format)
MODEL_NAME_OR_PATH = "runwayml/stable-diffusion-v1-5" #@param {type:"string"}

#@markdown Optional (e.g., stabilityai/sd-vae-ft-mse), leaving empty will default to VAE packaged with the model
VAE_NAME_OR_PATH = "" #@param {type:"string"}
#if VAE_NAME_OR_PATH=="":
#  VAE_NAME_OR_PATH = None

#@markdown (Not yet implemented), leaving empty will default to text encoder packaged with the model
TEXT_ENCODER_NAME_OR_PATH = "" #@param {type:"string"}

In [2]:
#@title ## Hugging Face 🤗 credentials

#@markdown If initiating training from official stable diffusion checkpoints (e.g., [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)), you must accept the license before using the model. You'll need a [🤗 Hugging Face](https://huggingface.co/) account to do so, after which you can [generate a login token](https://huggingface.co/settings/tokens) and paste it here.
from huggingface_hub import login

HUGGINGFACE_TOKEN = "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" #@param {type:"string"}
login(HUGGINGFACE_TOKEN)

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /home/saemeechoi/.cache/huggingface/token
Login successful


In [3]:
# #@title ## Mount Google Drive if initial model stored there (or you want to direct outputs there)
# from google.colab import drive
# drive.mount('/content/gdrive')

# Set up experiment parameters

In [4]:
#@title ## Training parameters

import os
from IPython.display import Markdown as md

#@markdown Unique token for specific subject
INSTANCE_TOKEN= "raretoken" #@param{type: 'string'}

#@markdown Use image captions? Captions can be either the image filename, or a separate text file (that must be named identically to the image but w/ extension .txt). If a separate .txt file exists, filename is ignored.
USE_IMAGE_CAPTIONS = False #@param {type:"boolean"}
USE_IMAGE_CAPTIONS_FLAG = ""
if USE_IMAGE_CAPTIONS:
  USE_IMAGE_CAPTIONS_FLAG='--use_image_captions'

#@markdown Path to instance images. Filenames are irrelevant, unless images are captioned *and* captions are not separate textfiles, in which case INSTANCE_TOKEN should appear in relevant filenames as part of the caption.
INSTANCE_DIR="content/InstanceImages" #@param{type: 'string'}

RESOLUTION = 512 #@param{type: 'number'}

TRAIN_BATCH_SIZE = 1 #@param{type: 'number'}

GRADIENT_ACCUMULATION_STEPS = 1  #@param{type: 'number'}

GRADIENT_CHECKPOINTING = True #@param {type:"boolean"}
GRADIENT_CHECKPOINTING_FLAG=""
if GRADIENT_CHECKPOINTING:
  GRADIENT_CHECKPOINTING_FLAG='--gradient_checkpointing'

ENABLE_PRIOR_PRESERVATION = True #@param {type:"boolean"}
ENABLE_PRIOR_PRESERVATION_FLAG=""
if ENABLE_PRIOR_PRESERVATION:
  ENABLE_PRIOR_PRESERVATION_FLAG='--with_prior_preservation'

#@markdown Prior loss weight. Note that if you set this to 0, but enable prior preservation and provide a CLASS_DIR, you can still monitor class loss.
PRIOR_LOSS_WEIGHT = 1.0 #@param {type:"number"}

#@markdown If using prior preservation, specify a path to class images
CLASS_DIR="content/RegularizationImages" #@param{type: 'string'}
if (CLASS_DIR !="") and os.path.exists(str(CLASS_DIR)):
  CLASS_DIR=CLASS_DIR
elif (CLASS_DIR !="") and not os.path.exists(str(CLASS_DIR)):
  CLASS_DIR=input('[1;31mThe folder specified does not exist, use the colab file explorer to copy the path :')

#@markdown Prompt for prior preservation class (e.g., 'person', 'a photo of a man', 'dog'). Ignored if USE_IMAGE_CAPTIONS checked.
CLASS_PROMPT="a photo of a person" #@param {type:"string"}
#@markdown Instance prompt, {SKS} will be automatically replaced by INSTANCE_TOKEN defined above.  Ignored if USE_IMAGE_CAPTIONS checked.
INSTANCE_PROMPT="a photo of {SKS} person" #@param {type:"string"}
INSTANCE_PROMPT=INSTANCE_PROMPT.replace("{SKS}",INSTANCE_TOKEN)

#@markdown Specify the number of class images used if prior preservation is enabled. If there are not enough images in CLASS_DIR (or CLASS_DIR is empty), additional images will be generated.
MIN_NUM_CLASS_IMAGES=1500 #@param{type: 'number'}

#@markdown Batch size for generating class images
SAMPLE_BATCH_SIZE = 1 #@param{type: 'number'}

#@markdown Number of training iterations, e.g., # instance images * 100
STEPS = 10000 #@param{type: 'number'}

#@markdown Random number generator seed
SEED = 1275017 #@param{type: 'number'}

#@markdown Enable text encoder training?
TRAIN_TEXT_ENCODER = False #@param{type: 'boolean'}
TRAIN_TEXT_ENCODER_FLAG=""
if TRAIN_TEXT_ENCODER:
  TRAIN_TEXT_ENCODER_FLAG="--train_text_encoder"

#@markdown ## ADAM optimizer settings

#@markdown Use 8-bit ADAM
USE_8BIT_ADAM = True #@param {type:"boolean"}
USE_8BIT_ADAM_FLAG=""
if USE_8BIT_ADAM:
  USE_8BIT_ADAM_FLAG='--use_8bit_adam'

#@markdown The exponential decay rate for the 1st moment estimates (the beta1 parameter for the Adam optimizer).
ADAM_BETA1 = 0.9 #@param {type:"number"}

#@markdown The exponential decay rate for the 2nd moment estimates (the beta2 parameter for the Adam optimizer).
ADAM_BETA2 = 0.999 #@param {type:"number"}

#@markdown Weight decay magnitude for the Adam optimizer.
ADAM_WEIGHT_DECAY = 1e-2 #@param {type:"number"}

#@markdown Epsilon value for the Adam optimizer.
ADAM_EPSILON = 1e-08 #@param {type:"number"}

#@markdown "fp16", "bf16", or "no" according to available VRAM
MIXED_PRECISION = "fp16" #@param{type: 'string'}

#@markdown ## Learning rate parameters
LR_SCHEDULE = "cosine" #@param ["linear", "cosine", "cosine_with_restarts", "polynomial", "constant", "constant_with_warmup"]
LR = 2e-6 #@param{type: 'number'}
#@markdown If training the text encoder, a different learning rate can be applied
LR_TEXT_ENCODER = 5e-5 #@param{type: 'number'}
LR_WARMUP_STEPS = 50 #@param{type: 'number'}
#@markdown Applies only for cosine_with_restarts schedule
LR_COSINE_NUM_CYCLES = 5 #@param{type: 'number'}

In [5]:
#@title # (Experimental) [Data augmentation](https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0197-0/)
#@markdown Transformations to apply to images (both instance and class).
#@markdown I find this useful to minimize the work of cropping and manually preparing images.
#@markdown This may be useful for certain applications, such as training a style, where there may not be a specific subject in each image.
#@markdown In this case, I don't crop images, and I enable random cropping, which presents to the network a randomly cropped (RESOLUTION X RESOLUTION) chunk of the original image selected for that iteration.
#@markdown AUGMENT_MIN_RESOLUTION allows you to adjust how much of the image you will crop. So if you are training for RESOLUTION=512, setting AUGMENT_MIN_RESOLUTION will give you two crops (on average) for the shortest image dimension.

#@markdown Resize image so that smallest dimension = AUGMENT_MIN_RESOLUTION (maintaining aspect ratio). Leave empty to skip.
AUGMENT_MIN_RESOLUTION = None #@param{type: 'number'}
AUGMENT_MIN_RESOLUTION_FLAG = ""
if AUGMENT_MIN_RESOLUTION is not None:
  AUGMENT_MIN_RESOLUTION = int(AUGMENT_MIN_RESOLUTION)
  AUGMENT_MIN_RESOLUTION_FLAG = f"--augment_min_resolution={AUGMENT_MIN_RESOLUTION}"

#@markdown If not enabled, defaults to center crop (which will do nothing if your images are already square at the RESOLUTION set above).
AUGMENT_RANDOM_CROP = False #@param{type: 'boolean'}
AUGMENT_CENTER_CROP_FLAG="--augment_center_crop"
if AUGMENT_RANDOM_CROP:
  AUGMENT_CENTER_CROP_FLAG=""

#@markdown Randomly flip image horizontally. Not recommended if asymmetry is important (e.g., faces).
AUGMENT_HFLIP = False #@param{type: 'boolean'}
AUGMENT_HFLIP_FLAG=""
if AUGMENT_HFLIP:
  AUGMENT_HFLIP_FLAG="--augment_hflip"

In [6]:
#@title # (Experimental) other training parameters

#@markdown ## [LoRA: Low-Rank Adaptation](https://arxiv.org/abs/2106.09685v2)
#@markdown Uses [clonesimo's implementation](https://github.com/cloneofsimo/lora)
USE_LORA = False #@param{type: 'boolean'}
USE_LORA_FLAG=""
if USE_LORA:
  USE_LORA_FLAG="--use_lora"

#@markdown Rank of LoRA update matrix
LORA_RANK = 4 #@param{type: 'number'}

#@markdown ## [Drop text-conditioning to improve classifier-free guidance sampling](https://arxiv.org/abs/2207.12598)

#@markdown Probability that image (applies to both instance and class images) will be selected for dropout (INSTANCE_PROMPT/CLASS_PROMPT will be replaced with UNCONDITIONAL_PROMPT)
CONDITIONING_DROPOUT_PROB = 0.0 #@param{type: 'number'}
#@markdown Defaults to an empty prompt. Unsure whether anything else would be useful.
UNCONDITIONAL_PROMPT = " " #@param{type: 'string'}

#@markdown ## Exponentially-weight moving average weights (unet only). Will not run on Tesla T4 (out of memory).
USE_EMA = False #@param{type: 'boolean'}
USE_EMA_FLAG=""
if USE_EMA:
  USE_EMA_FLAG="--use_ema"
EMA_INV_GAMMA = 1.0 #@param{type: 'number'}
EMA_POWER = 0.75 #@param{type: 'number'}
EMA_MIN_VALUE = 0 #@param{type: 'number'}
EMA_MAX_VALUE = 0.9999 #@param{type: 'number'}

In [7]:
#@title # Where should outputs get saved?

#@markdown Trained models (and intermediates) saved here
OUTPUT_DIR="content/models/" #@param{type: 'string'}

#@markdown Training logs saved here
LOGGING_DIR="content/logs/" #@param{type: 'string'}

if not os.path.exists(LOGGING_DIR):
  !mkdir -p "$LOGGING_DIR"

LOG_GPU = True #@param{type: 'boolean'}
if LOG_GPU:
  LOG_GPU_FLAG="--log_gpu"
else:
  LOG_GPU_FLAG=""


In [8]:
#@title # Setup saving of intermediate models
#@markdown To save intermediate checkpoints, set START_SAVING_FROM_STEP < STEPS

#@markdown Number of steps between intermediate saves
SAVE_CHECKPOINT_EVERY = 500 #@param{type: 'number'}
if SAVE_CHECKPOINT_EVERY==None:
  SAVE_CHECKPOINT_EVERY = STEPS+1

START_SAVING_FROM_STEP=500 #@param{type: 'number'}
if START_SAVING_FROM_STEP==None:
  START_SAVING_FROM_STEP=STEPS

#@markdown At each intermediate checkpoint, infer this many samples using SAVE_SAMPLE_PROMPT
N_SAVE_SAMPLES=2 #@param{type: 'number'}

#@markdown {SKS} is automatically replaced by INSTANCE_TOKEN. Give multiple prompts using // as a separator
SAVE_SAMPLE_PROMPT= "a photo of {SKS} // a painting of {SKS} person by Picasso" #@param{type: 'string'}
if SAVE_SAMPLE_PROMPT=="":
  SAVE_SAMPLE_PROMPT=None
else:
  SAVE_SAMPLE_PROMPT=SAVE_SAMPLE_PROMPT.replace("{SKS}",INSTANCE_TOKEN)

#@markdown The negative prompt, on the other hand, applies to all SAVE_SAMPLE_PROMPTs
SAVE_SAMPLE_NEGATIVE_PROMPT="border" #@param{type: 'string'}

# Train!

In [9]:
#@title ## (optional) Tensorboard visualization of loss and learning rate
#@markdown Once the Tensorboard panel is launched (takes a good 10 seconds), click on the gear icon in upper right, and check Reload data. Then, after launching training in the next cell, click on TIME SERIES in upper left to see updates.
#%load_ext tensorboard
!rm -rf content/logs
%reload_ext tensorboard
%tensorboard --logdir $LOGGING_DIR

Reusing TensorBoard on port 6007 (pid 194067), started 0:47:47 ago. (Use '!kill 194067' to kill it.)

In [10]:
#@title ## Launch training
!lsb_release -a | grep Description
!pip freeze | grep diffusers
!pip freeze | grep lora-diffusion
!pip freeze | grep torchvision
!pip freeze | grep transformers
!pip freeze | grep xformers
!accelerate env

!accelerate launch \
    --mixed_precision=$MIXED_PRECISION \
    --num_machines=1 \
    --num_processes=1 \
    --dynamo_backend="no" \
    content/Dreambooth/train.py \
    $USE_LORA_FLAG \
    --lora_rank=$LORA_RANK \
    $TRAIN_TEXT_ENCODER_FLAG \
    --pretrained_model_name_or_path=$MODEL_NAME_OR_PATH \
    --pretrained_vae_name_or_path=$VAE_NAME_OR_PATH \
    --instance_data_dir="$INSTANCE_DIR" \
    --class_data_dir="$CLASS_DIR" \
    --output_dir="$OUTPUT_DIR" \
    --logging_dir="$LOGGING_DIR" \
    $LOG_GPU_FLAG \
    $ENABLE_PRIOR_PRESERVATION_FLAG \
    --prior_loss_weight=$PRIOR_LOSS_WEIGHT \
    --instance_prompt="$INSTANCE_PROMPT" \
    --class_prompt="$CLASS_PROMPT" \
    $USE_IMAGE_CAPTIONS_FLAG \
    --conditioning_dropout_prob=$CONDITIONING_DROPOUT_PROB \
    --unconditional_prompt="$UNCONDITIONAL_PROMPT" \
    --seed=$SEED \
    --resolution=$RESOLUTION \
    --train_batch_size=$TRAIN_BATCH_SIZE \
    --gradient_accumulation_steps=$GRADIENT_ACCUMULATION_STEPS \
    $GRADIENT_CHECKPOINTING_FLAG \
    --mixed_precision=$MIXED_PRECISION \
    $USE_8BIT_ADAM_FLAG \
    --adam_beta1=$ADAM_BETA1 \
    --adam_beta2=$ADAM_BETA2 \
    --adam_weight_decay=$ADAM_WEIGHT_DECAY \
    --adam_epsilon=$ADAM_EPSILON \
    --learning_rate=$LR \
    --learning_rate_text=$LR_TEXT_ENCODER \
    --lr_scheduler=$LR_SCHEDULE \
    --lr_warmup_steps=$LR_WARMUP_STEPS \
    --lr_cosine_num_cycles=$LR_COSINE_NUM_CYCLES \
    $USE_EMA_FLAG \
    --ema_inv_gamma=$EMA_INV_GAMMA \
    --ema_power=$EMA_POWER \
    --ema_min_value=$EMA_MIN_VALUE \
    --ema_max_value=$EMA_MAX_VALUE \
    --max_train_steps=$STEPS \
    --num_class_images=$MIN_NUM_CLASS_IMAGES \
    --sample_batch_size=$SAMPLE_BATCH_SIZE \
    --save_min_steps=$START_SAVING_FROM_STEP \
    --save_interval=$SAVE_CHECKPOINT_EVERY \
    --n_save_sample=$N_SAVE_SAMPLES \
    --save_sample_prompt="$SAVE_SAMPLE_PROMPT" \
    --save_sample_negative_prompt="$SAVE_SAMPLE_NEGATIVE_PROMPT" \
    $AUGMENT_MIN_RESOLUTION_FLAG \
    $AUGMENT_CENTER_CROP_FLAG \
    $AUGMENT_HFLIP_FLAG

No LSB modules are available.
Description:	Ubuntu 18.04.5 LTS
[0mdiffusers==0.11.1
[0mlora-diffusion @ file:///home/nas4_user/saemeechoi/course/AdvancedDeepLearning/notebooks/content/lora
[0mtorchvision==0.15.2
[0mtransformers==4.25.1
[0mxformers==0.0.23

Copy-and-paste the text below in your GitHub issue

- `Accelerate` version: 0.15.0
- Platform: Linux-4.15.0-175-generic-x86_64-with-glibc2.27
- Python version: 3.9.18
- Numpy version: 1.26.2
- PyTorch version (GPU?): 2.0.1 (True)
- `Accelerate` default config:
	Not found
    PyTorch 2.1.1+cu121 with CUDA 1201 (you have 2.0.1)
    Python  3.9.18 (you have 3.9.18)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
safety_checker/pytorch_model.fp16.safetensors not found
Fetching 29 files: 100%|██████████████████████| 29/29 [00:00<00:00, 8099.27it/s]
You have disabled

# Do inference with trained model(s)

Cells in this section can be run to generate grids of images using the trained model(s). I find this useful for probing overtraining, concept bleeding, quality, etc.

In [11]:
#@title Some imports and utility functions
import torch
from diffusers import DiffusionPipeline, StableDiffusionPipeline, DPMSolverMultistepScheduler, AutoencoderKL
from PIL import Image
import os
import json
import random
import string
from lora_diffusion import monkeypatch_lora, tune_lora_scale

device = "cuda"

def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols
    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

def get_pipeline(model_name_or_path,
                 vae_name_or_path=None,
                 text_encoder_name_or_path=None,
                 feature_extractor_name_or_path=None,
                 revision="fp16"):
    #scheduler = DPMSolverMultistepScheduler.from_pretrained(model_name_or_path, subfolder="scheduler")
    scheduler = DPMSolverMultistepScheduler(
        beta_start=0.00085,
        beta_end=0.012,
        beta_schedule="scaled_linear",
        num_train_timesteps=1000,
        trained_betas=None,
        prediction_type="epsilon",
        thresholding=False,
        algorithm_type="dpmsolver++",
        solver_type="midpoint",
        lower_order_final=True,
    )

    pipe = DiffusionPipeline.from_pretrained(
        model_name_or_path,
        custom_pipeline="lpw_stable_diffusion",
        safety_checker=None,
        revision=revision,
        scheduler=scheduler,
        vae=AutoencoderKL.from_pretrained(
            vae_name_or_path or model_name_or_path,
            subfolder=None if vae_name_or_path else "vae",
            revision=None if vae_name_or_path else revision,
            torch_dtype=torch.float16,
        ),
        feature_extractor=feature_extractor_name_or_path,
        torch_dtype=torch.float16
    ).to("cuda")

    #https://github.com/huggingface/diffusers/issues/1552
    #pipe.enable_attention_slicing()
    pipe.enable_xformers_memory_efficient_attention()
    return pipe

# Monkey patch LoRA pt files
# Returns pipeline
def get_lora_pipeline(model_dir, scale_unet=1.0, scale_text_encoder=1.0):
    # Load untrained original model
    pipe = get_pipeline(MODEL_NAME_OR_PATH, vae_name_or_path=VAE_NAME_OR_PATH)

    print('Monkey patching unet pt file')
    monkeypatch_lora(pipe.unet, torch.load(os.path.join(model_dir, "lora_unet.pt")))

    print('Monkey patching text encoder pt file')
    monkeypatch_lora(pipe.text_encoder, torch.load(os.path.join(model_dir, "lora_text_encoder.pt")), target_replace_module=["CLIPAttention"])

    tune_lora_scale(pipe.unet, scale_unet)
    tune_lora_scale(pipe.text_encoder, scale_text_encoder)

    return pipe

def get_config(filename=None,
               save_dir=None,
               prompt=None, negative_prompt=None,
               seeds=None,
               num_samples=4,
               width=512, height=512,
               inference_steps=20,
               guidance_scale=7.5,
               ):
    if filename==None:
        num_prompts = len(prompt)
        if seeds==None:
            seeds = []
            # fixed value seeds for easier comparision betwen subsequent runs/config files
            for i in range(num_samples):
                seeds.append(i * 1000000)
        else:
            num_samples = len(seeds)

        tag = ''.join(random.choice(string.ascii_letters) for _ in range(8))
        config = {
            "tag": tag,
            "prompt": prompt,
            "negative_prompt": negative_prompt,
            "num_prompts": num_prompts,
            "num_samples": num_samples,
            "seeds": seeds,
            "height": height,
            "width": width,
            "inference_steps": inference_steps,
            "guidance_scale": guidance_scale,
        }

        with open(os.path.join(save_dir, "config_"+tag+".json"), "w") as outfile:
            json.dump(config, outfile)
    else:
        f = open(filename)
        config = json.load(f)

    return config

def get_images(pipe, sample_config, device="cuda"):
    generator = torch.Generator("cuda")
    with torch.autocast(device):
        num_cfg = len(sample_config['guidance_scale'])
        # Loop in order to use defined seed for each image in a batch
        all_images = []
        for i in range(sample_config['num_samples']):
        #for _ in sample_config['num_samples']:
            for cfg in sample_config['guidance_scale']:
                # Manually generate latent
                seed = sample_config['seeds'][i]
                generator = generator.manual_seed(seed)
                latent = torch.randn(
                    (1, pipe.unet.in_channels, sample_config['height'] // 8, sample_config['width'] // 8),
                    generator = generator,
                    device = device
                )
                images = pipe(sample_config['prompt'],
                    negative_prompt=sample_config['negative_prompt'],
                    num_inference_steps=int(sample_config['inference_steps']),
                    guidance_scale=cfg,
                    latents=latent.repeat(sample_config['num_prompts'], 1, 1, 1),
                ).images
                all_images.extend(images)

    grid = image_grid(all_images, rows=num_cfg*sample_config['num_samples'], cols=sample_config['num_prompts'])
    return grid

In [12]:
#@title Specify which models to do inference with
model_list = [os.path.join(OUTPUT_DIR,'500'),
              os.path.join(OUTPUT_DIR,'1000'),
              os.path.join(OUTPUT_DIR,'2500'),
              ]

print(model_list)

['/content/models/500', '/content/models/1000', '/content/models/2500']


In [None]:
#@title Generate or load a configuration for inference

config_name = None
#config_name = os.path.join(OUTPUT_DIR, "config_ZMasiqkP.json")

if config_name is None:
    num_samples = 6
    prompt = ["photo of a cat",
              "photo of a person",
              "close-up studio portrait photo of Keanu Reeves, film, detail, studio lighting",
              "close-up studio portrait photo of {SKS} person, film, detail, studio lighting",
              "beautiful white (marble:1.1) bust of {SKS} person, highly detailed",
              "oil painting of {SKS} person on the beach",
    ]
    negative_prompt = "hands, nude, nudity, duplicate, frame, border"
    guidance_scale = [1.0, 3.0, 7.0, 15.0]

    config = get_config(save_dir=OUTPUT_DIR,
                        prompt=prompt, negative_prompt=negative_prompt,
                        num_samples=num_samples,
                        width=512, height=512,
                        inference_steps=20, guidance_scale=guidance_scale
                        )
else:
    config = get_config(filename=config_name)

config['prompt'] = [sub.replace('{SKS}', INSTANCE_TOKEN) for sub in config['prompt']]
print(config)

In [None]:
#@title Infer!

LORA_SCALE_UNET = 1.0 #@param {type:"slider", min:0.0, max:2.0}
LORA_SCALE_TENC = 1.0 #@param {type:"slider", min:0.0, max:2.0}

for model in model_list:
    print(model)
    pipe = get_pipeline(model) if not USE_LORA else get_lora_pipeline(model, scale_unet=LORA_SCALE_UNET, scale_text_encoder=LORA_SCALE_TENC)
    grid = get_images(pipe, config)
    grid.save(os.path.join(OUTPUT_DIR, "grid_"+os.path.split(model)[1]+"_"+config['tag']+".jpg"), quality=90, optimize=True)
    del pipe
    if torch.cuda.is_available():
        torch.cuda.empty_cache()

In [None]:
#@title Generate grids for base model using same config
model_name_or_path = MODEL_NAME_OR_PATH #'runwayml/stable-diffusion-v1-5'
vae_name_or_path = VAE_NAME_OR_PATH #'stabilityai/sd-vae-ft-mse'
pipe = get_pipeline(model_name_or_path, vae_name_or_path=vae_name_or_path)
grid = get_images(pipe, config)
grid.save(os.path.join(OUTPUT_DIR, "grid_"+os.path.split(model_name_or_path)[1]+"_"+config['tag']+".jpg"), quality=90, optimize=True)

del pipe
if torch.cuda.is_available():
    torch.cuda.empty_cache()

# Convert to checkpoint (ckpt) format

In [None]:
!wget -q https://raw.githubusercontent.com/huggingface/diffusers/main/scripts/convert_diffusers_to_original_stable_diffusion.py

MODEL_PATH = os.path.join(OUTPUT_DIR, '500')
CKPT_PATH = os.path.join(OUTPUT_DIR, '500.ckpt')

!python content/convert_diffusers_to_original_stable_diffusion.py \
  --model_path $MODEL_PATH \
  --checkpoint_path $CKPT_PATH \
  --half

# Close Colab instance

In [None]:
# from google.colab import runtime
# runtime.unassign()