<a href="https://colab.research.google.com/github/ritwikraha/LoRA-SDXL-FineTuning/blob/main/notebooks/Image_maker_LoRA_w_LLava.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Image Style Creator with SDXL LoRA Adapter



### Setup and Installations

NOTE: Make sure to install `diffusers` from `main` .Download diffusers SDXL DreamBooth training script.

In [None]:
# Check the GPU
!nvidia-smi

In [None]:
# Install dependencies.
!pip install bitsandbytes transformers accelerate peft -q
!pip install git+https://github.com/huggingface/diffusers.git -q
!pip install datasets -q
!wget https://raw.githubusercontent.com/huggingface/diffusers/main/examples/dreambooth/train_dreambooth_lora_sdxl.py

In [None]:
import gc
import os
from google.colab import files
from huggingface_hub import snapshot_download
from PIL import Image
import glob
import requests
from transformers import AutoProcessor, LlavaForConditionalGeneration
import torch
import json
import locale
from huggingface_hub import whoami
from pathlib import Path

from train_dreambooth_lora_sdxl import save_model_card
from huggingface_hub import upload_folder, create_repo
from huggingface_hub import upload_file
from diffusers import DiffusionPipeline, AutoencoderKL

from IPython.display import display, Markdown

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"

In [None]:
concept_name= "comics"

## Dataset

We will be uploading custom images and using the LLava processor to caption them.

In [None]:
# pick a name for the image folder
local_dir = "./"+concept_name+"/"
os.makedirs(local_dir)
os.chdir(local_dir)

# choose and upload local images into the newly created directory
uploaded_images = files.upload()
os.chdir("/content") # back to parent directory

Preview the images:

In [None]:
def image_grid(imgs, rows, cols, resize=256):

    if resize is not None:
        imgs = [img.resize((resize, resize)) for img in imgs]
    w, h = imgs[0].size
    grid = Image.new("RGB", size=(cols * w, rows * h))
    grid_w, grid_h = grid.size

    for i, img in enumerate(imgs):
        grid.paste(img, box=(i % cols * w, i // cols * h))
    return grid

In [None]:
# change path to display images from your local dir
img_paths = "./"+concept_name+"/*.png"
imgs = [Image.open(path) for path in glob.glob(img_paths)]

num_imgs_to_preview = 10
image_grid(imgs[:num_imgs_to_preview], 1, num_imgs_to_preview)

### Generate custom captions with LLava
Load LLava to auto caption your images:

In [None]:
model_id = "llava-hf/llava-1.5-7b-hf"

llava_model = LlavaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
    load_in_4bit=True
)
llava_processor = AutoProcessor.from_pretrained(model_id)

# captioning utility
def caption_images(prompt, input_image):
    inputs = llava_processor(prompt, images=input_image, return_tensors="pt").to("cuda", torch.float16)
    # pixel_values = inputs.pixel_values

    generated_ids = llava_model.generate(**inputs, max_length=60, do_sample=False)
    generated_caption = llava_processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

    # Split the caption by line breaks
    lines = generated_caption.splitlines()
    for line in lines:
      if line.startswith("ASSISTANT:"):
        # Extract text after ": " (without using split and indexing)
        assistant_line = line.partition(": ")[2]
        if assistant_line.startswith("In the image,"):
          # Extract text after "In the image," (without using split and indexing)
          in_the_image_line = assistant_line.partition("In the image,")[2]
          # print(in_the_image_line)
          break  # Exit the loop after finding the first assistant line
    return in_the_image_line

In [None]:
# create a list of (Pil.Image, path) pairs
local_dir = "./"+concept_name+"/"
imgs_and_paths = [(path,Image.open(path)) for path in glob.glob(f"{local_dir}*.png")]

Now let's add the concept token identifier (e.g. TOK) to each caption using a caption prefix.
Change the prefix to the concept you are training on
- for this example we can use "a photo of TOK," other options include:
    - For styles - "In the style of TOK"
    - For faces - "photo of a TOK person"
- You can add additional identifiers to the prefix that can help steer the model in the right direction.
-- e.g. for this example, instead of "a photo of TOK" we can use "a photo of TOK comics" / "In the style of TOK comics"

In [None]:
caption_prefix = "in the style of TOK "+concept_name+", "
prompt_to_captioner = "USER: <image>\nWhat is happening in this image?\nASSISTANT:"
with open(f'{local_dir}metadata.jsonl', 'w') as outfile:
  for img in imgs_and_paths:
      caption = caption_prefix + caption_images(prompt_to_captioner,img[1]).split("\n")[0]
      entry = {"file_name":img[0].split("/")[-1], "prompt": caption}
      json.dump(entry, outfile)
      outfile.write('\n')

Free some memory for the GPU

In [None]:
# delete the BLIP pipelines and free up some memory
del llava_processor, llava_model
gc.collect()
torch.cuda.empty_cache()

## Prepare for Training

Initialize `accelerate`:

In [None]:
locale.getpreferredencoding = lambda: "UTF-8"
!accelerate config default

### Log into your Hugging Face account
Pass [your **write** access token](https://huggingface.co/settings/tokens) so that we can push the trained checkpoints to the Hugging Face Hub:

In [None]:
from huggingface_hub import notebook_login
notebook_login()

## Train!

#### Set Hyperparameters
To ensure we can DreamBooth with LoRA on a heavy pipeline like Stable Diffusion XL, we're using:

* Gradient checkpointing (`--gradient_accumulation_steps`)
* 8-bit Adam (`--use_8bit_adam`)
* Mixed-precision training (`--mixed-precision="fp16"`)

### Launch training

To allow for custom captions we need to install the `datasets` library, you can skip that if you want to train solely
 with `--instance_prompt`.
In that case, specify `--instance_data_dir` instead of `--dataset_name`

 - Use `--output_dir` to specify your LoRA model repository name!
 - Use `--caption_column` to specify name of the cpation column in your dataset. In this example we used "prompt" to
 save our captions in the
 metadata file, change this according to your needs.

In [None]:
#!/usr/bin/env bash
!accelerate launch train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
  --pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" \
  --dataset_name="comics" \
  --output_dir="comics_style_LoRA" \
  --caption_column="prompt"\
  --mixed_precision="fp16" \
  --instance_prompt="an illustration in the style of TOK comics" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=3 \
  --gradient_checkpointing \
  --learning_rate=1e-4 \
  --snr_gamma=5.0 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --mixed_precision="fp16" \
  --use_8bit_adam \
  --max_train_steps=500 \
  --checkpointing_steps=717 \
  --seed="97"

### Save the model to the hub and check it out

NOTE: make sure the `output_dir` you specify here is the same as the one used for training

Sometimes training finishes succesfuly (i.e. a **.safetensores** file with the LoRA weights saved properly to your local `output_dir`) but there's not enough RAM in the free tier to push the model to the hub 🙁
To mitigate this, run this cell with your training arguments to make sure your model is uploaded! 🤗


In [None]:
output_dir = concept_name+"_style_LoRA"
username = whoami(token=Path("/root/.cache/huggingface/"))["name"]
repo_id = f"{username}/{output_dir}"

In [None]:
# push to the hub🔥
repo_id = create_repo(repo_id, exist_ok=True).repo_id

In [None]:
print(repo_id)

In [None]:
token=""

In [None]:
# change the params below according to your training arguments
save_model_card(
    repo_id = repo_id,
    images=[],
    use_dora=False,
    base_model="stabilityai/stable-diffusion-xl-base-1.0",
    train_text_encoder=False,
    instance_prompt="a photo in the style of TOK"+concept_name,
    validation_prompt=None,
    repo_folder=output_dir,
    vae_path="madebyollin/sdxl-vae-fp16-fix",
)

In [None]:
upload_folder(
    repo_id=repo_id,
    token=token,
    folder_path=output_dir,
    commit_message="End of training",
    ignore_patterns=["step_*", "epoch_*"],
)

In [None]:
link_to_model = f"https://huggingface.co/{repo_id}"
display(Markdown("### Your model has finished training.\nAccess it here: {}".format(link_to_model)))

## Inference

In [None]:
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    vae=vae,
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True
)
pipe.load_lora_weights(repo_id)
_ = pipe.to("cuda")

In [None]:
prompt = "some generic but specialized prompt, 8k"

image = pipe(prompt=prompt, num_inference_steps=50).images[0]
image