---

üìå **This notebook has been updated in [jhj0517/finetuning-notebooks](https://github.com/jhj0517/finetuning-notebooks) repository!**

## Version : 0.0.1
---

In [1]:
#@title #(Optional) Check GPU

#@markdown To fine tune full Flux, more than 24GB VRAM is recommended.
#@markdown <br>You can check your GPU setup before start.
!nvidia-smi

Sun Jan 26 09:28:44 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA L4                      Off | 00000000:00:03.0 Off |                    0 |
| N/A   35C    P8              11W /  72W |      1MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [1]:
#@title #1. Install Dependencies
#@markdown This notebook is powered by https://github.com/ostris/ai-toolkit
!git clone https://github.com/ostris/ai-toolkit
!cd ai-toolkit && git submodule update --init --recursive && pip install -r requirements.txt

Cloning into 'ai-toolkit'...
remote: Enumerating objects: 4152, done.[K
remote: Counting objects: 100% (2565/2565), done.[K
remote: Compressing objects: 100% (301/301), done.[K
remote: Total 4152 (delta 2441), reused 2290 (delta 2263), pack-reused 1587 (from 3)[K
Receiving objects: 100% (4152/4152), 29.84 MiB | 29.66 MiB/s, done.
Resolving deltas: 100% (3134/3134), done.
Submodule 'repositories/batch_annotator' (https://github.com/ostris/batch-annotator) registered for path 'repositories/batch_annotator'
Submodule 'repositories/ipadapter' (https://github.com/tencent-ailab/IP-Adapter.git) registered for path 'repositories/ipadapter'
Submodule 'repositories/leco' (https://github.com/p1atdev/LECO) registered for path 'repositories/leco'
Submodule 'repositories/sd-scripts' (https://github.com/kohya-ss/sd-scripts.git) registered for path 'repositories/sd-scripts'
Cloning into '/content/ai-toolkit/repositories/batch_annotator'...
Cloning into '/content/ai-toolkit/repositories/ipadapter'.

In [None]:
#@title # 2. (Optional) Mount Google Drive

#@markdown It's not mandatory but it's recommended to mount to Google Drive and use the Google Drive's path for your training image dataset.

#@markdown The dataset should have following structure:

#@markdown Each image file should have a corresponding text file (`.txt`) with the same name.
#@markdown The text file contains prompts associated with the image.

#@markdown ### Example File Structure:
#@markdown ```
#@markdown your-dataset/
#@markdown ‚îú‚îÄ‚îÄ a (1).png         # Image file
#@markdown ‚îú‚îÄ‚îÄ a (1).txt         # Corresponding prompt for a (1).png
#@markdown ‚îú‚îÄ‚îÄ a (2).png         # Another image file
#@markdown ‚îú‚îÄ‚îÄ a (2).txt         # Corresponding prompt for a (2).png
#@markdown ```

from google.colab import drive
import os
drive.mount('/content/drive')

In [3]:
#@title # 3. (Optional) Register Huggingface Token To Download Base Model

#@markdown If you don't have entire base model files ([black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)) in the drive you need to sign in to Huggingface to download the model.

#@markdown Get your tokens from https://huggingface.co/settings/tokens, and register it in colab's seceret as **`HF_TOKEN`** and use it in any notebook. ( 'Read' permission is enough )

#@markdown To register secrets in colab, click on the key-shaped icon in the left panel and enter your **`HF_TOKEN`** like this:

#@markdown ![image](https://media.githubusercontent.com/media/jhj0517/finetuning-notebooks/master/docs/screenshots/colab_secrets.png)

import getpass
import os
from google.colab import userdata

hf_token = userdata.get('HF_TOKEN')
os.environ['HF_TOKEN'] = hf_token

print("HF_TOKEN environment variable has been set.")

HF_TOKEN environment variable has been set.


In [8]:
#@title # 4. Train with Parameters
import os
import sys
sys.path.append('/content/ai-toolkit')
from toolkit.job import run_job
from collections import OrderedDict
from PIL import Image
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

#@markdown ## Paths Configuration

#@markdown Set your dataset path and output path for lora here.
DATASET_DIR = "/content/drive/MyDrive/finetuning-notebooks/dataset/dog" # @param {type:"string"}
OUTPUT_DIR = '/content/drive/MyDrive/finetuning-notebooks/flux/outputs'  # @param {type:"string"}
MODEL_NAME = 'Your-Finetuned-Flux-v1'  # @param {type:"string"}
os.makedirs(OUTPUT_DIR, exist_ok=True)

#@markdown ## Base Model Configuration
#@markdown If you'll just use the default repo id here then you need to register huggingface token in the previous section
repo_id_or_path = 'black-forest-labs/FLUX.1-dev' # @param {type:"string"}
is_flux = True
quantize_te = True # @param {type:"boolean"}
 # only train the transformer blocks
only_if_contains = ["transformer.transformer_blocks.", "transformer.single_transformer_blocks."]

#@markdown ## Process Settings
#@markdown (max_step_saves_to_keep = how many checkpoints to keep during training. )
save_dtype = "bf16 " # @param {type:"string"}
save_every = 250 # @param {type:"number"}
max_step_saves_to_keep = 3 # @param {type:"number"}
#@markdown Whenever `sample_every` step, it will make samples to the output directory with prompts below to benchmark your result.
#@markdown <br>Below is the example with the trigger word "A teddy dog". The "trigger word" thing is not necessary.
# uncomment this to skip the pre training sample
skip_first_sample = True  # @param {type:"boolean"}
# uncomment to completely disable sampling
disable_sampling = False # @param {type:"boolean"}
sample_every = 250 # @param {type:"number"}
sample_seed = 77 # @param {type: "number"}
sample_steps = 20 # @param {type: "number"}
sample_prompt_1 = "A sks dog is looking above in bedroom" # @param {type: "string"}
sample_prompt_2 = "A sks dog is playing with balls on the grass in sunny day" # @param {type: "string"}
sample_prompt_3 = "A sks dog is sleeping on the caution, the room is dark without light at night" # @param {type: "string"}
### Add `sample_prompts` as much as you need  ###
sample_prompts = [sample_prompt_1, sample_prompt_2, sample_prompt_3]

performance_log_every = 1000 # @param {type:"number"}

#@markdown ## Dataset Settings
caption_ext = "txt" # @param {type:"string"}
caption_dropout_rate = 0.05 # @param {type:"number"}
shuffle_tokens = False # @param {type:"boolean"}
cache_latents_to_disk = True # @param {type:"boolean"}
resolution = [512] # @param {type:"raw"}

#@markdown ## Training Settings
batch_size = 1 # @param {type:"number"}
# IMPORTANT! For Flex, you must bypass the guidance embedder during training
bypass_guidance_embedding = True  # @param {type:"boolean"}
timestep_type = 'sigmoid'  # @param ['sigmoid', 'linear', 'lognorm_blend']

# Recommended range is 500 ~ 4000
steps = 1000 # @param {type:"number"}
gradient_accumulation = 1 # @param {type:"number"}
train_dtype = "bf16" # @param {type:"string"}
lr = 3e-5 # @param {type:"number"}
train_unet = True # @param {type:"boolean"}
train_text_encoder = False # @param {type:"boolean"}
gradient_checkpointing = True # @param {type:"boolean"}
noise_scheduler = 'flowmatch' # @param {type:"string"}
optimizer = 'adafactor' # @param {type:"string"}
# Paramiter swapping can reduce vram requirements. Set factor from 1.0 to 0.0.
# 0.1 is 10% of paramiters active at easc step. Only works with adafactor
do_paramiter_swapping = True # @param {type:"boolean"}
paramiter_swapping_factor = 0.5 # @param {type:"number"}
# ema settings
# ema will smooth out learning, but could slow it down. Recommended to leave on if you have the vram
use_ema = False # @param {type:"boolean"}
ema_decay = 0.99 # @param {type:"number"}

# Training
job_to_run = OrderedDict([
    ('job', 'extension'),
    ('config', OrderedDict([
        # this name will be the folder and filename name
        ('name', MODEL_NAME),
        ('process', [
            OrderedDict([
                ('type', 'sd_trainer'),
                ('training_folder', OUTPUT_DIR),
                ('performance_log_every', 1000),
                ('device', 'cuda:0'),
                ('save', OrderedDict([
                    ('dtype', save_dtype),
                    ('save_every', save_every),
                    ('max_step_saves_to_keep', max_step_saves_to_keep),
                    ('save_format', 'diffusers'),
                ])),
                ('datasets', [
                    OrderedDict([
                        ('folder_path', DATASET_DIR),
                        ('caption_ext', caption_ext),
                        ('caption_dropout_rate', caption_dropout_rate),
                        ('shuffle_tokens', shuffle_tokens),
                        ('cache_latents_to_disk', cache_latents_to_disk),
                        ('resolution', resolution)
                    ])
                ]),
                ('train', OrderedDict([
                    ('batch_size', batch_size),
                    ('bypass_guidance_embedding', bypass_guidance_embedding),
                    ('timestep_type', timestep_type),
                    ('steps', steps),
                    ('gradient_accumulation', gradient_accumulation),
                    ('train_unet', train_unet),
                    ('train_text_encoder', train_text_encoder),
                    ('gradient_checkpointing', gradient_checkpointing),
                    ('noise_scheduler', noise_scheduler),
                    ('optimizer', optimizer),
                    ('lr', lr),
                    ('ema_config', OrderedDict([
                        ('use_ema', use_ema),
                        ('ema_decay', ema_decay)
                    ])),
                    ('dtype', train_dtype),
                    ("do_paramiter_swapping", do_paramiter_swapping),
                    ("paramiter_swapping_factor", paramiter_swapping_factor),
                ])),
                ('model', OrderedDict([
                    ('name_or_path', repo_id_or_path),
                    ('is_flux', True),
                    ('quantize_te', quantize_te),
                    ('only_if_contains', only_if_contains)
                ])),
                ('sample', OrderedDict([
                    ('sampler', 'flowmatch'),
                    ('sample_every', sample_every),
                    ('width', 1024),
                    ('height', 1024),
                    ('prompts', sample_prompts),
                    ('neg', ''),
                    ('seed', sample_seed),
                    ('walk_seed', True),
                    ('guidance_scale', 4),
                    ('sample_steps', sample_steps),
                    ('skip_first_sample', skip_first_sample),
                    ('disable_sampling', disable_sampling)
                ]))
            ])
        ])
    ])),
    ('meta', OrderedDict([
        ('name', '[name]'),
        ('version', '1.0')
    ]))
])

run_job(job_to_run)


{
    "type": "sd_trainer",
    "training_folder": "/content/drive/MyDrive/finetuning-notebooks/flux/outputs",
    "performance_log_every": 1000,
    "device": "cuda:0",
    "save": {
        "dtype": "bf16 ",
        "save_every": 250,
        "max_step_saves_to_keep": 3,
        "save_format": "diffusers"
    },
    "datasets": [
        {
            "folder_path": "/content/drive/MyDrive/finetuning-notebooks/dataset/dog",
            "caption_ext": "txt",
            "caption_dropout_rate": 0.05,
            "shuffle_tokens": false,
            "cache_latents_to_disk": true,
            "resolution": [
                512
            ]
        }
    ],
    "train": {
        "batch_size": 1,
        "bypass_guidance_embedding": true,
        "timestep_type": "sigmoid",
        "steps": 1000,
        "gradient_accumulation": 1,
        "train_unet": true,
        "train_text_encoder": false,
        "gradient_checkpointing": true,
        "noise_scheduler": "flowmatch",
        "opt

Your-Finetuned-Flux-v1:   0%|          | 0/1000 [00:29<?, ?it/s]


Loading vae
Loading t5


Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Quantizing T5
Loading clip
making pipe
preparing
Found 1140 trainable parameter in unet
Total training paramiters: 11,837,283,328
Dataset: /content/drive/MyDrive/finetuning-notebooks/dataset/dog
  -  Preprocessing image dimensions


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5/5 [00:00<00:00, 17955.07it/s]

  -  Found 5 images
Bucket sizes for /content/drive/MyDrive/finetuning-notebooks/dataset/dog:
448x512: 1 files
512x512: 4 files
2 buckets made
Caching latents for /content/drive/MyDrive/finetuning-notebooks/dataset/dog
 - Saving latents to disk



Caching latents to disk: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5/5 [00:00<00:00, 2611.00it/s]


Generating baseline samples before training


Your-Finetuned-Flux-v1:   0%|          | 0/1000 [00:00<?, ?it/s]

OutOfMemoryError: CUDA out of memory. Tried to allocate 216.00 MiB. GPU 0 has a total capacity of 39.56 GiB of which 112.81 MiB is free. Process 40981 has 39.45 GiB memory in use. Of the allocated memory 38.69 GiB is allocated by PyTorch, and 252.18 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)