<a href="https://colab.research.google.com/github/p3bozuric/headshot_generator/blob/main/headshot_generator_finetuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# This is no-code tuning for headshot generator AI model.

This notebook is build to be run in Google Colab.

A100 GPU would be optimal for running this training. Depending on settings you set it will take a couple of hours. Make sure you keep this session running.

## Giving access to Google Drive

Manual approval to Google is mandatory after running next cell.




In [1]:
from google.colab import drive

drive.mount('/content/drive')

Mounted at /content/drive


## Preparing AI toolkit environment

In [2]:
!git clone https://github.com/ostris/ai-toolkit.git
!cd /content/ai-toolkit && git checkout 86b5938cf35bde7b1eab33a001515433b01a7b63
!cd /content/ai-toolkit && git submodule update --init --recursive
!cd /content/ai-toolkit && pip install -r requirements.txt
!pip install optimum-quanto==0.2.4
!pip install -U --no-cache-dir timm

Cloning into 'ai-toolkit'...
remote: Enumerating objects: 3911, done.[K
remote: Counting objects: 100% (3910/3910), done.[K
remote: Compressing objects: 100% (974/974), done.[K
remote: Total 3911 (delta 2987), reused 3725 (delta 2848), pack-reused 1 (from 1)[K
Receiving objects: 100% (3911/3911), 29.65 MiB | 26.96 MiB/s, done.
Resolving deltas: 100% (2987/2987), done.
Note: switching to '86b5938cf35bde7b1eab33a001515433b01a7b63'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 86b5938 Fixed the webp bug fin

## Huggingface token preparation (HF_TOKEN)

How to get & prepare HF_TOKEN:
1. Log in to huggingface
2. Create a token here: https://huggingface.co/settings/tokens
3. You need to click on the key icon to the left and place your token there under the name 'HF_TOKEN'.

In [3]:
import os
from google.colab import userdata

hf_token = userdata.get('HF_TOKEN')

# Set the environment variable
os.environ['HF_TOKEN'] = hf_token

print("HF_TOKEN environment variable has been set.")

HF_TOKEN environment variable has been set.


## Importing packages


In [4]:
import os
import sys
sys.path.append('/content/ai-toolkit')
from toolkit.job import run_job
from collections import OrderedDict
from PIL import Image
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

## Dataset preparation

1. Dataset needs to be in folder **flux_dataset** which is inside of output directory you set in next cell.

2. Dataset needs to have 20-30 images of your face from different angles and backgrounds in various situations and face expressions.

3. Images should have names like: image001.jpg

4. Corresponding .txt files explaining the content of those images should be named like so: image001.txt

Keep in mind that .txt files are not necessary, but fine tuning will be better with written content of images.

# Hyperparameter setup

Fill out the form and run the cell.


In [5]:
#@markdown ---
#@markdown ## **Project Configuration**
#@markdown Title of your project
project_name = 'professional_headshot_generator' # @param {type:"string"}
#@markdown Model you'll be fine-tuning. This code is optimized for FLUX.1-dev.
model_name = 'black-forest-labs/FLUX.1-dev' # @param ["black-forest-labs/FLUX.1-dev"]
#@markdown Where you want your model to be saved at. Keep in mind "/content/drive/MyDrive/" should be a constant.
output_dir = '/content/drive/MyDrive/headshot-generator' # @param {type:"string"}
#@markdown Instance prompt is a unique code by which you refer to yourself when prompting the generator in the inference.
instance_prompt = "pa3k" # @param {type:"string"}

#@markdown ---
#@markdown ## **Training Configuration**
#@markdown These parameters control the training process of your model.

#@markdown Number of images processed in one iteration.
batch_size = 2 # @param {type:"integer"}

#@markdown Total number of training iterations. 1000-4000 is a good range.
total_steps = 2000 # @param {type:"integer"}

#@markdown Rate at which the model learns. Higher values may lead to faster learning but potential instability.
learning_rate = 1e-4 # @param {type:"number"}

#@markdown Image resolutions to use during training. FLUX model benefits from multiple resolutions.
resolution = [512, 768, 1024] # @param {type:"raw"}

#@markdown Whether to train the U-Net part of the model. Usually kept True.
train_unet = True # @param {type:"boolean"}

#@markdown Whether to train the text encoder. Usually False for FLUX models.
train_text_encoder = False # @param {type:"boolean"}

#@markdown Use 8-bit Adam optimizer for reduced memory usage. Recommended if your GPU supports it.
use_8bit_adam = True # @param {type:"boolean"}

#@markdown Saves memory by doing forward/backward passes in chunks. Needed unless you have a lot of VRAM.
use_gradient_checkpointing = True # @param {type:"boolean"}

#@markdown Use Exponential Moving Average for more stable training. Recommended to leave on.
use_ema = True # @param {type:"boolean"}

#@markdown EMA decay rate. Higher values give more weight to recent iterations. Best to leave at 0.99.
ema_decay = 0.99 # @param {type:"number"}

#@markdown Use bfloat16 precision. Speeds up training if your GPU supports it.
use_bf16 = True # @param {type:"boolean"}

#@markdown ---
#@markdown ## **Sampling Configuration**
#@markdown These settings control the generation of test images during the training process.

#@markdown Generate sample images every this many steps.
sample_every = 500 # @param {type:"integer"}

#@markdown Width of the generated sample images.
sample_width = 1024 # @param {type:"integer"}

#@markdown Height of the generated sample images.
sample_height = 1024 # @param {type:"integer"}

#@markdown How closely the image adheres to the prompt. Higher values = closer adherence.
guidance_scale = 4 # @param {type:"number"}

#@markdown Number of denoising steps in image generation. More steps = potentially higher quality but slower.
sample_steps = 40 # @param {type:"integer"}

#@markdown ---
#@markdown ## **Advanced Configuration**
#@markdown These are additional parameters for fine-tuning the training process.

#@markdown Log performance stats in the terminal every N steps
performance_log_every = 500 # @param {type:"integer"}

#@markdown Device to use for training
device = 'cuda:0' # @param {type:"string"}

#@markdown Linear layer rank for LoRA
lora_rank = 16 # @param {type:"integer"}

#@markdown Linear layer alpha value for LoRA
lora_alpha = 16 # @param {type:"integer"}

#@markdown Precision to save the model
save_precision = 'float16' # @param ["float16", "float32"]

#@markdown Save the model every this many steps
save_every = 500 # @param {type:"integer"}

#@markdown Number of intermittent saves to keep
max_save_keeps = 4 # @param {type:"integer"}

#@markdown Rate at which captions are dropped during training
caption_dropout_rate = 0.05 # @param {type:"number"}

#@markdown Whether to shuffle tokens in captions
shuffle_caption_tokens = False # @param {type:"boolean"}

#@markdown Cache latents to disk for faster loading
cache_latents = True # @param {type:"boolean"}

#@markdown Number of gradient accumulation steps
gradient_accumulation_steps = 2 # @param {type:"integer"}

#@markdown Training focus: content, style, or balanced
content_or_style = 'content' # @param ["content", "style", "balanced"]

#@markdown Noise scheduler to use during training
noise_scheduler = 'flowmatch' # @param ["flowmatch", "ddpm", "ddim"]

#@markdown Skip the first sample generation
skip_first_sample = True # @param {type:"boolean"}

#@markdown Whether the model is a FLUX model
is_flux_model = True # @param {type:"boolean"}

#@markdown Use 8-bit quantization
use_quantization = True # @param {type:"boolean"}

#@markdown Use low VRAM mode (slower but uses less memory)
low_vram_mode = False # @param {type:"boolean"}

#@markdown Sampler to use for generating samples
sample_sampler = 'flowmatch' # @param ["flowmatch", "ddpm", "ddim"]

#@markdown Seed for random number generation
random_seed = 42 # @param {type:"integer"}

#@markdown Whether to use different seeds for each sample
walk_seed = True # @param {type:"boolean"}

from collections import OrderedDict

job_to_run = OrderedDict([
    ('job', 'extension'),
    ('config', OrderedDict([
        ('name', project_name),
        ('process', [
            OrderedDict([
                ('type', 'sd_trainer'),
                ('training_folder', output_dir),
                ('performance_log_every', performance_log_every),
                ('device', device),
                ('trigger_word', instance_prompt),
                ('network', OrderedDict([
                    ('type', 'lora'),
                    ('linear', lora_rank),
                    ('linear_alpha', lora_alpha)
                ])),
                ('save', OrderedDict([
                    ('dtype', save_precision),
                    ('save_every', save_every),
                    ('max_step_saves_to_keep', max_save_keeps)
                ])),
                ('datasets', [
                    OrderedDict([
                        ('folder_path', f'{output_dir}/flux_dataset'),
                        ('caption_ext', "txt"),
                        ('caption_dropout_rate', caption_dropout_rate),
                        ('shuffle_tokens', shuffle_caption_tokens),
                        ('cache_latents_to_disk', cache_latents),
                        ('resolution', resolution)
                    ])
                ]),
                ('train', OrderedDict([
                    ('batch_size', batch_size),
                    ('steps', total_steps),
                    ('gradient_accumulation_steps', gradient_accumulation_steps),
                    ('train_unet', train_unet),
                    ('train_text_encoder', train_text_encoder),
                    ('content_or_style', content_or_style),
                    ('gradient_checkpointing', use_gradient_checkpointing),
                    ('noise_scheduler', noise_scheduler),
                    ('optimizer', 'adamw8bit' if use_8bit_adam else 'adamw'),
                    ('lr', learning_rate),
                    ('skip_first_sample', skip_first_sample),
                    ('ema_config', OrderedDict([
                        ('use_ema', use_ema),
                        ('ema_decay', ema_decay)
                    ])),
                    ('dtype', 'bf16' if use_bf16 else 'float32')
                ])),
                ('model', OrderedDict([
                    ('name_or_path', model_name),
                    ('is_flux', is_flux_model),
                    ('quantize', use_quantization),
                    ('low_vram', low_vram_mode),
                ])),
                ('sample', OrderedDict([
                    ('sampler', sample_sampler),
                    ('sample_every', sample_every),
                    ('width', sample_width),
                    ('height', sample_height),
                    ('prompts', [
                        f'professional headshot of {instance_prompt} in a suit, studio lighting, neutral background',
                        f'business portrait of {instance_prompt} smiling, office setting, soft lighting',
                        f'corporate headshot of {instance_prompt} with confident expression, blurred office background',
                        f'professional profile picture of {instance_prompt} in business casual attire, outdoors',
                        f'LinkedIn profile photo of {instance_prompt} with friendly expression, solid color background',
                        f'{instance_prompt} giving a presentation in a conference room, professional attire',
                        f'close-up portrait of {instance_prompt} for company website, modern office background',
                        f'{instance_prompt} in a casual business meeting, gesturing while speaking, natural light'
                    ]),
                    ('neg', ''),
                    ('seed', random_seed),
                    ('walk_seed', walk_seed),
                    ('guidance_scale', guidance_scale),
                    ('sample_steps', sample_steps)
                ]))
            ])
        ])
    ])),
    ('meta', OrderedDict([
        ('name', project_name),
        ('version', '1.0')
    ]))
])

# Start the training with cell bellow when you're ready

This might take a while. Keep the session running while training.

In [6]:
run_job(job_to_run)

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

  check_for_updates()
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
  return register_model(fn_wrapper)
  return register_model(fn_wrapper)
  return register_model(fn_wrapper)
  return register_model(fn_wrapper)
  return register_model(fn_wrapper)
  self.scaler = torch.cuda.amp.GradScaler()


{
    "type": "sd_trainer",
    "training_folder": "/content/drive/MyDrive/headshot-generator",
    "performance_log_every": 500,
    "device": "cuda:0",
    "trigger_word": "pa3k",
    "network": {
        "type": "lora",
        "linear": 16,
        "linear_alpha": 16
    },
    "save": {
        "dtype": "float16",
        "save_every": 500,
        "max_step_saves_to_keep": 4
    },
    "datasets": [
        {
            "folder_path": "/content/drive/MyDrive/headshot-generator/flux_dataset",
            "caption_ext": "txt",
            "caption_dropout_rate": 0.05,
            "shuffle_tokens": false,
            "cache_latents_to_disk": true,
            "resolution": [
                512,
                768,
                1024
            ]
        }
    ],
    "train": {
        "batch_size": 2,
        "steps": 2000,
        "gradient_accumulation_steps": 2,
        "train_unet": true,
        "train_text_encoder": false,
        "content_or_style": "content",
        "

KeyboardInterrupt: 