Skip to content

How to train LoRA

stassius edited this page Jun 7, 2023 · 6 revisions

What is LoRA

LoRA stands for Low-Rank Adaptation. It's a robust, quick and easy way of training your model on a particular style, object or person. There are other options like Dreambooth, Hypernetwork and Embedding (textual inversion), but LoRA is the most convenient and universal option. LoRA files are relatively small (4-150 Mb), you can use them in Automatic1111 directly in prompts and they could be mixed with other extra-networks. LoRA can be trained relatively fast, from 5 minutes on 4090 to about 40 minutes on 1080ti.

What is Kohya_ss and why use it

Kohya_ss is a popular toolset for training Dreambooth, LoRAs and Textual inversions. Although Automatic1111 has its own tools for training, they are not as robust and stable as Kohya_ss. So for Stable Houdini I decided to make a Kohya_ss connector.

How to install Kohya_ss

Clone this repository: https://github.com/bmaltais/kohya_ss You will need around 8 Gb of disk space.

Run setup.bat (Windows) or setup.sh (Linux and Mac). It will download all the neccesary libraries. Choose Torch 2.0 option, it works faster for modern cards. After the installation it will ask some questions, answer like this:

  • This machine
  • No distributed training
  • NO
  • NO
  • NO
  • all (or choose the desired GPU)
  • fp16

After that you're good to go. You can run the webui version of Kohya with gui-user.bat (.sh) to check if it works. You don't need to run it to use from Houdini.

How to set up Kohya in Houdini

In Houdini install the Stable Houdini toolset, and add an SD Trainer Kohya node to a Top network. After that config.ini in /hda/Config folder will be appended with a default path for Kohya. Edit this file and type in the real path to your Kohya_ss installation directory without quotes. Restart Houdini. Note that from Houdini you can only work with Kohya locally at the moment.

image

Dataset Preparation

Dataset is the most important part of the training process. "Garbage in - Garbage out" principle applies here. Images should be clear and sharp. They should include variations of your subject or style. For example if you train on a person, you should include different camera angles, facial expressions, poses, backgrounds, clothes, lighting conditions.

You need only 10-20 images for style or simple object training. For training on a person appearance, 15-30 will do.

Images could have non-square aspect ratios (more on this later). For easier training you can scale them to 512 (or any other arbitrary resolution like 768) pixels. For most models 512,512 is enough.

SD Dataset Preparation node

image

  • Folders
    • Input folder - is a folder with images inside.
    • Output folder - is a dataset folder. Inside it a new folder will be created.
    • Clear Output folder will erase everything from the Output folder. Be careful, don't put anything valuable inside it!
    • Backup output folder content will copy all the files from the output folder each time you cook this node.
  • Dataset
    • Add folder prefix will add Prefix_ to the folder name. Kohya use it to determine the number of repeats for each image in the folder. So put a number here like 150.
    • Dataset name.
      • From input folder. It will use the name of your input folder as the dataset name.
      • Custom. Lets you choose Instance and Class names. They will be added to the name of your folder like this: "150_instance class". Instance is a name of this particular object or style. It's recommended to use short unique name like 'sls', 'sks' and so on. Class is the name of the class. For a person it would be 'Man', 'Woman', 'Boy' and so on. For a style it should be 'style'. It's important if you train LoRA on more than one concept.
  • Image Processing
    • Add text captions. It will add text files with the description of your image next to the image file. If you use captions, custom Instance and Class names will not be used, so you can add them to captions as 'Additional text'. Note that for training on a person likeness, you really don't need captions. They could be important when training on a particular style or a rare object.
    • Crop to square will cut the square from the center of the image
    • Scale type lets you choose how the image will be scaled (by long or short side), as well as the resolution.
    • Additional text will be added before and after the text caption. You can add instance and class names here.

After this node is cooked, you'll find a new folder inside the Dataset folder, that will be ready for training. The name of the folder will be also saved to dataset_folder variable on the task. Note that the Dataset folder could contain more than one folder. This way you can train one LoRA on different concepts. But since LoRAs are mixable, it's easier to train a LoRA on a single concept.

SD Trainer Kohya

It has two sections: Settings with common settings and Kohya script area that will be populated with corresponding script parameters.

image

Settings

  • Source type - here you can choose the dataset directory. It could use an upstream attribute or a folder path. Note, that you should choose the parent dataset directory, containing other folders named like '150_sls dog', and not the folder with images.
  • Override repeats lets you rename all folders inside the source directory and change (or add) numeric prefixes with the number of repeats to them.
  • LoRA name is a name of the file to save LoRA to, WITHOUT EXTENSION.
  • Output directory - where to save your LoRAs.
  • Copy LoRA to - after the training your file will be copied to this directory. It could be A1111 webui/models/Lora folder.
  • Pretrained model is a model to finetune. I recommend to choose the model you use in A1111, like webui/models/StableDiffusion/Dreamshaper.safetensors Or you can choose a standard model from the list. Note that if you don't have the standard model installed, it will be download to disk C, so it's better use the custom model.
  • Number of CPU threads per process - can potentially improve the speed of different CPU-heavy tasks.

Kohya script

Here you can choose a script to use. At the moment there are two: Lora - Easy and Lora - Medium. They have different amount of parameters. Start with the easy one. Once you selected the script, the new set of parameters appears.

Lora parameters

Files

  • Existing LoRA to resume from. You can choose an existing file and continue training from it.
  • Use the dimensions from the existing LoRA. Only check it if you use the existing Lora. It will try to determine network Rank from it.
  • Save precision. Precision that is used while saving a file.
  • Save every N epochs. If you have 10 epochs and save every 1 epoch, you'll end up with 10 files.
  • Caption extension - the extension of the caption text files, like .txt
  • Regularization images directory - a folder that contains regularization images. They are needed to tell apart your object from other objects of the same class. For most LoRAs this parameters could be safely skipped

Model

  • Stable Diffusion 2+ model. Check this if the model you selected is 2.0 or 2.1 or based on one.
  • V parametrization. Check this if the model you selected is of version 2.1 or based on one.
  • Save model as - lets you choose the file type of the final model. Safetensors works for most.

Training

  • Max training epochs determines the amount of epochs your model will be trained on. One epoch is Number of images X Number of repeats steps. It's safe to say that 1500-3000 steps is enough for most LoRAs. So if you have 15-30 images, use 100 repeats in the name of the folder, it'll do.
  • Learning rate - the rate of the gradient descent.
  • Text encoder learning rate - the consensus is it should be half of the Learning rate.
  • Network rank - the number of dimensions for your network. File size is directly influenced by this parameter. In theory the higher the number, the more details will be captured, but also the amount of possible bad generations will be higher. For style 64 is a sweet spot. For person likeness it should be 128.
  • Network Alpha. Alpha for LoRA scaling. Half of the rank works best for style and the same value works for the person likeness.
  • Train batch size - how many images will be processed simultaneously. Depends on the amount of VRAM you have. Don't use too many, 2 is usually Ok.
  • Seed - just a seed for random generators inside the Kohya. The same seed should give you the same results. It's important if you're experimenting with parameters and want everything else to be the same.
  • Clip Skip - amount of layers of the network to skip at the end. For most anime-style models it's 2. For other - 1. Tweak it if you know what you're doing.

Images

  • Augmentations - this options will alter images on each repeat to get more info from them
  • Training resolution. Should be set as two numbers divided by a comma. 512,512 by default, but could be anything like 512,640
  • Aspect ratio bucketing. If you have non-square image in your dataset, use this option. It will divide your images to parts (buckets) while training. Note that it will also raise the step count, but it's hard to predict the exact number.

Optimization

  • Memory effective attention. Check this if you don't use XFormers.
  • XFormers will increase the speed while lowering the VRAM usage.
  • Use 8bit Adam decreases memory usage
  • Mixed precision. While 'no' gives you faster training, fp16 and bf16 will save VRAM. bf16 is only available for some GPUs.
  • Training with fp16 gradient could potentially save VRAM
  • Low RAM could be used if your computer has low RAM (not GPU VRAM).
  • Gradient checkpointing reduces the speed but saves VRAM
  • Cache latents could save some VRAM, but it can't work with image augmentations
  • Cache latents to disk is the same, but cache is stored on hard drive.

The default settings of this script are optimized enough for most of the LoRAs. If you got CUDA Memory errors, try different options in the Optimization tab.

Note that a LoRA will be trained for each incoming task. So you can wedge different parameters and create different LoRAs for testing (don't forget to append the LoRA name with @pdg_index or another expression).

When node is cooking, you will see the progress in the Houdini's status bar. You can interrupt the process, but sometimes you will have to wait a little for the task to finish.

How to calculate the number of repeats and epochs

There are different approaches, so feel free to experiment. Consensus is to properly train a LoRA it should go through 1500-3000 iterations.

Number of training images X Number of repeats X Number of epochs should be around these numbers.

Lower number leads to underfit, when the model just doesn't work as expected. Higher number leads to overfit, when you only get results from your dataset and overburnt images.

The total amount of steps will be printed in Houdini Console as "total optimization steps". Note that the real number of steps will be divided by the number of batches.

My advice: use 15-30 images, 100 repeats and one epoch. If the image count is lower, increase the number of repeats slightly.

How to use LoRA

Just put it into your prompt like this: <lora:filename:1>, where 1 is weight. Note, that most LoRAs work better when weight is lower than 1, try 0.7-0.9. In Houdini you can add it from SD Prompt node.

Also sometimes LoRAs require lower CFG scale, like 4-6 to get rid of artifacts and make the result closer to the dataset. Also, if you feel that the final image drifts from the dataset, add the instance word to your prompt, sometimes it helps.

How to wedge LoRAs

If you want to create LoRAs in bulk mode with different parameters, put down a Wedge node, choose Wedge count, add Wedge attribute with the proper type, choose Value list and add as many values as there are wedges in Wedge count. Append an SD Trainer Kohya node and put the name of the variable with @ at the beginning, like this: @lr. Include @pdg_index in the filename, so all the files will have an unique name

image

Here, I wedged learning rate

image

After you cook the last node, you'll get three LoRAs on disk with different Learning rates.