In [None]:
!pip install -qU diffusers accelerate datasets huggingface_hub

# Training

Diffusers provides a collection of traiing scripts to train our own diffusion models.

Each training script is
* **Self-contained**: the training script does not depend on any local files, and all packages required to run the script are installed from the `requirements.txt` file.
* **Easy-to-tweak**: the training scripts are an example of how to train a diffusion model for a specific task and will not work out-of-the-box for every training scenario.
* **Beginner-friendly**: the training scripts are designed to be beginner-friendly and easy to understand, rather than including the latest state-of-the-art methods to get the best and most competitive results.
* *Single-purpose**: each training script is expressly designed for only one task to keep it readable and understandable.

To install the latest version of Diffusers library:
```bash
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .
```

To speedup training and reduce memory-usage, we should
* use PyTorch 2.0 or higher to automatically use `scaled_dot_product_attention` during training
* install `xFormers` to enable memory-efficient attention

# Create a dataset for training

There are many datasets on the [HuggingFace Hub](https://huggingface.co/datasets?task_categories=task_categories:text-to-image&sort=downloads) to train a model on, but if we cannot find one we are interested in or want to use our own, we can create a dataset with the HuggingFace Datasets library.

## Provide a dataset as a folder

For unconditional generation, we can provide our own dataset as a folder of images. The training script uses the `ImageFolder` builder from HuggingFace Datasets to automatically build a dataset from the folder.

Our directory structure should look like:
```
data_dir/xxx.png
data_dir/xxy.png
...
data_dir/[...]/xxs.png
```

We then pass the path to the dataset directory to the `--train_data_dir` argument, and start training:
```bash
accelerate launch train_unconditional.py --train_data_dir <path-to-train-directory> <other-arguments>
```

## Upload our data to the hub

We start by creating a dataset with the `ImageFolder` feature, which creates an `image` column containing the PIL-encoded images.

We can use the `data_dir` or `data_files` to specify the location of the dataset. The `data_files` supports mapping specific files to dataset splits like `train` or `test`:

In [None]:
from datasets import load_dataset

# Example 1: local folder
dataset = load_dataset(
    'imagefolder',
    data_dir='path_to_our_folder'
)

# Example 2: local files (supported formats are tar, gzip, zip, sz, rar, zstd)
dataset = load_dataset(
    'imagefolder',
    data_files='path_to_zip_file'
)

# Example 3: remote files (supported formats are tar, gzip, zip, xz, rar, zstd)
dataset = load_dataset(
    'imagefolder',
    data_files='https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip'
)

# Example 4: providing several splits
dataset = load_dataset(
    'imagefolder',
    data_files={
        'train': ['path/to/file1', 'path/to/file2'],
        'test': ['path/to/file3', 'path/to/file4']
    }
)

Then we use the `push_to_hub` to upload the dataset to the Hub:

In [None]:
dataset.push_to_hub('name_of_our_dataset')

# if we want to push to a private repo
dataset.push_to_hub('name_of_our_dataset', private=True)

Now the dataset is available for training by passing the dataset name to the `--dataset_name`:
```bash
accelerate launch --mixed_precision="fp16" train_text_to_image.py \
  --pretrained_model_name_or_path="stable-diffusion-v1-5/stable-diffusion-v1-5" \
  --dataset_name="name_of_our_dataset" \
  <other-arguments>
```

# Adapt a model to a new task

Many diffusion systems share the same components, allowing us to adapt a pretrained model for one task to an entirely different task.

## Configure `UNet2DConditionModel` parameters

The `UNet2DConditionModel` by default accepts 4 channels in the input sample.

In [None]:
from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained(
    'stable-diffusion-v1-5/stable-diffusion-v1-5',
    use_safetensors=True
)

In [None]:
pipeline.unet.config['in_channels']

Inpainting requires 9 channels in the input sample.

In [None]:
pipeline = StableDiffusionPipeline.from_pretrained(
    'runwayml/stable-diffusion-inpainting',
    use_safetensors=True
)

In [None]:
pipeline.unet.config['in_channels']

To adapt our text-to-image model for inpainting, we need to change the `in_channels` from 4 to 9.

We can initialize a `UNet2DConditionModel` with the pretrained text-to-image model weights, and change `in_channels` to 9. This means that we need to set `ignore_mismatched_sizes=True` and `low_cpu_mem_usage=False` to avoid a size mismatch error because the shape is different.

In [None]:
from diffusers import UNet2DConditionModel

model_id = 'stable-diffusion-v1-5/stable-diffusion-v1-5'
unet = UNet2DConditionModel.from_pretrained(
    model_id,
    subfolder='unet',
    in_channels=9,
    low_cpu_mem_usage=False,
    ignore_mismatched_sizes=True,
    use_safetensors=True
)

The pretrained weights of the other components from the text-to-image model are initialized from their checkpoints, but the input channel weights (`conv_in.weight`) of the `unet` are randomly initialized. We need to finetune the model for inpainting beucase otherwise this model will return nothing but noise.