Error when setting num_single_layers=0 while training flux-controlnet on a multi-GPU server using a single GPU

### Describe the bug

While training flux-controlnet on a multi-GPU server and restricting the training to a single GPU, setting **_num_single_layers=0_** leads to an error:

[rank0]: Parameter indices which did not receive grad for rank 0: 64 65 72 73 74 75



### Reproduction

`
accelerate launch --gpu_ids='0,' --num_processes=1 --num_machines=1 --main_process_port 28700 train_controlnet_flux.py \
    --pretrained_model_name_or_path="black-forest-labs/FLUX.1-schnell" \
    --dataset_name="lucataco/fill1k" \
    --conditioning_image_column=conditioning_image \
    --image_column=image \
    --caption_column=text \
    --output_dir="logs" \
    --mixed_precision="bf16" \
    --resolution=512 \
    --learning_rate=1e-5 \
    --max_train_steps=15000 \
    --validation_steps=100 \
    --checkpointing_steps=200 \
    --validation_image "./example_images/conditioning_image_1.png" "./example_images/conditioning_image_2.png" \
    --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
    --train_batch_size=1 \
    --gradient_accumulation_steps=1 \
    --report_to="tensorboard" \
    --num_double_layers=2 \
    --num_single_layers=0 \
    --seed=42 \
    --enable_model_cpu_offload \
    --use_8bit_adam \
    --use_adafactor \
    --gradient_checkpointing \
`


### Logs

```shell
[rank0]: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
[rank0]: making sure all `forward` function outputs participate in calculating loss.
[rank0]: If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
[rank0]: Parameter indices which did not receive grad for rank 0: 64 65 72 73 74 75
[rank0]:  In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error
```


### System Info

- 🤗 Diffusers version: 0.31.0.dev0
- Platform: Linux-5.14.0-427.33.1.el9_4.x86_64-x86_64-with-glibc2.34
- Running on Google Colab?: No
- Python version: 3.12.4
- PyTorch version (GPU?): 2.4.1+cu121 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.24.7
- Transformers version: 4.45.0
- Accelerate version: 0.33.0
- PEFT version: 0.12.0
- Bitsandbytes version: 0.44.1
- Safetensors version: 0.4.4
- xFormers version: 0.0.28
- Accelerator: NVIDIA RTX A6000, 49140 MiB
NVIDIA RTX A6000, 49140 MiB
NVIDIA RTX A6000, 49140 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: Yes

### Who can help?

@sayakpaul

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when setting num_single_layers=0 while training flux-controlnet on a multi-GPU server using a single GPU #9630

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Error when setting num_single_layers=0 while training flux-controlnet on a multi-GPU server using a single GPU #9630

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions