"Raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) " happened when starting studing

### Describe the bug

I run the train_text_to_image.py with the command described in the instruction. However, when the process is in the VAE encoding, it took a lot of time and raised the error (shown below).

01/24/2024 11:00:18 - INFO - __main__ - ***** Running training *****
01/24/2024 11:00:18 - INFO - __main__ -   Num examples = 834
01/24/2024 11:00:18 - INFO - __main__ -   Num Epochs = 72
01/24/2024 11:00:18 - INFO - __main__ -   Instantaneous batch size per device = 1
01/24/2024 11:00:18 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
01/24/2024 11:00:18 - INFO - __main__ -   Gradient Accumulation steps = 4
01/24/2024 11:00:18 - INFO - __main__ -   Total optimization steps = 15000
Steps:   0%|                                                                                                                                                                      | 0/15000 [00:00<?, ?it/s]Shape of pixel_values: torch.Size([1, 3, 176, 512, 512])
Shape of rearrange pixel_values: torch.Size([176, 3, 512, 512])
Traceback (most recent call last):
  File "/home/vipuser/.conda/envs/diffusion/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/vipuser/.conda/envs/diffusion/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/home/vipuser/.conda/envs/diffusion/lib/python3.8/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
    simple_launcher(args)
  File "/home/vipuser/.conda/envs/diffusion/lib/python3.8/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/vipuser/.conda/envs/diffusion/bin/python', '/home/vipuser/Downloads/diffusers/examples/text_to_image/temp_3D_v2.py', '--pretrained_model_name_or_path=/home/vipuser/Downloads/stable-diffusion-v1-4', '--train_data_dir=/data/dataset-NKI', '--use_ema', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--gradient_checkpointing', '--max_train_steps=15000', '--learning_rate=1e-05', '--max_grad_norm=1', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--output_dir=output_3D']' died with <Signals.SIGSEGV: 11>.

### Reproduction

accelerate launch --mixed_precision="fp16" /home/Downloads/diffusers/examples/text_to_image/train_text_to_image.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --train_data_dir=“/data/dataset" \
  --use_ema \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --gradient_checkpointing \
  --max_train_steps=15000 \
  --learning_rate=1e-05 \
  --max_grad_norm=1 \
  --lr_scheduler="constant" --lr_warmup_steps=0 \
  --output_dir="output"

### Logs

_No response_

### System Info

- `diffusers` version: 0.26.0.dev0
- Platform: Linux-5.4.0-164-generic-x86_64-with-glibc2.17
- Python version: 3.8.18
- PyTorch version (GPU?): 2.3.0.dev20240123+cu118 (True)
- Huggingface_hub version: 0.20.3
- Transformers version: 4.37.0
- Accelerate version: 0.26.1
- xFormers version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

### Who can help?

@sayakpaul @patrickvonplaten 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"Raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) " happened when starting studing #6695

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"Raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) " happened when starting studing #6695

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions