-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Closed
Labels
bugSomething isn't workingSomething isn't workingstaleIssues that haven't received updatesIssues that haven't received updates
Description
Describe the bug
When trying to run train_dreambooth.py with --enable_xformers_memory_efficient_attention the process exits with this error:
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Steps: 0%| | 0/400 [00:07<?, ?it/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/*****/anaconda3/envs/sd-gpu/bin/accelerate:8 in <module> │
│ │
│ 5 from accelerate.commands.accelerate_cli import main │
│ 6 if __name__ == '__main__': │
│ 7 │ sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) │
│ ❱ 8 │ sys.exit(main()) │
│ 9 │
│ │
│ /home/*****/anaconda3/envs/sd-gpu/lib/python3.10/site-packages/accelerate/commands/accelerate_c │
│ li.py:45 in main │
│ │
│ 42 │ │ exit(1) │
│ 43 │ │
│ 44 │ # Run │
│ ❱ 45 │ args.func(args) │
│ 46 │
│ 47 │
│ 48 if __name__ == "__main__": │
│ │
│ /home/*****/anaconda3/envs/sd-gpu/lib/python3.10/site-packages/accelerate/commands/launch.py:11 │
│ 04 in launch_command │
│ │
│ 1101 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │
│ 1102 │ │ sagemaker_launcher(defaults, args) │
│ 1103 │ else: │
│ ❱ 1104 │ │ simple_launcher(args) │
│ 1105 │
│ 1106 │
│ 1107 def main(): │
│ │
│ /home/*****/anaconda3/envs/sd-gpu/lib/python3.10/site-packages/accelerate/commands/launch.py:56 │
│ 7 in simple_launcher │
│ │
│ 564 │ process = subprocess.Popen(cmd, env=current_env) │
│ 565 │ process.wait() │
│ 566 │ if process.returncode != 0: │
│ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │
│ 568 │
│ 569 │
│ 570 def multi_gpu_launcher(args): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Reproduction
accelerate launch train_dreambooth.py --pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4 --instance_data_dir=./inputs --output_dir=./outputs --instance_prompt="a photo of sks dog" --resolution=512 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=5e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=400 --enable_xformers_memory_efficient_attention
Logs
No response
System Info
diffusers
version: 0.12.0.dev0- Platform: Linux-5.15.79.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
- Python version: 3.10.8
- PyTorch version (GPU?): 1.13.0 (True)
- Huggingface_hub version: 0.11.1
- Transformers version: 0.15.0
- Accelerate version: not installed
- xFormers version: 0.0.15.dev395+git.7e05e2c
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: single GPU
davidpfahler
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaleIssues that haven't received updatesIssues that haven't received updates