-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I get an error when I try to learn Dreambooth model #10
Comments
solved I was using Pytorch version 1.13.0 when it failed. 1.12 did not have this problem. However, on V100, the CUDA memory is exceeded, so the sample step was dropped from 8 to 4. |
Hi @shi3z, thank you for sharing the solution. The code works well on my V100 32GB with xformers and fp16 enabled. |
Hi @shi3z , I still got the same problem even I completely follow your package settings as follows: Traceback (most recent call last): Did you change other settings in your work? |
Thank you for a great job.
I'm having trouble training a normal model, but I'm having trouble training a Dreambooth model. Mr Potato doesn't work either, so I'd like to identify the cause.
$ accelerate launch train_tuneavideo.py --config="configs/mr-potato-head.yaml"
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
02/01/2023 10:24:30 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
{'variance_type', 'prediction_type', 'clip_sample'} was not found in config. Values will be initialized to default values.
{'use_linear_projection', 'resnet_time_scale_shift', 'num_class_embeds', 'class_embed_type', 'mid_block_type', 'only_cross_attention', 'dual_cross_attention', 'upcast_attention'} was not found in config. Values will be initialized to default values.
{'prediction_type', 'clip_sample'} was not found in config. Values will be initialized to default values.
/home/ubuntu/Tune-A-Video/tuneavideo/pipelines/pipeline_tuneavideo.py:82: FutureWarning: The configuration file of this scheduler: DDIMScheduler {
"_class_name": "DDIMScheduler",
"_diffusers_version": "0.12.1",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"clip_sample": true,
"num_train_timesteps": 1000,
"prediction_type": "epsilon",
"set_alpha_to_one": false,
"skip_prk_steps": true,
"steps_offset": 1,
"trained_betas": null
}
has not set the configuration
clip_sample
.clip_sample
should be set to False in the configuration file. Please make sure to update the config accordingly as not settingclip_sample
in the config might lead to incorrect results in future versions. If you have downloaded this checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull request for thescheduler/scheduler_config.json
filedeprecate("clip_sample not set", "1.0.0", deprecation_message, standard_warn=False)
02/01/2023 10:24:42 - INFO - main - ***** Running training *****
02/01/2023 10:24:42 - INFO - main - Num examples = 1
02/01/2023 10:24:42 - INFO - main - Num Epochs = 500
02/01/2023 10:24:42 - INFO - main - Instantaneous batch size per device = 1
02/01/2023 10:24:42 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 1
02/01/2023 10:24:42 - INFO - main - Gradient Accumulation steps = 1
02/01/2023 10:24:42 - INFO - main - Total optimization steps = 500
Steps: 0%| | 0/500 [00:00<?, ?it/s]/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Traceback (most recent call last):
File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 352, in
main(**OmegaConf.load(args.config))
File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 284, in main
model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/utils/operations.py", line 490, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet.py", line 364, in forward
sample, res_samples = downsample_block(
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet_blocks.py", line 301, in forward
hidden_states = torch.utils.checkpoint.checkpoint(
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet_blocks.py", line 294, in custom_forward
return module(*inputs, return_dict=return_dict)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/attention.py", line 111, in forward
hidden_states = block(
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/attention.py", line 243, in forward
hidden_states = self.attn1(norm_hidden_states, attention_mask=attention_mask, video_length=video_length) + hidden_states
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/attention.py", line 283, in forward
query = self.reshape_heads_to_batch_dim(query)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1269, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'SparseCausalAttention' object has no attribute 'reshape_heads_to_batch_dim'
This environment is as follows
Tesla V100(32GB)
Python 3.10.9
torch 1.13.1
torchaudio 0.13.1
torchtext 0.14.1
torchvision 0.14.1
transformers 4.26.0
Also, when I tried to train Tune-A-Video with a model I trained myself using the Diffusers examples, I got a different error.
$ accelerate launch train_tuneavideo.py --config="configs/man-surfing.yaml"
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
02/01/2023 10:07:58 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
{'variance_type'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 352, in
main(**OmegaConf.load(args.config))
File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 107, in main
unet = UNet3DConditionModel.from_pretrained_2d(pretrained_model_path, subfolder="unet")
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet.py", line 440, in from_pretrained_2d
model = cls.from_config(config)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 210, in from_config
model = cls(**init_dict)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 567, in inner_init
init(self, *args, **init_kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet.py", line 158, in init
raise ValueError(f"unknown mid_block_type : {mid_block_type}")
ValueError: unknown mid_block_type : UNetMidBlock2DCrossAttn
Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/bin/accelerate", line 8, in
sys.exit(main())
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/bin/python', 'train_tuneavideo.py', '--config=configs/man-surfing.yaml']' returned non-zero exit status 1.
Any hint would be appreciated
The text was updated successfully, but these errors were encountered: