Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I get an error when I try to learn Dreambooth model #10

Closed
shi3z opened this issue Feb 1, 2023 · 3 comments
Closed

I get an error when I try to learn Dreambooth model #10

shi3z opened this issue Feb 1, 2023 · 3 comments

Comments

@shi3z
Copy link

shi3z commented Feb 1, 2023

Thank you for a great job.
I'm having trouble training a normal model, but I'm having trouble training a Dreambooth model. Mr Potato doesn't work either, so I'd like to identify the cause.


$ accelerate launch train_tuneavideo.py --config="configs/mr-potato-head.yaml"
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
02/01/2023 10:24:30 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16

{'variance_type', 'prediction_type', 'clip_sample'} was not found in config. Values will be initialized to default values.
{'use_linear_projection', 'resnet_time_scale_shift', 'num_class_embeds', 'class_embed_type', 'mid_block_type', 'only_cross_attention', 'dual_cross_attention', 'upcast_attention'} was not found in config. Values will be initialized to default values.
{'prediction_type', 'clip_sample'} was not found in config. Values will be initialized to default values.
/home/ubuntu/Tune-A-Video/tuneavideo/pipelines/pipeline_tuneavideo.py:82: FutureWarning: The configuration file of this scheduler: DDIMScheduler {
"_class_name": "DDIMScheduler",
"_diffusers_version": "0.12.1",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"clip_sample": true,
"num_train_timesteps": 1000,
"prediction_type": "epsilon",
"set_alpha_to_one": false,
"skip_prk_steps": true,
"steps_offset": 1,
"trained_betas": null
}
has not set the configuration clip_sample. clip_sample should be set to False in the configuration file. Please make sure to update the config accordingly as not setting clip_sample in the config might lead to incorrect results in future versions. If you have downloaded this checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull request for the scheduler/scheduler_config.json file
deprecate("clip_sample not set", "1.0.0", deprecation_message, standard_warn=False)
02/01/2023 10:24:42 - INFO - main - ***** Running training *****
02/01/2023 10:24:42 - INFO - main - Num examples = 1
02/01/2023 10:24:42 - INFO - main - Num Epochs = 500
02/01/2023 10:24:42 - INFO - main - Instantaneous batch size per device = 1
02/01/2023 10:24:42 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 1
02/01/2023 10:24:42 - INFO - main - Gradient Accumulation steps = 1
02/01/2023 10:24:42 - INFO - main - Total optimization steps = 500
Steps: 0%| | 0/500 [00:00<?, ?it/s]/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Traceback (most recent call last):
File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 352, in
main(**OmegaConf.load(args.config))
File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 284, in main
model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/utils/operations.py", line 490, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet.py", line 364, in forward
sample, res_samples = downsample_block(
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet_blocks.py", line 301, in forward
hidden_states = torch.utils.checkpoint.checkpoint(
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet_blocks.py", line 294, in custom_forward
return module(*inputs, return_dict=return_dict)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/attention.py", line 111, in forward
hidden_states = block(
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/attention.py", line 243, in forward
hidden_states = self.attn1(norm_hidden_states, attention_mask=attention_mask, video_length=video_length) + hidden_states
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/attention.py", line 283, in forward
query = self.reshape_heads_to_batch_dim(query)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1269, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'SparseCausalAttention' object has no attribute 'reshape_heads_to_batch_dim'


This environment is as follows

Tesla V100(32GB)
Python 3.10.9
torch 1.13.1
torchaudio 0.13.1
torchtext 0.14.1
torchvision 0.14.1
transformers 4.26.0

Also, when I tried to train Tune-A-Video with a model I trained myself using the Diffusers examples, I got a different error.

$ accelerate launch train_tuneavideo.py --config="configs/man-surfing.yaml"
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
02/01/2023 10:07:58 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16

{'variance_type'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 352, in
main(**OmegaConf.load(args.config))
File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 107, in main
unet = UNet3DConditionModel.from_pretrained_2d(pretrained_model_path, subfolder="unet")
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet.py", line 440, in from_pretrained_2d
model = cls.from_config(config)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 210, in from_config
model = cls(**init_dict)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 567, in inner_init
init(self, *args, **init_kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet.py", line 158, in init
raise ValueError(f"unknown mid_block_type : {mid_block_type}")
ValueError: unknown mid_block_type : UNetMidBlock2DCrossAttn
Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/bin/accelerate", line 8, in
sys.exit(main())
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/bin/python', 'train_tuneavideo.py', '--config=configs/man-surfing.yaml']' returned non-zero exit status 1.


Any hint would be appreciated

@shi3z shi3z changed the title I get an error when I try to learn Dreambooth modelthank you for a great job I get an error when I try to learn Dreambooth model Feb 1, 2023
@shi3z
Copy link
Author

shi3z commented Feb 1, 2023

solved
I found the cause, so I'll write it in case someone else has the same symptoms.

I was using Pytorch version 1.13.0 when it failed. 1.12 did not have this problem. However, on V100, the CUDA memory is exceeded, so the sample step was dropped from 8 to 4.

@shi3z shi3z closed this as completed Feb 1, 2023
@zhangjiewu
Copy link
Collaborator

zhangjiewu commented Feb 1, 2023

solved I found the cause, so I'll write it in case someone else has the same symptoms.

I was using Pytorch version 1.13.0 when it failed. 1.12 did not have this problem. However, on V100, the CUDA memory is exceeded, so the sample step was dropped from 8 to 4.

Hi @shi3z, thank you for sharing the solution. The code works well on my V100 32GB with xformers and fp16 enabled.

@Randle-Github
Copy link

Hi @shi3z , I still got the same problem even I completely follow your package settings as follows:

Traceback (most recent call last):
File "/data/home/lyc/Tune-A-Video/train_tuneavideo.py", line 367, in
main(**OmegaConf.load(args.config))
File "/data/home/lyc/Tune-A-Video/train_tuneavideo.py", line 289, in main
model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/accelerate/utils/operations.py", line 489, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "/data/home/lyc/Tune-A-Video/tuneavideo/models/unet.py", line 364, in forward
sample, res_samples = downsample_block(
File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/home/lyc/Tune-A-Video/tuneavideo/models/unet_blocks.py", line 301, in forward
hidden_states = torch.utils.checkpoint.checkpoint( ]
File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 235, in checkpoint
return CheckpointFunction.apply(function, preserve, *args) 3
File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 96, in forward
outputs = run_function(*args)
File "/data/home/lyc/Tune-A-Video/tuneavideo/models/unet_blocks.py", line 294, in custom_forward
return module(*inputs, return_dict=return_dict)
File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/home/lyc/Tune-A-Video/tuneavideo/models/attention.py", line 111, in forward
hidden_states = block(
File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data/home/lyc/Tune-A-Video/tuneavideo/models/attention.py", line 243, in forward
hidden_states = self.attn1(norm_hidden_states, attention_mask=attention_mask, video_length=video_length) + hidden_states
File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs)
File "/data/home/lyc/Tune-A-Video/tuneavideo/models/attention.py", line 283, in forward
query = self.reshape_heads_to_batch_dim(query) File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1207, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'SparseCausalAttention' object has no attribute 'reshape_heads_to_batch_dim'

Did you change other settings in your work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants