Skip to content

Conversation

mnslarcher
Copy link
Contributor

What does this PR do?

As discussed in issue #4736, this PR tackles the following tasks:

  • Increases the minimum required version of accelerate from 0.16.0 to 0.22.0. This prevents unnecessary peaks in GPU memory consumption during checkpointing with mixed-precision.
  • Removes casting to float32 before saving to avoid unnecessary increases in GPU memory consumption.
  • Removes redundant instantiation of the VAE at the end (already present).
  • Deletes unused models before creating the final pipeline.

The command make test-examples completed successfully.

The following images were generated using the weights saved in fp32:
image_fp32

The following images were generated using the same weights as above, but saved in fp16:
image_fp16

Previously, peak memory consumption was borderline on an RTX 4090 during checkpoint and final weight saving. Now, it remains consistently below 75%.

Saving and loading also work with bf16.

Fixes #4736 (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul

# Final inference
# Load previous pipeline
vae = AutoencoderKL.from_pretrained(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this already defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we never del the one defined here so is still there even at the end. And... it works without it :)

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me - wdyt @sayakpaul ?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@@ -1,4 +1,4 @@
accelerate>=0.16.0
accelerate>=0.22.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, without this change, there is still a spike in GPU memory during checkpointing :/. From Zach's message on the issue, my understanding is that there was a bug when saving with the accelerator while using mixed precision

@sayakpaul
Copy link
Member

Looks amazing actually. Thanks for delving deep here and for the fixes!

@sayakpaul sayakpaul merged commit 87ae330 into huggingface:main Aug 28, 2023
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
…4791)

* Increase min accelerate ver to avoid OOM when mixed precision

* Rm re-instantiation of VAE

* Rm casting to float32

* Del unused models and free GPU

* Fix style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA out of memory and invalid value encountered in cast with train_text_to_image_lora_sdxl.py
4 participants