Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Optimization] Use torch.bfloat16 on cuda systems #5410

Merged
merged 1 commit into from
Jan 5, 2024

Conversation

lstein
Copy link
Collaborator

@lstein lstein commented Jan 5, 2024

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update
  • Community Node Submission

Have you discussed this change with the InvokeAI team?

  • Yes
  • No, because:

Have you updated all relevant documentation?

  • Yes
  • No

Description

On CUDA systems, torch.bfloat16 is supposed to be ~25% faster than torch.float16 according to this article: https://pytorch.org/blog/accelerating-generative-ai-3/. This small PR substitutes bfloat16 for float16 when half-precision is requested on a cuda system.

Unfortunately I don't see any change in speed in my benchmarking. The article says that the speedup is GPU dependent, so perhaps others will have more luck (I'm using an RTX 4070).

Related Tickets & Documents

  • Related Issue #
  • Closes #

QA Instructions, Screenshots, Recordings

Merge Plan

Can merge when tested and approved. Please try on different GPUs and OSs. Should make no difference on Macs.

Added/updated tests?

  • Yes
  • No : not really needed

[optional] Are there any post deployment tasks we need to perform?

@hipsterusername
Copy link
Member

On a 4090, also not seeing the advertised speed increases (goes from ~7.5 > 7, not below 5.

Something seems off - could ask the diffusers team

@hipsterusername
Copy link
Member

After some testing, seems to be improving things - Logging what changed the most:

  • Disabled xformers
  • Installed torch nightlys.

Diffusers team suggested maxing clocks, but I figured that was a step far to recommend for average user so am leaving it at what I've done to validate that it's 👍

@hipsterusername hipsterusername merged commit 6460dcc into main Jan 5, 2024
7 checks passed
@hipsterusername hipsterusername deleted the feat/lstein/bfloat16 branch January 5, 2024 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants