[Optimization] Use torch.bfloat16 on cuda systems #5410

lstein · 2024-01-05T01:47:14Z

What type of PR is this? (check all applicable)

Have you discussed this change with the InvokeAI team?

Yes
No, because:

Have you updated all relevant documentation?

Yes
No

Description

On CUDA systems, torch.bfloat16 is supposed to be ~25% faster than torch.float16 according to this article: https://pytorch.org/blog/accelerating-generative-ai-3/. This small PR substitutes bfloat16 for float16 when half-precision is requested on a cuda system.

Unfortunately I don't see any change in speed in my benchmarking. The article says that the speedup is GPU dependent, so perhaps others will have more luck (I'm using an RTX 4070).

Related Tickets & Documents

Related Issue #
Closes #

QA Instructions, Screenshots, Recordings

Merge Plan

Can merge when tested and approved. Please try on different GPUs and OSs. Should make no difference on Macs.

Added/updated tests?

Yes
No : not really needed

[optional] Are there any post deployment tasks we need to perform?

hipsterusername · 2024-01-05T02:21:44Z

On a 4090, also not seeing the advertised speed increases (goes from ~7.5 > 7, not below 5.

Something seems off - could ask the diffusers team

hipsterusername · 2024-01-05T04:25:48Z

After some testing, seems to be improving things - Logging what changed the most:

Disabled xformers
Installed torch nightlys.

Diffusers team suggested maxing clocks, but I figured that was a step far to recommend for average user so am leaving it at what I've done to validate that it's 👍

use torch.bfloat16 on cuda systems

003b4c2

lstein requested review from damian0815, blessedcoolant, GreggHelt2, StAlKeR7779, brandonrising, RyanJDick and hipsterusername as code owners January 5, 2024 01:47

hipsterusername approved these changes Jan 5, 2024

View reviewed changes

hipsterusername merged commit 6460dcc into main Jan 5, 2024
7 checks passed

hipsterusername deleted the feat/lstein/bfloat16 branch January 5, 2024 04:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optimization] Use torch.bfloat16 on cuda systems #5410

[Optimization] Use torch.bfloat16 on cuda systems #5410

lstein commented Jan 5, 2024

hipsterusername commented Jan 5, 2024

hipsterusername commented Jan 5, 2024

[Optimization] Use torch.bfloat16 on cuda systems #5410

[Optimization] Use torch.bfloat16 on cuda systems #5410

Conversation

lstein commented Jan 5, 2024

What type of PR is this? (check all applicable)

Have you discussed this change with the InvokeAI team?

Have you updated all relevant documentation?

Description

Related Tickets & Documents

QA Instructions, Screenshots, Recordings

Merge Plan

Added/updated tests?

[optional] Are there any post deployment tasks we need to perform?

hipsterusername commented Jan 5, 2024

hipsterusername commented Jan 5, 2024