-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug]: I don't thin bfloat16 config option is being used. #5799
Comments
Adding another bfloat16 support issue [2024-03-09 12:06:28,029]::[InvokeAI]::ERROR --> Traceback (most recent call last):
File "/Volumes/SSD2TB/AI/InvokeAI/lib/python3.10/site-packages/invokeai/app/services/invocation_processor/invocation_processor_default.py", line 134, in __process
outputs = invocation.invoke_internal(
File "/Volumes/SSD2TB/AI/InvokeAI/lib/python3.10/site-packages/invokeai/app/invocations/baseinvocation.py", line 669, in invoke_internal
output = self.invoke(context)
File "/Volumes/SSD2TB/AI/InvokeAI/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Volumes/SSD2TB/AI/InvokeAI/lib/python3.10/site-packages/invokeai/app/invocations/latent.py", line 773, in invoke
) = pipeline.latents_from_embeddings(
File "/Volumes/SSD2TB/AI/InvokeAI/lib/python3.10/site-packages/invokeai/backend/stable_diffusion/diffusers_pipeline.py", line 381, in latents_from_embeddings
latents, attention_map_saver = self.generate_latents_from_embeddings(
File "/Volumes/SSD2TB/AI/InvokeAI/lib/python3.10/site-packages/invokeai/backend/stable_diffusion/diffusers_pipeline.py", line 441, in generate_latents_from_embeddings
callback(
File "/Volumes/SSD2TB/AI/InvokeAI/lib/python3.10/site-packages/invokeai/app/invocations/latent.py", line 704, in step_callback
self.dispatch_progress(context, source_node_id, state, self.unet.unet.base_model)
File "/Volumes/SSD2TB/AI/InvokeAI/lib/python3.10/site-packages/invokeai/app/invocations/latent.py", line 318, in dispatch_progress
stable_diffusion_step_callback(
File "/Volumes/SSD2TB/AI/InvokeAI/lib/python3.10/site-packages/invokeai/app/util/step_callback.py", line 85, in stable_diffusion_step_callback
image = sample_to_lowres_estimated_image(sample, sdxl_latent_rgb_factors, sdxl_smooth_matrix)
File "/Volumes/SSD2TB/AI/InvokeAI/lib/python3.10/site-packages/invokeai/app/util/step_callback.py", line 13, in sample_to_lowres_estimated_image
latent_image = samples[0].permute(1, 2, 0) @ latent_rgb_factors
RuntimeError: Undefined type BFloat16
[2024-03-09 12:06:28,029]::[InvokeAI]::ERROR --> Error while invoking:
Undefined type BFloat16 EDIT: raised a PyTorch issue EDIT2: Now fixed in PyTorch nightlies, I think, the current nightly totally breaks InvokeAI by breaking Accelerate. |
This seems to be much improved in version 4, I only had to change |
Is there an existing issue for this problem?
Operating system
macOS
GPU vendor
Apple Silicon (MPS)
GPU model
M3 base 10 GPU revision
GPU VRAM
24
Version number
v3.7.0
Browser
Safari 17.2.1
Python dependencies
{
"accelerate": "0.27.2",
"compel": "2.0.2",
"cuda": null,
"diffusers": "0.26.3",
"numpy": "1.26.4",
"opencv": "4.9.0.80",
"onnx": "1.15.0",
"pillow": "10.2.0",
"python": "3.10.13",
"torch": "2.3.0.dev20240221",
"torchvision": "0.18.0.dev20240221",
"transformers": "4.37.2",
"xformers": null
}
What happened
Recent PyTorch Nightlies have added some bfloat16 for MPS, and testing Diffusers with them showed there enough support for Stable Diffusion to run and give a small but statically significant decrease in seconds per iteration.
I thats to see if I could use the nightlies with InvokeAI now the basicSR dependancy has been removed and that testing worked fine. So I set the config to bfloat16.
Everything worked but I saw no change to render times.
Digging though the code and I spotted a few bits of code that look like they force the use of float16., including one
that I think prevents the use of bfloat16 on all formats.
I made the following code changes
invokeai/backend/util/devices.py
made choose_precision allow use of bfloat16 for MPS
This broke Compel so had to do this quick fix, this needs a better fix as I think the issue is purely a MPS torch issue.
invokeai/app/invocations/compel.py
comment out both occurrences of dtype_for_device_getter, line 125 and 248
And finally the one I think prevents bfloat16 usage on all platforms
invokeai/app/services/model_manager/model_manager_default.py
line 65replaced with
and that seems to do the trick and I see a small speed up I'd expect.
I suspect there are other places, where similar changes need to be made as the same
invokeai/app/invocations/latent.py
loads the vae as either float16 or float32, not bfloat16. Not sure if that is actually necessary.
Model installation
invokeai/app/services/model_install/model_install_default.py
_guess_variant()
looks like it will try and get the 32bit version of models for bfloat16 instead of the fp16 variant by default (andI've noticed one or two bf16 variants recently)Model loading
invokeai/backend/model_management/models/base.py
I think line 285 means for bfloat16 it will load the fp32 model in preference to the fp16 variant if both are installed.What you expected to happen
Modifying the config file to use bfloat16 should be enough to use bfloat16 precision.
How to reproduce the problem
Changer the invoke.yaml file to use bfloat16, you may need some debug code to conform its actually still using float16, and to run a Diffusers script to test exactly how bfloat16 should behave with your hardware.
Additional context
No response
Discord username
No response
The text was updated successfully, but these errors were encountered: