Skip to content

Conversation

Disty0
Copy link
Contributor

@Disty0 Disty0 commented Oct 11, 2025

What does this PR do?

Adds update_expected_keys and update_unexpected_keys APIs to DiffusersQuantizer.
Makes load_model_dict_into_meta compatible with updated unexpected / expected keys added in DiffusersQuantizer.

Fixes #12470

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Core library:

@Disty0
Copy link
Contributor Author

Disty0 commented Oct 11, 2025

Here is a use case of this PR with SDNQ:

pip install git+https://github.com/Disty0/sdnq
import torch
import diffusers
from sdnq import SDNQConfig # import sdnq to register it into diffusers and transformers

pipe = diffusers.FluxPipeline.from_pretrained("Disty0/FLUX.1-dev-SDNQ-uint4-svd-r32", torch_dtype=torch.bfloat16)

if (
    hasattr(pipe.text_encoder_2.encoder.block[0].layer[0].SelfAttention.k, "scale")
    and pipe.text_encoder_2.encoder.block[0].layer[0].SelfAttention.k.scale.device.type != "meta"
    and hasattr(pipe.transformer.single_transformer_blocks[0].attn.to_k, "scale")
    and pipe.transformer.single_transformer_blocks[0].attn.to_k.scale.device.type != "meta"
):
    print("SDNQ model loaded succesfully")
else:
    print("SDNQ model failed to load")
    exit()

pipe.enable_model_cpu_offload()
prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=3.5,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-dev-sdnq-uint4-svd-r32.png")

(This result is generated with an Intel ARC A770 with FP64 Rope downcasted to FP32 to make it work on Alchemist as it doesn't support FP64.)
Image

@sayakpaul
Copy link
Member

sayakpaul commented Oct 13, 2025

Thanks for the work!

I am still a bit confused about the utility of the APIs. Possible to explain it in simpler terms?

Cc: @SunMarc as well.

@Disty0
Copy link
Contributor Author

Disty0 commented Oct 13, 2025

Related PR on Transformers: huggingface/transformers#41138

Currently i have to acces the state dict to load the newly created params in quantization. And this can break if the state dict is sharded. And Diffusers will still throw unexpected keys warning regardles of if they were actually used or not as seen in the screenshot:

image-1

This PR makes it so we can update the unexpected keys and expected keys so the params that will be added in quantization won't be skipped by Diffusers and Diffusers won't throw unnecessary unexpected keys warning.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay to me! Thanks!

@SunMarc could you review as well?


for param_name, param in state_dict.items():
if param_name not in empty_state_dict:
if param_name in unexpected_keys:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be?

Suggested change
if param_name in unexpected_keys:
if param_name not in empty_state_dict or param_name in unexpected_keys:

Copy link
Contributor Author

@Disty0 Disty0 Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters that will be added in quantization isn't in the empty_state_dict yet. They will be added to the model in create_quantized_param within this loop.

Transformers uses param_name not in expected_keys for this check. I used the unexpected keys here instead because diffusers doesn't pass the expected keys to this loop.

Comment on lines 269 to 270
# hf_quantizer can add parameters that doesn't exist yet
# they will be in the loaded_state_dict when pre_quantized
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also provide more details when this can arise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a new commit that fixes the failing pipeline tests when unexpected_keys is None. Also added more details to this comment lines.

@sayakpaul sayakpaul requested review from DN6 and SunMarc October 13, 2025 11:17
@Disty0
Copy link
Contributor Author

Disty0 commented Oct 13, 2025

Also, Transformers has requires_parameters_quantization flag for HfQuantizer classes that require creation of a new parameter that doesn't exist in the model before the create_quantized_param step. We can add this flag to the DiffusersQuantizer as well.

From Transformers:
requires_parameters_quantization (bool):
Whether the quantization method requires to create a new Parameter. For example, for bitsandbytes, it is required to create a new xxxParameter in order to properly quantize the model.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this ! Eager to see the integration with SDNQ !

Comment on lines +263 to +272
if param_name in empty_state_dict:
old_param = model
splits = param_name.split(".")
for split in splits:
old_param = getattr(old_param, split)
else:
# hf_quantizer can add parameters that doesn't exist yet in the model and the empty_state_dict
# they will be created in create_quantized_param and hf_quantizer should handle the loading of these parameters
# these parameters will be in the loaded_state_dict from the model file instead when loading a pre_quantized model
old_param = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah indeed this is kind of what we did in _infer_parameter_dtype in transformers

# bnb params are flattened.
# gguf quants have a different shape based on the type of quantization applied
if empty_state_dict[param_name].shape != param.shape:
if param_name in empty_state_dict and empty_state_dict[param_name].shape != param.shape:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add a small comment for that as we will probably refactor the loading at some point to match what we have in transformers


for param_name, param in state_dict.items():
if param_name not in empty_state_dict:
if unexpected_keys is not None and param_name in unexpected_keys:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's better, actually in transformers we rely on unexpected keys

@SunMarc
Copy link
Member

SunMarc commented Oct 13, 2025

Also, Transformers has requires_parameters_quantization flag for HfQuantizer classes that require creation of a new parameter that doesn't exist in the model before the create_quantized_param step. We can add this flag to the DiffusersQuantizer as well.

From Transformers:
requires_parameters_quantization (bool):
Whether the quantization method requires to create a new Parameter. For example, for bitsandbytes, it is required to create a new xxxParameter in order to properly quantize the model.

Acutally this is not that useful and we will probably remove it in transformers, check_quantized_param should be enough. This is why we didn't add it here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add update_unexpected_keys and update_expected_keys APIs to DiffusersQuantizer

3 participants