Skip to content

Conversation

sayakpaul
Copy link
Member

What does this PR do?

Many models use "# Copied from ..." implementations of attn_processors and set_attn_processor. They are basically the same as what we have implemented in

class AttentionMixin:

This PR makes those models inherit from AttentionMixin and removes the copied-over implementations.

I decided to leave fuse_qkv_projections and unfuse_qkv_projections out of this PR because some models don't have attention processors implemented in a way that would make this seamless. But the methods removed in this PR should be very harmless.

@sayakpaul sayakpaul requested review from DN6 and dg845 October 10, 2025 15:39
@sayakpaul sayakpaul marked this pull request as draft October 10, 2025 16:18
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul sayakpaul marked this pull request as ready for review October 11, 2025 03:47
from ...models.attention_processor import (
ADDED_KV_ATTENTION_PROCESSORS,
CROSS_ATTENTION_PROCESSORS,
AttentionProcessor,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that many of the tests in AudioLDM2PipelineFastTests currently fail in the CI with the following error, e.g.:

FAILED tests/pipelines/audioldm2/test_audioldm2.py::AudioLDM2PipelineFastTests::test_inference_batch_consistent - AttributeError: type object 'ClapConfig' has no attribute 'from_text_audio_configs'

This method is called in the AudioLDM2PipelineFastTests.get_dummy_components:

text_encoder_config = ClapConfig.from_text_audio_configs(
text_config=text_branch_config,
audio_config=audio_branch_config,
projection_dim=16,
)

It looks like the ClapConfig.from_text_audio_configs method exists in transformers==4.57.0 but has been removed in main. Given that this method will be deprecated, should we replace this call with something like

class AudioLDM2PipelineFastTests(PipelineTesterMixin, unittest.TestCase):
    ...
    def get_dummy_components(self):
        ...
        text_encoder_config = ClapConfig(
            text_config=text_branch_config,
            audio_config=audio_branch_config,
            projection_dim=16,
        )
        ...
    ...

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, for the following tests fail due to CLIPFeatureExtractor being removed:

FAILED tests/pipelines/test_pipelines.py::DownloadTests::test_download_bin_only_variant_exists_for_model - AttributeError: module transformers has no attribute CLIPFeatureExtractor
FAILED tests/pipelines/test_pipelines.py::DownloadTests::test_download_bin_variant_does_not_exist_for_model - AttributeError: module transformers has no attribute CLIPFeatureExtractor
FAILED tests/pipelines/test_pipelines.py::PipelineFastTests::test_wrong_model - AttributeError: module transformers has no attribute CLIPFeatureExtractor

Should we replace the calls to CLIPFeatureExtractor with CLIPImageProcessor, or do you think that should be separated into a new PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For #12463 (comment) see #12455

nit: Unrelated to this PR. Prefer discussing these separately.

for name, module in self.named_children():
fn_recursive_attn_processor(name, module, processor)

# Copied from diffusers.models.unets.unet_2d_condition.UNet2DConditionModel.set_default_attn_processor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's out of the scope for this PR, but I see that a lot of models additionally have a set_default_attn_processor method, usually # Copied from diffusers.models.unets.unet_2d_condition.UNet2DConditionModel.set_default_attn_processor. Do you think it makes sense to add this method to AttentionMixin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, not yet since AttentionMixin is fairly agnostic to the model-type but set_default_attn_processor relies on some custom attention processor types. For UNet2DConditionModel, we have:

if all(proc.__class__ in ADDED_KV_ATTENTION_PROCESSORS for proc in self.attn_processors.values()):
processor = AttnAddedKVProcessor()
elif all(proc.__class__ in CROSS_ATTENTION_PROCESSORS for proc in self.attn_processors.values()):
processor = AttnProcessor()
else:
raise ValueError(
f"Cannot call `set_default_attn_processor` when attention processors are of type {next(iter(self.attn_processors.values()))}"
)

However, for AutoencoderKL Temporal Decoder:

if all(proc.__class__ in CROSS_ATTENTION_PROCESSORS for proc in self.attn_processors.values()):
processor = AttnProcessor()
else:
raise ValueError(
f"Cannot call `set_default_attn_processor` when attention processors are of type {next(iter(self.attn_processors.values()))}"
)

I'd be down to the refactoring, though. Cc: @DN6

Copy link
Collaborator

@dg845 dg845 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I think AuraFlowTransformer2DModel and AudioLDM2UNet2DConditionModel have their attn_processor/set_attn_processor methods deleted but are missing the corresponding change to inherit from AttentionMixin.

@sayakpaul
Copy link
Member Author

Thanks for those catches, @dg845. Should have been fixed by now.

@dg845
Copy link
Collaborator

dg845 commented Oct 14, 2025

LGTM :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants