Skip to content

🚨 Delete duplicate code in backbone utils#43323

Merged
zucchini-nlp merged 43 commits intohuggingface:mainfrom
zucchini-nlp:backbone
Feb 4, 2026
Merged

🚨 Delete duplicate code in backbone utils#43323
zucchini-nlp merged 43 commits intohuggingface:mainfrom
zucchini-nlp:backbone

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Jan 16, 2026

What does this PR do?

This PR cleans up backbone utilities. Specifically, we have currently 5 different config attr to decide which backbone to load, most of which can be merged into one and seem redundant
After this PR, we'll have only one config.backbone_config as a single source of truth. The models will load the backbone from_config and load pretrained weights only if the checkpoint has any weights saved. The overall idea is same as in other composite models

I removed these config attr:

  • backbone - the backbone model id is now used to create a backbone_config by loading it from the hub or from timm
  • backbone_kwargs - it is used to update backbone_config with user-provided kwargs (i.e. backbone_config = CONFIG_MAPPING[model_type](**backbone_kwargs))
  • use_pretrained_backbone - we don't load a pretrained backbone anymore unless the user is calling from_pretrained. The default is to initialize a model with random weights and let the users either tune it from scratch or load pretrained weights themselves
  • use_timm_backbone - we can infer model type from config and the requested backbone type, so this arg was redundant

Along the way, I also updated the tests and docs. Recommended review path: modeling_backbone_utils.py -> auto_factory.py -> timm backbone model files -> couple other models of your choice

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43323&sha=754b61

Comment on lines 77 to 78
config = MaskFormerConfig(backbone="microsoft/resnet-50", use_pretrained_backbone=True)
config = MaskFormerConfig(backbone="microsoft/resnet-50")
model = MaskFormerForInstanceSegmentation(config)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo it doesn't serve much purpose, loading a random init model with pretrained backbone. User still has to tune the model so it can be used

Therefore I deleted this feature. Pretrained weights are loaded from_pretrained and random weights from_config, same way as any other model

```py
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation

config = MaskFormerConfig(backbone="resnet50", use_timm_backbone=True, use_pretrained_backbone=True)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_timm_backbone is not really needed imo. We can infer if the requested checkpoint is from timm or HF by checking if repo exists on the hub with a valid config

Deleted it as well as a redudant arg

Comment on lines 189 to 191
self._out_features, self._out_indices = get_aligned_output_features_output_indices(
out_features=out_features, out_indices=out_indices, stage_names=self.stage_names
)
out_indices = list(out_indices) if out_indices is not None else None
self._out_features, self._out_indices = out_features, out_indices
self.align_output_features_output_indices()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature-index aligning happens in Mixin when we call align_output_features_output_indices

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we set self._out_features and self. _out_indices in align_output_features_output_indices(out_features, out_indices)? To only call self.align_output_features_output_indices(out_features, out_indices) in backbone configs instead of these three lines. It would also simplify the setters

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I actually added a set_output_features_output_indices as well so we can use that to "set+align" values

@zucchini-nlp zucchini-nlp changed the title [WIP] Attempt at cleaning backbone utils Attempt at cleaning backbone utils Jan 27, 2026
@zucchini-nlp zucchini-nlp changed the title Delete duplicate code in backbone utils 🚨 Delete duplicate code in backbone utils Feb 2, 2026
@zucchini-nlp
Copy link
Member Author

run-slow: auto, beit, bit, conditional_detr, convnext, convnextv2, d_fine, dab_detr, deformable_detr, depth_anything, detr, timm_backbone

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

This comment contains run-slow, running the specified jobs:

models: ["models/auto", "models/beit", "models/bit", "models/conditional_detr", "models/convnext", "models/convnextv2", "models/d_fine", "models/dab_detr", "models/deformable_detr", "models/depth_anything", "models/detr", "models/timm_backbone"]
quantizations: []

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 176e9614 merge commit
PR eb88a4c1 branch commit
main e5c8b0d1 base commit

✅ No failing test specific to this PR 🎉 👏 !

@Cyrilvallez
Copy link
Member

Cyrilvallez commented Feb 3, 2026

Hey! Very nice initiative, indeed backbones have been quite annoying and have no reason to exist at all in general (we can simply do usual composite models)! Especially the timm_backbone and its pretrained weights loading which does not work correctly with our from_pretrained.
In general, we could completely remove timm_backbone in favor of timm_wrapper I believe if we remove the use_pretrained_weights anyway 🤗
And about

use_pretrained_backbone - we don't load a pretrained backbone anymore unless the user is calling from_pretrained

I believe we need to remove it completely - otherwise the loading is being done in the __init__, and when from_pretrained is called, if those weights are not in the main hf repo because they are assumed to only be on timm repo, then the weights will be considered missing and will be reinitialized.... So we can fully remove it IMO and assume weights need to live in hf repo, which will solve a long due issue!

Comment on lines 6 to 10
class BackboneConfigMixin(BackboneConfigMixin):
warnings.warn(
"Importing `BackboneConfigMixin` from `utils/backbone_utils.py` is deprecated and will be removed in "
"Transformers v5.10. Import as `from transformers.modeling_backbone_utils import BackboneConfigMixin` instead.",
FutureWarning,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really mind the renaming in general, but not sure if it's really needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean moving to a different place? It was hitting a circular import with PreTrainedConfig, atm I imported it lazily in the new file so it doesn't get imported at the top

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I meant you renamed backbone_utils -> modeling_backbone_utils haha

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that! No reason behind, just a matter of taste. Can rename it back for sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I believe it's a little easier as we want to deprecate the backbone api, then it avoids maintaining yet another BC entry point!

"""
)
class BitBackbone(BitPreTrainedModel, BackboneMixin):
class BitBackbone(BackboneMixin, BitPreTrainedModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to switch the order here? Cause of __init__?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, to init the backbone stuff first and then call BitPreTrainedModel.__init__ from within it

@Cyrilvallez
Copy link
Member

In general we want to move away from the backbone API as much as possible in favor of standard composite models so this will help 🤗

@zucchini-nlp
Copy link
Member Author

In general, we could completely remove timm_backbone in favor of timm_wrapper I believe if we remove the use_pretrained_weights anyway 🤗

100% , that is my goal for the subsequent PR. This is already hard to manage with gh conflicts and models, so we need to first clean-up extra kwargs and keep it as timm_backbone

I believe we need to remove it completely. So we can fully remove it IMO and assume weights need to live in hf repo, which will solve a long due issue!

It is removed already completely. We pop the kwarg and never use it, it's hardcoded as False in timm backbone modeling file. Atm the slow tests are passing which I believe means that the weights are already in hf repo for official releases

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright very nice! Super happy to gradually move away from the backbone API! Feel free to merge once CI is back to green (after the conflict handling issues)

@zucchini-nlp
Copy link
Member Author

run-slow: auto, beit, bit, conditional_detr, convnext, convnextv2, d_fine, dab_detr, deformable_detr, depth_anything

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

This comment contains run-slow, running the specified jobs:

models: ["models/auto", "models/beit", "models/bit", "models/conditional_detr", "models/convnext", "models/convnextv2", "models/d_fine", "models/dab_detr", "models/deformable_detr", "models/depth_anything"]
quantizations: []

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, beit, bit, conditional_detr, convnext, convnextv2, d_fine, dab_detr, deformable_detr

@zucchini-nlp
Copy link
Member Author

run-slow: beit, bit, conditional_detr, convnext, convnextv2, d_fine, dab_detr, deformable_detr, depth_anything, timm_backbone, detr, pp_doclayout_v3, resnet, vitpose

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 18774550 merge commit
PR ea9ac12d branch commit
main 480ed54e base commit

⚠️ No test being reported (jobs are skipped or cancelled)!

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

This comment contains run-slow, running the specified jobs:

models: ["models/beit", "models/bit", "models/conditional_detr", "models/convnext", "models/convnextv2", "models/d_fine", "models/dab_detr", "models/deformable_detr", "models/depth_anything", "models/detr", "models/pp_doclayout_v3", "models/resnet", "models/timm_backbone", "models/vitpose"]
quantizations: []

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 674802ce merge commit
PR 509ffc48 branch commit
main 480ed54e base commit

✅ No failing test specific to this PR 🎉 👏 !

@zucchini-nlp
Copy link
Member Author

Nice, slow CI passes and failing ones are just flaky

@zucchini-nlp zucchini-nlp enabled auto-merge (squash) February 4, 2026 10:27
@zucchini-nlp zucchini-nlp merged commit 0c4fe5c into huggingface:main Feb 4, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants