🚨 Delete duplicate code in backbone utils#43323
🚨 Delete duplicate code in backbone utils#43323zucchini-nlp merged 43 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43323&sha=754b61 |
| config = MaskFormerConfig(backbone="microsoft/resnet-50", use_pretrained_backbone=True) | ||
| config = MaskFormerConfig(backbone="microsoft/resnet-50") | ||
| model = MaskFormerForInstanceSegmentation(config) |
There was a problem hiding this comment.
imo it doesn't serve much purpose, loading a random init model with pretrained backbone. User still has to tune the model so it can be used
Therefore I deleted this feature. Pretrained weights are loaded from_pretrained and random weights from_config, same way as any other model
| ```py | ||
| from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation | ||
|
|
||
| config = MaskFormerConfig(backbone="resnet50", use_timm_backbone=True, use_pretrained_backbone=True) |
There was a problem hiding this comment.
use_timm_backbone is not really needed imo. We can infer if the requested checkpoint is from timm or HF by checking if repo exists on the hub with a valid config
Deleted it as well as a redudant arg
| self._out_features, self._out_indices = get_aligned_output_features_output_indices( | ||
| out_features=out_features, out_indices=out_indices, stage_names=self.stage_names | ||
| ) | ||
| out_indices = list(out_indices) if out_indices is not None else None | ||
| self._out_features, self._out_indices = out_features, out_indices | ||
| self.align_output_features_output_indices() |
There was a problem hiding this comment.
The feature-index aligning happens in Mixin when we call align_output_features_output_indices
There was a problem hiding this comment.
Nit: Can we set self._out_features and self. _out_indices in align_output_features_output_indices(out_features, out_indices)? To only call self.align_output_features_output_indices(out_features, out_indices) in backbone configs instead of these three lines. It would also simplify the setters
There was a problem hiding this comment.
sure, I actually added a set_output_features_output_indices as well so we can use that to "set+align" values
|
run-slow: auto, beit, bit, conditional_detr, convnext, convnextv2, d_fine, dab_detr, deformable_detr, depth_anything, detr, timm_backbone |
|
This comment contains models: ["models/auto", "models/beit", "models/bit", "models/conditional_detr", "models/convnext", "models/convnextv2", "models/d_fine", "models/dab_detr", "models/deformable_detr", "models/depth_anything", "models/detr", "models/timm_backbone"] |
|
Hey! Very nice initiative, indeed backbones have been quite annoying and have no reason to exist at all in general (we can simply do usual composite models)! Especially the
I believe we need to remove it completely - otherwise the loading is being done in the |
| class BackboneConfigMixin(BackboneConfigMixin): | ||
| warnings.warn( | ||
| "Importing `BackboneConfigMixin` from `utils/backbone_utils.py` is deprecated and will be removed in " | ||
| "Transformers v5.10. Import as `from transformers.modeling_backbone_utils import BackboneConfigMixin` instead.", | ||
| FutureWarning, |
There was a problem hiding this comment.
I don't really mind the renaming in general, but not sure if it's really needed
There was a problem hiding this comment.
you mean moving to a different place? It was hitting a circular import with PreTrainedConfig, atm I imported it lazily in the new file so it doesn't get imported at the top
There was a problem hiding this comment.
No I meant you renamed backbone_utils -> modeling_backbone_utils haha
There was a problem hiding this comment.
oh that! No reason behind, just a matter of taste. Can rename it back for sure
There was a problem hiding this comment.
Yes I believe it's a little easier as we want to deprecate the backbone api, then it avoids maintaining yet another BC entry point!
| """ | ||
| ) | ||
| class BitBackbone(BitPreTrainedModel, BackboneMixin): | ||
| class BitBackbone(BackboneMixin, BitPreTrainedModel): |
There was a problem hiding this comment.
Why do we need to switch the order here? Cause of __init__?
There was a problem hiding this comment.
yeah, to init the backbone stuff first and then call BitPreTrainedModel.__init__ from within it
|
In general we want to move away from the backbone API as much as possible in favor of standard composite models so this will help 🤗 |
100% , that is my goal for the subsequent PR. This is already hard to manage with gh conflicts and models, so we need to first clean-up extra kwargs and keep it as
It is removed already completely. We pop the kwarg and never use it, it's hardcoded as |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Alright very nice! Super happy to gradually move away from the backbone API! Feel free to merge once CI is back to green (after the conflict handling issues)
|
run-slow: auto, beit, bit, conditional_detr, convnext, convnextv2, d_fine, dab_detr, deformable_detr, depth_anything |
|
This comment contains models: ["models/auto", "models/beit", "models/bit", "models/conditional_detr", "models/convnext", "models/convnextv2", "models/d_fine", "models/dab_detr", "models/deformable_detr", "models/depth_anything"] |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, beit, bit, conditional_detr, convnext, convnextv2, d_fine, dab_detr, deformable_detr |
|
run-slow: beit, bit, conditional_detr, convnext, convnextv2, d_fine, dab_detr, deformable_detr, depth_anything, timm_backbone, detr, pp_doclayout_v3, resnet, vitpose |
|
This comment contains models: ["models/beit", "models/bit", "models/conditional_detr", "models/convnext", "models/convnextv2", "models/d_fine", "models/dab_detr", "models/deformable_detr", "models/depth_anything", "models/detr", "models/pp_doclayout_v3", "models/resnet", "models/timm_backbone", "models/vitpose"] |
|
Nice, slow CI passes and failing ones are just flaky |
What does this PR do?
This PR cleans up backbone utilities. Specifically, we have currently 5 different config attr to decide which backbone to load, most of which can be merged into one and seem redundant
After this PR, we'll have only one
config.backbone_configas a single source of truth. The models will load the backbonefrom_configand load pretrained weights only if the checkpoint has any weights saved. The overall idea is same as in other composite modelsI removed these config attr:
backbone- the backbone model id is now used to create abackbone_configby loading it from the hub or from timmbackbone_kwargs- it is used to updatebackbone_configwith user-provided kwargs (i.e.backbone_config = CONFIG_MAPPING[model_type](**backbone_kwargs))use_pretrained_backbone- we don't load a pretrained backbone anymore unless the user is callingfrom_pretrained. The default is to initialize a model with random weights and let the users either tune it from scratch or load pretrained weights themselvesuse_timm_backbone- we can infer model type from config and the requested backbone type, so this arg was redundantAlong the way, I also updated the tests and docs. Recommended review path:
modeling_backbone_utils.py->auto_factory.py->timm backbone model files-> couple other models of your choice