Attempt to fix VLM gradient enabling #41993

molbap · 2025-11-03T14:26:28Z

What does this PR do?

As per title. Linked to huggingface/peft#2880.
Follows more or less closely the already existing implementations for idefics2-3 and smolvlm, trying to cover several types of VLMs (they are named differently across the lib.)

HuggingFaceDocBuilderDev · 2025-11-03T14:36:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

githubnemo · 2025-11-03T22:42:03Z

tests/models/qwen2_vl/test_modeling_qwen2_vl.py

+            if vision_module is not None:
+                for parameter in vision_module.parameters():
+                    parameter.requires_grad = True


From my understanding of peft#2880, the problem is mainly that the entry point of the model doesn't require gradients (not a trainable parameter, just for gradient checkpointing) so that targeting modules after that doesn't work with reentrant gradient checkpointing. Isn't setting all vision parameters to requires_grad=True masking the changes done in enable_input_requires_grad and therefore always true, regardless of what that helper function does? Maybe targeting something that is clearly not an input, something resembling an attention layer for example, is better?

I see, hmm- followed the implem of idefics2/smolvlm as I remembered they faced this issue at the time. You're right that this isn't necessary, we register twice. The lowest module trick should work though, and I'm not sure targeting an attention layer works either. Currently @BenjaminBossan 's script outputs grad norms properly with gradient checkpointing enabled and PEFT disabled on this branch, so it seems to do the trick?

no GC

{'loss': 9.4971, 'grad_norm': 23.421083450317383, 'learning_rate': 2e-05, 'epoch': 0.33} {'loss': 7.9526, 'grad_norm': 675.1868896484375, 'learning_rate': 1.866666666666667e-05, 'epoch': 0.67}

with GC

{'loss': 9.4971, 'grad_norm': 23.421083450317383, 'learning_rate': 2e-05, 'epoch': 0.33} {'loss': 7.9526, 'grad_norm': 675.1868896484375, 'learning_rate': 1.866666666666667e-05, 'epoch': 0.67}

in either case, agree double registering is useless, will remove!

Yeah, I think the implementation is fine. I'm just worried that the test is masking the behavior of the fix and is therefore not honest enough. Sorry if I didn't make that clear.

No that's fair, I'll revamp the test for a narrower scope!

zucchini-nlp

I think this solution works only for VLMs and also depends a lot on how the vision model is named. I'm sure we listed all possible names, but new models can get creative with it

So I'm thinking that we could potentially make it works ootx for all MLLMS (audio/vision/omni) by checking for each PreTrainedModel within the model and then setting grads on that models' inputs (model.get_input_embeddings())

We use similar trick when setting attention implementations and check for PreTrainedModel's, so it could be a good option. WDYT?

molbap · 2025-11-07T13:18:45Z

Thanks, yes it's a far less brittle option. There's a few (really a few and hopefully should be 0 after v5) modules that were just nn.Modules instead of PreTrainedModel so they would be off the hook, other than these few exceptions should work out well, will push something like that today

githubnemo · 2025-11-09T22:30:03Z

So I'm thinking that we could potentially make it works ootx for all MLLMS (audio/vision/omni) by checking for each PreTrainedModel within the model and then setting grads on that models' inputs (model.get_input_embeddings())

We use similar trick when setting attention implementations and check for PreTrainedModel's, so it could be a good option. WDYT?

Sorry I may misunderstand the proposed solution but this doesn't seem to solve the problem? In a VLM where I target a module in the vision stack I need to have the vision model's inputs require grads, not the language model's input (get_input_embeddings).

zucchini-nlp · 2025-11-10T08:19:57Z

@githubnemo model.get_input_embeddings() will return the lowest module for each models, which is the same thing current PR does by recursing over modules. So setting grads on inputs of model.get_input_embeddings() will enable it not only for for vision-modules, but also for audio and other modalities

zucchini-nlp · 2025-11-13T14:12:15Z

BTW, when working on smth else noticed that we have code like below which can be deleted after this PR

transformers/src/transformers/models/idefics2/modeling_idefics2.py

Lines 1026 to 1030 in 80134e6

    
               def enable_input_require_grads(self): 
        
                   """ 
        
                   Enables the gradients for the input embeddings. This is useful for fine-tuning adapter weights while keeping 
        
                   the model weights fixed. 
        
                   """

molbap · 2025-11-13T14:36:09Z

Yes, for all idefics/smolvlm there'll be no need for that. Should ship that today (finally)

molbap · 2025-11-13T17:24:08Z

Iterated a bit on that and hit a dead end on idefics2/3 code, back at it tomorrow!

molbap · 2025-11-14T12:38:58Z

Tests failing appear unrelated (I rebased on main), what do you think of the new method @zucchini-nlp ? Also @githubnemo I updated the test a tad, let me know

zucchini-nlp

Love the clean-up! Only one major q about tests, would be super super cool to have a common test imo. Though I realize it can be hard with multimodals

zucchini-nlp · 2025-11-14T13:05:28Z

src/transformers/models/idefics2/modeling_idefics2.py

-    def enable_input_require_grads(self):
-        """
-        Enables the gradients for the input embeddings.
-
-        This is useful for lora when using gradient checkpointing.
-        c.f. https://github.com/huggingface/peft/issues/1402#issuecomment-1913675032


zucchini-nlp · 2025-11-14T13:08:16Z

src/transformers/modeling_utils.py

+        if hooks:
+            # for BC
+            self._require_grads_hook = hooks[0]


aren't we ignoring all hooks except for the first one in this case, i.e. when we disable it will disable the text model and will not disable vision model?

I don't think so, this is just because we used to remove _require_grads_hook, now we always iterate over the full list _require_grads_hooks (with an s) so every registered hook (vision or text or whatever) should be removed

ahh my bad, didn't see the "s" at the end

might be a bad naming then haha

zucchini-nlp · 2025-11-14T13:08:25Z

src/transformers/modeling_utils.py

+        for module in self.modules():
+            if not (isinstance(module, PreTrainedModel) and hasattr(module, "get_input_embeddings")):
+                continue
+
+            input_embeddings = module.get_input_embeddings()
+
+            if input_embeddings is None:
+                continue
+
+            embedding_id = id(input_embeddings)
+            if embedding_id in seen_modules:
+                continue
+
+            seen_modules.add(embedding_id)
+            hooks.append(input_embeddings.register_forward_hook(make_inputs_require_grads))
+


super clean!

zucchini-nlp · 2025-11-14T13:09:42Z

src/transformers/modeling_utils.py

+
+        self._require_grads_hooks = []
+        if hasattr(self, "_require_grads_hook"):
+            del self._require_grads_hook


out of curiosity, is it required to explicitly delete?

Just out of safety, not certain it's always necessary but not knowing what people were doing with that hook in their FT scripts I think it's safer to remove it so no reference remains

zucchini-nlp · 2025-11-14T13:10:43Z

tests/models/qwen2_vl/test_modeling_qwen2_vl.py

    def test_multi_gpu_data_parallel_forward(self):
        pass

+    def test_enable_input_require_grads_with_gradient_checkpointing(self):


i am thinking, if we can make a common test for all models?

eeeh... I think we should :D yes
will look before EOD if I have time

…ormers into fix_reentrant_gc_vlms

ArthurZucker

Very nice!

…ormers into fix_reentrant_gc_vlms

molbap · 2025-11-27T15:13:11Z

run-slow: bart, blip_2, idefics2, idefics3, qwen2_5_omni, qwen2_5_vl, qwen2_vl, qwen3_omni_moe, smolvlm, timm_wrapper

github-actions · 2025-11-27T15:14:15Z

This comment contains run-slow, running the specified jobs:

models: ["models/bart", "models/blip_2", "models/idefics2", "models/idefics3", "models/qwen2_5_omni", "models/qwen2_5_vl", "models/qwen2_vl", "models/qwen3_omni_moe", "models/smolvlm", "models/timm_wrapper"]
quantizations: []

github-actions · 2025-11-27T15:27:32Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

github-actions · 2025-12-01T10:24:13Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: bart, blip_2, idefics2, idefics3, qwen2_5_omni, qwen2_5_vl, qwen2_vl, qwen3_omni_moe, smolvlm, timm_wrapper

ArthurZucker

Very nice! this fixes #42489 (#42494) as well I think

attempt to fix gradients

61679f8

githubnemo reviewed Nov 3, 2025

View reviewed changes

molbap requested a review from zucchini-nlp November 6, 2025 21:20

zucchini-nlp reviewed Nov 7, 2025

View reviewed changes

Merge branch 'main' into fix_reentrant_gc_vlms

83c4c0e

molbap added 2 commits November 14, 2025 11:53

Improve tests, use PreTrainedModel hooks, cleanup

c1f0329

missing patch_embed

3e160ed

zucchini-nlp reviewed Nov 14, 2025

View reviewed changes

molbap and others added 5 commits November 17, 2025 10:15

Merge branch 'main' into fix_reentrant_gc_vlms

d775535

fix arg name

b70f0e1

Merge branch 'tinyfix_broken_glm_test' into fix_reentrant_gc_vlms

f8e7827

Merge branch 'fix_reentrant_gc_vlms' of github.com:huggingface/transf…

bd2f811

…ormers into fix_reentrant_gc_vlms

local revert

17c0db4

ArthurZucker approved these changes Nov 26, 2025

View reviewed changes

molbap and others added 7 commits November 27, 2025 10:24

Merge branch 'main' into fix_reentrant_gc_vlms

a779501

adapt BART test

5b07e43

Merge branch 'main' into fix_reentrant_gc_vlms

4db9403

Merge branch 'fix_reentrant_gc_vlms' of github.com:huggingface/transf…

b1a79e3

…ormers into fix_reentrant_gc_vlms

Merge branch 'main' into fix_reentrant_gc_vlms

6c39af2

lingering fails

1cb35ff

Merge branch 'main' into fix_reentrant_gc_vlms

f57834b

molbap added 3 commits November 28, 2025 12:09

Merge branch 'main' into fix_reentrant_gc_vlms

8b47a00

Merge branch 'main' into fix_reentrant_gc_vlms

1552922

Merge branch 'main' into fix_reentrant_gc_vlms

d69cdb6

ArthurZucker approved these changes Dec 1, 2025

View reviewed changes

ArthurZucker merged commit e2f08ea into main Dec 1, 2025
19 of 24 checks passed

ArthurZucker deleted the fix_reentrant_gc_vlms branch December 1, 2025 15:32

winglian mentioned this pull request Dec 2, 2025

handle get_input_embeddings() on models like gemma3 gracefully #42542

Open

5 tasks

molbap mentioned this pull request Dec 2, 2025

Make gradient-checkpoint enabling tolerant of models without get_input_embeddings #42558

Open

Attempt to fix VLM gradient enabling #41993

Attempt to fix VLM gradient enabling #41993

Conversation

molbap commented Nov 3, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

molbap commented Nov 7, 2025

Uh oh!

githubnemo commented Nov 9, 2025

Uh oh!

zucchini-nlp commented Nov 10, 2025

Uh oh!

zucchini-nlp commented Nov 13, 2025

Uh oh!

molbap commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

molbap commented Nov 13, 2025

Uh oh!

molbap commented Nov 14, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

molbap commented Nov 27, 2025

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

github-actions bot commented Nov 27, 2025

CI Results

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

zucchini-nlp left a comment •

edited

Loading

molbap commented Nov 13, 2025 •

edited

Loading