Skip to content

Decorate cache updates with no_grad, just in case#43897

Merged
ArthurZucker merged 2 commits intomainfrom
no_cache_grad
Feb 11, 2026
Merged

Decorate cache updates with no_grad, just in case#43897
ArthurZucker merged 2 commits intomainfrom
no_cache_grad

Conversation

@Rocketknight1
Copy link
Member

Although our cache update methods are usually used in inference, when grad is disabled anyway, there seem to be some edge cases where they cause problems with compilation and gradient computation. Since we never want to propagate gradient through these steps, adding @no_grad decorators shouldn't hurt, and fixes the edge cases.

Fixes #43010

@Rocketknight1 Rocketknight1 marked this pull request as ready for review February 10, 2026 17:17
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: bamba, falcon_h1, falcon_mamba, granitemoehybrid, jamba, mamba, mamba2

@Rocketknight1
Copy link
Member Author

Not sure who the right person to ping for generate/caches is now so cc @remi-or @ArthurZucker @McPatate

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty

@ArthurZucker ArthurZucker merged commit 6c4710b into main Feb 11, 2026
24 of 27 checks passed
@ArthurZucker ArthurZucker deleted the no_cache_grad branch February 11, 2026 10:29
@Cyrilvallez
Copy link
Member

Once again the test fetcher bites us... This fails a few test_retain_grad_hidden_states_attentions and test_training_ci
cc @ArthurZucker @ydshieh @tarekziade

@Rocketknight1 Rocketknight1 mentioned this pull request Feb 11, 2026
Cyrilvallez pushed a commit that referenced this pull request Feb 11, 2026
* Remove the no_grad decorator when we actually create new tensors

* Just fully revert that PR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache's (and Layer's) update(...) method to be decorated with @torch.no_grad

4 participants