Skip to content

hooks/pyramid_attention_broadcast: fix redundant recompute at iteration 0 and free stale cache when outside timestep range#13467

Closed
GitGlimpse895 wants to merge 3 commits intohuggingface:mainfrom
GitGlimpse895:fix/pab-cache-logic
Closed

hooks/pyramid_attention_broadcast: fix redundant recompute at iteration 0 and free stale cache when outside timestep range#13467
GitGlimpse895 wants to merge 3 commits intohuggingface:mainfrom
GitGlimpse895:fix/pab-cache-logic

Conversation

@GitGlimpse895
Copy link
Copy Markdown

@GitGlimpse895 GitGlimpse895 commented Apr 14, 2026

What does this PR do?

Fixes two bugs in PyramidAttentionBroadcastHook.new_forward:

  1. Redundant iteration == 0 conditionself.state.cache is None already
    covers the first-call case after every reset_state, making the extra guard
    dead code that creates a misleading impression of two independent invariants.

  2. Stale cache leaking GPU VRAM — when outside the active timestep range,
    the hook was still writing self.state.cache = output, holding a full
    hidden-state activation tensor on GPU until the next generation's
    reset_state call. For video transformers with dozens of PAB-hooked layers
    this accumulates hundreds of MBs of unreleased VRAM. The fix sets
    self.state.cache = None immediately when outside the range.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you read our philosophy doc (important for complex PRs)?
  • Was this discussed/approved via a GitHub issue or the forum?
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@yiyixuxu @sayakpaul @DN6

@github-actions github-actions bot added hooks size/S PR with diff < 50 LOC labels Apr 14, 2026
@sayakpaul sayakpaul requested a review from DN6 April 14, 2026 04:53
@github-actions github-actions bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 16, 2026
@github-actions github-actions bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 17, 2026
…on 0 and free stale cache when outside timestep range
@github-actions github-actions bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 17, 2026
@GitGlimpse895
Copy link
Copy Markdown
Author

@sayakpaul @DN6 — Checking in on this! I've updated the branch to stay current with the base. Could you please take a look and also approve the workflows so the CI tests can run?

@sayakpaul
Copy link
Copy Markdown
Member

@claude could you do a first round of reviews?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 17, 2026

Claude finished @sayakpaul's task in 1m 27s —— View job


Review of PR #13467

  • Read repository review guidelines
  • Read the PR diff
  • Read full file for context
  • Post review feedback

Thanks for the PR. I walked through the logic for both changes — here's my review.

Change 1: Removing self.state.iteration == 0

Correct. After reset_state() (line 125–126), both iteration and cache are reset to 0 and None respectively. So at iteration 0, self.state.cache is None is always True, making self.state.iteration == 0 dead code. Good cleanup.

Change 2: Freeing cache when outside timestep range

Correct and well-reasoned. I traced the transition scenarios:

  • Outside→inside range: cache is Noneself.state.cache is None triggers compute → cache gets populated. ✓
  • Inside→outside range: not is_within_timestep_range triggers compute → cache set to None. ✓
  • Staying inside range: cache skip logic works as before, self.state.cache = output updates on compute. ✓
  • Staying outside range: every call computes fresh (because not is_within_timestep_range is always True), cache stays None. ✓

Also note: in the original else branch, output = self.state.cache followed by self.state.cache = output was a no-op, so removing the unconditional assignment is clean.

Nit: missing blank line (style)

The diff removes a blank line at line 183 of the original, reducing the gap between the end of PyramidAttentionBroadcastHook and the module-level apply_pyramid_attention_broadcast function from two blank lines to one. PEP 8 / ruff E302 expects two blank lines before a top-level definition. Running make style should catch this — please run it before merge.

Fix this →

Verdict

Logic is sound, VRAM improvement is real. One minor style fix needed (blank line). LGTM otherwise.


@github-actions github-actions bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 17, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions github-actions bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 17, 2026
@sayakpaul
Copy link
Copy Markdown
Member

Any reason why this was closed?

@GitGlimpse895
Copy link
Copy Markdown
Author

@sayakpaul So sorry about that — I accidentally closed it myself while
trying to update the branch. My sincerest apologies for the confusion
and the noise! I've opened a new PR with the same fix here: #13497.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hooks size/S PR with diff < 50 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants