hooks/pyramid_attention_broadcast: fix redundant recompute at iteration 0 and free stale cache when outside timestep range#13467
Conversation
809742d to
7f2a8da
Compare
7f2a8da to
227b167
Compare
…on 0 and free stale cache when outside timestep range
227b167 to
eb83319
Compare
|
@sayakpaul @DN6 — Checking in on this! I've updated the branch to stay current with the base. Could you please take a look and also approve the workflows so the CI tests can run? |
|
@claude could you do a first round of reviews? |
|
Claude finished @sayakpaul's task in 1m 27s —— View job Review of PR #13467
Thanks for the PR. I walked through the logic for both changes — here's my review. Change 1: Removing
|
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Any reason why this was closed? |
|
@sayakpaul So sorry about that — I accidentally closed it myself while |
What does this PR do?
Fixes two bugs in
PyramidAttentionBroadcastHook.new_forward:Redundant
iteration == 0condition —self.state.cache is Nonealreadycovers the first-call case after every
reset_state, making the extra guarddead code that creates a misleading impression of two independent invariants.
Stale cache leaking GPU VRAM — when outside the active timestep range,
the hook was still writing
self.state.cache = output, holding a fullhidden-state activation tensor on GPU until the next generation's
reset_statecall. For video transformers with dozens of PAB-hooked layersthis accumulates hundreds of MBs of unreleased VRAM. The fix sets
self.state.cache = Noneimmediately when outside the range.Fixes # (issue)
Before submitting
Who can review?
@yiyixuxu @sayakpaul @DN6