[ET-VK] Lower reduce_peak_memory threshold from 500 MB to 10 MB#18816
[ET-VK] Lower reduce_peak_memory threshold from 500 MB to 10 MB#18816SS-JIA wants to merge 1 commit intogh/SS-JIA/519/basefrom
Conversation
During prepack, staging buffers accumulate in buffers_to_clear_ until flush() is called. Previously, the reduce_peak_memory path (which calls submit_and_wait + flush to free staging buffers incrementally) only triggered when total constant data exceeded 500 MB. This meant models with moderate weight sizes (e.g. 42 MB) never benefited from incremental cleanup, causing all staging buffers to coexist in memory until the final flush. Lowering the threshold to 10 MB enables incremental staging buffer cleanup for most models. On SceneX V9 FP16 (42 MB weights, Samsung S24 Adreno 750), this reduces transient VMA peak during prepack from 89.6 MB to 57.3 MB (-36%) at a cost of ~15 ms additional load latency (+4.4%). Steady-state memory and inference performance are unaffected. Authored with Claude. Differential Revision: [D100332227](https://our.internmc.facebook.com/intern/diff/D100332227/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18816
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 2 Pending, 3 Unrelated FailuresAs of commit a0e7458 with merge base 930ecfd ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
During prepack, staging buffers accumulate in buffers_to_clear_ until flush() is called. Previously, the reduce_peak_memory path (which calls submit_and_wait + flush to free staging buffers incrementally) only triggered when total constant data exceeded 500 MB. This meant models with moderate weight sizes (e.g. 42 MB) never benefited from incremental cleanup, causing all staging buffers to coexist in memory until the final flush. Lowering the threshold to 10 MB enables incremental staging buffer cleanup for most models. On SceneX V9 FP16 (42 MB weights, Samsung S24 Adreno 750), this reduces transient VMA peak during prepack from 89.6 MB to 57.3 MB (-36%) at a cost of ~15 ms additional load latency (+4.4%). Steady-state memory and inference performance are unaffected. Authored with Claude. Differential Revision: [D100332227](https://our.internmc.facebook.com/intern/diff/D100332227/) ghstack-source-id: 365456277 Pull Request resolved: #18816
This PR needs a
|
Stack from ghstack (oldest at bottom):
During prepack, staging buffers accumulate in buffers_to_clear_ until
flush() is called. Previously, the reduce_peak_memory path (which calls
submit_and_wait + flush to free staging buffers incrementally) only
triggered when total constant data exceeded 500 MB. This meant models
with moderate weight sizes (e.g. 42 MB) never benefited from incremental
cleanup, causing all staging buffers to coexist in memory until the
final flush.
Lowering the threshold to 10 MB enables incremental staging buffer
cleanup for most models. On SceneX V9 FP16 (42 MB weights, Samsung S24
Adreno 750), this reduces transient VMA peak during prepack from 89.6 MB
to 57.3 MB (-36%) at a cost of ~15 ms additional load latency (+4.4%).
Steady-state memory and inference performance are unaffected.
Authored with Claude.
Differential Revision: D100332227