New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA graphs] Make sure graph mempool cudaMalloc_count decrement pairs with cudaFree for all allocations #61567
Conversation
💊 CI failures summary and remediationsAs of commit ef19382 (more details on the Dr. CI page and at hud.pytorch.org/pr/61567):
🕵️ 3 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages: pytorch_xla_linux_bionic_py3_6_clang9_build (1/3)Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)
|
Job | Step | Action |
---|---|---|
Lint / quick-checks | Ensure no trailing spaces | 🔁 rerun |
Lint / shellcheck | Assert that regenerating the workflows didn't change them | 🔁 rerun |
Preview docs built from this PR
This comment was automatically generated by Dr. CI (expand for details).
Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group.
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Graphs mempools aren't deleted until all their allocations are cudaFreed.
PrivatePool::cudaMalloc_count
tracks the number of outstanding (not-yet-cudaFreed) allocations.#44742 moves cudaFree to release_block, while the
cudaMalloc_count
decrement (if needed) remains in a caller (release_blocks). But I noticed there's also a path (release_available_cached_blocks) that callsrelease_block
without callingrelease_blocks
, in other words, it calls cudaFree but dodges any potentialcudaMalloc_count
decrement.In practice, the way the code is currently organized, I don't think this second path can cause the pool to become a zombie whose
cudaMalloc_count
will never reach zero (I think this could only happen if you callrelease_available_cached_blocks
on a private pool, and the only way it would be called on a private pool is if capture is underway, and if capture is underway, the cudaFree call will hard error). Regardless, I feel much more comfortable keeping the cudaMalloc_count decrement right next to the cudaFree.