-
Notifications
You must be signed in to change notification settings - Fork 25.6k
CUDA CachingHostAllocator tracks registrations to call correct free #146520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Users may change the allocator config at will. torch unit tests do this. However, allocations using cudaHostRegister should use corresonding cudaHostUnregister and similarly for cudaHostAlloc / cudaFreeHost.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146520
Note: Links to docs will display an error until the docs builds have been completed. ⏳ 1 Pending, 2 Unrelated FailuresAs of commit b6ad576 with merge base fa0fdc0 ( UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Notably ROCm pytorch would fail to run test_cuda.py if run as a single module without sharding. The root cause was a sequence of tests changing the allocator config resulting in the host allocator's empty_cache() causing a seg fault due to a mix of allocating using hipHostMalloc() followed by hipHostUnregister(). |
Alternative approaches would be to have the host caching allocator empty itself whenever the allocator config changes. Or have the unit tests empty the cache to ensure consistent state. |
@zdevito can I get a review and opinion on this approach vs others suggested? |
@mikaylagawarecki / @zdevito ping -- still waiting for a review |
@ngimel perhaps could I get your opinion on this PR since it effects CUDA too? |
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Rebase failed due to Command
Raised by https://github.com/pytorch/pytorch/actions/runs/14207074188 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…ytorch#146520) Allocations using cudaHostRegister should use corresponding cudaHostUnregister and similarly for cudaHostAlloc / cudaFreeHost. In test_cuda.py, the allocator config will change from test to test but the cache is not emptied prior to changing the config. This results in the wrong free being called later. Unit test sharding is avoiding this issue, but running the test_cuda.py with a single shard will fail. The following reproducer demonstrates the problem. ```C++ int main(int argc, char **argv) { void *ptr; assert(cudaSuccess == cudaHostAlloc(&ptr, 1024, cudaHostAllocDefault)); assert(cudaSuccess == cudaHostUnregister(ptr)); std::free(ptr); return 0; } ``` The above code results in the following failure because the ptr is an invalid argument to cudaHostUnregister. ``` a.out: test.cpp:53: int main(int, char**): Assertion `cudaSuccess == cudaHostUnregister(ptr)' failed. ``` Pull Request resolved: pytorch#146520 Approved by: https://github.com/ngimel
…ytorch#146520) Allocations using cudaHostRegister should use corresponding cudaHostUnregister and similarly for cudaHostAlloc / cudaFreeHost. In test_cuda.py, the allocator config will change from test to test but the cache is not emptied prior to changing the config. This results in the wrong free being called later. Unit test sharding is avoiding this issue, but running the test_cuda.py with a single shard will fail. The following reproducer demonstrates the problem. ```C++ int main(int argc, char **argv) { void *ptr; assert(cudaSuccess == cudaHostAlloc(&ptr, 1024, cudaHostAllocDefault)); assert(cudaSuccess == cudaHostUnregister(ptr)); std::free(ptr); return 0; } ``` The above code results in the following failure because the ptr is an invalid argument to cudaHostUnregister. ``` a.out: test.cpp:53: int main(int, char**): Assertion `cudaSuccess == cudaHostUnregister(ptr)' failed. ``` Pull Request resolved: pytorch#146520 Approved by: https://github.com/ngimel
Allocations using cudaHostRegister should use corresponding cudaHostUnregister and similarly for cudaHostAlloc / cudaFreeHost. In test_cuda.py, the allocator config will change from test to test but the cache is not emptied prior to changing the config. This results in the wrong free being called later. Unit test sharding is avoiding this issue, but running the test_cuda.py with a single shard will fail.
The following reproducer demonstrates the problem.
The above code results in the following failure because the ptr is an invalid argument to cudaHostUnregister.