CUDA CachingHostAllocator tracks registrations to call correct free #146520

jeffdaily · 2025-02-05T21:19:31Z

Allocations using cudaHostRegister should use corresponding cudaHostUnregister and similarly for cudaHostAlloc / cudaFreeHost. In test_cuda.py, the allocator config will change from test to test but the cache is not emptied prior to changing the config. This results in the wrong free being called later. Unit test sharding is avoiding this issue, but running the test_cuda.py with a single shard will fail.

The following reproducer demonstrates the problem.

int main(int argc, char **argv)
{
    void *ptr;
    assert(cudaSuccess == cudaHostAlloc(&ptr, 1024, cudaHostAllocDefault));
    assert(cudaSuccess == cudaHostUnregister(ptr));
    std::free(ptr);
    return 0;
}

The above code results in the following failure because the ptr is an invalid argument to cudaHostUnregister.

a.out: test.cpp:53: int main(int, char**): Assertion `cudaSuccess == cudaHostUnregister(ptr)' failed.

Users may change the allocator config at will. torch unit tests do this. However, allocations using cudaHostRegister should use corresonding cudaHostUnregister and similarly for cudaHostAlloc / cudaFreeHost.

pytorch-bot · 2025-02-05T21:19:37Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146520

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 2 Unrelated Failures

As of commit b6ad576 with merge base fa0fdc0 ():

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.4-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) (gh) (#149370)
REGRESSION: benchmark ('aotdispatcher_training_subclass_cpu', 'compile_time_instruction_count') failed, actual result 9986415467 is 1.59% higher than expected 9830000000 ±+1.50% if this is an expected regression, please update the expected results.
pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, ephemeral.linux.2xlarge) (gh) (#144480)
examples/models/llama3_2_vision/text_decoder/test/test_text_decoder.py::TextDecoderTest::test_llama3_2_text_decoder_aoti

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jeffdaily · 2025-02-05T21:21:19Z

Notably ROCm pytorch would fail to run test_cuda.py if run as a single module without sharding. The root cause was a sequence of tests changing the allocator config resulting in the host allocator's empty_cache() causing a seg fault due to a mix of allocating using hipHostMalloc() followed by hipHostUnregister().

jeffdaily · 2025-02-21T17:52:47Z

Alternative approaches would be to have the host caching allocator empty itself whenever the allocator config changes. Or have the unit tests empty the cache to ensure consistent state.

jeffdaily · 2025-03-11T23:23:58Z

@zdevito can I get a review and opinion on this approach vs others suggested?

jeffdaily · 2025-03-19T20:40:23Z

@mikaylagawarecki / @zdevito ping -- still waiting for a review

jeffdaily · 2025-04-01T22:11:45Z

@ngimel perhaps could I get your opinion on this PR since it effects CUDA too?

jeffdaily · 2025-04-01T22:12:53Z

@pytorchbot rebase

pytorchmergebot · 2025-04-01T22:14:27Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-04-01T22:14:29Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/146520/head returned non-zero exit code 1

Rebasing (1/1)
Auto-merging aten/src/ATen/cuda/CachingHostAllocator.cpp
CONFLICT (content): Merge conflict in aten/src/ATen/cuda/CachingHostAllocator.cpp
error: could not apply 6ff673ddd1c... CUDA CachingHostAllocator tracks registrations to call correct free
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 6ff673ddd1c... CUDA CachingHostAllocator tracks registrations to call correct free

Raised by https://github.com/pytorch/pytorch/actions/runs/14207074188

ngimel

Good catch!

aten/src/ATen/cuda/CachingHostAllocator.cpp

jeffdaily · 2025-04-03T17:06:54Z

@pytorchbot merge

pytorchmergebot · 2025-04-03T17:09:11Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ytorch#146520) Allocations using cudaHostRegister should use corresponding cudaHostUnregister and similarly for cudaHostAlloc / cudaFreeHost. In test_cuda.py, the allocator config will change from test to test but the cache is not emptied prior to changing the config. This results in the wrong free being called later. Unit test sharding is avoiding this issue, but running the test_cuda.py with a single shard will fail. The following reproducer demonstrates the problem. ```C++ int main(int argc, char **argv) { void *ptr; assert(cudaSuccess == cudaHostAlloc(&ptr, 1024, cudaHostAllocDefault)); assert(cudaSuccess == cudaHostUnregister(ptr)); std::free(ptr); return 0; } ``` The above code results in the following failure because the ptr is an invalid argument to cudaHostUnregister. ``` a.out: test.cpp:53: int main(int, char**): Assertion `cudaSuccess == cudaHostUnregister(ptr)' failed. ``` Pull Request resolved: pytorch#146520 Approved by: https://github.com/ngimel

CUDA CachingHostAllocator tracks registrations to call correct free

6ff673d

Users may change the allocator config at will. torch unit tests do this. However, allocations using cudaHostRegister should use corresonding cudaHostUnregister and similarly for cudaHostAlloc / cudaFreeHost.

jeffdaily requested review from eqy and syed-ahmed as code owners February 5, 2025 21:19

pytorchbot added the open source label Feb 5, 2025

jeffdaily added the topic: not user facing topic category label Feb 5, 2025

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 7, 2025

mikaylagawarecki requested a review from zdevito February 7, 2025 17:55

Merge branch 'main' into cuda_host_allocator_free

fb5482f

pruthvistony added rocm This tag is for PRs from ROCm team rocm priority high priority ROCm PRs from performance or other aspects ciflow/rocm Trigger "default" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Mar 21, 2025

ngimel approved these changes Apr 2, 2025

View reviewed changes

aten/src/ATen/cuda/CachingHostAllocator.cpp Outdated Show resolved Hide resolved

jeffdaily added 2 commits April 3, 2025 17:02

Merge branch 'main' into cuda_host_allocator_free

164777b

use TORCH_INTERNAL_ASSERT_DEBUG_ONLY to reduce overhead

b6ad576

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 3, 2025

pytorchmergebot added the merging label Apr 3, 2025

pytorchmergebot added the Merged label Apr 3, 2025

pytorchmergebot closed this in 9e55dae Apr 3, 2025

pytorchmergebot removed the merging label Apr 3, 2025

CUDA CachingHostAllocator tracks registrations to call correct free #146520

CUDA CachingHostAllocator tracks registrations to call correct free #146520

Uh oh!

Conversation

jeffdaily commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146520

⏳ 1 Pending, 2 Unrelated Failures

Uh oh!

jeffdaily commented Feb 5, 2025

Uh oh!

jeffdaily commented Feb 21, 2025

Uh oh!

jeffdaily commented Mar 11, 2025

Uh oh!

jeffdaily commented Mar 19, 2025

Uh oh!

jeffdaily commented Apr 1, 2025

Uh oh!

jeffdaily commented Apr 1, 2025

Uh oh!

pytorchmergebot commented Apr 1, 2025

Uh oh!

pytorchmergebot commented Apr 1, 2025

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeffdaily commented Apr 3, 2025

Uh oh!

pytorchmergebot commented Apr 3, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jeffdaily commented Feb 5, 2025 •

edited

Loading

pytorch-bot bot commented Feb 5, 2025 •

edited

Loading