Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Allow construction of cuda_async_memory_resource from existing pool #889

Merged
merged 10 commits into from
Mar 23, 2022

Conversation

fkallen
Copy link
Contributor

@fkallen fkallen commented Oct 11, 2021

Adds a new MR type cuda_async_view_memory_resource which has a constructor cuda_async_view_memory_resource(cudaMemPool_t valid_pool_handle) . The memory resource will use this pool for allocation and deallocation instead of managing its own pool.

Refactors cuda_async_memory_resource to have an instance of the above and create it with a cudaMemPool_t that it owns.

@fkallen fkallen requested a review from a team as a code owner October 11, 2021 20:01
@fkallen fkallen requested a review from rongou October 11, 2021 20:01
@GPUtester
Copy link
Contributor

Can one of the admins verify this patch?

1 similar comment
@GPUtester
Copy link
Contributor

Can one of the admins verify this patch?

@github-actions github-actions bot added the cpp Pertains to C++ code label Oct 11, 2021
@jrhemstad
Copy link
Contributor

This makes me uncomfortable. I don't like having a type that is "sometimes owning, sometimes not owning".

I'd rather see a new resource for wrapping an existing cudaMemPool_t.

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly doc changes.

include/rmm/mr/device/cuda_async_memory_resource.hpp Outdated Show resolved Hide resolved
include/rmm/mr/device/cuda_async_memory_resource.hpp Outdated Show resolved Hide resolved
@harrism harrism added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 11, 2021
@harrism harrism added this to PR-WIP in v21.12 Release via automation Oct 11, 2021
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this with @jrhemstad and we don't think we should have classes with unclear ownership of resources like this. Perhaps this class should always own its pool. So if you pass in an existing pool, this class should take ownership, and destroy the pool in its destructor. But we may still need a version of async_memory_resource that does not own its pool, and if so, then maybe that should be a different class. I'm interested in discussion on this point.

v21.12 Release automation moved this from PR-WIP to PR-Needs review Oct 11, 2021
@fkallen
Copy link
Contributor Author

fkallen commented Oct 18, 2021

I see, that's a good point. The owning resource could be the cuda_async_memory_resource from this PR with the is_owner_of_pool_ flag removed. It should take ownership of the pool which is used in the constructor. One has to check that the pool is not the default pool of the current device since it cannot be destroyed via cudaMemPoolDestroy.

Do you have a name in mind for the non-owning class? How about cuda_async_non_owning_memory_resource ?
The non-owning class can be used with the default memory pool.

@jrhemstad
Copy link
Contributor

Do you have a name in mind for the non-owning class? How about cuda_async_non_owning_memory_resource ?

Hm, in some sense a resource that wraps an existing cudaMemPool_t is kind of like a resource adaptor.

The difference is the "upstream" isn't a device_memory_resource, but the CUDA pool represented by the cudaMemPool_t.

So we could call it cuda_pool_adaptor, but that could be misleading as all the other adaptor types adapt a device_memory_resource upstream.

cuda_pool_wrapper could be a good pick as its still descriptive and shouldn't be confused with other adaptor types.

Also, we should refactor the current cuda_async_memory_resource to be implemented in terms of the cuda_pool_wrapper type, e.g.,

class cuda_async_memory_resource final : public device_memory_resource {

 cuda_pool_wrapper pool_;

 public:

  cuda_async_memory_resource(thrust::optional<std::size_t> initial_pool_size = {},
                             thrust::optional<std::size_t> release_threshold = {})
  {
    ...
    RMM_CUDA_TRY(cudaMemPoolCreate(&cuda_pool_handle_, &pool_props));
    pool_ = cuda_pool_wrapper(cuda_pool_handle);
   ...
  }

  void* do_allocate(std::size_t bytes, rmm::cuda_stream_view stream) override
  {
#ifdef RMM_CUDA_MALLOC_ASYNC_SUPPORT
   return pool_.allocate(bytes, stream);
#else
    (void)bytes;
    (void)stream;
    return nullptr;
#endif
  }

@harrism
Copy link
Member

harrism commented Oct 19, 2021

Another approach could be to have an owning cuda_pool type that is simply a RAII wrapper for cudaMemPool_t, and a non-owning cuda_pool_view type that can be constructed either from a cuda_pool or a cudaMemPool_t. Then have cuda_async_memory_resource take a cuda_pool_view and never own the pool.

@jrhemstad
Copy link
Contributor

Another approach could be to have an owning cuda_pool type that is simply a RAII wrapper for cudaMemPool_t, and a non-owning cuda_pool_view type that can be constructed either from a cuda_pool or a cudaMemPool_t. Then have cuda_async_memory_resource take a cuda_pool_view and never own the pool.

I don't think that's the direction we want to go because that introduces another object that a caller would have to keep alive outside of the usual device_memory_resource hierarchy.

@fkallen
Copy link
Contributor Author

fkallen commented Oct 19, 2021

Implemented the idea of cuda_pool_wrapper.

Do you think cuda_async_mr should check if the pool which should be owned is a default pool (since it cannot be destroyed), or leave it to the user? If yes, the current check will not be sufficient. It could still be a pool of a different device with access enabled for the current device.

Should there be a requirement for the device location of the pool? Does it have to be the same device on which the memory resource is used?

* @param valid_pool_handle Handle to a CUDA memory pool which will be used to
* serve allocation requests.
*/
cuda_async_memory_resource(cudaMemPool_t valid_pool_handle)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this ctor with the cuda_pool_wrapper?

@harrism
Copy link
Member

harrism commented Oct 19, 2021

This still feels clunky to me. I think it's mostly because of the name cuda_pool_wrapper. The name is vague. To me a pool wrapper sounds like it's not a memory_resource, but just some RAII wrapper for something to be owned. Especially confusing since we have another resource in RMM called an owning_wrapper. Memory_resources so far always have memory_resource or resource_adaptor in the name. cuda_pool_wrapper does exactly what cuda_async_memory_resource does except it doesn't own the pool. The only difference between the two resources is one is owning and the other is non-owning. So I think the naming should reflect that (e.g. shared vs. unique, or somehow make one a view, etc).

@harrism
Copy link
Member

harrism commented Nov 10, 2021

Due to open discussion, I'm moving this to the next release. @fkallen can you merge rapidsai:branch-22.02 into your branch in order to target the right branch?

@harrism harrism removed this from PR-Needs review in v21.12 Release Nov 10, 2021
@harrism harrism added this to PR-WIP in v22.02 Release via automation Nov 10, 2021
@harrism harrism changed the base branch from branch-21.12 to branch-22.02 November 10, 2021 20:52
@fkallen
Copy link
Contributor Author

fkallen commented Jan 11, 2022

@harrism Sorry, I did not focus on this PR. Thanks for reminding me.
I agree with your opinions on the naming of the new class. I will change cuda_pool_wrapper to cuda_async_view_memory_resource .

I have more questions. At the moment I have added the constructor cuda_async_memory_resource(cudaMemPool_t) which takes ownership of the pool. Do we still need this if we have a dedicated view type? My original intention was to be able to use a raw cudaMemPool_t with rmm. Now, the view will be sufficient.

@harrism
Copy link
Member

harrism commented Jan 11, 2022

I agree, I think the view is sufficient. No need for the constructor that takes ownership.

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just a few unnecessary blank lines.

tests/mr/device/cuda_async_mr_tests.cpp Outdated Show resolved Hide resolved
tests/mr/device/cuda_async_view_mr_tests.cpp Outdated Show resolved Hide resolved
tests/mr/device/cuda_async_view_mr_tests.cpp Outdated Show resolved Hide resolved
v22.04 Release automation moved this from PR-WIP to PR-Reviewer approved Jan 12, 2022
Co-authored-by: Mark Harris <mharris@nvidia.com>
@github-actions
Copy link

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

@harrism harrism changed the base branch from branch-22.02 to branch-22.04 February 15, 2022 02:26
@harrism
Copy link
Member

harrism commented Feb 15, 2022

ok to test

@harrism
Copy link
Member

harrism commented Feb 25, 2022

@fkallen can you fix the style failures? Helps to run clang-format locally (e.g. enable format-on-save).

@fkallen
Copy link
Contributor Author

fkallen commented Feb 25, 2022

I hope this fixes the issues. I could not use AlignConsecutiveBitFields and AllowShortEnumsOnASingleLine with clang-format 10 on my machine.

@harrism
Copy link
Member

harrism commented Mar 21, 2022

@fkallen we've merged some major changes from @robertmaynard which now load the symbols for cudaMallocAsync and related functions using dlopen. See #990 . Unfortunately this means your PR may need some changes, or at least to resolve conflicts. We only have two days before 22.04 code freeze. If you have time to try to get this working before then it may still make it, but we need to move fast, otherwise we need to slip to the next release.

@fkallen
Copy link
Contributor Author

fkallen commented Mar 22, 2022

I have merged your changes. In tests/mr/device/cuda_async_view_mr_tests.cpp I have disabled a test because cudaDeviceGetDefaultMemPool is not available using the new mechanism. This should not be an issue since the view is effectively tested when testing the cuda_async_memory_resource.

@harrism
Copy link
Member

harrism commented Mar 22, 2022

I have merged your changes. In tests/mr/device/cuda_async_view_mr_tests.cpp I have disabled a test because cudaDeviceGetDefaultMemPool is not available using the new mechanism. This should not be an issue since the view is effectively tested when testing the cuda_async_memory_resource.

Should probably add that API to the new mechanism and enable the test...

@harrism
Copy link
Member

harrism commented Mar 23, 2022

Thanks @fkallen !

@harrism
Copy link
Member

harrism commented Mar 23, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 220ba88 into rapidsai:branch-22.04 Mar 23, 2022
v22.04 Release automation moved this from PR-Reviewer approved to Done Mar 23, 2022
@fkallen fkallen deleted the non-owning-cuda-async-mr branch August 9, 2022 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp Pertains to C++ code improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants