Migrate RMM usage to CCCL MR design#1990
Conversation
Adapt cuVS to RMM breaking changes: removal of device_memory_resource base class, de-templated resource/adaptor types, new per-device resource ref APIs, and CCCL resource concept requirements. Key changes: - get_workspace_resource() -> get_workspace_resource_ref() (44 sites) - get_large_workspace_resource() -> get_large_workspace_resource_ref() (21 sites) - get_current_device_resource() -> get_current_device_resource_ref() - device_memory_resource* params -> device_async_resource_ref - Remove &resource pointer patterns (resources are now value types) - Migrate cuda_huge_page_resource to CCCL concept - De-template pool_memory_resource, failure_callback_resource_adaptor - Rewrite C API pool resource management without owning_wrapper - Remove deleted rmm/mr/device_memory_resource.hpp includes
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
# Conflicts: # cpp/src/neighbors/ivf_flat/ivf_flat_search.cuh
942f4e8 to
4c59082
Compare
4c59082 to
ae4f344
Compare
4df8847 to
e45a83e
Compare
|
/ok to test |
| @@ -17,37 +19,25 @@ | |||
|
|
|||
| namespace raft::mr { | |||
There was a problem hiding this comment.
Why is cuVS defining classes in the raft:: namespace? (This is not in scope for this PR.)
| indices_topk + offset * k, | ||
| type != cuvs::distance::DistanceType::InnerProduct, | ||
| mr); | ||
| type != cuvs::distance::DistanceType::InnerProduct); |
There was a problem hiding this comment.
Need to check if this change is intentional or a mistake.
There was a problem hiding this comment.
Seems like this was changed in rapidsai/raft#1786?
There was a problem hiding this comment.
Maybe this was a dead code path. I can't see how it would've been building successfully prior to this migration otherwise.
| indices_topk + offset * k, | ||
| cuvs::distance::is_min_close(type), | ||
| mr); | ||
| cuvs::distance::is_min_close(type)); |
There was a problem hiding this comment.
Need to check if this API change was intentional.
There was a problem hiding this comment.
Seems like this was changed in rapidsai/raft#1786?
There was a problem hiding this comment.
Maybe this was a dead code path. I can't see how it would've been building successfully prior to this migration otherwise.
1428e7c to
423fa1c
Compare
423fa1c to
8f50dc0
Compare
Add explicit operator!= (required when _CCCL_HAS_CONCEPTS is 0, e.g. nvcc C++17 mode, since operator!= is not auto-synthesized from operator==). Use constexpr friend for get_property to match the RMM resource pattern. Add synchronous_resource and resource static_asserts for finer diagnostics.
8f50dc0 to
06e0ae5
Compare
06e0ae5 to
fe7ca1b
Compare
|
Requesting admin-merge. Most CI is passing, the one failure seems like a fluke. I am not going to wait for the slow L4 C++ job since all others have passed and this is blocking builds for cuML + cuGraph. |
## Summary - Migrate all raw RMM `allocate`/`deallocate` calls to the new CCCL 3-argument API that requires explicit alignment - Replace removed `rmm.librmm.per_device_resource` Cython import with `rmm.pylibrmm.memory_resource` and use `make_any_device_resource` to obtain the resource for `device_buffer` construction Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/raft#2996. Depends on rapidsai/cuvs#1990. ## Changes - **`cpp/src/genetic/genetic.cu`**: Add explicit `alignof(node)` / `alignof(program)` to all `allocate` and `deallocate` calls in `parallel_evolve` and `symFit`; fix deallocation bug in `parallel_evolve` where `h_nextprogs[i].len` was incorrectly used instead of `tmp.len` to compute the buffer size being freed - **`cpp/examples/symreg/symreg_example.cpp`**: Use `params.population_size * sizeof(cg::program)` and `alignof(cg::program)` for `allocate`/`deallocate` calls, fixing incorrect byte-size computation; remove unused `<rmm/aligned.hpp>` include - **`cpp/tests/sg/genetic/evolution_test.cu`**: Add alignment arguments to allocate/deallocate in `SymReg` test - **`cpp/tests/sg/genetic/program_test.cu`**: Add alignment arguments to `SetUp`/`TearDown` allocate/deallocate calls - **`python/cuml/cuml/manifold/umap/umap.pyx`**: Replace `get_current_device_resource()` with `make_any_device_resource(get_current_device_resource().get_mr())` for `device_buffer` construction Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Simon Adorf (https://github.com/csadorf) - Divye Gala (https://github.com/divyegala) - Victor Lafargue (https://github.com/viclafargue) URL: #7951
## Summary - Replace removed `rmm::mr::device_memory_resource` base class, `owning_wrapper`, `shared_ptr`-based resource management, and deprecated per-device resource APIs with CCCL-native memory resource types - Use `cuda::mr::any_resource<cuda::mr::device_accessible>` for owning type-erased storage, `rmm::device_async_resource_ref` for non-owning references, and value-typed resources (`cuda_memory_resource`, `pinned_host_memory_resource`) - Pass the memory resource to `raft::handle_t` as the `workspace_resource` (3rd) constructor argument, matching the new raft API (`stream_view`, `stream_pool`, `std::optional<raft::mr::device_resource>`) Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/raft#2996. Depends on rapidsai/cuvs#1990. ## Files changed **Headers:** - `algorithms.hpp`, `dendrogram.hpp`, `legacy/graph.hpp`, `legacy/functions.hpp`: `get_current_device_resource()` → `get_current_device_resource_ref()` in default argument expressions - `host_staging_buffer_manager.hpp`: Remove `owning_wrapper`, store `pool_memory_resource` by value in a `std::optional`, accept `pinned_host_memory_resource` by value in `init()` - `large_buffer_manager.hpp`: Store `pinned_host_memory_resource` by value (not `shared_ptr`), return `device_async_resource_ref` from `get()`, `std::move` the resource into storage - `mtmg/resource_manager.hpp`: Use `cuda::mr::any_resource<device_accessible>` instead of `shared_ptr<device_memory_resource>` for `per_device_rmm_resources_`, use non-deprecated `set_per_device_resource`, pass resource as `workspace_resource` to `raft::handle_t` **Tests:** - `base_fixture.hpp`: Return `any_resource<device_accessible>` from `create_memory_resource()`, use value-typed MR factory helpers (`make_cuda`, `make_managed`, `make_pool`, `make_binning`), switch to non-deprecated `set_current_device_resource` / `get_current_device_resource_ref` - `multi_node_threaded_test.cpp`: Switch to non-deprecated `set_current_device_resource(resource)` - `mg_graph500_bfs_test.cu`, `mg_graph500_sssp_test.cu`: Store `pinned_mr_` as `optional<pinned_host_memory_resource>` by value, prefer `.value()` over `operator*` for optional access **Examples:** - All 4 example files (`sg_graph_algorithms.cpp`, `mg_graph_algorithms.cpp`, `vertex_and_edge_partition.cu`, `graph_operations.cu`): Use value-typed `cuda_memory_resource`, non-deprecated `set_current_device_resource`, pass the resource to `raft::handle_t` as the `workspace_resource` (3rd positional arg, with `nullptr` for the unused `stream_pool`) Authors: - Bradley Dice (https://github.com/bdice) - Chuck Hastings (https://github.com/ChuckHastings) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) - Vyas Ramasubramani (https://github.com/vyasr) URL: #5483
Summary
device_async_resource_refinstead ofdevice_memory_resource*, value semantics)get_workspace_resource()/get_large_workspace_resource()with_ref()variants across 65 call sitescuda_huge_page_resourceto satisfy CCCLresourceconcept directlyowning_wrapper/dynamic_castpatterns in C API and benchmarksDepends on rapidsai/rmm#2361.
Depends on rapidsai/ucxx#636.
Depends on rapidsai/raft#2996.
Changes
device_memory_resource*params →device_async_resource_ref(ivf_common, ivf_pq, naive_knn)get_current_device_resource()→get_current_device_resource_ref()set_current_device_resource()→set_current_device_resource_ref()pool_memory_resource,failure_callback_resource_adaptorin bench utils&resourcepointer patterns (resources are now copyable value types)mrarg fromselect_kcalls (previously compiled due to implicit pointer→bool conversion)owning_wrapper