Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callback memory resource #980

Merged
merged 39 commits into from
Apr 19, 2022

Conversation

shwina
Copy link
Contributor

@shwina shwina commented Feb 15, 2022

This PR adds a CallbackMemoryResource that accepts Python callback functions that are responsible for allocating and deallocating memory (warning: this should not be used in production for performance reasons).

import rmm

# using a CudaMemoryResource as the backing MR,
# define allocation and deallocation functions that
# print the amount of memory being (de)allocated

base_mr = rmm.mr.CudaMemoryResource()

def allocate(size):
    print(f"Allocating {size}")
    return base_mr.allocate(size)

def deallocate(ptr, size):
    print(f"Deallocating {size}")    
    return base_mr.deallocate(ptr, size)

# create a CallbackMemoryResource and set it to be
# the default memory resource used by RMM:

mr = rmm.mr.CallbackMemoryResource(allocate, deallocate)
rmm.mr.set_current_device_resource(mr)
# All allocations/deallocations go through the callback:
s = cudf.Series([0, 1, 2]) 
# prints "Allocating 24"

@github-actions github-actions bot added cpp Pertains to C++ code Python Related to RMM Python API labels Feb 15, 2022
@github-actions
Copy link

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

@harrism
Copy link
Member

harrism commented Mar 21, 2022

@shwina should we push this to next release?

@shwina shwina changed the base branch from branch-22.04 to branch-22.06 April 8, 2022 17:50
@shwina shwina marked this pull request as ready for review April 8, 2022 17:51
@shwina shwina requested review from a team as code owners April 8, 2022 17:51
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No other comments from me -- thanks again. Eager to make use of all the functionality this enables!

Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing left are nits and slight improvements to testing. I'm happy to see this merged.

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This is pretty close to what cuQuantum expects too (except for the missing stream argument which we can work around at the Python layer):
https://docs.nvidia.com/cuda/cuquantum/python/api/generated/cuquantum.cutensornet.set_device_mem_handler.html#cuquantum.cutensornet.set_device_mem_handler
so we probably could use RMM at least for testing purpose in the next release.

Comment on lines +24 to +25
void* allocate(size_t bytes) except +
void deallocate(void* ptr, size_t bytes) except +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT these methods are being added just for the purpose of the test. I'm not opposed to exposing these functions in the Python API, but that seems like it merits discussion beyond this largely unrelated PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also fine if you want to push back here and just get this done, but I wanted to at least have that discussion recorded if so.

Copy link
Contributor Author

@shwina shwina Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably a good idea to expose these anyway for users that want to use RMM but not necessarily construct memory owning DeviceBuffers. Also see my comment below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that. Could we add some explicit tests of these APIs for other memory resources then? It seems awkward that the only test of the allocate function of memory allocators (which sounds like a pretty core feature...) would only be tested in this one callback test where we don't even actually validate the allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test for allocate/deallocate

python/rmm/_lib/memory_resource.pyx Outdated Show resolved Hide resolved
include/rmm/mr/device/callback_memory_resource.hpp Outdated Show resolved Hide resolved
include/rmm/mr/device/callback_memory_resource.hpp Outdated Show resolved Hide resolved
include/rmm/mr/device/callback_memory_resource.hpp Outdated Show resolved Hide resolved
python/rmm/_lib/memory_resource.pyx Show resolved Hide resolved
Comment on lines +496 to +499
``CallbackMemoryResource`` should really only be used for
debugging memory issues, as there is a significant performance
penalty associated with using a Python function for each memory
allocation and deallocation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably out of scope for this PR, but would it be possible to instead accept a cdef function as the allocator (as a void * pointer at that point) that wouldn't have these performance implications?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would preclude passing Python functions as the callbacks, which is the primary motivation for the CallbackMemoryResource.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I'm not suggesting that we could do it with this same class or in this PR. I'm asking if this is a useful feature for future work and another class CythonCallbackMemoryResource (or if there's some way to make this signature polymorphic). Mostly asking if we should open a follow-up issue.

python/rmm/_lib/memory_resource.pyx Outdated Show resolved Hide resolved
python/rmm/tests/test_rmm.py Show resolved Hide resolved
Comment on lines +739 to +740
dbuf = rmm.DeviceBuffer(size=256)
del dbuf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we actually check the total memory allocation here instead of just looking for printed output?

Copy link
Contributor Author

@shwina shwina Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for any arbitrary base_mr.

My thought here is that this test doesn't need to check that base_mr is behaving correctly, or really test what happens inside the callbacks. This test should just ensure that the callbacks are indeed invoked as expected.

Maybe there's a better way to do that I'm missing. Modifying a global is one approach I guess?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a reasonable expectation for the test, but in that case why even have a base mr? The test could remove the actual allocation from the callback entirely. Is calling the allocate function really testing anything other than the fact that you can run arbitrary Python from the callback?

Setting a global would work, but I'm also OK with output capturing for this purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly because it serves as a useful example of how to use CallbackMemoryResource, although I realized that probably belongs in the docstring, so I added it there.

shwina and others added 7 commits April 14, 2022 08:53
Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
@shwina
Copy link
Contributor Author

shwina commented Apr 14, 2022

rerun tests

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through and resolved all the conversations that have been addressed. There are still a few outstanding items around adding tests. Also, many of these files need their copyrights updated. mock_resource.hpp needs the entire license header added, it's not currently present. I'll be out tomorrow and Monday and I don't want to be the only blocking review in case you get a chance to wrap things up, and these issues are pretty minor, so I'm going to approve assuming those remaining issues get addressed. Thanks! Super cool feature.

@shwina
Copy link
Contributor Author

shwina commented Apr 19, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 17bdbcb into rapidsai:branch-22.06 Apr 19, 2022
wence- added a commit to wence-/rmm that referenced this pull request Mar 5, 2024
In rapidsai#980, the DeviceMemoryResource class in Python gained allocation
and deallocation routines. This was to facility writing Python
allocate/deallocate callbacks for the CallbackMemoryResource.

These routines should, to match the C++ API, accept a stream parameter
such that one can use them for stream-ordered allocation. Although we
recommend that users allocate on the Python side using the
DeviceBuffer interface, exposing these routines implicitly makes them
public.

To fix this, add an optional stream argument defaulting to the default
stream.

- Closes rapidsai#1493
rapids-bot bot pushed a commit that referenced this pull request Mar 7, 2024
…1494)

In #980, the DeviceMemoryResource class in Python gained allocation and deallocation routines. This was to facilitate writing Python allocate/deallocate callbacks for the CallbackMemoryResource.

These routines should, to match the C++ API, accept a stream parameter such that one can use them for stream-ordered allocation. Although we recommend that users allocate on the Python side using the DeviceBuffer interface, exposing these routines implicitly makes them public.

To fix this, add an optional stream argument defaulting to the default stream.

- Closes #1493

Authors:
  - Lawrence Mitchell (https://github.com/wence-)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Mark Harris (https://github.com/harrism)

URL: #1494
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp Pertains to C++ code feature request New feature or request non-breaking Non-breaking change Python Related to RMM Python API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants