-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Callback memory resource #980
Callback memory resource #980
Conversation
This PR has been labeled |
@shwina should we push this to next release? |
… callback-memory-resource
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
… callback-memory-resource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No other comments from me -- thanks again. Eager to make use of all the functionality this enables!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing left are nits and slight improvements to testing. I'm happy to see this merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This is pretty close to what cuQuantum expects too (except for the missing stream argument which we can work around at the Python layer):
https://docs.nvidia.com/cuda/cuquantum/python/api/generated/cuquantum.cutensornet.set_device_mem_handler.html#cuquantum.cutensornet.set_device_mem_handler
so we probably could use RMM at least for testing purpose in the next release.
void* allocate(size_t bytes) except + | ||
void deallocate(void* ptr, size_t bytes) except + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT these methods are being added just for the purpose of the test. I'm not opposed to exposing these functions in the Python API, but that seems like it merits discussion beyond this largely unrelated PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also fine if you want to push back here and just get this done, but I wanted to at least have that discussion recorded if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably a good idea to expose these anyway for users that want to use RMM but not necessarily construct memory owning DeviceBuffers. Also see my comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with that. Could we add some explicit tests of these APIs for other memory resources then? It seems awkward that the only test of the allocate
function of memory allocators (which sounds like a pretty core feature...) would only be tested in this one callback test where we don't even actually validate the allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test for allocate/deallocate
``CallbackMemoryResource`` should really only be used for | ||
debugging memory issues, as there is a significant performance | ||
penalty associated with using a Python function for each memory | ||
allocation and deallocation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably out of scope for this PR, but would it be possible to instead accept a cdef
function as the allocator (as a void *
pointer at that point) that wouldn't have these performance implications?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would preclude passing Python functions as the callbacks, which is the primary motivation for the CallbackMemoryResource
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I'm not suggesting that we could do it with this same class or in this PR. I'm asking if this is a useful feature for future work and another class CythonCallbackMemoryResource
(or if there's some way to make this signature polymorphic). Mostly asking if we should open a follow-up issue.
dbuf = rmm.DeviceBuffer(size=256) | ||
del dbuf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we actually check the total memory allocation here instead of just looking for printed output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for any arbitrary base_mr
.
My thought here is that this test doesn't need to check that base_mr
is behaving correctly, or really test what happens inside the callbacks. This test should just ensure that the callbacks are indeed invoked as expected.
Maybe there's a better way to do that I'm missing. Modifying a global is one approach I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a reasonable expectation for the test, but in that case why even have a base mr? The test could remove the actual allocation from the callback entirely. Is calling the allocate function really testing anything other than the fact that you can run arbitrary Python from the callback?
Setting a global would work, but I'm also OK with output capturing for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly because it serves as a useful example of how to use CallbackMemoryResource
, although I realized that probably belongs in the docstring, so I added it there.
Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
… callback-memory-resource
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through and resolved all the conversations that have been addressed. There are still a few outstanding items around adding tests. Also, many of these files need their copyrights updated. mock_resource.hpp
needs the entire license header added, it's not currently present. I'll be out tomorrow and Monday and I don't want to be the only blocking review in case you get a chance to wrap things up, and these issues are pretty minor, so I'm going to approve assuming those remaining issues get addressed. Thanks! Super cool feature.
@gpucibot merge |
In rapidsai#980, the DeviceMemoryResource class in Python gained allocation and deallocation routines. This was to facility writing Python allocate/deallocate callbacks for the CallbackMemoryResource. These routines should, to match the C++ API, accept a stream parameter such that one can use them for stream-ordered allocation. Although we recommend that users allocate on the Python side using the DeviceBuffer interface, exposing these routines implicitly makes them public. To fix this, add an optional stream argument defaulting to the default stream. - Closes rapidsai#1493
…1494) In #980, the DeviceMemoryResource class in Python gained allocation and deallocation routines. This was to facilitate writing Python allocate/deallocate callbacks for the CallbackMemoryResource. These routines should, to match the C++ API, accept a stream parameter such that one can use them for stream-ordered allocation. Although we recommend that users allocate on the Python side using the DeviceBuffer interface, exposing these routines implicitly makes them public. To fix this, add an optional stream argument defaulting to the default stream. - Closes #1493 Authors: - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) Approvers: - Mark Harris (https://github.com/harrism) URL: #1494
This PR adds a
CallbackMemoryResource
that accepts Python callback functions that are responsible for allocating and deallocating memory (warning: this should not be used in production for performance reasons).