[REVIEW] Improve Cython Lifetime Management by Adding References in `DeviceBuffer` #661

mdemoret-nv · 2020-12-15T04:02:27Z

As discussed with @shwina @harrism and @kkraus14, this PR adds 2 properties to DeviceBuffer to allow for automatic reference counting of MemoryResource and Stream objects. This will prevent any MemoryResource from being destructed while any DeviceBuffer that needs the MR for deallocation is still alive.

There are a few outstanding issues I could use input on:

The test test_rmm_device_buffer is failing due to the line: sys.getsizeof(b) == b.size. Need input on the best way forward.
1. This test is failing since DeviceBuffer is now involved in GC. Python automatically adds the GC memory overhead to __size__ (see here) which makes it difficult to continue working the same way it has before.
2. Only options I can think of are:
  1. Remove this check from the test or alter the "correct" value
  2. Add @cython.no_gc which is very risky.
The current PR implementation includes cuda stream object reference counting but treats all Stream objects the same. @harrism mentioned only streams owned by RMM should be tracked this way but I am not sure if thats necessary or how to distinguish them at this point.

Other than the above items, all test are passing and I ran this through the cuML test suite without any issues. Thanks for your help.

…ifetime

shwina · 2020-12-15T14:24:38Z

Only options I can think of are:
Remove this check from the test or alter the "correct" value
Add @cython.no_gc which is very risky.

We should remove this test entirely. The overhead added to the result of __sizeof__() is very much an implementation detail. Previous discussion here: #654 (comment)

shwina · 2020-12-15T14:38:17Z

The addition of Python attributes to DeviceBuffer means that it can now technically participate in reference cycles. It also has an attribute (Stream) that can participate in reference cycles. Objects in reference cycles are cleaned up by Python's GC.

My question is: does this open the possibility of Python's GC destroying the Stream before the underlying device buffers are deallocated? (see: cython/cython#3946)

If we're sure that typical usage patterns won't result in reference cycles, then maybe this is not worth worrying about :)

mdemoret-nv · 2020-12-15T19:56:52Z

We should remove this test entirely. The overhead added to the result of __sizeof__() is very much an implementation detail. Previous discussion here: #654 (comment)

Sounds good. Using __sizeof__ this way seems odd to me anyways and I can't think of a situation where a user would query the size of a buffer with sys.getsizeof(buf) instead of buf.size. Leaving it up to the implementation seems like the right move.

The addition of Python attributes to DeviceBuffer means that it can now technically participate in reference cycles. It also has an attribute (Stream) that can participate in reference cycles. Objects in reference cycles are cleaned up by Python's GC.

My question is: does this open the possibility of Python's GC destroying the Stream before the underlying device buffers are deallocated? (see: cython/cython#3946)

That's a great question. The Cython docs have this second on how to handle that scenario. I couldn't completely rule out the possibility so I had considered adding @cython.no_gc_clear to DeviceBuffer. I ultimately didn't include it in the first iteration because I couldn't think of a situation where there would be a reference cycle. Glad you brought this up. Can you post an example reference cycle I could add to the tests? I'm not too familiar with the new stream objects just yet but it would be great to test this experimentally.

If we're sure that typical usage patterns won't result in reference cycles, then maybe this is not worth worrying about :)

I am going to defer to you on this one. I couldn't think of a situation but I hardly am an advanced user. Additionally, it's worth considering if this may come up in the future given the rmm team's current plans.

harrism · 2020-12-15T21:51:37Z

2\. The current PR implementation includes cuda stream object reference counting but treats all `Stream` objects the same. @harrism mentioned only streams owned by RMM should be tracked this way but I am not sure if thats necessary or how to distinguish them at this point.

Not sure when I mentioned this too you. However, it's not that you shouldn't hold a reference to a Python stream object for a stream you don't own. It's just that if RMM doesn't own the underlying cudaStream_t, there's a chance that the actual owner might cudaStreamDestroy it out from under you regardless of whether you hold a reference to the Python object or not. It is what it is, and documentation is our only way to avoid this case.

Does this PR overlap #650 ?

python/rmm/_lib/device_buffer.pxd

kkraus14

LGTM, thanks @mdemoret-nv

kkraus14 · 2021-01-07T16:46:46Z

python/rmm/_lib/device_buffer.pyx

@@ -81,6 +84,10 @@ cdef class DeviceBuffer:
                if stream.c_is_default():
                    stream.c_synchronize()

+        # Save a reference to the MR and stream used for allocation
+        self.mr = get_current_device_resource()


Do we want to allow passing a MR to the constructor?

For now, I'd say let's not do that. I'd much rather we support multiple MRs with a context manager. That is, I prefer the first approach below to the second:

mr = MemoryResource(...) with using_memory_resource(mr) dbuf_1 = DeviceBuffer(...) dbuf_2 = DeviceBuffer(...)

v/s

mr = MemoryResource(...) dbuf_1 = DeviceBuffer(..., mr=mr) dbuf_2 = DeviceBuffer(..., mr=mr)

There are arguments in favour of both though.

The context manager approach also follows the "One -- and preferably only one -- obvious way to do it" Zen.

For now, I have left out passing the MR in the constructor. If the situation has changed, @kkraus14 or @shwina please let me know.

kkraus14 · 2021-01-07T16:51:22Z

Sounds good. Using __sizeof__ this way seems odd to me anyways and I can't think of a situation where a user would query the size of a buffer with sys.getsizeof(buf) instead of buf.size.

If I remember correctly, Dask will call __sizeof__() for tracking memory usage and is why we test the method. Would suggest calling the function directly in the test instead of via sys.getsizeof(...).

harrism · 2021-02-02T22:59:25Z

@mdemoret-nv your current PRs seem to have stalled. Today is code freeze for 0.18 so I'm going to push this to 0.19

mdemoret-nv · 2021-02-03T00:12:41Z

@mdemoret-nv your current PRs seem to have stalled. Today is code freeze for 0.18 so I'm going to push this to 0.19

Apologies, my Github notifications significantly increased in 0.18 and these PRs slipped through the cracks while I worked on cuml PRs. I think it would be better to get these PRs in a release cycle early anyways, just to ensure there are no unintended side effects.

I will wrap these up once 0.18 has shipped.

mdemoret-nv · 2021-02-24T02:11:04Z

PR is ready for re-review/merging.

kkraus14 · 2021-02-24T02:11:41Z

Would like to let @shwina review again before merging

kkraus14 · 2021-02-25T01:24:50Z

rerun tests

kkraus14 · 2021-02-25T01:59:57Z

Going to merge this as the changes were minimal. Thanks @mdemoret-nv!

kkraus14 · 2021-02-25T02:00:02Z

@gpucibot merge

@mdemoret-nv

#662) Depends on #661 While working on PR #661, it looked like it was possible to remove the "owning" C++ memory resource adaptors in `memory_resource_adaptors.hpp`. This PR is a quick implementation of what that would look like to run through CI and get feedback. The main driving factor of this PR is to eliminate the need for 2 layers of wrappers around every memory resource in the library. When adding new memory resources, C++ wrappers must be created in `memory_resource_adaptors.hpp` and Cython wrappers must be created in `memory_resource.pyx`, for any property/function that needs to be exposed at the python level. This removes the C++ wrappers in favor of using pythons reference counting for lifetime management. A few notes: 1. `MemoryResource` was renamed `DeviceMemoryResource` to more closely match the C++ class names. Easily can be changed back 1. Upstream MR are kept alive by a base class `UpstreamResourceAdaptor` that stores a single property `upstream_mr`. Any MR that has an upstream, needs to inherit from this class. 1. Once the `UpstreamResourceAdaptor` was created, most of the work/changes were updating the Cython imports to use the C++ classes instead of the C++ wrappers. 1. This should make it easier to expose more methods/properties at the python layer in the future. Would appreciate any feedback. Authors: - Michael Demoret (@mdemoret-nv) Approvers: - Christopher Harris (@cwharris) - Keith Kraus (@kkraus14) - Ashwin Srinath (@shwina) URL: #662

@pentschev

This PR adds support for CuPy streams in `rmm_cupy_allocator`. It works by getting CuPy's current stream and passing that to the `DeviceBuffer` constructor. There's also a fix for the casting of CuPy/Numba streams to `cudaStream_t`, it needs to be cast to `uintptr_t` first, without that the resulting pointer would be wrong and result in a segfault. Depends on #661 Authors: - Peter Andreas Entschev (@pentschev) - Keith Kraus (@kkraus14) Approvers: - Keith Kraus (@kkraus14) - @jakirkham - Mark Harris (@harrism) URL: #654

mdemoret-nv added 3 commits December 14, 2020 15:11

Adding references to DeviceBuffer to ensure necessary MR and Stream l…

eca1c64

…ifetime

Style Cleanup

5164d03

Removing no_gc from DeviceBuffer

9280800

mdemoret-nv requested a review from a team as a code owner December 15, 2020 04:02

Running isort

abcab05

mdemoret-nv mentioned this pull request Dec 15, 2020

[REVIEW] Remove C++ Wrappers in memory_resource_adaptors.hpp Needed by Cython #662

Merged

harrism added this to PR-WIP in v0.18 Release via automation Dec 15, 2020

harrism assigned mdemoret-nv Dec 15, 2020

harrism added 3 - Ready for review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Related to RMM Python API labels Dec 15, 2020

mdemoret-nv mentioned this pull request Dec 15, 2020

Add Statistics Resource Adaptor and cython bindings to tracking_resource_adaptor and statistics_resource_adaptor #626

Merged

Add pytest fixture to auto reinitialize after every test.

623704d

jakirkham reviewed Dec 17, 2020

View reviewed changes

python/rmm/_lib/device_buffer.pxd Show resolved Hide resolved

kkraus14 approved these changes Jan 7, 2021

View reviewed changes

v0.18 Release automation moved this from PR-WIP to PR-Reviewer approved Jan 7, 2021

kkraus14 mentioned this pull request Jan 7, 2021

Add support for streams in CuPy allocator #654

Merged

harrism removed this from PR-Reviewer approved in v0.18 Release Feb 2, 2021

harrism added this to PR-WIP in v0.19 Release via automation Feb 2, 2021

harrism changed the base branch from branch-0.18 to branch-0.19 February 2, 2021 22:59

shwina mentioned this pull request Feb 16, 2021

[BUG] Cython memory resource classes duplicate C++ class default parameters #497

Closed

Merge branch 'branch-0.19' into bug-devicebuffer-cython-refs

ebaf2f4

Fixing __sizeof__ comparison on failing test.

11ac17a

kkraus14 approved these changes Feb 24, 2021

View reviewed changes

v0.19 Release automation moved this from PR-WIP to PR-Reviewer approved Feb 24, 2021

rapids-bot bot merged commit 230369d into rapidsai:branch-0.19 Feb 25, 2021

v0.19 Release automation moved this from PR-Reviewer approved to Done Feb 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Improve Cython Lifetime Management by Adding References in `DeviceBuffer` #661

[REVIEW] Improve Cython Lifetime Management by Adding References in `DeviceBuffer` #661

mdemoret-nv commented Dec 15, 2020

shwina commented Dec 15, 2020

shwina commented Dec 15, 2020 •

edited

Loading

mdemoret-nv commented Dec 15, 2020

harrism commented Dec 15, 2020

kkraus14 left a comment

kkraus14 Jan 7, 2021

shwina Jan 9, 2021 •

edited

Loading

shwina Jan 9, 2021

mdemoret-nv Feb 24, 2021

kkraus14 commented Jan 7, 2021

harrism commented Feb 2, 2021

mdemoret-nv commented Feb 3, 2021

mdemoret-nv commented Feb 24, 2021

kkraus14 commented Feb 24, 2021

kkraus14 commented Feb 25, 2021

kkraus14 commented Feb 25, 2021

kkraus14 commented Feb 25, 2021

[REVIEW] Improve Cython Lifetime Management by Adding References in DeviceBuffer #661

[REVIEW] Improve Cython Lifetime Management by Adding References in DeviceBuffer #661

Conversation

mdemoret-nv commented Dec 15, 2020

shwina commented Dec 15, 2020

shwina commented Dec 15, 2020 • edited Loading

mdemoret-nv commented Dec 15, 2020

harrism commented Dec 15, 2020

kkraus14 left a comment

Choose a reason for hiding this comment

kkraus14 Jan 7, 2021

Choose a reason for hiding this comment

shwina Jan 9, 2021 • edited Loading

Choose a reason for hiding this comment

shwina Jan 9, 2021

Choose a reason for hiding this comment

mdemoret-nv Feb 24, 2021

Choose a reason for hiding this comment

kkraus14 commented Jan 7, 2021

harrism commented Feb 2, 2021

mdemoret-nv commented Feb 3, 2021

mdemoret-nv commented Feb 24, 2021

kkraus14 commented Feb 24, 2021

kkraus14 commented Feb 25, 2021

kkraus14 commented Feb 25, 2021

kkraus14 commented Feb 25, 2021

[REVIEW] Improve Cython Lifetime Management by Adding References in `DeviceBuffer` #661

[REVIEW] Improve Cython Lifetime Management by Adding References in `DeviceBuffer` #661

shwina commented Dec 15, 2020 •

edited

Loading

shwina Jan 9, 2021 •

edited

Loading