Refactoring of Buffers (last step towards unifying COW and Spilling) #13801

madsbk · 2023-08-02T12:43:27Z

This PR de-couples buffer slices/views from owning buffers. As it is now, all buffer classes (ExposureTrackedBuffer, BufferSlice, SpillableBuffer, SpillableBufferSlice) inherent from Buffer, however they are not Liskov substitutable as pointed by @wence- and @vyasr (here and here).

To fix this, we now have a Buffer and a BufferOwner class. We still use the Buffer throughout cuDF but it now points to an BufferOwner.

We have the following class hierarchy:

ExposureTrackedBufferOwner -> BufferOwner 
SpillableBufferOwner -> BufferOwner 
ExposureTrackedBuffer -> Buffer 
SpillableBuffer -> Buffer

With the following relationship:

Buffer -> BufferOwner 
ExposureTrackedBuffer -> ExposureTrackedBufferOwner 
SpillableBuffer -> SpillableBufferOwner

Unify COW and Spilling

In a follow-up PR, the spilling buffer classes will inherent from the exposure tracked buffer classes so we get the following hierarchy:

SpillableBufferOwner -> ExposureTrackedBufferOwner -> BufferOwner 
SpillableBuffer -> ExposureTrackedBuffer -> Buffer

copy-pr-bot · 2023-11-02T15:33:52Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

madsbk · 2023-11-03T10:20:58Z

/ok to test

wence-

Mostly requesting changes for discussion points

docs/cudf/source/developer_guide/library_design.md

python/cudf/cudf/core/buffer/buffer.py

python/cudf/cudf/core/buffer/exposure_tracked_buffer.py

python/cudf/cudf/core/buffer/utils.py

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

madsbk · 2023-11-23T15:22:04Z

/ok to test

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

vyasr

First pass was pretty high-level to reacquaint myself with the concepts here. My main open question is around the separation between ExposureTrackedBufferOwner and BufferOwner and whether they should maybe be unified if we have to maintain the exposure properties at the BufferOwner level for the two to satisfy the LSP. Maybe we only want to distinguish at the Buffer, not BufferOwner, level (I think the SpillableBufferOwner should stay separate though).

python/cudf/cudf/core/buffer/buffer.py

python/cudf/cudf/core/abc.py

python/cudf/cudf/core/buffer/buffer.py

python/cudf/cudf/core/buffer/exposure_tracked_buffer.py

madsbk · 2023-11-30T07:48:09Z

First pass was pretty high-level to reacquaint myself with the concepts here. My main open question is around the separation between ExposureTrackedBufferOwner and BufferOwner and whether they should maybe be unified if we have to maintain the exposure properties at the BufferOwner level for the two to satisfy the LSP. Maybe we only want to distinguish at the Buffer, not BufferOwner, level (I think the SpillableBufferOwner should stay separate though).

Agree, I think merging ExposureTrackedBufferOwner and BufferOwner is a good idea.
@wence-, I think you had a similar idea?

python/cudf/cudf/core/buffer/buffer.py

…wner

madsbk · 2023-12-14T10:08:42Z

/ok to test

vyasr

This is very close! Thanks for all the iterations.

python/cudf/cudf/core/buffer/buffer.py

vyasr · 2024-01-11T21:58:28Z

python/cudf/cudf/core/buffer/exposure_tracked_buffer.py

-            base=self._base, offset=offset + self._offset, size=size
-        )
+    @property
+    def exposed(self) -> bool:


It's a little odd that all BufferOwner objects know their exposed status, but only a subclass of Buffer does. However, I think I'm OK with that for now. In the next PR when more of these types get unified further we can see exactly what separation of concerns makes the most sense between the final set of classes.

vyasr · 2024-01-11T22:02:17Z

python/cudf/cudf/core/buffer/spillable_buffer.py

+        The sound solution is to modify Dask et al. so that they access the
+        frames through `.get_ptr()` and holds on to the `spill_lock` until
+        the frame has been transferred. However, until this adaptation we
+        use a hack where the frame is a `Buffer` with a `spill_lock` as the
+        owner, which makes `self` unspillable while the frame is alive but
+        doesn't expose `self` when `__cuda_array_interface__` is accessed.


We should reassess this evaluation after the next PR unifying COW and spilling. I don't think I agree with this statement, it implies leaking knowledge of cudf buffer internals to dask. Once we've finished the unification we should revisit whether there's a more API-friendly way of doing this. If not, we need to think about the appropriate generalization of our exposure semantics to generic CAI usage.

FWIW, this approach (as the code in this PR uses) is (morally) the way pytorch ensures lifetime of a CAI-supporting object being turned into a torch tensor

python/cudf/cudf/core/buffer/utils.py

…wner

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

wence- · 2024-01-12T09:17:39Z

/ok to test

wence-

I think I am happy with this! 🎉

madsbk · 2024-01-12T10:12:33Z

/ok to test

madsbk · 2024-01-12T13:13:32Z

/ok to test

madsbk · 2024-01-15T08:32:51Z

Thanks for the reviews. All tests marked spilling passes for me locally so let's merge this PR

CUDF_SPILL=on CUDF_SPILL_DEVICE_LIMIT=1 py.test -m spilling python/cudf/cudf/tests/

madsbk · 2024-01-15T08:33:06Z

/merge

madsbk added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 2, 2023

github-actions bot added the Python Affects Python cuDF API. label Aug 2, 2023

madsbk force-pushed the buffer_owner branch from ba3f507 to 9d04a0a Compare August 3, 2023 08:36

madsbk added 7 commits August 3, 2023 10:54

Introduce and use BufferOwner

4273e04

rename ExposureTrackedBuffer => ExposureTrackedBufferOwner

6201b6a

rename SpillableBuffer => SpillableBufferOwner

06ab42e

fix test_buffer_creation_from_any

fa076ef

doc

b37425c

clean up inits

d6fc8b4

unify as_spillable_buffer, as_exposure_tracked_buffer, and as_buffer

3b49970

madsbk force-pushed the buffer_owner branch from 9d04a0a to 3b49970 Compare August 3, 2023 08:54

madsbk marked this pull request as ready for review August 3, 2023 11:33

madsbk requested a review from a team as a code owner August 3, 2023 11:33

madsbk requested review from vyasr and mroeschke August 3, 2023 11:33

Merge branch 'branch-23.10' into buffer_owner

25603ef

madsbk changed the base branch from branch-23.10 to branch-23.12 November 3, 2023 10:19

Merge branch 'branch-23.12' into buffer_owner

a9c38d9

wence- requested changes Nov 23, 2023

View reviewed changes

Apply suggestions from code review

3196670

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

madsbk changed the base branch from branch-23.12 to branch-24.02 November 23, 2023 15:20

madsbk and others added 2 commits November 23, 2023 16:20

Merge branch 'branch-24.02' into buffer_owner

de0cc61

Update python/cudf/cudf/core/buffer/utils.py

1f82e1b

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

madsbk and others added 3 commits November 24, 2023 10:19

fix errror check

d22370c

clean up __str__ and __repr__

3053e3a

Update python/cudf/cudf/core/buffer/buffer.py

63382d9

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

vyasr reviewed Nov 30, 2023

View reviewed changes

shwina reviewed Dec 12, 2023

View reviewed changes

python/cudf/cudf/core/buffer/buffer.py Outdated Show resolved Hide resolved

shwina reviewed Dec 12, 2023

View reviewed changes

python/cudf/cudf/core/buffer/buffer.py Show resolved Hide resolved

shwina reviewed Dec 12, 2023

View reviewed changes

python/cudf/cudf/core/buffer/buffer.py Show resolved Hide resolved

madsbk added 4 commits December 13, 2023 08:41

doc

c423812

Merge branch 'branch-24.02' of github.com:rapidsai/cudf into buffer_o…

2b70e7a

…wner

merge ExposureTrackedBufferOwner and BufferOwner

d81cd3b

spill serialize now returns a Buffer

89b45fd

madsbk requested a review from vyasr January 11, 2024 07:23

vyasr requested changes Jan 11, 2024

View reviewed changes

madsbk and others added 3 commits January 12, 2024 08:19

Merge branch 'branch-24.02' of github.com:rapidsai/cudf into buffer_o…

a428f15

…wner

doc

5ffba0c

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

Merge branch 'buffer_owner' of github.com:madsbk/cudf into buffer_owner

1606ccf

Buffer.memoryview(): removed the size and offset arguments

c7d6378

wence- approved these changes Jan 12, 2024

View reviewed changes

madsbk added 2 commits January 12, 2024 11:09

Merge branch 'branch-24.02' into buffer_owner

d1cb966

copyrights

a00a12b

moved get_buffer_owner to utils.py

6a9597a

madsbk requested a review from vyasr January 12, 2024 12:18

vyasr approved these changes Jan 12, 2024

View reviewed changes

rapids-bot bot merged commit 0710335 into rapidsai:branch-24.02 Jan 15, 2024
67 checks passed

madsbk deleted the buffer_owner branch January 15, 2024 08:34

vyasr mentioned this pull request Apr 10, 2024

Unify Copy-On-Write and Spilling #15436

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring of Buffers (last step towards unifying COW and Spilling) #13801

Refactoring of Buffers (last step towards unifying COW and Spilling) #13801

madsbk commented Aug 2, 2023

copy-pr-bot bot commented Nov 2, 2023

madsbk commented Nov 3, 2023

wence- left a comment

madsbk commented Nov 23, 2023

vyasr left a comment

madsbk commented Nov 30, 2023

madsbk commented Dec 14, 2023

vyasr left a comment

vyasr Jan 11, 2024

vyasr Jan 11, 2024

wence- Jan 12, 2024

wence- commented Jan 12, 2024

wence- left a comment

madsbk commented Jan 12, 2024

madsbk commented Jan 12, 2024

madsbk commented Jan 15, 2024

madsbk commented Jan 15, 2024

Refactoring of Buffers (last step towards unifying COW and Spilling) #13801

Refactoring of Buffers (last step towards unifying COW and Spilling) #13801

Conversation

madsbk commented Aug 2, 2023

Unify COW and Spilling

copy-pr-bot bot commented Nov 2, 2023

madsbk commented Nov 3, 2023

wence- left a comment

Choose a reason for hiding this comment

madsbk commented Nov 23, 2023

vyasr left a comment

Choose a reason for hiding this comment

madsbk commented Nov 30, 2023

madsbk commented Dec 14, 2023

vyasr left a comment

Choose a reason for hiding this comment

vyasr Jan 11, 2024

Choose a reason for hiding this comment

vyasr Jan 11, 2024

Choose a reason for hiding this comment

wence- Jan 12, 2024

Choose a reason for hiding this comment

wence- commented Jan 12, 2024

wence- left a comment

Choose a reason for hiding this comment

madsbk commented Jan 12, 2024

madsbk commented Jan 12, 2024

madsbk commented Jan 15, 2024

madsbk commented Jan 15, 2024