-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring of Buffers (last step towards unifying COW and Spilling) #13801
Conversation
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly requesting changes for discussion points
Co-authored-by: Lawrence Mitchell <wence@gmx.li>
Co-authored-by: Lawrence Mitchell <wence@gmx.li>
/ok to test |
Co-authored-by: Lawrence Mitchell <wence@gmx.li>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass was pretty high-level to reacquaint myself with the concepts here. My main open question is around the separation between ExposureTrackedBufferOwner and BufferOwner and whether they should maybe be unified if we have to maintain the exposure properties at the BufferOwner level for the two to satisfy the LSP. Maybe we only want to distinguish at the Buffer, not BufferOwner, level (I think the SpillableBufferOwner should stay separate though).
Agree, I think merging |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very close! Thanks for all the iterations.
base=self._base, offset=offset + self._offset, size=size | ||
) | ||
@property | ||
def exposed(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little odd that all BufferOwner objects know their exposed status, but only a subclass of Buffer does. However, I think I'm OK with that for now. In the next PR when more of these types get unified further we can see exactly what separation of concerns makes the most sense between the final set of classes.
The sound solution is to modify Dask et al. so that they access the | ||
frames through `.get_ptr()` and holds on to the `spill_lock` until | ||
the frame has been transferred. However, until this adaptation we | ||
use a hack where the frame is a `Buffer` with a `spill_lock` as the | ||
owner, which makes `self` unspillable while the frame is alive but | ||
doesn't expose `self` when `__cuda_array_interface__` is accessed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should reassess this evaluation after the next PR unifying COW and spilling. I don't think I agree with this statement, it implies leaking knowledge of cudf buffer internals to dask. Once we've finished the unification we should revisit whether there's a more API-friendly way of doing this. If not, we need to think about the appropriate generalization of our exposure semantics to generic CAI usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this approach (as the code in this PR uses) is (morally) the way pytorch ensures lifetime of a CAI
-supporting object being turned into a torch tensor
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am happy with this! 🎉
/ok to test |
/ok to test |
Thanks for the reviews. All tests marked
|
/merge |
This PR de-couples buffer slices/views from owning buffers. As it is now, all buffer classes (
ExposureTrackedBuffer
,BufferSlice
,SpillableBuffer
,SpillableBufferSlice
) inherent fromBuffer
, however they are not Liskov substitutable as pointed by @wence- and @vyasr (here and here).To fix this, we now have a
Buffer
and aBufferOwner
class. We still use theBuffer
throughout cuDF but it now points to anBufferOwner
.We have the following class hierarchy:
With the following relationship:
Unify COW and Spilling
In a follow-up PR, the spilling buffer classes will inherent from the exposure tracked buffer classes so we get the following hierarchy: