-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store xnd objects in plasma #4
Comments
Originally I thought I could store an This does make sense, since Plasma stores immutable data. |
I think the idea is that you create an object in plasma, then seal it.
Plasma becomes the "malloc". It is the owner of the shared memory.
This makes sense.
1) We need to make sure xnd can take a reference to other memory and
2) We need to make sure that plasma can accept ndtypes descriptions for the
memory
…-Travis
---
Travis Oliphant
Quansight
On Mon, Mar 5, 2018 at 11:25 AM, Saul Shanabrook ***@***.***> wrote:
Originally I thought I could store an xnd object in Plasma without
copying, but I no longer believe this to be the case. The create command
returns a new memory buffer that Plasma creates and hands you a pointer to.
It doesn't seem to support taking ownership over an existing piece of
memory.
This does make sense, since Plasma stores immutable data.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAPjoL45TQuxr8rAWByFsriZPnRU2ubTks5tbXTvgaJpZM4Scd2O>
.
|
Yeah that makes sense. So do we have to be able to serialize the typedef struct {
PyObject_HEAD
MemoryBlockObject *mblock; /* owner of the primary type and memory block */
PyObject *type; /* owner of the current type */
xnd_t xnd; /* typed view, does not own anything */
} XndObject; In trying to get a sense of exactly where the memory is stored, is it possible to inspect these fields |
On Mon, Mar 05, 2018 at 06:18:54PM +0000, Saul Shanabrook wrote:
So do we have to be able to serialize the `XndObject` to that contiguous buffer?
As I understand this https://arrow.apache.org/docs/python/plasma.html#putting-and-getting-python-objects
"This works with all Python objects supported by the Arrow Python object serialization.",
we'd need Arrow serialization. I'm not familiar with plasma though.
|
We don't want to serialize anything (except the description of the
buffer). We want to
1) allocate memory in plasma and ensure that the ndtypes description is
also stored.
2) get a handle the to the buffer and place it in the xnd container
3) get the ndtypes description of the buffer and ensure the xnd container
has the same one.
…-Travis
---
Travis Oliphant
Quansight
On Mon, Mar 5, 2018 at 12:18 PM, Saul Shanabrook ***@***.***> wrote:
Yeah that makes sense.
So do we have to be able to serialize the XndObject to that contiguous
buffer?
typedef struct {
PyObject_HEAD
MemoryBlockObject *mblock; /* owner of the primary type and memory block */
PyObject *type; /* owner of the current type */
xnd_t xnd; /* typed view, does not own anything */
} XndObject;
In trying to get a sense of exactly where the memory is stored, is it
possible to inspect these fields
in a Python REPL on an an xnd instance?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAPjoD4qwbJ2jUel_gPf4f2m5RuHOUykks5tbYE4gaJpZM4Scd2O>
.
|
So would the desired user flow be something like:
Right now, memory allocation in xnd uses the If we could get a pointer back from Plasma's create function, then we could use that instead of Is it right to assume that different plasma clients could have different pointer addresses for the same plasma object? Because we are using virtual memory? If this is the case, then it might not be good to store the pointer for a |
I opened an issue asking if it was possible to get a pointer back from the plasma store, to write to apache/arrow#1706 |
Here is my current take on it. Option 1: I would not like to replace existing allocation functions. Instead, I'd like to proceed cautiously and create a separate We would only support pointer-free types. The Creating this object would allocate a data block in plasma. I assume that the picklable In such a process, the consumer would call something like That is to say, the memory in the This all hinges on the possibility of getting a raw pointer from the plasma data store. If Arrow headers are required for this, I'd like to check for these headers in In general, I'd like to keep the Option 2: Only allow NumPy types and hope that the existing plasma functions take a PEP-3118 buffer. |
Option 2 is a non-starter in my mind as we need something that extends
NumPy types. The data-shape and xnd should be usable where PEP-3118 is not.
A variant for Option 1 is fine in my mind too. Plasma *must* be an
optional part. It is only an important use-case and not something xnd
should rely on.
Because XND can accept foreign buffers as it's memory, this should be
straightforward. You just have another module for interfacing with
plasma. Introducing ndtypes to plasma is a good opportunity to see how
ndtypes and Arrow interact.
…---
Travis Oliphant
Quansight
On Tue, Mar 6, 2018 at 6:19 AM, Stefan Krah ***@***.***> wrote:
Here is my *current* take on it.
Option 1:
I would not like to replace existing allocation functions. Instead, I'd
like to proceed cautiously and create a separate pxnd (p for parallel)
object.
We would only support pointer-free types. The pxnd object would have a
datashape Python string object and a plasma object id and could therefore
be pickled.
Creating this object would allocate a data block in plasma.
I assume that the picklable `pxnd``object would be sent via a pipe to
consumer processes.
In such a process, the consumer would call something like pxnd.get(),
which would use the existing MemoryBlockObject in the same way as the
from_buffer function and return a regular xnd object.
That is to say, the memory in the xnd_master buffer is marked as
external, but creation of the xnd object largely proceeds in the usual
way. This is the same way the existing PEP-3118 imports are handled.
This all hinges on the possibility of getting a raw pointer from the
plasma data store.
If Arrow headers are required for this, I'd like to check for these
headers in ./configure and make the compilation of the pxnd object
conditional on the presence of the headers.
In general, I'd like to keep the pxnd object in a separate section or
file and be able to compile and use xnd without plasma support.
Option 2:
Only allow NumPy types and hope that the existing plasma functions take a
PEP-3118 buffer.
In that case, we'd need to add PEP-3118 exporting facilities to xnd.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAPjoEbuyKM_mcI1lN7_BkSLhfYfTogDks5tbn7tgaJpZM4Scd2O>
.
|
On Tue, Mar 06, 2018 at 01:12:04PM +0000, Travis E. Oliphant wrote:
Option 2 is a non-starter in my mind as we need something that extends
NumPy types. The data-shape and xnd should be usable where PEP-3118 is not.
Agreed. It's more of a fallback if some unknown problems arise.
|
The C++ plasma code looks like it can give you back a pointer to the buffer(https://arrow.apache.org/docs/cpp/classplasma_1_1_plasma_client.html#ac18ab9cc792c620a97a3dcb165e0ecd7), but I am not sure if it's possible for use that, since we are using C and they don't export anything to C. The Cython wrapper might also be helpful, but I also don't think it's possible for us to use that in our C code, since they don't make anything public. I opened a thread on the arrow listserve to see if they have any guidance on using the plasma API from C. |
We may have to make a pull request to add a simple C-export of the buffer.
Alternatively the C++ code that binds libxnd to plasma can create that
API. We can start there. This should be a separate library from libxnd.
…-Travis
---
Travis Oliphant
Quansight
On Tue, Mar 6, 2018 at 4:01 PM, Saul Shanabrook ***@***.***> wrote:
The C++ plasma code looks like it can give you back a pointer to the
buffer(https://arrow.apache.org/docs/cpp/classplasma_1_1_
plasma_client.html#ac18ab9cc792c620a97a3dcb165e0ecd7), but I am not sure
if it's possible for use that, since we are using C and they don't export
anything to C.
The Cython wrapper
<https://github.com/apache/arrow/blob/master/python/pyarrow/plasma.pyx>
might also be helpful, but I also don't think it's possible for us to use
that in our C code, since they don't make anything public. I opened a
thread
<https://lists.apache.org/thread.html/0ba7b568fd29a1250c73103cf7da9e6f5cd54498184ef8a79d49b408@%3Cdev.arrow.apache.org%3E>
on the arrow listserve to see if they have any guidance on using the plasma
API from C.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAPjoFW0fo8oRHDjrdot_CGa12pq9A1bks5tbwcggaJpZM4Scd2O>
.
|
adding @robertnishihara @pcmoritz @pitrou to go along with the mailing list thread. Plasma buffers implement the buffer protocol so it seems like you should be able to use that |
Is xnd able to use foreign buffers as backing memory? If not, is there a reason why? Basically Plasma needs to allocate the memory itself in a certain way, to be able to hand it to its various client processes. (I'm assuming there may be limitations such as contiguity or restriction to certain well-behaved types) |
To be clear, my question is whether xnd supports the equivalent of |
There are some comments here about type metadata:
Plasma is unconcerned with the contents of the memory. So you will need to take responsibility for writing a serialized metadata header to prefix the xnd memory. We are using Flatbuffers for our serialized metadata for Arrow |
Yes. So we should be able to call the Plasma Python API from our C code that exposes our own Python API. By accessing that buffer in C, then we can get access to the pointer. I will try that approach before moving forward on any kind of C wrapper around the Plasma C++ API. |
On Wed, Mar 07, 2018 at 04:03:56PM +0000, Antoine Pitrou wrote:
Is xnd able to use foreign buffers as backing memory?
Yes, there's xnd.from_buffer(), using PEP-3118. xnd has the MemoryBlockObject
that manages foreign memory.
|
The problem with using from_buffer is that PEP-3118 does not support full ndtypes. What I'd like to see is plasma support for any structure that can be described by libndtypes (i.e. datashape). We will need this eventually. It would be good to get collaborating early if possible. |
Does plasma set the |
No -- the Plasma data is just bytes. You'll need to define a serialized data header for xnd data. The way we do it for Arrow objects is:
See https://github.com/apache/arrow/blob/master/format/IPC.md#encapsulated-message-format |
@wesm OK, that makes sense. Just to make sure I understand correctly, the plasma python API provides a PEP 3118 buffer but without the shape, format, strides, ndim, suboffets fields, so it's really just a pointer to the data. So you can write whatever data format you want, and any type information if you want to use that later to read the data. i.e. there isn't anything arrow specific in plasma.
@pitrou Ideally, we would access Plasma with |
Yep, that's right. We aren't forcing anyone to use any particular metadata representation. So you're free to define a metadata prefix that uses libndtypes, or anything you like |
I am closing this for now, since I moved the plasma integration to it's own repo. https://github.com/plures/pxnd If I hit any roadblocks there that I think need to be addressed on the Arrow side, I will open an issue on that repo and reach out on the mailing list. |
My two cents: I suspect that a more monorepo-like approach will make your life much easier rather than having to maintain interconnected C APIs amongst various Python packages with C libraries / C extensions. You can also make changes to multiple components at once without causing your CI setup to fall apart (i.e. if you have lots of repos, you may end up with multiple PRs related to a single change, each of which may cause the CI for any of the repos in the stack to break temporarily) |
This is true. It worked because I was really the only developer up to now. :) I would like to stabilize gumath a bit -- currently it's sometimes convenient to change ndtypes/xnd and worry later about gumath. But in a couple of months I think we can have something like a monorepo. |
Also, if we want to add |
Hey Stefan,
Why don't we move the plasma C bindings into the arrow repo as a first step
now? I'm happy to help with setting up the build system. They might also be
useful for other people!
…-- Philipp.
On Fri, Mar 16, 2018 at 10:27 AM, Stefan Krah ***@***.***> wrote:
Also, if we want to add __truediv__ etc. to xnd objects instead of just
libgumath functions like truediv(), we have a circular relationship
between xnd/gumath.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAG6pMQq7TZtWjBxuXi-1LqUHAOf_fStks5te_XsgaJpZM4Scd2O>
.
|
@pcmoritz Do you mean the part that is independent from xnd? I think that is a good idea, it would be very nice to have general plasma C bindings. |
Yeah, I mean this part here: https://github.com/plures/pxnd/tree/master/
libplasma
We can put it here: https://github.com/apache/arrow/tree/master/cpp/src/
plasma (for example as libplasma.h)
Feel free to create a PR and ping me if you need help with CMake.
…On Fri, Mar 16, 2018 at 10:58 AM, Stefan Krah ***@***.***> wrote:
@pcmoritz <https://github.com/pcmoritz> Do you mean the part that is
independent from xnd? I think that is a good idea, it would be very nice to
have general plasma C bindings.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAG6pN_s6EiH6vgMOKqBRFEVKI_Z0E9Eks5te_1CgaJpZM4Scd2O>
.
|
@wesm Yeah I am finding I might need to make a few changes to @pcmoritz Sure, I can do that soon. I am still iterating on them a lot to make sure they work correctly for us. It's my first time writing this sort of code, so I wanna see it working end to end before I believe it's correct. Once they are stable, I am happy to move them out, probably next week or the week after. |
We should be able to store xnd objects in the Plasma store, similar to how you can store arrow memory. In the blog post introducing Apache Arrow's usage with Plasma, they outline the motivation for the store. Similar to arrow, it would be beneficial to store xnd objects, so that multiple processes on on one computer could all access them without copying or serialization.
The text was updated successfully, but these errors were encountered: