Store xnd objects in plasma #4

saulshanabrook · 2018-03-05T15:22:32Z

We should be able to store xnd objects in the Plasma store, similar to how you can store arrow memory. In the blog post introducing Apache Arrow's usage with Plasma, they outline the motivation for the store. Similar to arrow, it would be beneficial to store xnd objects, so that multiple processes on on one computer could all access them without copying or serialization.

saulshanabrook · 2018-03-05T17:25:02Z

Originally I thought I could store an xnd object in Plasma without copying, but I no longer believe this to be the case. The create command returns a new memory buffer that Plasma creates and hands you a pointer to. It doesn't seem to support taking ownership over an existing piece of memory.

This does make sense, since Plasma stores immutable data.

teoliphant · 2018-03-05T17:54:07Z

I think the idea is that you create an object in plasma, then seal it. Plasma becomes the "malloc". It is the owner of the shared memory. This makes sense. 1) We need to make sure xnd can take a reference to other memory and 2) We need to make sure that plasma can accept ndtypes descriptions for the memory

…

-Travis

--- Travis Oliphant Quansight

On Mon, Mar 5, 2018 at 11:25 AM, Saul Shanabrook ***@***.***> wrote: Originally I thought I could store an xnd object in Plasma without copying, but I no longer believe this to be the case. The create command returns a new memory buffer that Plasma creates and hands you a pointer to. It doesn't seem to support taking ownership over an existing piece of memory. This does make sense, since Plasma stores immutable data. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAPjoL45TQuxr8rAWByFsriZPnRU2ubTks5tbXTvgaJpZM4Scd2O> .

saulshanabrook · 2018-03-05T18:17:28Z

Yeah that makes sense.

So do we have to be able to serialize the XndObject to that contiguous buffer?

typedef struct {
    PyObject_HEAD
    MemoryBlockObject *mblock; /* owner of the primary type and memory block */
    PyObject *type;            /* owner of the current type */
    xnd_t xnd;                 /* typed view, does not own anything */
} XndObject;

In trying to get a sense of exactly where the memory is stored, is it possible to inspect these fields
in a Python REPL on an an xnd instance?

skrah · 2018-03-05T18:26:01Z

On Mon, Mar 05, 2018 at 06:18:54PM +0000, Saul Shanabrook wrote: So do we have to be able to serialize the `XndObject` to that contiguous buffer?

As I understand this https://arrow.apache.org/docs/python/plasma.html#putting-and-getting-python-objects "This works with all Python objects supported by the Arrow Python object serialization.", we'd need Arrow serialization. I'm not familiar with plasma though.

teoliphant · 2018-03-05T18:52:21Z

We don't want to serialize anything (except the description of the buffer). We want to 1) allocate memory in plasma and ensure that the ndtypes description is also stored. 2) get a handle the to the buffer and place it in the xnd container 3) get the ndtypes description of the buffer and ensure the xnd container has the same one.

…

-Travis

--- Travis Oliphant Quansight

On Mon, Mar 5, 2018 at 12:18 PM, Saul Shanabrook ***@***.***> wrote: Yeah that makes sense. So do we have to be able to serialize the XndObject to that contiguous buffer? typedef struct { PyObject_HEAD MemoryBlockObject *mblock; /* owner of the primary type and memory block */ PyObject *type; /* owner of the current type */ xnd_t xnd; /* typed view, does not own anything */ } XndObject; In trying to get a sense of exactly where the memory is stored, is it possible to inspect these fields in a Python REPL on an an xnd instance? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAPjoD4qwbJ2jUel_gPf4f2m5RuHOUykks5tbYE4gaJpZM4Scd2O> .

saulshanabrook · 2018-03-05T22:11:12Z

So would the desired user flow be something like:

Create plasma client connected to store
Tell xnd to use this client
Create some containers with xnd
Retrieve their plasma ids
Start another process
Create plasma client again
Get access to xnd containers using plasma ids

Right now, memory allocation in xnd uses the ndt_aligned_calloc function. This happens when creating a new xnd container, when creating a Ref, and when creating bytes.

If we could get a pointer back from Plasma's create function, then we could use that instead of ndt_aligned_calloc. If we are storying bytes or references, then this means storing multiple Plasma objects.

Is it right to assume that different plasma clients could have different pointer addresses for the same plasma object? Because we are using virtual memory?

If this is the case, then it might not be good to store the pointer for a Ref directly in the xnd data, but instead store the plasma ID instead (same with bytes).

saulshanabrook · 2018-03-05T22:44:20Z

I opened an issue asking if it was possible to get a pointer back from the plasma store, to write to apache/arrow#1706

skrah · 2018-03-06T12:19:57Z

Here is my current take on it.

Option 1:

I would not like to replace existing allocation functions. Instead, I'd like to proceed cautiously and create a separate pxnd (p for parallel) object.

We would only support pointer-free types. The pxnd object would have a datashape Python string object and a plasma object id and could therefore be pickled.

Creating this object would allocate a data block in plasma.

I assume that the picklable pxndobject would be sent via a pipe to consumer processes.

In such a process, the consumer would call something like pxnd.get(), which would use the existing MemoryBlockObject in the same way as the from_buffer function and return a regular xnd object.

That is to say, the memory in the xnd_master buffer is marked as external, but creation of the xnd object largely proceeds in the usual way. This is the same way the existing PEP-3118 imports are handled.

This all hinges on the possibility of getting a raw pointer from the plasma data store.

If Arrow headers are required for this, I'd like to check for these headers in ./configure and make the compilation of the pxnd object conditional on the presence of the headers.

In general, I'd like to keep the pxnd object in a separate section or file and be able to compile and use xnd without plasma support.

Option 2:

Only allow NumPy types and hope that the existing plasma functions take a PEP-3118 buffer.
In that case, we'd need to add PEP-3118 exporting facilities to xnd.

teoliphant · 2018-03-06T13:12:03Z

Option 2 is a non-starter in my mind as we need something that extends NumPy types. The data-shape and xnd should be usable where PEP-3118 is not. A variant for Option 1 is fine in my mind too. Plasma *must* be an optional part. It is only an important use-case and not something xnd should rely on. Because XND can accept foreign buffers as it's memory, this should be straightforward. You just have another module for interfacing with plasma. Introducing ndtypes to plasma is a good opportunity to see how ndtypes and Arrow interact.

…

--- Travis Oliphant Quansight

On Tue, Mar 6, 2018 at 6:19 AM, Stefan Krah ***@***.***> wrote: Here is my *current* take on it. Option 1: I would not like to replace existing allocation functions. Instead, I'd like to proceed cautiously and create a separate pxnd (p for parallel) object. We would only support pointer-free types. The pxnd object would have a datashape Python string object and a plasma object id and could therefore be pickled. Creating this object would allocate a data block in plasma. I assume that the picklable `pxnd``object would be sent via a pipe to consumer processes. In such a process, the consumer would call something like pxnd.get(), which would use the existing MemoryBlockObject in the same way as the from_buffer function and return a regular xnd object. That is to say, the memory in the xnd_master buffer is marked as external, but creation of the xnd object largely proceeds in the usual way. This is the same way the existing PEP-3118 imports are handled. This all hinges on the possibility of getting a raw pointer from the plasma data store. If Arrow headers are required for this, I'd like to check for these headers in ./configure and make the compilation of the pxnd object conditional on the presence of the headers. In general, I'd like to keep the pxnd object in a separate section or file and be able to compile and use xnd without plasma support. Option 2: Only allow NumPy types and hope that the existing plasma functions take a PEP-3118 buffer. In that case, we'd need to add PEP-3118 exporting facilities to xnd. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAPjoEbuyKM_mcI1lN7_BkSLhfYfTogDks5tbn7tgaJpZM4Scd2O> .

skrah · 2018-03-06T13:29:01Z

On Tue, Mar 06, 2018 at 01:12:04PM +0000, Travis E. Oliphant wrote: Option 2 is a non-starter in my mind as we need something that extends NumPy types. The data-shape and xnd should be usable where PEP-3118 is not.

Agreed. It's more of a fallback if some unknown problems arise.

saulshanabrook · 2018-03-06T22:01:04Z

The C++ plasma code looks like it can give you back a pointer to the buffer(https://arrow.apache.org/docs/cpp/classplasma_1_1_plasma_client.html#ac18ab9cc792c620a97a3dcb165e0ecd7), but I am not sure if it's possible for use that, since we are using C and they don't export anything to C.

The Cython wrapper might also be helpful, but I also don't think it's possible for us to use that in our C code, since they don't make anything public. I opened a thread on the arrow listserve to see if they have any guidance on using the plasma API from C.

teoliphant · 2018-03-07T08:30:49Z

We may have to make a pull request to add a simple C-export of the buffer. Alternatively the C++ code that binds libxnd to plasma can create that API. We can start there. This should be a separate library from libxnd.

…

-Travis

--- Travis Oliphant Quansight

On Tue, Mar 6, 2018 at 4:01 PM, Saul Shanabrook ***@***.***> wrote: The C++ plasma code looks like it can give you back a pointer to the buffer(https://arrow.apache.org/docs/cpp/classplasma_1_1_ plasma_client.html#ac18ab9cc792c620a97a3dcb165e0ecd7), but I am not sure if it's possible for use that, since we are using C and they don't export anything to C. The Cython wrapper <https://github.com/apache/arrow/blob/master/python/pyarrow/plasma.pyx> might also be helpful, but I also don't think it's possible for us to use that in our C code, since they don't make anything public. I opened a thread <https://lists.apache.org/thread.html/0ba7b568fd29a1250c73103cf7da9e6f5cd54498184ef8a79d49b408@%3Cdev.arrow.apache.org%3E> on the arrow listserve to see if they have any guidance on using the plasma API from C. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAPjoFW0fo8oRHDjrdot_CGa12pq9A1bks5tbwcggaJpZM4Scd2O> .

wesm · 2018-03-07T16:00:45Z

adding @robertnishihara @pcmoritz @pitrou to go along with the mailing list thread. Plasma buffers implement the buffer protocol so it seems like you should be able to use that

pitrou · 2018-03-07T16:03:54Z

Is xnd able to use foreign buffers as backing memory? If not, is there a reason why? Basically Plasma needs to allocate the memory itself in a certain way, to be able to hand it to its various client processes.

(I'm assuming there may be limitations such as contiguity or restriction to certain well-behaved types)

pitrou · 2018-03-07T16:04:54Z

To be clear, my question is whether xnd supports the equivalent of np.frombuffer() to create xnd objects.

wesm · 2018-03-07T16:08:27Z

There are some comments here about type metadata:

We need to make sure that plasma can accept ndtypes descriptions for the
memory

Plasma is unconcerned with the contents of the memory. So you will need to take responsibility for writing a serialized metadata header to prefix the xnd memory. We are using Flatbuffers for our serialized metadata for Arrow

saulshanabrook · 2018-03-07T16:38:09Z

Is xnd able to use foreign buffers as backing memory?

Yes. So we should be able to call the Plasma Python API from our C code that exposes our own Python API. By accessing that buffer in C, then we can get access to the pointer.

I will try that approach before moving forward on any kind of C wrapper around the Plasma C++ API.

skrah · 2018-03-07T16:43:25Z

On Wed, Mar 07, 2018 at 04:03:56PM +0000, Antoine Pitrou wrote: Is xnd able to use foreign buffers as backing memory?

Yes, there's xnd.from_buffer(), using PEP-3118. xnd has the MemoryBlockObject that manages foreign memory.

teoliphant · 2018-03-07T21:11:44Z

The problem with using from_buffer is that PEP-3118 does not support full ndtypes. What I'd like to see is plasma support for any structure that can be described by libndtypes (i.e. datashape). We will need this eventually. It would be good to get collaborating early if possible.

saulshanabrook · 2018-03-07T21:27:19Z

Does plasma set the format of the the buffer that is returned?

wesm · 2018-03-07T21:31:15Z

No -- the Plasma data is just bytes. You'll need to define a serialized data header for xnd data. The way we do it for Arrow objects is:

4 bytes for metadata size
serialized metadata (we use Flatbuffers)
message body

See https://github.com/apache/arrow/blob/master/format/IPC.md#encapsulated-message-format

saulshanabrook · 2018-03-07T21:51:58Z

No -- the Plasma data is just bytes. You'll need to define a serialized data header for xnd data.

@wesm OK, that makes sense. Just to make sure I understand correctly, the plasma python API provides a PEP 3118 buffer but without the shape, format, strides, ndim, suboffets fields, so it's really just a pointer to the data. So you can write whatever data format you want, and any type information if you want to use that later to read the data. i.e. there isn't anything arrow specific in plasma.

Is xnd able to use foreign buffers as backing memory? If not, is there a reason why? Basically Plasma needs to allocate the memory itself in a certain way, to be able to hand it to its various client processes.

@pitrou Ideally, we would access Plasma with libxnd, which isn't specific to Python. That is more direct than going through the Python API and also other language bindings for libxnd will be able to use that integration.

wesm · 2018-03-08T04:40:52Z

So you can write whatever data format you want, and any type information if you want to use that later to read the data. i.e. there isn't anything arrow specific in plasma.

Yep, that's right. We aren't forcing anyone to use any particular metadata representation. So you're free to define a metadata prefix that uses libndtypes, or anything you like

saulshanabrook · 2018-03-16T05:26:53Z

I am closing this for now, since I moved the plasma integration to it's own repo. https://github.com/plures/pxnd If I hit any roadblocks there that I think need to be addressed on the Arrow side, I will open an issue on that repo and reach out on the mailing list.

wesm · 2018-03-16T17:10:58Z

My two cents: I suspect that a more monorepo-like approach will make your life much easier rather than having to maintain interconnected C APIs amongst various Python packages with C libraries / C extensions. You can also make changes to multiple components at once without causing your CI setup to fall apart (i.e. if you have lots of repos, you may end up with multiple PRs related to a single change, each of which may cause the CI for any of the repos in the stack to break temporarily)

skrah · 2018-03-16T17:24:39Z

This is true. It worked because I was really the only developer up to now. :)

I would like to stabilize gumath a bit -- currently it's sometimes convenient to change ndtypes/xnd and worry later about gumath. But in a couple of months I think we can have something like a monorepo.

skrah · 2018-03-16T17:27:06Z

Also, if we want to add __truediv__ etc. to xnd objects instead of just using libgumath functions like truediv(), we have a circular relationship between xnd/gumath.

pcmoritz · 2018-03-16T17:29:38Z

Hey Stefan, Why don't we move the plasma C bindings into the arrow repo as a first step now? I'm happy to help with setting up the build system. They might also be useful for other people!

…

-- Philipp.

On Fri, Mar 16, 2018 at 10:27 AM, Stefan Krah ***@***.***> wrote: Also, if we want to add __truediv__ etc. to xnd objects instead of just libgumath functions like truediv(), we have a circular relationship between xnd/gumath. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAG6pMQq7TZtWjBxuXi-1LqUHAOf_fStks5te_XsgaJpZM4Scd2O> .

skrah · 2018-03-16T17:58:21Z

@pcmoritz Do you mean the part that is independent from xnd? I think that is a good idea, it would be very nice to have general plasma C bindings.

pcmoritz · 2018-03-16T18:13:42Z

Yeah, I mean this part here: https://github.com/plures/pxnd/tree/master/ libplasma We can put it here: https://github.com/apache/arrow/tree/master/cpp/src/ plasma (for example as libplasma.h) Feel free to create a PR and ping me if you need help with CMake.

…

On Fri, Mar 16, 2018 at 10:58 AM, Stefan Krah ***@***.***> wrote: @pcmoritz <https://github.com/pcmoritz> Do you mean the part that is independent from xnd? I think that is a good idea, it would be very nice to have general plasma C bindings. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAG6pN_s6EiH6vgMOKqBRFEVKI_Z0E9Eks5te_1CgaJpZM4Scd2O> .

saulshanabrook · 2018-03-16T20:25:15Z

@wesm Yeah I am finding I might need to make a few changes to xnd as I am developing pxnd, so a monorepo might make sense eventually. It is nice to have the build separate so that xnd doesn't have to depend on Plasma and since authorship is so far split between them.

@pcmoritz Sure, I can do that soon. I am still iterating on them a lot to make sure they work correctly for us. It's my first time writing this sort of code, so I wanna see it working end to end before I believe it's correct. Once they are stable, I am happy to move them out, probably next week or the week after.

saulshanabrook mentioned this issue Mar 13, 2018

WIP: Allow storing xnd in plasma #7

Closed

11 tasks

saulshanabrook closed this as completed Mar 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store xnd objects in plasma #4

Store xnd objects in plasma #4

saulshanabrook commented Mar 5, 2018

saulshanabrook commented Mar 5, 2018

teoliphant commented Mar 5, 2018 via email

saulshanabrook commented Mar 5, 2018 •

edited

Loading

skrah commented Mar 5, 2018 via email

teoliphant commented Mar 5, 2018 via email

saulshanabrook commented Mar 5, 2018

saulshanabrook commented Mar 5, 2018 •

edited

Loading

skrah commented Mar 6, 2018 •

edited

Loading

teoliphant commented Mar 6, 2018 via email

skrah commented Mar 6, 2018 via email

saulshanabrook commented Mar 6, 2018

teoliphant commented Mar 7, 2018 via email

wesm commented Mar 7, 2018 •

edited

Loading

pitrou commented Mar 7, 2018

pitrou commented Mar 7, 2018

wesm commented Mar 7, 2018

saulshanabrook commented Mar 7, 2018 •

edited

Loading

skrah commented Mar 7, 2018 via email

teoliphant commented Mar 7, 2018

saulshanabrook commented Mar 7, 2018

wesm commented Mar 7, 2018

saulshanabrook commented Mar 7, 2018

wesm commented Mar 8, 2018

saulshanabrook commented Mar 16, 2018

wesm commented Mar 16, 2018

skrah commented Mar 16, 2018 •

edited

Loading

skrah commented Mar 16, 2018 •

edited

Loading

pcmoritz commented Mar 16, 2018 via email

skrah commented Mar 16, 2018

pcmoritz commented Mar 16, 2018 via email

saulshanabrook commented Mar 16, 2018 •

edited

Loading

Store xnd objects in plasma #4

Store xnd objects in plasma #4

Comments

saulshanabrook commented Mar 5, 2018

saulshanabrook commented Mar 5, 2018

teoliphant commented Mar 5, 2018 via email

saulshanabrook commented Mar 5, 2018 • edited Loading

skrah commented Mar 5, 2018 via email

teoliphant commented Mar 5, 2018 via email

saulshanabrook commented Mar 5, 2018

saulshanabrook commented Mar 5, 2018 • edited Loading

skrah commented Mar 6, 2018 • edited Loading

teoliphant commented Mar 6, 2018 via email

skrah commented Mar 6, 2018 via email

saulshanabrook commented Mar 6, 2018

teoliphant commented Mar 7, 2018 via email

wesm commented Mar 7, 2018 • edited Loading

pitrou commented Mar 7, 2018

pitrou commented Mar 7, 2018

wesm commented Mar 7, 2018

saulshanabrook commented Mar 7, 2018 • edited Loading

skrah commented Mar 7, 2018 via email

teoliphant commented Mar 7, 2018

saulshanabrook commented Mar 7, 2018

wesm commented Mar 7, 2018

saulshanabrook commented Mar 7, 2018

wesm commented Mar 8, 2018

saulshanabrook commented Mar 16, 2018

wesm commented Mar 16, 2018

skrah commented Mar 16, 2018 • edited Loading

skrah commented Mar 16, 2018 • edited Loading

pcmoritz commented Mar 16, 2018 via email

skrah commented Mar 16, 2018

pcmoritz commented Mar 16, 2018 via email

saulshanabrook commented Mar 16, 2018 • edited Loading

saulshanabrook commented Mar 5, 2018 •

edited

Loading

saulshanabrook commented Mar 5, 2018 •

edited

Loading

skrah commented Mar 6, 2018 •

edited

Loading

wesm commented Mar 7, 2018 •

edited

Loading

saulshanabrook commented Mar 7, 2018 •

edited

Loading

skrah commented Mar 16, 2018 •

edited

Loading

skrah commented Mar 16, 2018 •

edited

Loading

saulshanabrook commented Mar 16, 2018 •

edited

Loading