Add `frombytes` to convert `bytes`-like to `DeviceBuffer` #253

jakirkham · 2020-01-25T23:59:30Z

Makes it easier to build a DeviceBuffer object from an existing bytes-like object in pure Python code.

Instead of strictly requiring `frombytes` take a `bytes` object, allow other `bytes`-like objects as well. This continues to support `bytes`, but could allow things like `bytearray`s, `memoryview`s, NumPy arrays, etc. Further all of these are supported without needing to copy the underlying data.

python/rmm/_lib/device_buffer.pyx

kkraus14 · 2020-01-27T15:34:22Z

python/rmm/_lib/device_buffer.pyx

@@ -57,17 +59,36 @@ cdef class DeviceBuffer:
        buf.c_obj = move(ptr)
        return buf

+    @staticmethod
+    @cython.boundscheck(False)
+    cdef DeviceBuffer c_frombytes(const unsigned char[::1] b,


Does this handle memoryview objects in general? If so it would be nice if we could have an overloaded non-const version to handle things like bytearray or other non-readonly objects.

It handles memoryview objects as long as they are 1-D, contiguous, and of type unsigned char.

The const just means that we won't modify the underlying buffer (much like if we took const char* and were given char*). So any mutable object meeting the constraints above could be passed in as well. This just promises we won't modify the underlying data. IOW bytearray is permissible and we test for this below.

As to other memory types and layouts that don't meet these requirements, users could coerce them into something that meets these requirements (with a minimal amount of copying to ensure contiguous data). Decided not to do that for the user as I wanted them to know we are not preserving the type or layout of their data. Though they could take the resulting DeviceBuffer and layer this information back on top by constructing some other array (like CuPy or Numba) and adding this info to that object.

Agreed with not implicitly making things contiguous at this level. I'm surprised that a bytearray is allowed for a const unsigned char[::1]. We previously ran into issues with using const vs non-const in handling numpy arrays that were marked read-only which is what prompted me to questioning an overload.

Ok cool.

Yeah I recall some cases where Cython didn't work as well with const memoryviews. Issue ( cython/cython#1772 ) comes to mind (though that's with fused types). Not sure if there were issues with const memoryviews on a specific type. Was this what you were thinking about?

Note: After discussion offline, we decided to add one more test using a read-only NumPy array, which has been included in commit ( 2234e5a ).

jakirkham · 2020-01-29T21:08:36Z

Here's a rough benchmark.

In [1]: import numpy                                                            

In [2]: import numba.cuda                                                       

In [3]: import rmm                                                              

In [4]: rmm.reinitialize(pool_allocator=True, 
   ...:                  initial_pool_size=int(2**30))                          
Out[4]: 0

In [5]: hb = numpy.asarray(memoryview(50_000_000 * b"a"))                       

In [6]: %timeit numba.cuda.to_device(hb)                                        
19.6 ms ± 175 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [7]: %timeit rmm.DeviceBuffer.frombytes(hb)                                  
4.69 ms ± 33.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [8]: %timeit numba.cuda.as_cuda_array(rmm.DeviceBuffer.frombytes(hb))        
5.22 ms ± 234 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

IOW one gets a ~4x improvement using frombuffer. The cost of converting to a Numba array after is also pretty small (if that is needed for some applications).

jakirkham · 2020-01-29T21:25:48Z

Should add we may be able to speed this up further by using hugepages.

kkraus14 · 2020-01-29T21:35:42Z

~5ms still seems really high from my perspective, have you confirmed via a python profile that there's no silly Python overhead?

jakirkham · 2020-01-29T21:42:51Z

That's 10GB/s. What would you expect instead?

Edit: This is from host memory to device memory on a DGX-1.

jakirkham · 2020-01-29T22:08:56Z

From reading the DGX-1's specs it states, "Pascal also supports 16 lanes of PCIe 3.0. In DGX-1, these are used for connecting between the CPUs and GPUs.". According to Wikipedia, this should give us a throughput of 15.75 GB/s. So it's possible to get 1.5x faster, but it doesn't seem like we are way slower. Please feel free to correct me.

kkraus14 · 2020-01-29T22:55:08Z

According to Wikipedia, this should give us a throughput of 15.75 GB/s. So it's possible to get 1.5x faster, but it doesn't seem like we are way slower. Please feel free to correct me.

I brainfarted and went off by an order of magnitude 😄. In practice that tends to go more towards ~12 GB/s so we're pretty close here which looks good.

jakirkham · 2020-01-29T23:00:30Z

No worries. It was good to work through the numbers. 🙂

Am curious how we might get closer to 12 GB/s (or even more) if possible. Do you have any ideas? 😉

kkraus14 · 2020-01-29T23:03:09Z

No worries. It was good to work through the numbers. 🙂

Am curious how we might get closer to 12 GB/s (or even more) if possible. Do you have any ideas? 😉

I'm guessing that is lower level things like pinned memory, using gdrcopy, etc.

jakirkham requested a review from a team as a code owner January 25, 2020 23:59

jakirkham force-pushed the add_devbuf_frombytes branch 27 times, most recently from 7d043ba to 1da0aa3 Compare January 26, 2020 21:09

jakirkham changed the title ~~Add frombytes to convert bytes to DeviceBuffer~~ Add frombytes to convert bytes-like to DeviceBuffer Jan 26, 2020

jakirkham force-pushed the add_devbuf_frombytes branch from 1da0aa3 to a8a85b9 Compare January 27, 2020 00:01

jakirkham added 7 commits January 26, 2020 16:08

Test frombytes/tobytes round trip

bf1e662

Make black happy

78e0f21

Make flake8-cython happy

dea47a7

Test frombytes with other bytes-like data

4d54bba

Make black happy

fa38277

Note converting bytes to DeviceBuffer

6eeaad8

jakirkham force-pushed the add_devbuf_frombytes branch from a8a85b9 to 6eeaad8 Compare January 27, 2020 00:08

kkraus14 reviewed Jan 27, 2020

View reviewed changes

kkraus14 added the 2 - In Progress Currently a work in progress label Jan 27, 2020

kkraus14 approved these changes Jan 27, 2020

View reviewed changes

Keith Kraus and others added 2 commits January 27, 2020 11:35

Merge branch 'branch-0.13' into add_devbuf_frombytes

eef69f8

Add read-only uint8 NumPy array test

2234e5a

kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 2 - In Progress Currently a work in progress labels Jan 27, 2020

kkraus14 merged commit 71ffd0d into rapidsai:branch-0.13 Jan 27, 2020

jakirkham deleted the add_devbuf_frombytes branch January 27, 2020 17:54

jakirkham mentioned this pull request Jan 29, 2020

Dask-cudf multi partition merge slows down with ucx rapidsai/ucx-py#402

Closed

jakirkham mentioned this pull request Jul 27, 2020

Evaluate further serialization performance improvements rapidsai/dask-cuda#106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `frombytes` to convert `bytes`-like to `DeviceBuffer` #253

Add `frombytes` to convert `bytes`-like to `DeviceBuffer` #253

jakirkham commented Jan 25, 2020 •

edited

Loading

kkraus14 Jan 27, 2020

jakirkham Jan 27, 2020

kkraus14 Jan 27, 2020

jakirkham Jan 27, 2020

jakirkham Jan 27, 2020

jakirkham commented Jan 29, 2020

jakirkham commented Jan 29, 2020

kkraus14 commented Jan 29, 2020

jakirkham commented Jan 29, 2020 •

edited

Loading

jakirkham commented Jan 29, 2020

kkraus14 commented Jan 29, 2020

jakirkham commented Jan 29, 2020

kkraus14 commented Jan 29, 2020

Add frombytes to convert bytes-like to DeviceBuffer #253

Add frombytes to convert bytes-like to DeviceBuffer #253

Conversation

jakirkham commented Jan 25, 2020 • edited Loading

kkraus14 Jan 27, 2020

Choose a reason for hiding this comment

jakirkham Jan 27, 2020

Choose a reason for hiding this comment

kkraus14 Jan 27, 2020

Choose a reason for hiding this comment

jakirkham Jan 27, 2020

Choose a reason for hiding this comment

jakirkham Jan 27, 2020

Choose a reason for hiding this comment

jakirkham commented Jan 29, 2020

jakirkham commented Jan 29, 2020

kkraus14 commented Jan 29, 2020

jakirkham commented Jan 29, 2020 • edited Loading

jakirkham commented Jan 29, 2020

kkraus14 commented Jan 29, 2020

jakirkham commented Jan 29, 2020

kkraus14 commented Jan 29, 2020

Add `frombytes` to convert `bytes`-like to `DeviceBuffer` #253

Add `frombytes` to convert `bytes`-like to `DeviceBuffer` #253

jakirkham commented Jan 25, 2020 •

edited

Loading

jakirkham commented Jan 29, 2020 •

edited

Loading