Use limited API-compatible APIs in arr#600
Conversation
pentschev
left a comment
There was a problem hiding this comment.
Thanks @vyasr for working on this, I'm very much supportive of those changes if we can have the tests fixed and clarify why the changes to the benchmark were made, all those changes at first glance seem undesirable to me and possibly indicative of a regression in Array support. If we can have all tests passing, hopefully without requiring the changes to benchmarks, and can still maintain the performance from the description, I'm +1 on getting these changes in.
| length = ((self.args.n_bytes + 3) // 4) * 4 | ||
| shape = (length // 4, 2, 2) | ||
| if not self.args.enable_am: | ||
| if self.args.reuse_alloc and self.args.n_buffers == 1: | ||
| reuse_msg = Array(xp.zeros(self.args.n_bytes, dtype="u1")) | ||
| reuse_msg = Array(xp.reshape(xp.zeros(length, dtype="u1"), shape)) |
There was a problem hiding this comment.
Why do we need this change with all the cryptic constants and the ultimate reshape? Note we don't do that in other implementations like https://github.com/rapidsai/ucxx/blob/main/python/ucxx/ucxx/benchmarks/backends/ucxx_core.py, and I would very much like preventing comparing apples to oranges in the results.
Sorry, I should have read the description more attentively, I see what's the reason for the shape changes now. If we can get the tests fixed, I'm definitely fine with the performance results which are essentially unchanged. I would like us not to commit the 3D changes to benchmark, or make it configurable, but if you'd like to make that configurable we should do it to other backends as well, which I think it's not too much worth the trouble. |
|
The failing test is IMHO a bit too strong. It's checking whether the obj under the Array is the same as the obj under the memoryview. The failure is occurring because arr = Array(buffer)
mv = memoryview(obj=buffer)
test_obj = arr.obj
# General solution
while(isinstance(test_obj, memoryview)):
test_obj = test_obj.obj
# Or, more specifically
test_obj = arr.obj.obj
# Now this will pass
assert obj is mv.objIMO changing the test is the right answer since we shouldn't promise that we don't wind up with the object chain happening i.e. However, if we feel strongly otherwise, it's straightforward enough to fix this in the code with the following diff ❯ git diff
diff --git a/python/ucxx/ucxx/_lib/arr.pyx b/python/ucxx/ucxx/_lib/arr.pyx
index 6bf0d00..e3411a5 100644
--- a/python/ucxx/ucxx/_lib/arr.pyx
+++ b/python/ucxx/ucxx/_lib/arr.pyx
@@ -142,6 +142,8 @@ cdef class Array:
self.ptr = <uintptr_t>pybuf.buf
self.obj = pybuf.obj
+ while isinstance(self.obj, memoryview):
+ self.obj = (<memoryview>self.obj).obj
self.readonly = <bint>pybuf.readonly
self.ndim = <Py_ssize_t>pybuf.ndim
self.itemsize = <Py_ssize_t>pybuf.itemsize@pentschev what do you think? |
|
Thanks for investigating @vyasr . I'm in agreement with your assessment and I'm fine with your proposal of adjusting the test alone. I think the original implementation was done by @jakirkham so I'm pinging here in case he has strong opinions. |
This reverts commit 5904f16.
|
Great. I've updated the test and reverted the benchmark changes. |
pentschev
left a comment
There was a problem hiding this comment.
LGTM, thanks Vyas! I'll leave this open until EOD in case @jakirkham has comments, otherwise we're good to go.
|
/merge |
| while isinstance(obj, memoryview): | ||
| obj = obj.obj |
There was a problem hiding this comment.
memoryviews are actually pretty clever in that they avoid multiple nesting
They do this by making all memoryviews views that share an underlying buffer
We can see this with a simple example
In [1]: b = b"abc"
In [2]: mv1 = memoryview(b)
In [3]: mv2 = memoryview(mv1)
In [4]: mv1.obj is b
Out[4]: True
In [5]: mv2.obj is b
Out[5]: TrueSo I don't think this while is needed. As it will resolve in the first loop. Though it isn't harming anything (particularly in a test)
There was a problem hiding this comment.
You're right that we don't really need a while loop, we just need one level of extraction (see the explanation at #600 (comment)). The top level of extraction is sufficient.
|
It looks like the Conda build is failing on |
Yes, see rapidsai/ci-imgs#376 |
|
Great thanks Vyas! 🙏 After restarting, CI passed So this change should now be available in packages |
xref rapidsai/build-planning#42 Changes made: - picked up the changes in #600 so we are now limited API compatible - split `libucxx` and `ucxx` `rattler-build` recipes into separate folder - needed to do this because we have different matrix filters for the two build-types and we need/want to build `cp311` packages using Python 3.11 Ops-Bot-Merge-Barrier: true Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - Bradley Dice (https://github.com/bdice) URL: #564
This PR isolates the changes to only use symbols that are part of the limited API from #564. The second commit includes the test changes that I used to benchmark, which should be reverted
I ran the
send_recvbenchmark (ucxx-asyncbackend, TAG transfer API, NumPy objects, 1 B payload, 100,000 iterations, TCP transport) across 8 configurations:I was looking to see if the limited API changes meaningfully affected performance, particularly in the 3D case since the changes were around the array's allocation of shapes, which are more relevant for higher dimensional arrays. What I found was that there are no meaningful performance differences across any of the 8 configurations. Bandwidth and latency numbers are consistent within normal run-to-run variance across all combinations of progress mode, code version, and array dimensionality. In particular, the limited API changes show no measurable overhead on either linear or multi-dimensional arrays. All the differences seem to be within the margin of noise.
Raw Results
blocking · original · linear array
blocking · original · nd array
blocking · limited API · linear array
blocking · limited API · nd array
polling · original · linear array
polling · original · nd array
polling · limited API · linear array
polling · limited API · nd array