Use CuPy array in `pip_bitmap_column_to_binary_array` #1418

isVoid · 2024-07-30T14:09:55Z

Description

The performance regression in #1413 is due to numba's DeviceNDArray
is slow in slicing. Recent cudf's DataFrame construction has simplified the construction and delegated construction
to similar logic that handles cuda_array_interface. Since the construction involves slicing the array, we need
this operation to be fast. In that sense, we should cast the use of DeviceNDArray to cupy array to support fast
slicing.

closes #1413

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

trxcllnt · 2024-07-30T17:30:39Z

Looks like the test needs to be updated as well:

    def test_pip_bitmap_column_to_binary_array():
        col = cudf.Series([0b00000000, 0b00001101, 0b00000011, 0b00001001])._column
        got = pip_bitmap_column_to_binary_array(col, width=4)
        expected = np.array(
            [[0, 0, 0, 0], [1, 1, 0, 1], [0, 0, 1, 1], [1, 0, 0, 1]], dtype="int8"
        )
>       np.testing.assert_array_equal(got.copy_to_host(), expected)
E       AttributeError: 'ndarray' object has no attribute 'copy_to_host'

mroeschke · 2024-07-30T18:22:52Z

python/cuspatial/cuspatial/utils/join_utils.py

@@ -31,7 +32,7 @@ def apply_binarize(in_col, width):
    if out.size > 0:
        out[:] = 0
        binarize.forall(out.size)(in_col, out, width)
-    return out
+    return cp.asarray(out)


If this is 2D, you could use asfortranarray so when sliced into 1D columns the memory is contiguous. (I'll also ensure this on the cudf side)

asfortranarray will reorder the data to F-contiguous with new allocation and increases peak memory usage. It would be nice if we don't have to assume memory order for return values? Cupy support fast indexing for any memory order anyway.

If we would like to change the order of the array, this could be done by passing the order flag to reshape

That said, given this wasn't done before with the DeviceNDArray (and may involve other subtle changes in associated code), maybe this should be converted to a new issue/PR?

maybe this should be converted to a new issue/PR?

Yes - again, unless explicitly required by downstream libraries, I would rather cuspatial not having to dictate a memory order here but just stick to the defaults. Switching to cupy gives the flexibility not having to be explicit on memory orders, and cudf is adding back support to different memory order arrays. This relieves the responsibility from cuspatial.

python/cuspatial/cuspatial/utils/join_utils.py

jakirkham · 2024-07-31T21:05:33Z

Have retitled the PR to reflect the current state. Hope that is ok 🙂

harrism · 2024-07-31T21:38:12Z

/merge

#1407 skipped the taxi dropoffs notebook due to performance regression fixed in #1418, so this PR re-enables the notebook in CI by removing it from SKIPNBS in ci/test_noteboks.sh. Authors: - Mark Harris (https://github.com/harrism) Approvers: - https://github.com/jakirkham - Michael Wang (https://github.com/isVoid) - James Lamb (https://github.com/jameslamb) URL: #1422

cast numba devicendarray to cupy array

3eddae7

isVoid requested a review from a team as a code owner July 30, 2024 14:09

isVoid requested review from trxcllnt and thomcom July 30, 2024 14:09

isVoid changed the title ~~Cast result of pip_bitmap_column_to_binary_array to cupy array~~ Cast result of pip_bitmap_column_to_binary_array to cupy array Jul 30, 2024

github-actions bot added the Python Related to Python code label Jul 30, 2024

style

5e72f84

trxcllnt approved these changes Jul 30, 2024

View reviewed changes

mroeschke reviewed Jul 30, 2024

View reviewed changes

jakirkham reviewed Jul 30, 2024

View reviewed changes

python/cuspatial/cuspatial/utils/join_utils.py Show resolved Hide resolved

harrism added bug Something isn't working non-breaking Non-breaking change labels Jul 30, 2024

isVoid added 3 commits July 31, 2024 03:19

update tests

9d0ce42

style

d88054d

use cupy array from the start

1590b70

harrism approved these changes Jul 31, 2024

View reviewed changes

jakirkham approved these changes Jul 31, 2024

View reviewed changes

isVoid mentioned this pull request Jul 31, 2024

Ensure objects with __interface__ are converted to cupy/numpy arrays rapidsai/cudf#16436

Merged

3 tasks

jakirkham changed the title ~~Cast result of pip_bitmap_column_to_binary_array to cupy array~~ Use CuPy array in pip_bitmap_column_to_binary_array Jul 31, 2024

rapids-bot bot merged commit fe3b0c9 into rapidsai:branch-24.08 Jul 31, 2024
69 checks passed

harrism mentioned this pull request Jul 31, 2024

Unskip taxi notebook from CI #1420

Closed

3 tasks

jakirkham mentioned this pull request Jul 31, 2024

Readd nyc_taxi_years_correlation.ipynb test #1421

Closed

3 tasks

harrism mentioned this pull request Jul 31, 2024

Unskip taxi notebook from CI #1422

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use CuPy array in `pip_bitmap_column_to_binary_array` #1418

Use CuPy array in `pip_bitmap_column_to_binary_array` #1418

isVoid commented Jul 30, 2024 •

edited

Loading

trxcllnt commented Jul 30, 2024

mroeschke Jul 30, 2024

isVoid Jul 31, 2024

jakirkham Jul 31, 2024

isVoid Jul 31, 2024

jakirkham commented Jul 31, 2024

harrism commented Jul 31, 2024

Use CuPy array in pip_bitmap_column_to_binary_array #1418

Use CuPy array in pip_bitmap_column_to_binary_array #1418

Conversation

isVoid commented Jul 30, 2024 • edited Loading

Description

Checklist

trxcllnt commented Jul 30, 2024

mroeschke Jul 30, 2024

Choose a reason for hiding this comment

isVoid Jul 31, 2024

Choose a reason for hiding this comment

jakirkham Jul 31, 2024

Choose a reason for hiding this comment

isVoid Jul 31, 2024

Choose a reason for hiding this comment

jakirkham commented Jul 31, 2024

harrism commented Jul 31, 2024

Use CuPy array in `pip_bitmap_column_to_binary_array` #1418

Use CuPy array in `pip_bitmap_column_to_binary_array` #1418

isVoid commented Jul 30, 2024 •

edited

Loading