[REVIEW]Handle index when dispatching __array_function and array_ufunc__ to cupy for cudf.Series #6839

VibhuJawa · 2020-11-24T01:57:48Z

This pr adds index handling when dispatching to cupy functions with __ufunc__ and __array_function__ for cudf.Series.

This PR does the following:

Adds index handling for __ufunc__ and __array_function (when being dispatched to cupy)
Adds test to ensure the same results as pandas with aligned index
Adds tests for appropriate errors non-aligned index
Removs support for list inputs (should not have been supported initially too)

Please note that I am unsure how to handle list inputs here.
The problem being solved here is below:

With this PR #6839 we get the correct index when we do the following:

>>> cudf_s1 = cudf.Series(data=[-1, 2, 3, 0], index=[2, 3, 1, 0])
>>> cudf_s2 = cudf.Series(data=[-1, -2, 3, 0], index=[2, 3, 1, 0])
>>> o = np.logaddexp(cudf_s1, cudf_s2)
>>> o.index
Int64Index([2, 3, 1, 0], dtype='int64')
>>> print(o)
2   -0.306853
3    2.018150
1    3.693147
0    0.693147
dtype: float64

On Master we get:

>>> cudf_s1 = cudf.Series(data=[-1, 2, 3, 0], index=[2, 3, 1, 0])
>>> cudf_s2 = cudf.Series(data=[-1, -2, 3, 0], index=[2, 3, 1, 0])
>>> o = np.logaddexp(cudf_s1, cudf_s2)
>>> o.index
RangeIndex(start=0, stop=4, step=1)
>>> print(o)
0   -0.306853
1    2.018150
2    3.693147
3    0.693147
dtype: float64

GPUtester · 2020-11-24T01:58:21Z

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

codecov · 2020-11-24T15:47:38Z

Codecov Report

Merging #6839 (7a12226) into branch-0.17 (cdc53b7) will increase coverage by 0.01%.
The diff coverage is 93.75%.

@@               Coverage Diff               @@
##           branch-0.17    #6839      +/-   ##
===============================================
+ Coverage        81.94%   81.95%   +0.01%     
===============================================
  Files               96       96              
  Lines            16164    16168       +4     
===============================================
+ Hits             13246    13251       +5     
+ Misses            2918     2917       -1

Impacted Files	Coverage Δ
python/cudf/cudf/utils/utils.py	`84.93% <93.75%> (+0.68%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cdc53b7...7a12226. Read the comment docs.

python/cudf/cudf/utils/utils.py

VibhuJawa · 2020-11-24T19:36:18Z

python/cudf/cudf/tests/test_array_function.py

-    expect = np.concatenate([ar, ar, ar])
-    got = np.concatenate([s, s, s])
-    assert_eq(expect, got.to_array())
+    with pytest.raises(TypeError):


Removed this as the index is undefined when dispatching to cupyfor these functions.

VibhuJawa · 2020-11-24T20:56:50Z

rerun tests

python/cudf/cudf/tests/test_array_ufunc.py

python/cudf/cudf/utils/utils.py

brandon-b-miller · 2020-11-30T20:58:39Z

In the process of looking this over - I am building this branch to better understand the problem being solved here.

VibhuJawa · 2020-11-30T21:13:59Z

In the process of looking this over - I am building this branch to better understand the problem being solved here.

Below might help. With this PR we get the correct index on the following:

>>> cudf_s1 = cudf.Series(data=[-1, 2, 3, 0], index=[2, 3, 1, 0])
>>> cudf_s2 = cudf.Series(data=[-1, -2, 3, 0], index=[2, 3, 1, 0])
>>> o = np.logaddexp(cudf_s1, cudf_s2)
>>> o.index
Int64Index([2, 3, 1, 0], dtype='int64')
>>> print(o)
2   -0.306853
3    2.018150
1    3.693147
0    0.693147
dtype: float64

On Master we get:

>>> cudf_s1 = cudf.Series(data=[-1, 2, 3, 0], index=[2, 3, 1, 0])
>>> cudf_s2 = cudf.Series(data=[-1, -2, 3, 0], index=[2, 3, 1, 0])
>>> o = np.logaddexp(cudf_s1, cudf_s2)
>>> o.index
RangeIndex(start=0, stop=4, step=1)
>>> print(o)
0   -0.306853
1    2.018150
2    3.693147
3    0.693147
dtype: float64

python/cudf/cudf/utils/utils.py

isVoid

Just some general questions.

python/cudf/cudf/tests/test_array_ufunc.py

python/cudf/cudf/utils/utils.py

…ugfix

VibhuJawa · 2020-12-01T03:31:33Z

All reviews have been addressed, this should be ready for another review.

CC: @brandon-b-miller , @isVoid

brandon-b-miller · 2020-12-01T16:59:03Z

One last question otherwise LGTM.

VibhuJawa · 2020-12-01T22:16:59Z

@isVoid, Please let me know if there is anything else I need to address.

isVoid

lgtm!

initial index handling with cupy fall back functions

b6325e7

VibhuJawa changed the title ~~[WIP]Index handling with cupy fallback functions~~ [WIP]Index handling with cupy fallback functions[Skip-CI] Nov 24, 2020

VibhuJawa changed the title ~~[WIP]Index handling with cupy fallback functions[Skip-CI]~~ [WIP]Index handling with cupy fallback functions [skip ci] Nov 24, 2020

cleaned up tests and code

ccd5275

VibhuJawa commented Nov 24, 2020

View reviewed changes

python/cudf/cudf/utils/utils.py Show resolved Hide resolved

VibhuJawa changed the title ~~[WIP]Index handling with cupy fallback functions [skip ci]~~ [WIP]Handle index when dispatching __array_function__ and __array_ufunc__ to cupy for cudf.Series [skip ci] Nov 24, 2020

added changelog

0f30525

VibhuJawa changed the title ~~[WIP]Handle index when dispatching __array_function__ and __array_ufunc__ to cupy for cudf.Series [skip ci]~~ [REVIEW]Handle index when dispatching __array_function__ and __array_ufunc__ to cupy for cudf.Series [skip ci] Nov 24, 2020

VibhuJawa changed the title ~~[REVIEW]Handle index when dispatching __array_function__ and __array_ufunc__ to cupy for cudf.Series [skip ci]~~ [REVIEW]Handle index when dispatching __array_function__ and __array_ufunc__ to cupy for cudf.Series Nov 24, 2020

VibhuJawa marked this pull request as ready for review November 24, 2020 20:53

VibhuJawa requested a review from a team as a code owner November 24, 2020 20:53

VibhuJawa requested review from isVoid and brandon-b-miller November 24, 2020 20:53

VibhuJawa commented Nov 24, 2020

View reviewed changes

VibhuJawa added 4 - Needs cuDF (Python) Reviewer 3 - Ready for Review Ready for review by team labels Nov 24, 2020