Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Allow "LabelEncoder" to accept cupy and numpy arrays as input. #4620

Merged
merged 38 commits into from
May 23, 2022

Conversation

daxiongshu
Copy link
Contributor

This is to resolve issue #4589

@daxiongshu daxiongshu requested a review from a team as a code owner March 7, 2022 17:06
@daxiongshu daxiongshu added the non-breaking Non-breaking change label Mar 7, 2022
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Mar 7, 2022
@daxiongshu daxiongshu added the feature request New feature or request label Mar 7, 2022
@caryr35 caryr35 added this to PR-WIP in v22.04 Release via automation Mar 7, 2022
@caryr35 caryr35 moved this from PR-WIP to PR-Needs review in v22.04 Release Mar 7, 2022

def get_param_names(self):
return super().get_param_names() + [
"handle_unknown",
]

def _to_cudf_series(self, y):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tvdboom @lowener @dantegd Thank you all so much for the comments. I've made changes accordingly. Are there existing functions to replace this function? def _to_cudf_series

Especially when y is string datatype. It seems input_to_cuml_array(y) doesn't work:

/raid/data/ml/anaconda3/envs/cuml_dev/lib/python3.8/site-packages/cudf/core/column/column.py in __cuda_array_interface__(self)
   1025     @property
   1026     def __cuda_array_interface__(self):
-> 1027         raise NotImplementedError(
   1028             f"dtype {self.dtype} is not yet supported via "
   1029             "`__cuda_array_interface__`"

NotImplementedError: dtype object is not yet supported via `__cuda_array_interface__`

@daxiongshu
Copy link
Contributor Author

rerun tests

@daxiongshu daxiongshu changed the base branch from branch-22.04 to branch-22.06 April 1, 2022 03:29
@daxiongshu
Copy link
Contributor Author

@lowener @tvdboom @dantegd Please help me fix the current CI test errors. When the output of LabelEncoder.transform is consumed inside another cuml.Base module’s function, the decorator @cuml.internals.api_base_return_array I added in LabelEncoder no longer works.
CI error
Another reproducer
Thank you very much!

@caryr35 caryr35 added this to PR-WIP in v22.06 Release via automation Apr 12, 2022
@caryr35 caryr35 moved this from PR-WIP to PR-Needs review in v22.06 Release Apr 12, 2022
@caryr35 caryr35 removed this from PR-Needs review in v22.04 Release Apr 12, 2022
Copy link
Contributor

@lowener lowener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-commenter
Copy link

Codecov Report

Merging #4620 (a6d091b) into branch-22.06 (7e91505) will decrease coverage by 7.69%.
The diff coverage is 94.11%.

@@               Coverage Diff                @@
##           branch-22.06    #4620      +/-   ##
================================================
- Coverage         96.56%   88.87%   -7.70%     
================================================
  Files               113      371     +258     
  Lines             13699    34695   +20996     
================================================
+ Hits              13229    30835   +17606     
- Misses              470     3860    +3390     
Flag Coverage Δ
dask 34.23% <29.41%> (+16.64%) ⬆️
non-dask 84.20% <94.11%> (-12.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/cuml/preprocessing/LabelEncoder.py 93.24% <85.71%> (ø)
python/cuml/tests/test_label_encoder.py 100.00% <100.00%> (ø)
python/cuml/dask/decomposition/__init__.py 100.00% <0.00%> (ø)
python/cuml/common/numba_utils.py 0.00% <0.00%> (ø)
python/cuml/dask/naive_bayes/__init__.py 100.00% <0.00%> (ø)
python/cuml/benchmark/bench_helper_funcs.py 64.53% <0.00%> (ø)
python/cuml/pipeline/__init__.py 100.00% <0.00%> (ø)
python/cuml/dask/preprocessing/LabelEncoder.py 93.33% <0.00%> (ø)
...l/_thirdparty/sklearn/preprocessing/_imputation.py 86.32% <0.00%> (ø)
python/cuml/dask/common/__init__.py 100.00% <0.00%> (ø)
... and 254 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e91505...a6d091b. Read the comment docs.

@daxiongshu daxiongshu added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 4 - Waiting on Author Waiting for author to respond to review labels May 7, 2022
v22.06 Release automation moved this from PR-Needs review to PR-Reviewer approved May 23, 2022
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cjnolet
Copy link
Member

cjnolet commented May 23, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 70105ad into rapidsai:branch-22.06 May 23, 2022
v22.06 Release automation moved this from PR-Reviewer approved to Done May 23, 2022
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Reviewer Waiting for reviewer to review or respond Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants