Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add "median" to TargetEncoder #4722

Merged
merged 31 commits into from Sep 7, 2022

Conversation

daxiongshu
Copy link
Contributor

This PR enables TargetEncoder to encode the median of the target column with respect to one or multiple categorical columns. The for loop logic used in this PR is not as fast as the previous optimization for mean and var but it can be easily reused for more stat functions.

@daxiongshu daxiongshu requested a review from a team as a code owner May 4, 2022 12:05
@daxiongshu daxiongshu added the non-breaking Non-breaking change label May 4, 2022
@github-actions github-actions bot added the Cython / Python Cython or Python issue label May 4, 2022
@github-actions
Copy link

github-actions bot commented Jul 1, 2022

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

@caryr35 caryr35 removed this from PR-WIP in v22.08 Release Aug 9, 2022
@caryr35 caryr35 added this to PR-WIP in v22.10 Release via automation Aug 9, 2022
@daxiongshu daxiongshu requested review from a team as code owners September 1, 2022 01:17
@github-actions github-actions bot added CMake conda conda issue CUDA/C++ gpuCI gpuCI issue labels Sep 1, 2022
@daxiongshu daxiongshu changed the base branch from branch-22.06 to branch-22.10 September 1, 2022 01:46
@github-actions github-actions bot removed CMake CUDA/C++ conda conda issue gpuCI gpuCI issue labels Sep 2, 2022
@codecov-commenter
Copy link

Codecov Report

Base: 78.02% // Head: 78.07% // Increases project coverage by +0.04% 🎉

Coverage data is based on head (1dbe878) compared to base (7a0ab85).
Patch coverage: 88.60% of modified lines in pull request are covered.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.10    #4722      +/-   ##
================================================
+ Coverage         78.02%   78.07%   +0.04%     
================================================
  Files               180      180              
  Lines             11385    11442      +57     
================================================
+ Hits               8883     8933      +50     
- Misses             2502     2509       +7     
Flag Coverage Δ
dask 46.23% <48.10%> (+0.01%) ⬆️
non-dask 67.37% <88.60%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/cuml/common/array.py 95.10% <85.10%> (-2.88%) ⬇️
python/cuml/preprocessing/TargetEncoder.py 85.07% <92.59%> (+1.00%) ⬆️
python/cuml/cluster/__init__.py 100.00% <100.00%> (ø)
python/cuml/metrics/__init__.py 100.00% <100.00%> (ø)
python/cuml/thirdparty_adapters/adapters.py 91.54% <100.00%> (+0.05%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -233,7 +241,7 @@ def _fit_transform(self, x, y, fold_ids):
self.n_folds = min(self.n_folds, len(train))
train[self.fold_col] = self._make_fold_column(len(train), fold_ids)

self.y_stat_val = eval(f'train[self.y_col].{self.stat}()')
self.y_stat_val = get_stat_func(self.stat)(train[self.y_col])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dantegd what do you think of the change here? Thank you.

@ajschmidt8 ajschmidt8 removed the request for review from a team September 6, 2022 14:31
v22.10 Release automation moved this from PR-WIP to PR-Reviewer approved Sep 7, 2022
@dantegd
Copy link
Member

dantegd commented Sep 7, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit e89e591 into rapidsai:branch-22.10 Sep 7, 2022
v22.10 Release automation moved this from PR-Reviewer approved to Done Sep 7, 2022
jakirkham pushed a commit to jakirkham/cuml that referenced this pull request Feb 27, 2023
This PR enables `TargetEncoder` to encode the `median` of the target column with respect to one or multiple categorical columns. The `for loop` logic used in this PR is not as fast as the previous optimization for `mean` and `var` but it can be easily reused for more stat functions.

Authors:
  - Jiwei Liu (https://github.com/daxiongshu)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4722
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Author Waiting for author to respond to review Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants