Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine and expose SVC's support vectors when fitting multi-class data #4454

Merged

Conversation

NV-jpt
Copy link
Contributor

@NV-jpt NV-jpt commented Dec 16, 2021

The purpose of this PR is to resolve issue #4206; filling SVC's support_ attribute by combining the support_ attribute from each of the estimators used in a multi-class one-versus-one SVC fit.

This is a new PR, now updated to be current with branch-21.12; this replaces this PR, this PR and this PR, all of which have now been closed.

This change will allow libraries that rely on sklearn's SVC attribute support_, like imbalanced-learn, to utilize cuML's SVC in place of sklearn's SVC.

In order to properly fill the support_ indices, we must first extract the support_ indices from each estimator in the multi-class wrapper. Then, these indices must be aligned with the full multi-class dataset, as each estimator only receives a binary (ovo) dataset that has certain classes removed by the multi-class wrapper.

Here is a gist that displays and compares the behavior of cuml with these changes to that of sklearn (prior to the changes in this PR, clf_cuml.support_ simply returned None).

@NV-jpt NV-jpt requested a review from a team as a code owner December 16, 2021 17:26
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Dec 16, 2021
Copy link
Contributor

@wphicks wphicks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! You'll need to take care of some style stuff reported by the linter, but otherwise this seems great. Had just one question for my own edification but not a blocker for merging.

python/cuml/test/test_svm.py Outdated Show resolved Hide resolved
@wphicks wphicks added feature request New feature or request non-breaking Non-breaking change labels Dec 16, 2021
@github-actions
Copy link

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

@caryr35 caryr35 added this to PR-WIP in v22.02 Release via automation Jan 18, 2022
@caryr35 caryr35 moved this from PR-WIP to PR-Reviewer approved in v22.02 Release Jan 18, 2022
@dantegd dantegd changed the title Final PR for "Combine and expose SVC's support vectors when fitting multi-class data" Combine and expose SVC's support vectors when fitting multi-class data Jan 24, 2022
@dantegd
Copy link
Member

dantegd commented Jan 25, 2022

rerun test

@wphicks
Copy link
Contributor

wphicks commented Jan 25, 2022

Stylistic changes LGTM. I think we're good to merge as soon as we know what's up with CI.

@dantegd
Copy link
Member

dantegd commented Jan 27, 2022

rerun tests

@dantegd
Copy link
Member

dantegd commented Jan 28, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit abae602 into rapidsai:branch-22.02 Jan 28, 2022
v22.02 Release automation moved this from PR-Reviewer approved to Done Jan 28, 2022
cjnolet added a commit to cjnolet/cuml that referenced this pull request Feb 4, 2022
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
rapidsai#4454)

The purpose of this PR is to resolve issue rapidsai#4206;  filling SVC's `support_` attribute by combining the support_ attribute from each of the estimators used in a multi-class one-versus-one SVC fit.  

This is a new PR, now updated to be current with branch-21.12;  this replaces [this PR](rapidsai#4308), [this PR](rapidsai#4218) and [this PR](rapidsai#4305), all of which have now been closed.

This change will allow libraries that rely on sklearn's SVC attribute `support_`, [like imbalanced-learn](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/56eefdf3d92afca77bc16fc13d315db5287df2fa/imblearn/over_sampling/_smote/filter.py#L366), to utilize cuML's SVC in place of sklearn's SVC.

In order to properly fill the `support_` indices, we must first extract the `support_` indices from each estimator in the multi-class wrapper. Then, these indices must be aligned with the full multi-class dataset, as each estimator only receives a binary (ovo) dataset that has certain classes removed by the multi-class wrapper.

[Here is a gist](https://gist.github.com/NV-jpt/48c324cd2cf3b972af32c2913f6c1b35) that displays and compares the behavior of cuml with these changes to that of sklearn (prior to the changes in this PR, `clf_cuml.support_` simply returned `None`).

Authors:
  - https://github.com/NV-jpt
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - William Hicks (https://github.com/wphicks)

URL: rapidsai#4454
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants