-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine and expose SVC's support vectors when fitting multi-class data #4454
Combine and expose SVC's support vectors when fitting multi-class data #4454
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! You'll need to take care of some style stuff reported by the linter, but otherwise this seems great. Had just one question for my own edification but not a blocker for merging.
This PR has been labeled |
rerun test |
Stylistic changes LGTM. I think we're good to merge as soon as we know what's up with CI. |
rerun tests |
@gpucibot merge |
…lass data (rapidsai#4454)" This reverts commit abae602.
rapidsai#4454) The purpose of this PR is to resolve issue rapidsai#4206; filling SVC's `support_` attribute by combining the support_ attribute from each of the estimators used in a multi-class one-versus-one SVC fit. This is a new PR, now updated to be current with branch-21.12; this replaces [this PR](rapidsai#4308), [this PR](rapidsai#4218) and [this PR](rapidsai#4305), all of which have now been closed. This change will allow libraries that rely on sklearn's SVC attribute `support_`, [like imbalanced-learn](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/56eefdf3d92afca77bc16fc13d315db5287df2fa/imblearn/over_sampling/_smote/filter.py#L366), to utilize cuML's SVC in place of sklearn's SVC. In order to properly fill the `support_` indices, we must first extract the `support_` indices from each estimator in the multi-class wrapper. Then, these indices must be aligned with the full multi-class dataset, as each estimator only receives a binary (ovo) dataset that has certain classes removed by the multi-class wrapper. [Here is a gist](https://gist.github.com/NV-jpt/48c324cd2cf3b972af32c2913f6c1b35) that displays and compares the behavior of cuml with these changes to that of sklearn (prior to the changes in this PR, `clf_cuml.support_` simply returned `None`). Authors: - https://github.com/NV-jpt - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - William Hicks (https://github.com/wphicks) URL: rapidsai#4454
The purpose of this PR is to resolve issue #4206; filling SVC's
support_
attribute by combining the support_ attribute from each of the estimators used in a multi-class one-versus-one SVC fit.This is a new PR, now updated to be current with branch-21.12; this replaces this PR, this PR and this PR, all of which have now been closed.
This change will allow libraries that rely on sklearn's SVC attribute
support_
, like imbalanced-learn, to utilize cuML's SVC in place of sklearn's SVC.In order to properly fill the
support_
indices, we must first extract thesupport_
indices from each estimator in the multi-class wrapper. Then, these indices must be aligned with the full multi-class dataset, as each estimator only receives a binary (ovo) dataset that has certain classes removed by the multi-class wrapper.Here is a gist that displays and compares the behavior of cuml with these changes to that of sklearn (prior to the changes in this PR,
clf_cuml.support_
simply returnedNone
).