-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Prevent incorrect class category resampling in SMOTENC when median_std_ == 0 #675
FIX Prevent incorrect class category resampling in SMOTENC when median_std_ == 0 #675
Conversation
Could you retrigger the CIs since we solve the issue with the dependencies. |
Codecov Report
@@ Coverage Diff @@
## master #675 +/- ##
=======================================
Coverage 96.48% 96.49%
=======================================
Files 82 82
Lines 5035 5043 +8
=======================================
+ Hits 4858 4866 +8
Misses 177 177
Continue to review full report at Codecov.
|
I added your non-regression test and I think that we are good to merge |
@bganglia Thanks for the contribution |
Fixes #662
What does this implement/fix? Explain your changes.
If the median standard deviation is 0, the SMOTENC class will now store the categorical features before multiplying the 1's by the median standard deviation. This way, information about the most common categorical labels can still be used in _get_samples.
Checklist:
Example:
Output on master:
Only the last row is new. It has the category 1 in the fourth column, even though all rows from the minority class have the category 2 in the fourth column. This is incorrect.
Output on this fork:
Here, the resampled row correctly has the category 2 in the fourth column.