[BUG] NCL - class should be cleaned if number of sampes is 0.5 * minority samples, not if 0.5* data.shape[0] #764

solegalli · 2020-10-14T19:06:47Z

Describe the bug

Neighbourhood cleaning rule procedure:

Split data T into the class of interest C (minority) and the rest of data O.
Identify noisy data A1 in O with edited nearest neighbor rule.
For each class Ci in O: (this is, for each observation in the majority class(es)
if ( x Ci in 3-nearest neighbors of misclassified y C )
and ( | Ci | ‡ 0.5 · | C | ) then A2 = { x } A2
Reduced data S = T - ( A1 union A2 )

The above is a copy of the pseudo code in the article. There, C is the minority class or class of interest.

Further quote what is on the article:
"To avoid excessive reduction of small classes, only examples from classes larger or equal to 0.5 * | C | are considered while forming A2. " and it previously mentions that C is the minority. They refer to the entire dataset as T.

solegalli · 2021-08-11T10:50:15Z

I renamed the issue, because after reading the paper further, my original interpretation was wrong, and the implementation in imbalanced learn reflects what is proposed in the paper. Apart from the criteria to exclude observations from the cleaning procedure.

solegalli · 2021-08-11T11:57:01Z

@glemaitre @chkoar was this parameter set up as a n_samples > X.shape[0] * self.threshold_cleaning for some reason?

Otherwise, I am happy to pick this up. Pls let me know.

glemaitre · 2023-07-10T07:54:40Z

n_samples > X.shape[0] * self.threshold_cleaning

It corresponds to C_i > C * t where by default t is 0.5 as in the paper. Then, we put a parameter such that one has control to clean other classes.

I will add some additional tests now but the algorithm looks fine to me.

glemaitre · 2023-07-10T07:55:22Z

Oh no, I see your point. Indeed, it should be the minority class indeed.

solegalli changed the title ~~[BUG] Neighbourhood cleaning rule algo - CNN should fit to O~~ [BUG] NCL - class should be cleaned if number of sampes is 0.5 * minority samples, not if 0.5* data.shape[0] Aug 11, 2021

solegalli mentioned this issue Aug 11, 2021

[MRG] update user guide, docstrings and comments from undersampling methods #853

Closed

solegalli mentioned this issue Aug 11, 2021

[WIP] update doscstrings NCL #854

Closed

glemaitre closed this as completed Jul 10, 2023

glemaitre reopened this Jul 10, 2023

glemaitre mentioned this issue Jul 10, 2023

FIX/DEPR follow literature for the implementation of NCR #1012

Merged

glemaitre closed this as completed in #1012 Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] NCL - class should be cleaned if number of sampes is 0.5 * minority samples, not if 0.5* data.shape[0] #764

[BUG] NCL - class should be cleaned if number of sampes is 0.5 * minority samples, not if 0.5* data.shape[0] #764

solegalli commented Oct 14, 2020 •

edited

Loading

solegalli commented Aug 11, 2021

solegalli commented Aug 11, 2021

glemaitre commented Jul 10, 2023

glemaitre commented Jul 10, 2023

[BUG] NCL - class should be cleaned if number of sampes is 0.5 * minority samples, not if 0.5* data.shape[0] #764

[BUG] NCL - class should be cleaned if number of sampes is 0.5 * minority samples, not if 0.5* data.shape[0] #764

Comments

solegalli commented Oct 14, 2020 • edited Loading

Describe the bug

solegalli commented Aug 11, 2021

solegalli commented Aug 11, 2021

glemaitre commented Jul 10, 2023

glemaitre commented Jul 10, 2023

solegalli commented Oct 14, 2020 •

edited

Loading