AdasynClassif - Error in matrix(nrow = sum(unlist(g)), ncol = nC) : invalid 'nrow' value (too large or NA) #4

ghost · 2019-04-19T15:17:02Z

Hi,

I have used AdasynClassif function for several times without any problem. But now I experienced weird behavior. I got this error:

"Error in matrix(nrow = sum(unlist(g)), ncol = nC) : invalid 'nrow' value (too large or NA)"

It’s only happening for some combination of samples and features (please find attached - "a","b" and "c" are features and "target" is class, baseClass = 4). Additionally, setting different base class has an effect on it. I wanted to dive deeper into the function and find out what’s going on, but I couldn’t figure out from where class.freq function came from.
(That’s another mystery for me, how is it possible that AdasynClassif is running without having installed a package with class.freq function?)
ADAS_examples.zip

ghost · 2019-04-19T15:29:00Z

I'm adding one more example. It has the same samples as ADAS_fail.csv but it has an extra feature column. In this case, AdasynClassif function works..
ADAS_ok2.csv.zip

paobranco · 2019-04-25T05:07:44Z

Hi,

I found the problem. It is related with the nearest neighbours in your data.
What is happening is that none of your class 3 examples has nearest neighbours that belong to the majority class.
These neighbours are used to derive the number of new examples to generate for each base example of class 3.
Because these class 3 examples do not have nearest neighbours in the majority class (or baseClass) the algorithm assigns an importance of zero to each case. This importance is then normalized...

I'm attaching a part of ADASYN algorithm below just to make this discussion more clear:

So, when the algorithm tries to generate new cases for examples of class 3, it fails because they all have $\hat{r}_i =0$.

This is a special case where the ADASYN algorithm is not capable of generating a non uniform distribution for the $\hat{r}_i$ values.
I didn't thought about that, so the ADASYNClassif is failing when this happens because it is generating NA's for these cases.
I think that this can be solved by simply generating new cases uniformly because we are not able to bias the generation of examples due to the lack of nearest neighbours in the majority class for these cases.

In effect, when you add more features the nearest neighbours change and the problem does not occur because all classes have nearest neighbours that belong to the majority class!

I'll try to push as soon as possible an updated version of ADASYN to github. I'll use the development branch.
Would that be OK for you?

Regarding the "mystery" of the class.freq function, it is implemented in UBL but it is an auxiliary function, so it is not exported. UBL knows what it is but the end-users shouldn't need to know ;-)

I'm sorry for the inconvenience.
Thank you for your interest in UBL!

ghost · 2019-04-26T19:47:09Z

Thank you so much for your great explanation!

ghost · 2019-05-06T14:43:06Z

Hi, I was just wondering when do you think that the updated version will be available?
Thank you!

paobranco · 2019-05-07T13:45:42Z

Hi,
Thank you for your patience!
A new version is now available on the development branch.
Please let me known if you have any more issues.

paobranco closed this as completed May 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AdasynClassif - Error in matrix(nrow = sum(unlist(g)), ncol = nC) : invalid 'nrow' value (too large or NA) #4

AdasynClassif - Error in matrix(nrow = sum(unlist(g)), ncol = nC) : invalid 'nrow' value (too large or NA) #4

ghost commented Apr 19, 2019

ghost commented Apr 19, 2019

paobranco commented Apr 25, 2019

ghost commented Apr 26, 2019

ghost commented May 6, 2019

paobranco commented May 7, 2019

AdasynClassif - Error in matrix(nrow = sum(unlist(g)), ncol = nC) : invalid 'nrow' value (too large or NA) #4

AdasynClassif - Error in matrix(nrow = sum(unlist(g)), ncol = nC) : invalid 'nrow' value (too large or NA) #4

Comments

ghost commented Apr 19, 2019

ghost commented Apr 19, 2019

paobranco commented Apr 25, 2019

ghost commented Apr 26, 2019

ghost commented May 6, 2019

paobranco commented May 7, 2019