robust irmi, bug with bcancer data #70

matthias-da · 2022-10-19T19:18:58Z

This is an unpleasant bug, because it is very hard to debug. It does not happen all the time.

library(VIM)
data("bcancer")

for(i in 1:ncol(bcancer)){
  bcancer[sample(1:nrow(bcancer), 25), i] <- NA
}

ir <- irmi(bcancer[, 2:ncol(bcancer)])
# no error:
set.seed(123)
ir <- irmi(bcancer[, 2:ncol(bcancer)], robust = TRUE, maxit = 3)
# error:
set.seed(1234)
ir <- irmi(bcancer[, 2:ncol(bcancer)], robust = TRUE, maxit = 3)

What will be: since so few different values in individual variables, it no longer interprets these variables as numeric.
If you add very little noise, everything fits with no errors:

for(i in 1:10){
  bcancer[, i] <- as.numeric(bcancer[, i]) + runif(nrow(bcancer), 0, 0.0001)
}

set.seed(1234)
ir <- irmi(bcancer[, 2:ncol(bcancer)], robust = TRUE)

The text was updated successfully, but these errors were encountered:

matthias-da · 2022-12-19T09:45:52Z

I would not remove rlm, because this fallback is often used when lmrob does not work. The problem is also not with rml, it is with rml and lmrob and any method, since the algorithm decided that the variable to impute is numeric for any reason, but it is not.

library(VIM)
data("bcancer")

for(i in 1:ncol(bcancer)){
  bcancer[sample(1:nrow(bcancer), 25), i] <- NA
}

ir <- irmi(bcancer[, 2:ncol(bcancer)])
# no error:
set.seed(123)
ir <- irmi(bcancer[, 2:ncol(bcancer)], robust = TRUE, maxit = 3)
# error:
set.seed(1234)
ir <- irmi(bcancer[, 2:ncol(bcancer)], robust = TRUE, maxit = 3)

[1] "inner loop: 10"
[1] "binary"
[1] "bin"
[1] "formula used: class ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses"
[1] "it = 3 ,  Wert = 187"
[1] "eps 5"
[1] "test: TRUE"
[1] "not converged..."
[1] "187 < 5 = eps"
Error in lm.wfit(x, y, w, method = "qr") : incompatible dimensions

So, either the print output is wrong, or the algorithm tooks the wrong regression method for a categorical variable.
rml (or lmrob) should never be used for this variable.

I can't go into details before Christmas, but can check it later in more detail. What I propose is to not exclude rlm.

matthias-da · 2022-12-19T09:48:25Z

P.S. and it happens in iteration 3, the first 2 iterations was fine...

> ir <- irmi(bcancer[, 2:ncol(bcancer)], robust = TRUE, maxit = 3, trace = TRUE)
Method for multinomial models:multinom

  clump_thickness uniformity_cellsize uniformity_cellshape adhesion epithelial_cellsize bare_nuclei
1               5                   1                    1        1                   2           1
2               5                   4                    4        5                   3          10
3               3                   1                    1        1                   2           2
4               6                   8                    7        1                   3           4
5               4                   1                    1        3                   2           1
6               8                  10                   10        8                   7          10
  chromatin normal_nucleoli mitoses class
1         3               1       1     0
2         3               2       1     0
3         3               1       1     0
4         3               7       1     1
5         3               1       1     0
6         9               7       3     1
Iteration1

[1] "inner loop: 1"
[1] "integer"
[1] "numeric"
[1] "formula used: clump_thickness ~ uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 2"
[1] "integer"
[1] "numeric"
[1] "formula used: uniformity_cellsize ~ clump_thickness+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 3"
[1] "integer"
[1] "numeric"
[1] "formula used: uniformity_cellshape ~ clump_thickness+uniformity_cellsize+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 4"
[1] "integer"
[1] "numeric"
[1] "formula used: adhesion ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 5"
[1] "integer"
[1] "numeric"
[1] "formula used: epithelial_cellsize ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+bare_nuclei+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 6"
[1] "numeric"
[1] "numeric"
[1] "formula used: bare_nuclei ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 7"
[1] "integer"
[1] "numeric"
[1] "formula used: chromatin ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+normal_nucleoli+mitoses+class"
[1] "inner loop: 8"
[1] "integer"
[1] "numeric"
[1] "formula used: normal_nucleoli ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+mitoses+class"
[1] "inner loop: 9"
[1] "integer"
[1] "numeric"
[1] "formula used: mitoses ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+class"
[1] "inner loop: 10"
[1] "binary"
[1] "bin"
[1] "formula used: class ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses"
[1] "it = 1 ,  Wert = 818"
[1] "eps 5"
[1] "test: TRUE"
Iteration2

[1] "inner loop: 1"
[1] "integer"
[1] "numeric"
[1] "formula used: clump_thickness ~ uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 2"
[1] "integer"
[1] "numeric"
[1] "formula used: uniformity_cellsize ~ clump_thickness+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 3"
[1] "integer"
[1] "numeric"
[1] "formula used: uniformity_cellshape ~ clump_thickness+uniformity_cellsize+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 4"
[1] "integer"
[1] "numeric"
[1] "formula used: adhesion ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 5"
[1] "integer"
[1] "numeric"
[1] "formula used: epithelial_cellsize ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+bare_nuclei+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 6"
[1] "numeric"
[1] "numeric"
[1] "formula used: bare_nuclei ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 7"
[1] "integer"
[1] "numeric"
[1] "formula used: chromatin ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+normal_nucleoli+mitoses+class"
[1] "inner loop: 8"
[1] "integer"
[1] "numeric"
[1] "formula used: normal_nucleoli ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+mitoses+class"
[1] "inner loop: 9"
[1] "integer"
[1] "numeric"
[1] "formula used: mitoses ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+class"
[1] "inner loop: 10"
[1] "binary"
[1] "bin"
[1] "formula used: class ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses"
[1] "it = 2 ,  Wert = 1004"
[1] "eps 5"
[1] "test: TRUE"
Iteration3

[1] "inner loop: 1"
[1] "integer"
[1] "numeric"
[1] "formula used: clump_thickness ~ uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 2"
[1] "integer"
[1] "numeric"
[1] "formula used: uniformity_cellsize ~ clump_thickness+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 3"
[1] "integer"
[1] "numeric"
[1] "formula used: uniformity_cellshape ~ clump_thickness+uniformity_cellsize+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 4"
[1] "integer"
[1] "numeric"
[1] "formula used: adhesion ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 5"
[1] "integer"
[1] "numeric"
[1] "formula used: epithelial_cellsize ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+bare_nuclei+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 6"
[1] "numeric"
[1] "numeric"
[1] "formula used: bare_nuclei ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+chromatin+normal_nucleoli+mitoses+class"
[1] "inner loop: 7"
[1] "integer"
[1] "numeric"
[1] "formula used: chromatin ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+normal_nucleoli+mitoses+class"
[1] "inner loop: 8"
[1] "integer"
[1] "numeric"
[1] "formula used: normal_nucleoli ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+mitoses+class"
[1] "inner loop: 9"
[1] "integer"
[1] "numeric"
[1] "formula used: mitoses ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+class"
[1] "inner loop: 10"
[1] "binary"
[1] "bin"
[1] "formula used: class ~ clump_thickness+uniformity_cellsize+uniformity_cellshape+adhesion+epithelial_cellsize+bare_nuclei+chromatin+normal_nucleoli+mitoses"
[1] "it = 3 ,  Wert = 187"
[1] "eps 5"
[1] "test: TRUE"
[1] "not converged..."
[1] "187 < 5 = eps"
Error in lm.wfit(x, y, w, method = "qr") : incompatible dimensions

alexkowa · 2022-12-19T10:01:17Z

The problem is happening in rlm, probably in the init part where a sample of the data is fitted with LS regression.
Depending on the sample it fails or it does not fail. The same rlm call with the same input data fails once and does not the second time. The error message is misleading.

rlmTestData.zip

dat <- readRDS("rlmTestData.Rds")
MASS::rlm(y ~ clump_thickness + uniformity_cellsize + uniformity_cellshape + 
    adhesion + epithelial_cellsize + bare_nuclei + chromatin + 
    normal_nucleoli + class, data = dat, method = "MM)

Btw, we can support all the methods provided by rlm with lmrob and I would not think that rlm is more stable than lmrob.

alexkowa · 2022-12-19T10:07:16Z

the init is actual happening in lqs which is used by rlm as init method.

matthias-da · 2022-12-19T10:10:59Z

Or simply use the following as the function argument of irmi() instead of robMethod = "MM" use robMethod = "lmrob".

This way, rlm is still inside as a fallback and lmrob is now used by default. The error will then no longer occur in the bcancer data.

That would actually be the fastest solution, wouldn't it? I have committed it this way.

alexkowa · 2022-12-19T10:13:45Z

yes, but the parameter robMethod is quite confusing now as it is. We could improve it by removing rlm and then the parameter is actual stating only the method to be used for the robust estimation and not the function. Because robMethod = "lmrob" is actually doing a MM estimation.

matthias-da · 2022-12-19T10:24:38Z

If I remeber correclty, the aim was always to use lmrob at first glance and as a fallback rlm, because rlm (at least) was more robust in terms of its implementation than lmrob. So there was (at least in the past) a lot of situation, where lmrob does not give a solution, but rlm did. It seems that once we even then changed rlm to default.

To not risk more failures for other situations/data, I recommend just to update the documentation instead of kicking out rlm. We can write that we use - when a fallback to rlm is used - also MM regression but then in function rlm. I still think it is a good fallback (when setting force = TRUE).

alexkowa · 2022-12-19T10:26:34Z

yeah, ok. Let's do this.

matthias-da · 2022-12-19T11:03:38Z

Ok. Thanks for your efforts! I will update the documentation to be more precise on this.

matthias-da assigned alexkowa and matthias-da Oct 19, 2022

alexkowa mentioned this issue Dec 19, 2022

remove rlm from used robust regression functions. #71

Closed

matthias-da closed this as completed Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

robust irmi, bug with bcancer data #70

robust irmi, bug with bcancer data #70

matthias-da commented Oct 19, 2022

matthias-da commented Dec 19, 2022

matthias-da commented Dec 19, 2022 •

edited

Loading

alexkowa commented Dec 19, 2022 •

edited

Loading

alexkowa commented Dec 19, 2022

matthias-da commented Dec 19, 2022 •

edited

Loading

alexkowa commented Dec 19, 2022

matthias-da commented Dec 19, 2022

alexkowa commented Dec 19, 2022

matthias-da commented Dec 19, 2022

robust irmi, bug with bcancer data #70

robust irmi, bug with bcancer data #70

Comments

matthias-da commented Oct 19, 2022

matthias-da commented Dec 19, 2022

matthias-da commented Dec 19, 2022 • edited Loading

alexkowa commented Dec 19, 2022 • edited Loading

alexkowa commented Dec 19, 2022

matthias-da commented Dec 19, 2022 • edited Loading

alexkowa commented Dec 19, 2022

matthias-da commented Dec 19, 2022

alexkowa commented Dec 19, 2022

matthias-da commented Dec 19, 2022

matthias-da commented Dec 19, 2022 •

edited

Loading

alexkowa commented Dec 19, 2022 •

edited

Loading

matthias-da commented Dec 19, 2022 •

edited

Loading