Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argument recode.target of getTaskData is not consistent with the documentation #2552

Closed
HerrMo opened this issue Mar 8, 2019 · 4 comments
Closed

Comments

@HerrMo
Copy link

HerrMo commented Mar 8, 2019

Hi,

in the docu of getTaskData it says under recode.target: "For survival, you may choose to recode the survival times to 'left', 'right' or 'interval2' censored times using 'lcens', 'rcens' or 'icens', respectively."
This is not consistent with the implementation. If one of these options is chosen, e.g.recode.target = "rcens", an error occurs. Instead, recode.traget = "surv" works. See example:

library(mlr)
library(prodlim)

X <- cbind(matrix(nrow=40, ncol=40, data=rnorm(40*40)),
           matrix(nrow=40, ncol=30, data=rnorm(40*30, mean=1, sd=2)),
           matrix(nrow=40, ncol=100, data=rnorm(40*100, mean=2, sd=3)))
colnames(X) <- paste0("X", 1:dim(X)[2])

ysurv <- prodlim::SimSurv(40)[, c(3,7)]

df_data <- cbind(ysurv, X)
task <- makeSurvTask(id = "exampletask", data = df_data, target = c("time", "status"))

# does not work
tt <- getTaskData(task, target.extra = TRUE, recode.target = "icens")

# works
tt <- getTaskData(task, target.extra = TRUE, recode.target = "surv")

As far as I understand that, the options listed in the docu are simply not implemented.
If you look at the source code, you will see that.

mlr/R/Task_operators.R

Lines 333 to 335 in 1a2d34b

if (recode.target %nin% c("no", "surv")) {
res[, tn] = recodeY(res[, tn], type = recode.target, task$task.desc)
}

mlr/R/Task_operators.R

Lines 340 to 354 in 1a2d34b

recodeY = function(y, type, td) {
if (type == "no")
return(y)
if (type == "drop.levels")
return(factor(y))
if (type == "01")
return(as.numeric(y == td$positive))
if (type == "-1+1")
return(as.numeric(2L * (y == td$positive) - 1L))
if (type == "surv")
return(Surv(y[, 1L], y[, 2L], type = "right"))
if (type == "multilabel.factor")
return(lapply(y, function(x) factor(x, levels = c("TRUE", "FALSE"))))
stopf("Unknown value for 'type': %s", type)
}

Kind regards

@HerrMo HerrMo changed the title argument target.extra of getTaskData is not consistent with the documentation argument recode.target of getTaskData is not consistent with the documentation Mar 8, 2019
@pat-s pat-s added the type-bug label May 18, 2019
@pat-s
Copy link
Member

pat-s commented May 18, 2019

Thanks, looks like the doc is not up to date then.

@pat-s
Copy link
Member

pat-s commented Jun 1, 2019

I've taken a look but couldn't find the commit that changed the implementation. I just found that there were valid options for 'lcens', 'rcens' or 'icens' in the package at some point.

Could you explain shortly what the current option "surv" does? I cannot infer that by just looking at the example and the transformation.

@HerrMo
Copy link
Author

HerrMo commented Jun 3, 2019

Many methods require an object of class Surv (s. survival::Surv). Option "surv" changes the target into such a Surv-object with censoring being set to right censoring (most standard censoring type) via type = "right"

return(Surv(y[, 1L], y[, 2L], type = "right"))

Otherwise you would get as target a data.frame with the survival times in the fist column and the censoring indicators in the second.

@stale
Copy link

stale bot commented Mar 29, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Mar 29, 2020
@stale stale bot closed this as completed Apr 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants