Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Competing Risks - Predicting on dataset without response can crash R #29

Closed
jatherrien opened this issue Feb 12, 2019 · 3 comments
Closed

Comments

@jatherrien
Copy link

Hello, I've noticed that with competing risk data if I first predict on a dataset that has a response, and then predict next on a dataset without one, that R crashes entirely. Here's a script that can reliably trigger it. The script causes a crash on the three computers I tested it on, but they're all Linux so I don't know if it's cross-platform or not.

set.seed(500)

n = 1500

data <- data.frame(x=rnorm(n), delta=sample(1:2, replace=TRUE, size=n))
data$T <- rexp(n, rate=ifelse(data$delta==1, 1/10, 1/15))

censorTimes <- rexp(n, rate=1/9)
data$delta = ifelse(data$T < censorTimes, 0, data$delta)
data$T = pmin(data$T, censorTimes)

trainingData <- data[1:1000,]
testData <- data[1001:1500,]

newData <- data.frame(x=rnorm(20))

library(randomForestSRC)

# Log-rank split rule is only used for speed; it still crashes on default splitrule
modelRfsrc = rfsrc(Surv(T, delta) ~ x, trainingData, 
                   ntree=1000, nodesize=10, mtry=1, 
                   nsplit=0, splitrule = "logrank")


testSetPredictions <- predict(modelRfsrc, testData)

# This line triggers the crash. I've tried sometimes running it before the predictions for testData
# and often it then *won't* crash, but it sometimes still does. It always triggers a crash though if
# I've run the predictions for testData before, even if before that I had successfully run this line.
newDataPredictions <- predict(modelRfsrc, newData)

Here's my sessionInfo():

R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8    LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] randomForestSRC_2.8.0

loaded via a namespace (and not attached):
[1] compiler_3.4.4 parallel_3.4.4 tools_3.4.4    yaml_2.2.0   
@kogalur
Copy link
Owner

kogalur commented Feb 12, 2019

Thank you for the excellent example. There was indeed a bug w.r.t. the absence of responses in a competing risk scenario. You can find a beta build with the fix here, if it's convenient:

https://www.dropbox.com/s/4esx9qvft9ah6fb/randomForestSRC_2.8.0.11.tar.gz?dl=0

We'll likely push a new build up to CRAN in a couple of weeks.

@kogalur kogalur closed this as completed Mar 16, 2019
@skvempati
Copy link

Hi, I am having the same issue but I don't see a new build on CRAN (v. 2.8.0 is up and able to be installed). Are there still plans for a new build to be pushed to CRAN or should I use the beta build?

Thanks!

@kogalur
Copy link
Owner

kogalur commented Apr 12, 2019

Sorry, I shouldn't closed this issue until we updated CRAN. We got delayed but we're very close, maybe a few days away, from posting. If you need it immediately, I'd use the beta build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants