protection stack overflow #103

ccshao · 2016-08-11T15:44:44Z

ranger (R version) give a Error: protect(): protection stack overflow with a 141*17222 data frame.
I used mtry of 131 and 1000 trees. save.memory = TRUE does not help.

If need it I could provide the data.

mnwright · 2016-08-12T12:21:57Z

This is probably due to the formula interface. Try to use the alternative interface with dependent.variable.name.

shihuang047 · 2019-04-04T21:25:29Z

I tried dependent.variable.name="y", it is still not working for me...

mnwright · 2019-04-05T05:12:29Z

With the same error message? That should only occur with the formula interface. Please give a reproducible example or at least your ranger call and some information to the data.

shihuang047 · 2019-04-05T05:46:42Z

I just solved the problem by converting the matrix into sparse matrix. But I also encountered a similar problem when using importance_pvalue(). It only allows data.frame/formula as inputs. Is it possible to make it also compatible with the sparse matrix?

The codes I used:
x <- data.frame(rbind(t(rmultinom(7000, 75000, c(.201,.5,.02,.18,.099))),
t(rmultinom(8000, 75000, c(.201,.4,.12,.18,.099))),
t(rmultinom(15000, 75000, c(.011,.3,.22,.18,.289))),
t(rmultinom(15000, 75000, c(.091,.2,.32,.18,.209))),
t(rmultinom(15000, 75000, c(.001,.1,.42,.18,.299)))))
y <- factor(c(rep("A", 15000), rep("B", 15000), rep("C", 15000), rep("D", 15000)))
data<-data.frame(y, x)
sparse_data <- Matrix(data.matrix(data), sparse = TRUE)
rf.model <- ranger::ranger(dependent.variable.name="y", data=sparse_data, keep.inbag=TRUE, importance='permutation')

mnwright · 2019-04-05T12:48:54Z

The importance_pvalues() function only needs the formula/data when using the permutation approach ("altmann"). The idea was that this method is so slow that no one wants to run it on data that's to large for the formula interface. I will check whether this is possible with sparse data.

In the meantime, the permutation p-values are so simple, you can just do it yourself:

library(ranger)

num.permutations <- 100

# Run RF
rf <- ranger(dependent.variable.name = "Species", data = iris, importance = "permutation")

# Permute and compute importance again (be sure to use same parameters as above)
vimp <- replicate(num.permutations, {
  dat <- iris
  dat[, "Species"] <- dat[sample(nrow(dat)), "Species"]
  ranger(dependent.variable.name = "Species", data = dat, importance = "permutation")$variable.importance
})

# Compute p-values
pval <- sapply(1:nrow(vimp), function(i) {
  (sum(vimp[i, ] >= rf$variable.importance[i]) + 1)/(ncol(vimp) + 1)
})

res <- cbind(rf$variable.importance, pval)
colnames(res) <- c("importance", "pvalue")

res

(this is just copy&paste from the importance_pvalues() function)

shihuang047 · 2019-04-08T23:50:02Z

Thanks so much!

mnwright closed this as completed Aug 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

protection stack overflow #103

protection stack overflow #103

ccshao commented Aug 11, 2016

mnwright commented Aug 12, 2016

shihuang047 commented Apr 4, 2019 •

edited

Loading

mnwright commented Apr 5, 2019

shihuang047 commented Apr 5, 2019

mnwright commented Apr 5, 2019

shihuang047 commented Apr 8, 2019

protection stack overflow #103

protection stack overflow #103

Comments

ccshao commented Aug 11, 2016

mnwright commented Aug 12, 2016

shihuang047 commented Apr 4, 2019 • edited Loading

mnwright commented Apr 5, 2019

shihuang047 commented Apr 5, 2019

mnwright commented Apr 5, 2019

shihuang047 commented Apr 8, 2019

shihuang047 commented Apr 4, 2019 •

edited

Loading