Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protection stack overflow #103

Closed
ccshao opened this issue Aug 11, 2016 · 6 comments
Closed

protection stack overflow #103

ccshao opened this issue Aug 11, 2016 · 6 comments

Comments

@ccshao
Copy link

ccshao commented Aug 11, 2016

ranger (R version) give a Error: protect(): protection stack overflow with a 141*17222 data frame.
I used mtry of 131 and 1000 trees. save.memory = TRUE does not help.

If need it I could provide the data.

@mnwright
Copy link
Member

This is probably due to the formula interface. Try to use the alternative interface with dependent.variable.name.

@shihuang047
Copy link

shihuang047 commented Apr 4, 2019

I tried dependent.variable.name="y", it is still not working for me...

@mnwright
Copy link
Member

mnwright commented Apr 5, 2019

With the same error message? That should only occur with the formula interface. Please give a reproducible example or at least your ranger call and some information to the data.

@shihuang047
Copy link

I just solved the problem by converting the matrix into sparse matrix. But I also encountered a similar problem when using importance_pvalue(). It only allows data.frame/formula as inputs. Is it possible to make it also compatible with the sparse matrix?

The codes I used:
x <- data.frame(rbind(t(rmultinom(7000, 75000, c(.201,.5,.02,.18,.099))),
t(rmultinom(8000, 75000, c(.201,.4,.12,.18,.099))),
t(rmultinom(15000, 75000, c(.011,.3,.22,.18,.289))),
t(rmultinom(15000, 75000, c(.091,.2,.32,.18,.209))),
t(rmultinom(15000, 75000, c(.001,.1,.42,.18,.299)))))
y <- factor(c(rep("A", 15000), rep("B", 15000), rep("C", 15000), rep("D", 15000)))
data<-data.frame(y, x)
sparse_data <- Matrix(data.matrix(data), sparse = TRUE)
rf.model <- ranger::ranger(dependent.variable.name="y", data=sparse_data, keep.inbag=TRUE, importance='permutation')

@mnwright
Copy link
Member

mnwright commented Apr 5, 2019

The importance_pvalues() function only needs the formula/data when using the permutation approach ("altmann"). The idea was that this method is so slow that no one wants to run it on data that's to large for the formula interface. I will check whether this is possible with sparse data.

In the meantime, the permutation p-values are so simple, you can just do it yourself:

library(ranger)

num.permutations <- 100

# Run RF
rf <- ranger(dependent.variable.name = "Species", data = iris, importance = "permutation")

# Permute and compute importance again (be sure to use same parameters as above)
vimp <- replicate(num.permutations, {
  dat <- iris
  dat[, "Species"] <- dat[sample(nrow(dat)), "Species"]
  ranger(dependent.variable.name = "Species", data = dat, importance = "permutation")$variable.importance
})

# Compute p-values
pval <- sapply(1:nrow(vimp), function(i) {
  (sum(vimp[i, ] >= rf$variable.importance[i]) + 1)/(ncol(vimp) + 1)
})

res <- cbind(rf$variable.importance, pval)
colnames(res) <- c("importance", "pvalue")

res

(this is just copy&paste from the importance_pvalues() function)

@shihuang047
Copy link

Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants