Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protection stack overflow #103

Closed
ccshao opened this issue Aug 11, 2016 · 6 comments

Comments

@ccshao
Copy link

commented Aug 11, 2016

ranger (R version) give a Error: protect(): protection stack overflow with a 141*17222 data frame.
I used mtry of 131 and 1000 trees. save.memory = TRUE does not help.

If need it I could provide the data.

@mnwright

This comment has been minimized.

Copy link
Member

commented Aug 12, 2016

This is probably due to the formula interface. Try to use the alternative interface with dependent.variable.name.

@mnwright mnwright closed this Aug 22, 2016

@shihuang047

This comment has been minimized.

Copy link

commented Apr 4, 2019

I tried dependent.variable.name="y", it is still not working for me...

@mnwright

This comment has been minimized.

Copy link
Member

commented Apr 5, 2019

With the same error message? That should only occur with the formula interface. Please give a reproducible example or at least your ranger call and some information to the data.

@shihuang047

This comment has been minimized.

Copy link

commented Apr 5, 2019

I just solved the problem by converting the matrix into sparse matrix. But I also encountered a similar problem when using importance_pvalue(). It only allows data.frame/formula as inputs. Is it possible to make it also compatible with the sparse matrix?

The codes I used:
x <- data.frame(rbind(t(rmultinom(7000, 75000, c(.201,.5,.02,.18,.099))),
t(rmultinom(8000, 75000, c(.201,.4,.12,.18,.099))),
t(rmultinom(15000, 75000, c(.011,.3,.22,.18,.289))),
t(rmultinom(15000, 75000, c(.091,.2,.32,.18,.209))),
t(rmultinom(15000, 75000, c(.001,.1,.42,.18,.299)))))
y <- factor(c(rep("A", 15000), rep("B", 15000), rep("C", 15000), rep("D", 15000)))
data<-data.frame(y, x)
sparse_data <- Matrix(data.matrix(data), sparse = TRUE)
rf.model <- ranger::ranger(dependent.variable.name="y", data=sparse_data, keep.inbag=TRUE, importance='permutation')

@mnwright

This comment has been minimized.

Copy link
Member

commented Apr 5, 2019

The importance_pvalues() function only needs the formula/data when using the permutation approach ("altmann"). The idea was that this method is so slow that no one wants to run it on data that's to large for the formula interface. I will check whether this is possible with sparse data.

In the meantime, the permutation p-values are so simple, you can just do it yourself:

library(ranger)

num.permutations <- 100

# Run RF
rf <- ranger(dependent.variable.name = "Species", data = iris, importance = "permutation")

# Permute and compute importance again (be sure to use same parameters as above)
vimp <- replicate(num.permutations, {
  dat <- iris
  dat[, "Species"] <- dat[sample(nrow(dat)), "Species"]
  ranger(dependent.variable.name = "Species", data = dat, importance = "permutation")$variable.importance
})

# Compute p-values
pval <- sapply(1:nrow(vimp), function(i) {
  (sum(vimp[i, ] >= rf$variable.importance[i]) + 1)/(ncol(vimp) + 1)
})

res <- cbind(rf$variable.importance, pval)
colnames(res) <- c("importance", "pvalue")

res

(this is just copy&paste from the importance_pvalues() function)

@shihuang047

This comment has been minimized.

Copy link

commented Apr 8, 2019

Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.