Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieving individual predictions for each tree #23

Closed
mllg opened this issue Nov 23, 2015 · 10 comments
Closed

Retrieving individual predictions for each tree #23

mllg opened this issue Nov 23, 2015 · 10 comments

Comments

@mllg
Copy link

mllg commented Nov 23, 2015

Possibly related to #18.

Something like predict.all in predict.randomForest would be great, i.e. return a [obs]x[tree] matrix. We need this to estimate the standard deviation of the predictions.

mnwright pushed a commit that referenced this issue Dec 1, 2015
mnwright added a commit that referenced this issue Dec 1, 2015
Option to get individual tree predictions (issues #18, #23)
@mnwright
Copy link
Member

mnwright commented Dec 1, 2015

Added for classification and regression. Is this also needed for probability prediction and survival?

@mllg
Copy link
Author

mllg commented Dec 1, 2015

Thanks! I personally need it only for regression at the moment, but I guess it would be consistent to have this for all forests.

@krz
Copy link

krz commented Feb 20, 2016

this may be related, so I'm posting it here. The out of bag predictions in the ranger class object $predictions, are these also majority votes (for classification) or averages (for regression) ?
I would be interested in all oob predictions from all trees.

@mnwright
Copy link
Member

mnwright commented Mar 2, 2016

Sorry for the delay. Yes these are also majority votes or averages. I will try to set the predict.all option for growing, too.

@ajai-sharma-backup
Copy link

Is it possible for the predict method to predict with a fewer number of trees than the original model? This may be useful for model selection.

@mnwright
Copy link
Member

You could use predict.all = TRUE and compute manual majority votes or averages on a subset of the trees.

@crarlus
Copy link

crarlus commented Oct 11, 2016

Hi, I was wondering if "predict.all" could work for probability trees as wel, i.e. a probability result per tree. This would be useful for statistical stability analysis via sub-sampling and aggregating results as well as for easy estimation of the optimal tree number (similar to the initial post). Thanks.

@mnwright
Copy link
Member

Yes that's possible and the reason why this issue is still open. I'll definitely add it for probability prediction and survival.

@crarlus
Copy link

crarlus commented Oct 12, 2016

Ok, great.
I devised an interim solution by slicing the forest into smaller sub-forest and making individual predictions on these sub-forests. One can then aggregate a number of sub-predictions for robustness studies etc.

A quick implementation reads as follows

# the original forest is called "rf"

slice.size <- 10 # size of sub-forests
slice.number <- ceiling(rf$num.trees/slice.size) # number of sub-forests

rf_slice <- list()
for(slice.index in 1:slice.number) {

  print(paste("Predicting slice",slice.index,"of",length(slice.number)))

  From <- (slice.index - 1 )* slice.size + 1
  To <- min(slice.index * slice.size,rf$num.trees)

  rf_temp <- rf # copy forest object

# extract "slice.size" trees from original forest
  rf_temp$num.trees <- slice.size 
  rf_temp$forest$child.nodeIDs <- rf$forest$child.nodeIDs[From:To]
  rf_temp$forest$split.values <- rf$forest$split.values[From:To]
  rf_temp$forest$split.varIDs <- rf$forest$split.varIDs[From:To]
  rf_temp$forest$terminal.class.counts <- rf$forest$terminal.class.counts[From:To]
  rf_temp$forest$num.trees <- slice.size 

  rf_slice[[slice.index]] <- rf_temp

}

# Prediction

  Prediction_slice <- list()
  for(slice in 1:length(rf_slice)) {

    print(paste("Predicting slice",slice,"of",length(rf_slice)))
    Prediction_temp <- predict(object = rf_slice[[slice]], data = testFeatures, num.threads = 1, predict.all = F)$prediction[,2] # extracts only probability of second class
    Prediction_slice[[slice]] <- Prediction_temp 
  }
  Prediction_slice <- do.call(cbind,Prediction_slice) # results in a matrix of predictions

I opted for a slice.size of 10, i.e. each of the sub-forests contains 10 trees. Note that this piece of code only extracts the probabilities of the second class - however extension to more classes is straightforward.

Marvin, do you think this is a possible (quick) solution?

@mnwright
Copy link
Member

mnwright commented Dec 9, 2016

Done.

mnwright pushed a commit to rcannood/ranger that referenced this issue Oct 23, 2019
mnwright added a commit to rcannood/ranger that referenced this issue Oct 23, 2019
Option to get individual tree predictions (issues imbs-hl#18, imbs-hl#23)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants