-
-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieving individual predictions for each tree #23
Comments
Added for classification and regression. Is this also needed for probability prediction and survival? |
Thanks! I personally need it only for regression at the moment, but I guess it would be consistent to have this for all forests. |
this may be related, so I'm posting it here. The out of bag predictions in the ranger class object $predictions, are these also majority votes (for classification) or averages (for regression) ? |
Sorry for the delay. Yes these are also majority votes or averages. I will try to set the predict.all option for growing, too. |
Is it possible for the predict method to predict with a fewer number of trees than the original model? This may be useful for model selection. |
You could use |
Hi, I was wondering if "predict.all" could work for probability trees as wel, i.e. a probability result per tree. This would be useful for statistical stability analysis via sub-sampling and aggregating results as well as for easy estimation of the optimal tree number (similar to the initial post). Thanks. |
Yes that's possible and the reason why this issue is still open. I'll definitely add it for probability prediction and survival. |
Ok, great. A quick implementation reads as follows # the original forest is called "rf"
slice.size <- 10 # size of sub-forests
slice.number <- ceiling(rf$num.trees/slice.size) # number of sub-forests
rf_slice <- list()
for(slice.index in 1:slice.number) {
print(paste("Predicting slice",slice.index,"of",length(slice.number)))
From <- (slice.index - 1 )* slice.size + 1
To <- min(slice.index * slice.size,rf$num.trees)
rf_temp <- rf # copy forest object
# extract "slice.size" trees from original forest
rf_temp$num.trees <- slice.size
rf_temp$forest$child.nodeIDs <- rf$forest$child.nodeIDs[From:To]
rf_temp$forest$split.values <- rf$forest$split.values[From:To]
rf_temp$forest$split.varIDs <- rf$forest$split.varIDs[From:To]
rf_temp$forest$terminal.class.counts <- rf$forest$terminal.class.counts[From:To]
rf_temp$forest$num.trees <- slice.size
rf_slice[[slice.index]] <- rf_temp
}
# Prediction
Prediction_slice <- list()
for(slice in 1:length(rf_slice)) {
print(paste("Predicting slice",slice,"of",length(rf_slice)))
Prediction_temp <- predict(object = rf_slice[[slice]], data = testFeatures, num.threads = 1, predict.all = F)$prediction[,2] # extracts only probability of second class
Prediction_slice[[slice]] <- Prediction_temp
}
Prediction_slice <- do.call(cbind,Prediction_slice) # results in a matrix of predictions
I opted for a slice.size of 10, i.e. each of the sub-forests contains 10 trees. Note that this piece of code only extracts the probabilities of the second class - however extension to more classes is straightforward. Marvin, do you think this is a possible (quick) solution? |
Done. |
Option to get individual tree predictions (issues imbs-hl#18, imbs-hl#23)
Possibly related to #18.
Something like
predict.all
inpredict.randomForest
would be great, i.e. return a [obs]x[tree] matrix. We need this to estimate the standard deviation of the predictions.The text was updated successfully, but these errors were encountered: