-
My random forest is being used to calculate predictions in a Shiny app, but the forest object is very large. I'm using the I'm not sure how the terminal-node quantities are stored on the backend in |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
So we do have ways of extracting terminal node information from the forest object on the R-side. When we don't care as much about the time it takes to create a model, but we do want rapid prediction, we typically save terminal node information in a fatter R-side object, and are able to restore the topology of the forest with terminal node quantities, without having to re-send training data down the forest during model restoration and prediction. The downside is that the forest object is large, but the upside is that terminal node restoration occurs on as as-needed just-in-time basis. If you only have one individual you wanted to send down the forest, we only restore the path from the root to the terminal node associated with that individual. Now that we know the terminal node identifier, we access the previously saved information for that terminal node, and calculate the predicted value. What is clear here is that we need the forest topology and the terminal node predicted value. We don't have a way of storing split information (i.e. the topology) and terminal node quantities the way you suggest. This discussion somewhat reminds me of an issue where a user was complaining about slow prediction times. It may not help you, but here is the link: I think it might be worth pursuing the anonymous forest object approach above and then reevaluate your prediction times. I think it would also help if you told us something about your data, and what family it comprises. Dimensions like n, p ntree, and the family would help. |
Beta Was this translation helpful? Give feedback.
So we do have ways of extracting terminal node information from the forest object on the R-side. When we don't care as much about the time it takes to create a model, but we do want rapid prediction, we typically save terminal node information in a fatter R-side object, and are able to restore the topology of the forest with terminal node quantities, without having to re-send training data down the forest during model restoration and prediction. The downside is that the forest object is large, but the upside is that terminal node restoration occurs on as as-needed just-in-time basis. If you only have one individual you wanted to send down the forest, we only restore the path from the root…