Normalization of ensemble estimators #401
Replies: 1 comment
-
To compute our own terminal node outcomes, you will need to look at $membership and $inbag. They are 2D arrays of dim n x ntree. The first will tell you the terminal node identifier of each case in each tree. In the second, if a cell is zero it is out of bag for a tree. If a cell is greater than zero it tells you how many times it was selected as in-bag in the sample for that tree. You define your predicted value from the in-bag elements (making sure you accommodate for multiple selections of the case in your statistic. The OOB cases are now assigned this value. The ensemble for a case takes each OOB case in each tree, sums the predicted value, and then normalizes by the number of times it is OOB in the forest. This is then the ensemble predicted value over the forest for that case. I would recommend a simple regression problem on mtcars or your favorite test data set. Don't bother writing a custom split rule. Just try to reconstruct the ensemble from first principles as described above. Then apply the technique to your custom split rule using your survival scenario. |
Beta Was this translation helpful? Give feedback.
-
I am implementing a proposed splitting rule in splitCustom.c. The objective is to calculate a survival indicator that is not available in the package (net survival). In order to do that, I am (i) building a forest; (ii) extracting the members of a terminal node of the forest object; and (iii) using them to estimate the quantity I need.
I did a little exercise to check my approach using the standard survival measure of the package. I compared the estimates provided by randomForestSRC versus the way I explained above at steps (i) -(iii) . The survival curves, however, were different. Going throgh the implementation, I could see that there is a sort of "normalization of the ensemble estimates". Am I wrong? If not, how can the normalizationis done?
Beta Was this translation helpful? Give feedback.
All reactions