Differences in permutation variable importance between ranger and randomForestSRC
#453
Unanswered
mickeycampbell
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
Looking through the VIMP vignette, I have come to understand (to the extent that I am comprehending the mathematical notation correctly...) that
rfsrc()'s variable importance measure in the context of a regression problem whereimportance = "permute"should be the difference in OOB MSE that occurs when a predictor variable's values are randomly permuted in comparison to that when the original values are retained.Assuming this is correct, then that appears to be the same measure used in the
rangerpackage, as described here.However, when I run the same dataset through
ranger()andrfsrc()while attempting to mirror the two models' parameters (e.g., same number of trees, mtry, nodesize, etc.), I get substantially different variable importance values. The variable importance fromrfsrc()values are, on average, about 50% lower than those ofranger(). The order of the importance is very similar, which is reassuring. And the model performance is nearly identical, in terms of OOB R^2^ and MSE. I would expect, due to the randomness of random forests, some inherent and minor differences, but the systematically lower values that emerge fromrfsrc()(or, conversely, the systematically higher values fromranger()) have me a bit puzzled.FWIW, I also explored variable importance from
randomForest(), but understand that those values are normalized by the standard deviation of the MSE differences, making direct comparison to the outputs ofranger()andrfsrc()a bit more challenging.I will attach the dataset I am using in case anyone wants to try to reproduce exactly what I am finding.
biois the response variable (biomass, in Mg/ha), and all other fields are airborne lidar structural predictor variables.biomass.csv
Here is a quick code snippet to highlight my approach:
Here is what is returned:
Thanks to anyone who can provide some insights!
Beta Was this translation helpful? Give feedback.
All reactions