Replies: 1 comment 1 reply
-
|
I would suggest using the missForest imputation method available in the The code for implementing this on your example would look something like: In the first call, each response is fit univariately, which means it conditions on all other species (and features) when building the prediction model. Therefore this is a univariate regression where ssp1 is regressed against everything, ssp2 is regressed against everything and finally ssp3 is regressed against everything in a cyclical fashion. The second call groups the responses into blocks of 3, which by the nature of your data since you have only 3 columns with missing data, will invoke multivariate regression using ssp1-ssp3 as the outcomes and regressing them on the remaining values. PS: there is a useful helper function |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Thank you for creating and maintaining randomForestSRC. I’m currently testing it for modelling and predicting multi-species communities, and I have a question about whether randomForestSRC is suitable for conditional prediction.
I use 'conditional prediction' to mean when we fit a multivariate model with all species, then use this model to predict a subset of those species for new outcomes. This unknown species subset is informed by any environmental predictors we included in the model as well as the abundance of the known subset of co-occurring species. I have used joint species modelling (the R packaged Hmsc) to do this, and it uses residual correlations among species estimated in the full model to then help inform the unknown species from the known species in the prediction. More info on conditional prediction can be found in: Wilkinson et al (2021). Defining and evaluating predictions of joint species distribution models. Methods in Ecology and Evolution, 12(3), 394-404.
I currently compare the Hmsc joint model against a set of univariate random forests. The univariate forests can create similar predictions simply by adding the ‘known’ species as predictor variables. But I’m wondering whether multivariate random forests can do something similar? I'm wondering if imputing missing values is a kind of conditional prediction. Does ‘impute’ in randomForestSRC use information from known species to inform species which have ‘NA’s?
Consider the following example data:
The final 5 rows for spp1, spp2, and spp3 are assumed unknown, and conditional prediction would predict those 15 values using their environmental responses as well as their correlation with the remaining 3 species. Normally I would first fit a model on training data with all species known then predict a species subset in new test data.
However, I can generate these missing 'NA' values using impute, and I've tried the following 3 impute options, each of which give similar predictive performance:
My questions are these:
I do note that when I make all species NA for the 'new' n rows, the predictive performance is only a little worse, so perhaps the information provided by the biomass of my co-occurring species is not very valuable.
Thanks for any advice!
Beta Was this translation helpful? Give feedback.
All reactions