You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am writing to report an issue I encountered while using SSDM for a large modelling project. The project includes 10 species, each modelled with species-specific variables, and for every region across a stratified study area of 14 regions (resulting in a total of 140 model outputs). I used the ensemble modelling function with RF, MARS, SVM, and ANN algorithms with 5 repeats for all model runs. Note that I only have presence data, so automatic pseudo-absence was generated for all.
For the first ~70 models, I used a loop to iterate over all combinations of species, regions, and species-specific variables with the same parameters as the following code snippet:
However, I ran into memory and speed issues, and then manually built the remaining 70 ESDMs via the GUI feature. In the GUI, I used the same parameters, and everything else was left as default parameters.
All model projections, evaluation results, and other files came out fine. However, the variable importance outputs for some models looked like this:
Presence
Presence
Presence
Presence
Presence
Presence
Presence
Presence
Presence
Presence
Presence
Presence
Axes.evaluation
6.06506260362906
3.22714547662735
12.9818109189702
4.25058836645605
6.9628498591444
2.60560023216018
5.63802728195239
26.7530966936013
13.3709808348051
5.19605061823909
9.42792640373865
The number of 'Presence' columns is the same as the number of original predictor variables I used for that particular model, and this is the same for other outputs too, indicating that these numbers are likely the variable importance numbers for the predictor variables.
I tested some single algorithm SDM models in the GUI with the MARS algorithm on a number of species and regions, and the variable importance table always came out looking like that. The issue with this is that now I am writing a report and producing summary statistics on the modelling and have no idea how to explain the chunk of models which have 'Presence' variable importance names. Below is a boxplot I created which shows a high-level overview of variable importance across all the species and regions, and as you can see, 'Presence' stands out clearly and accounts for 32.58% of the frequency of variables for all models.
I would like to ask for help understanding if this is a result of my configuration, or if it is really a bug. I am not very experienced with these algorithms, especially MARS. Ideally, I would like to be able to link these 'Presence' columns to their respective variable names.
Thank you for your time and for developing such an awesome package! :)
The text was updated successfully, but these errors were encountered:
Hi,
to me this sounds like a bug in the evaluate.axes function when creating the variable importance data.frame (probably here: names(obj@variable.importance) <- names(obj@data)[4:(length(obj@data)-1)]. )
I'll try to look into it.
Best,
Lukas
Hi there @sylvainschmitt, @lukasbaumbach,
I am writing to report an issue I encountered while using SSDM for a large modelling project. The project includes 10 species, each modelled with species-specific variables, and for every region across a stratified study area of 14 regions (resulting in a total of 140 model outputs). I used the ensemble modelling function with RF, MARS, SVM, and ANN algorithms with 5 repeats for all model runs. Note that I only have presence data, so automatic pseudo-absence was generated for all.
For the first ~70 models, I used a loop to iterate over all combinations of species, regions, and species-specific variables with the same parameters as the following code snippet:
ESDM <- ensemble_modelling( c('ANN', 'SVM', 'CTA', 'MARS', 'RF'), ensemble.metric = c('prop.correct'), ensemble.thresh = c(0.75), occurrence_data, predictor_variables, rep = 5, Xcol = 'x', Ycol = 'y' )
However, I ran into memory and speed issues, and then manually built the remaining 70 ESDMs via the GUI feature. In the GUI, I used the same parameters, and everything else was left as default parameters.
All model projections, evaluation results, and other files came out fine. However, the variable importance outputs for some models looked like this:
The number of 'Presence' columns is the same as the number of original predictor variables I used for that particular model, and this is the same for other outputs too, indicating that these numbers are likely the variable importance numbers for the predictor variables.
I tested some single algorithm SDM models in the GUI with the MARS algorithm on a number of species and regions, and the variable importance table always came out looking like that. The issue with this is that now I am writing a report and producing summary statistics on the modelling and have no idea how to explain the chunk of models which have 'Presence' variable importance names. Below is a boxplot I created which shows a high-level overview of variable importance across all the species and regions, and as you can see, 'Presence' stands out clearly and accounts for 32.58% of the frequency of variables for all models.
I would like to ask for help understanding if this is a result of my configuration, or if it is really a bug. I am not very experienced with these algorithms, especially MARS. Ideally, I would like to be able to link these 'Presence' columns to their respective variable names.
Thank you for your time and for developing such an awesome package! :)
The text was updated successfully, but these errors were encountered: