Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable Importance Output Issue with MARS Algorithm in SSDM package #124

Closed
Montyx23 opened this issue Apr 11, 2023 · 3 comments
Closed
Labels

Comments

@Montyx23
Copy link

Montyx23 commented Apr 11, 2023

Hi there @sylvainschmitt, @lukasbaumbach,

I am writing to report an issue I encountered while using SSDM for a large modelling project. The project includes 10 species, each modelled with species-specific variables, and for every region across a stratified study area of 14 regions (resulting in a total of 140 model outputs). I used the ensemble modelling function with RF, MARS, SVM, and ANN algorithms with 5 repeats for all model runs. Note that I only have presence data, so automatic pseudo-absence was generated for all.

For the first ~70 models, I used a loop to iterate over all combinations of species, regions, and species-specific variables with the same parameters as the following code snippet:

ESDM <- ensemble_modelling( c('ANN', 'SVM', 'CTA', 'MARS', 'RF'), ensemble.metric = c('prop.correct'), ensemble.thresh = c(0.75), occurrence_data, predictor_variables, rep = 5, Xcol = 'x', Ycol = 'y' )

However, I ran into memory and speed issues, and then manually built the remaining 70 ESDMs via the GUI feature. In the GUI, I used the same parameters, and everything else was left as default parameters.

All model projections, evaluation results, and other files came out fine. However, the variable importance outputs for some models looked like this:

Presence Presence Presence Presence Presence Presence Presence Presence Presence Presence Presence Presence
Axes.evaluation 6.06506260362906 3.22714547662735 12.9818109189702 4.25058836645605 6.9628498591444 2.60560023216018 5.63802728195239 26.7530966936013 13.3709808348051 5.19605061823909 9.42792640373865

The number of 'Presence' columns is the same as the number of original predictor variables I used for that particular model, and this is the same for other outputs too, indicating that these numbers are likely the variable importance numbers for the predictor variables.

I tested some single algorithm SDM models in the GUI with the MARS algorithm on a number of species and regions, and the variable importance table always came out looking like that. The issue with this is that now I am writing a report and producing summary statistics on the modelling and have no idea how to explain the chunk of models which have 'Presence' variable importance names. Below is a boxplot I created which shows a high-level overview of variable importance across all the species and regions, and as you can see, 'Presence' stands out clearly and accounts for 32.58% of the frequency of variables for all models.

Rplot01

I would like to ask for help understanding if this is a result of my configuration, or if it is really a bug. I am not very experienced with these algorithms, especially MARS. Ideally, I would like to be able to link these 'Presence' columns to their respective variable names.

Thank you for your time and for developing such an awesome package! :)

@lukasbaumbach
Copy link
Collaborator

Hi,
to me this sounds like a bug in the evaluate.axes function when creating the variable importance data.frame (probably here: names(obj@variable.importance) <- names(obj@data)[4:(length(obj@data)-1)]. )
I'll try to look into it.
Best,
Lukas

@Montyx23
Copy link
Author

Thanks Lukas, hope you manage to fix it.

@sylvainschmitt
Copy link
Owner

@Montyx23 , this is fixed. I'll pushed the updated version on GitHub today.

sylvainschmitt added a commit that referenced this issue May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants