Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement spatial block-validation in ibis.iSDM ? #89

Open
6 tasks
Martin-Jung opened this issue Jan 29, 2024 · 1 comment
Open
6 tasks

Implement spatial block-validation in ibis.iSDM ? #89

Martin-Jung opened this issue Jan 29, 2024 · 1 comment
Labels
enhancement New feature or request Feature request Any new feature that would be nice to have Quality of life Quality of life improvement

Comments

@Martin-Jung
Copy link
Collaborator

(Spatial) Block validation has so far not been added to the package given the complexities of assigning blocks to single or multiple datasets that might be specified in the model. Thus in most projects we usually implement the cross-validation externally, e.g. providing subsets of training data to individual ibis fits and then validate them externally. I still think outsourcing validation to the user makes the most sense.

However...., given that increasingly we have a range of projects that need to rely on this, we could brainstorm on how to best support this functionality within ibis.
I judge this as a relatively big overhaul if implemented well.

So possible implementation steps:

  • I would suggest an implementation using the new spatialsample, which is relatively clean and aligns with the tidy philosophy.
  • Idea would be to have a new function called cross_validate() (or another name?) opposed to just validate(). This function would need to store the method and blocks somehow in the BiodiversityDistribution-class object so that it can be queried from within the object.
  • During the setup and training stage for each engine, there could be a query for these attributes to create the sets, run per set and store the validation statistics. Note that some engines support internal cross-validation (XGBoost for example).
  • The metric to be used for cross-validation would need to be saved in the BiodiversityDistribution-class object and also in the resulting object with the fits.
  • (Optional) functionality to not only make a cross-validation but also an ensemble of the various distributions fitted within the object.
  • The whole pipeline requires several unittests and likely its own vignette article ("Cross-validation and ensemble modelling") as well to demonstrate the procedure.

Thoughts?

@Martin-Jung Martin-Jung added enhancement New feature or request Quality of life Quality of life improvement Feature request Any new feature that would be nice to have labels Jan 29, 2024
@Martin-Jung
Copy link
Collaborator Author

This has been dormant since a while and as we implement this already in a range of processing pipelines, I am still in favour of not adding it directly I think.
Instead provide an extra vignette highlighting (spatial) cross-validation approaches might be an idea. The vignette could simply import and use the spatialsample package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Feature request Any new feature that would be nice to have Quality of life Quality of life improvement
Projects
None yet
Development

No branches or pull requests

1 participant