Documenting how to access data for benchmarking #121

matt-graham · 2023-11-22T14:52:20Z

Raising as part of JOSS review openjournals/joss-reviews/issues/5901

As the data files are stored on Git LFS and the free LFS quota for this account seems to be regularly exceeded (see openjournals/joss-reviews#5901 (comment)) it would be useful to document an alternative approach for accessing the data, ideally one which uses an open data repository which doesn't require subscribing to an account to download. While the datasets have been made available on Kaggle (openjournals/joss-reviews#5901 (comment)) this is not currently documented in this repository and a Kaggle account is required to download. An open research data repository / archive like Zenodo would seem to be a better fit with JOSS requirement that the software should be stored in a repository that can be cloned without registration. While I don't think this strictly extends to data associated with the software, from a FAIR data and reproducibility perspective a service like Zenodo is much better than Kaggle.

A potentially even nicer approach would be to use a tool like pooch to automate getting the data from a remote repository as part of running the benchmarks.

vivekjoshy mentioned this issue Dec 11, 2023

Update paper to include suggestions by reviewer #116

Merged

vivekjoshy closed this as completed Jan 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documenting how to access data for benchmarking #121

Documenting how to access data for benchmarking #121

matt-graham commented Nov 22, 2023

Documenting how to access data for benchmarking #121

Documenting how to access data for benchmarking #121

Comments

matt-graham commented Nov 22, 2023