[REVIEW]: Fast Resampling and Monte Carlo Methods in Python #5092

editorialbot · 2023-01-22T11:14:53Z

Submitting author: @mdhaber (Matt Haberland)
Repository: https://github.com/mdhaber/scipy
Branch with paper.md (empty if default branch): joss_resampling
Version: v1.10.0-joss-article
Editor: @jbytecode
Reviewers: @SaranjeetKaur, @kose-y
Archive: 10.5281/zenodo.8031631

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4"><img src="https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4/status.svg)](https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@coatless & @SaranjeetKaur, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review.
First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @jbytecode know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @SaranjeetKaur

📝 Checklist for @kose-y

The text was updated successfully, but these errors were encountered:

editorialbot · 2023-01-22T11:14:55Z

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot · 2023-01-22T11:15:23Z

Software report:

github.com/AlDanial/cloc v 1.88  T=5.24 s (485.0 files/s, 163896.3 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Python                          960          80209         144387         221351
C                               313          13419          33524          77926
Fortran 77                      409           3798          71804          73352
reStructuredText                403           9312           6466          29912
C/C++ Header                    135           3937           7263          18535
Cython                          124           5953           9618          17108
C++                              30           1680           2011          10682
Meson                            88            397            192           4878
JSON                              5             15              0           3303
YAML                             25            217            293           1723
TeX                               5            141            161           1289
diff                              2             44            572            731
INI                               4            221              0            507
Pascal                            3            115              0            466
Bourne Shell                      9             59             89            264
Markdown                          9             64              0            201
SVG                               1              4              0            133
make                              4             40             30            111
TOML                              1             19             46            110
CSS                               1             31             20            106
MATLAB                            5             42             45             94
R                                 1              5             12             67
Bourne Again Shell                2             14             26             48
HTML                              2              5              0             21
Dockerfile                        1              5             31             16
Unity-Prefab                      1              0              0              2
--------------------------------------------------------------------------------
SUM:                           2543         119746         276590         462936
--------------------------------------------------------------------------------


gitinspector failed to run statistical information for the repository

editorialbot · 2023-01-22T11:15:34Z

Wordcount for paper.md is 866

editorialbot · 2023-01-22T11:15:54Z

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.2307/2331554 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1214/aoms/1177729437 is OK
- 1544-6115.1585 is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot · 2023-01-22T11:16:36Z

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

jbytecode · 2023-01-22T11:18:01Z

Dear @coatless, Dear @SaranjeetKaur

This is the review thread. Firstly, type

@editorialbot generate my checklist

to generate your own checklist. In that checklist, there are 23 check items. Whenever you complete the corresponding task, you can check off them.

Please write your comments as separate posts and do not modify your checklist descriptions.

The review process is interactive so you can always interact with the authors, reviewers, and the editor. You can also create issues and pull requests in the target repository. Please do mention this thread's URL in the issues so we can keep tracking what is going on out of our world.

Please do not hesitate to ask me about anything, anytime.

Thank you in advance!

jbytecode · 2023-01-30T09:53:42Z

@coatless, @SaranjeetKaur - could you please generate your task list and update your status? thank you in advance.

jbytecode · 2023-02-10T07:46:35Z

@coatless, @SaranjeetKaur - After three weeks from assigning as reviewers, I am still failed to get at least a life signal from you (Are you fine?). Please generate your task lists to start your review. If you are not available, please ping me, so I can find another reviewers to proceed. Thank you in advance.

jbytecode · 2023-02-10T20:11:33Z

@coatless - no worries, thank you for the response

@SaranjeetKaur - We are waiting to hear from you also.

Thank you in advance.

SaranjeetKaur · 2023-02-13T11:47:54Z

@jbytecode - I am looking at this thread now

SaranjeetKaur · 2023-02-13T11:48:04Z

Review checklist for @SaranjeetKaur

Conflict of interest

I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

I confirm that I read and will adhere to the JOSS code of conduct.

General checks

Repository: Is the source code for this software available at the https://github.com/mdhaber/scipy?
License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
Contribution and authorship: Has the submitting author (@mdhaber) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

Installation: Does installation proceed as outlined in the documentation?
Functionality: Have the functional claims of the software been confirmed?
Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
State of the field: Do the authors describe how this software compares to other commonly-used packages?
Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

mdhaber · 2023-02-26T01:11:08Z

Since the paper is about specific contributions to a much larger library (as discussed in #5047 (comment)), I thought it might be helpful to point out some relevant files.

License: SciPy's license file is available at https://github.com/scipy/scipy/blob/main/LICENSE.txt.

Contributions: The "blame" of scipy.stats._resampling.py and scipy.stats.tests.test_resampling.py reveal some of my contributions to these features over the past two years.

Installation: Installation instructions for the whole library are at https://scipy.org/install/. The list of dependencies is available in the environment.yml file. Please use the latest version of SciPy (1.10.1) for best results.

Documentation: The scipy.stats API reference includes both input/output documentation and examples. More realistic usage examples are available in the scipy.stats tutorials.

Tests: Automated tests are in scipy.stats.tests.test_resampling.py. Continuous integration test results can be seen in any pull request to SciPy, or tests can be run locally using import scipy.stats; scipy.stats.test(verbose=2).

Community Guidlines: Community guidelines are available in the Developer Documentation.

Functionality: One way of verifying the functionality would be to follow the examples in the API reference. For example, the documentation of bootstrap shows that the true (population) value of the statistic lies within the 90% confidence interval approximately 900 times out of 1000 trials. The documentation of permutation_test and monte_carlo_test each compare the p-value produced by the function against the p-value of a comparable hypothesis test function. Many similar examples of real-world use cases are available in the tutorials. Of course, the functionality was also verified when the PRs introducing the functionally were reviewed by SciPy maintainers. The original PRs were:
scipy/scipy#13371
scipy/scipy#13899
scipy/scipy#14576
Additional functionality has been added over the past few years; the relevant PRs can be found by reviewing the "blame" of stats.stats._resampling.py.

Performance: The paper claims that these methods are "fast" because "the functions take advantage of vectorized user code, avoiding slow Python loops". There is an easy way to verify this. All three functions accept an argument statistic (callable) and a parameter vectorized (boolean). All of the examples show how the function can be used with a vectorized statistic; that is, one that accepts an $N$ dimensional array as input and returns the statistic of each slice defined by the parameter axis. To see that the functions take advantage of vectorized statistics to improve performance, simply replace statistic with a function that does not have an axis argument, and ensure that vectorized is not passed as an argument.

For example, the relevant parts of the `scipy.stats.bootstrap` example are:

import numpy as np
rng = np.random.default_rng()
from scipy.stats import norm
dist = norm(loc=2, scale=4)  # our "unknown" distribution
data = dist.rvs(size=100, random_state=rng)

from scipy.stats import bootstrap
data = (data,)  # samples must be in a sequence

# np.std is vectorized
%timeit bootstrap(data, statistic=np.std, confidence_level=0.9, random_state=rng)
# 12.2 ms ± 2.01 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

If we redefine the statistic so that it is not vectorized:

# remove the axis argument so that the statistic can only operate on one 1d sample at a time
%timeit bootstrap(data, statistic=lambda x: np.std(x), confidence_level=0.9, random_state=rng)
# 266 ms ± 14.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As an independent check, a naive implementation of the (percentile) bootstrap would be:

n_resamples = 9999  # default for `bootstrap`

def naive_bootstrap(data):
    bootstrap_distribution = []
    for j in range(9999):
        i = rng.integers(len(data), size=len(data))
        bootstrap_distribution.append(np.std(data[i]))
    return np.percentile(bootstrap_distribution, [5, 95])

%timeit naive_bootstrap(data[0])
# 265 ms ± 1.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I hope this helps!

~~@jbytecode would it be appropriate for me to also suggest how some functionality and performance claims might be verified?~~ Done!

jbytecode · 2023-02-26T17:35:56Z

@mdhaber - of course, addition to this, you can provide some reproducible test benchmark if exists. thank you.

mdhaber · 2023-02-27T04:06:28Z

Thanks @jbytecode I added notes on how one might begin to verify the functionality and performance claims. I'd be happy to provide other examples if it would help!

jbytecode · 2023-03-03T13:36:05Z

@coatless, @SaranjeetKaur - Could you please update your status and inform us on how is your review going? Thank you in advance.

SaranjeetKaur · 2023-03-03T13:39:38Z

Hi @jbytecode - I have reviewed it for general checks and am yet to test the software functionality and documentation part.

jbytecode · 2023-03-03T13:40:26Z

@SaranjeetKaur - Thank you for responding and updating your status.

jbytecode · 2023-03-22T09:53:29Z

@coatless, @SaranjeetKaur - Could you please update your status and tell us how is your review going? Thank you in advance.

jbytecode · 2023-03-24T19:14:53Z

@coatless, @SaranjeetKaur - Is it possible to get a life signal? Could you please update your status? Thank you in advance.

jbytecode · 2023-04-10T06:28:11Z

Dear @coatless and @SaranjeetKaur,

I am so sorry if I am bothering you. We are a little bit ahead of normal review times. Moreover, I can only reach the reviewers by email which in turn does not give me a significant result.

Please declare your availability and set a deadline in a few days. If you are not able to review this manuscript let me know, so I'll find new reviewers.

Thank you in advance.

SaranjeetKaur · 2023-04-12T07:37:46Z

@mdhaber - it might be helpful to add link to the references in the section "Statement of need" (that is, for all the 3 questions, when you are sharing examples which were introduced in the Summary)

Also when you are describing the state of the ecosystem before the release of SciPy 1.9.0 and how it was partially met by tutorials, blog posts, medium.com, and niche packages - share some references here too, if possible.

jbytecode · 2023-06-13T06:31:50Z

@mdhaber - sure

jbytecode · 2023-06-13T06:34:04Z

@mdhaber - if commits are only on manuscript and bibtex, it is okay, we can proceed with the current one.

mdhaber · 2023-06-13T06:38:30Z

Good to know. But I went ahead and recreated it. Same URL.
https://github.com/mdhaber/scipy/releases/tag/v1.10.0-joss-article

jbytecode · 2023-06-13T06:40:54Z

@editorialbot set v1.10.0-joss-article as version

@mdhaber - we can now go on with the archive creation. most authors choose Zenodo. please create an archive associated to this version tag and report the DOI here.

editorialbot · 2023-06-13T06:40:56Z

Done! version is now v1.10.0-joss-article

mdhaber · 2023-06-13T06:46:59Z

10.5281/zenodo.8031631

jbytecode · 2023-06-13T06:52:39Z

@mdhaber - In the software repo, it seems the license is BSD-3-Clause license whereas in Zenodo archive it is Creative Commons Attribution 4.0 International. Could you please correct that and report the changes here

mdhaber · 2023-06-13T06:59:16Z

Thanks. I must have been fixing that as you were writing this. I think it fixed the license at the original DOI.

jbytecode · 2023-06-13T07:00:34Z

@editorialbot set 10.5281/zenodo.8031631 as archive

editorialbot · 2023-06-13T07:00:36Z

Done! archive is now 10.5281/zenodo.8031631

jbytecode · 2023-06-13T07:01:12Z

@editorialbot check references

editorialbot · 2023-06-13T07:01:31Z

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.2307/2331554 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1214/aoms/1177729437 is OK
- 1544-6115.1585 is OK

MISSING DOIs

- None

INVALID DOIs

- None

jbytecode · 2023-06-13T07:02:21Z

@editorialbot generate pdf

editorialbot · 2023-06-13T07:04:26Z

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

jbytecode · 2023-06-13T07:06:15Z

@mdhaber - All of the stuff looks fine to me. I am now recommending an acceptance. The track editor or editor-in-chief will have the final decision. Thank you!

jbytecode · 2023-06-13T07:07:21Z

@editorialbot recommend-accept

editorialbot · 2023-06-13T07:07:23Z

Attempting dry run of processing paper acceptance...

editorialbot · 2023-06-13T07:07:45Z

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.2307/2331554 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1214/aoms/1177729437 is OK
- 1544-6115.1585 is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot · 2023-06-13T07:09:17Z

👋 @openjournals/dsais-eics, this paper is ready to be accepted and published.

Check final proof 👉📄 Download article

If the paper PDF and the deposit XML files look good in openjournals/joss-papers#4302, then you can now move forward with accepting the submission by compiling again with the command @editorialbot accept

gkthiruvathukal · 2023-06-24T19:39:00Z

@editorialbot accept

editorialbot · 2023-06-24T19:39:01Z

Doing it live! Attempting automated processing of paper acceptance...

editorialbot · 2023-06-24T19:41:18Z

Ensure proper citation by uploading a plain text CITATION.cff file to the default branch of your repository.

If using GitHub, a Cite this repository menu will appear in the About section, containing both APA and BibTeX formats. When exported to Zotero using a browser plugin, Zotero will automatically create an entry using the information contained in the .cff file.

You can copy the contents for your CITATION.cff file here:

CITATION.cff

cff-version: "1.2.0"
authors:
- family-names: Haberland
  given-names: Matt
  orcid: "https://orcid.org/0000-0003-4806-3601"
contact:
- family-names: Haberland
  given-names: Matt
  orcid: "https://orcid.org/0000-0003-4806-3601"
doi: 10.5281/zenodo.8031631
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Haberland
    given-names: Matt
    orcid: "https://orcid.org/0000-0003-4806-3601"
  date-published: 2023-06-24
  doi: 10.21105/joss.05092
  issn: 2475-9066
  issue: 86
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 5092
  title: Fast Resampling and Monte Carlo Methods in Python
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.05092"
  volume: 8
title: Fast Resampling and Monte Carlo Methods in Python

If the repository is not hosted on GitHub, a .cff file can still be uploaded to set your preferred citation. Users will be able to manually copy and paste the citation.

Find more information on .cff files here and here.

editorialbot · 2023-06-24T19:41:23Z

🐘🐘🐘 👉 Toot for this paper 👈 🐘🐘🐘

editorialbot · 2023-06-24T19:41:24Z

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

Check final PDF and Crossref metadata that was deposited 👉 Creating pull request for 10.21105.joss.05092 joss-papers#4344
Wait a couple of minutes, then verify that the paper DOI resolves https://doi.org/10.21105/joss.05092
If everything looks good, then close this review issue.
Party like you just published a paper! 🎉🌈🦄💃👻🤘

Any issues? Notify your editorial technical team...

editorialbot · 2023-06-24T19:42:42Z

🎉🎉🎉 Congratulations on your paper acceptance! 🎉🎉🎉

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](https://joss.theoj.org/papers/10.21105/joss.05092/status.svg)](https://doi.org/10.21105/joss.05092)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.05092">
  <img src="https://joss.theoj.org/papers/10.21105/joss.05092/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: https://joss.theoj.org/papers/10.21105/joss.05092/status.svg
   :target: https://doi.org/10.21105/joss.05092

This is how it will look in your documentation:

We need your help!

The Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:

Volunteering to review for us sometime in the future. You can add your name to the reviewer list here: https://reviewers.joss.theoj.org/join
Making a small donation to support our running costs here: https://numfocus.org/donate-to-joss

mdhaber · 2023-06-24T21:40:51Z

Thanks, everyone!

editorialbot added Meson Python review Starlark Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning waitlisted Submissions in the JOSS backlog due to reduced service mode. labels Jan 22, 2023

editorialbot assigned jbytecode Jan 22, 2023

editorialbot mentioned this issue Jan 22, 2023

[PRE REVIEW]: Fast Resampling and Monte Carlo Methods in Python #5047

Closed

jbytecode removed the waitlisted Submissions in the JOSS backlog due to reduced service mode. label Jan 22, 2023

editorialbot added the recommend-accept Papers recommended for acceptance in JOSS. label Jun 13, 2023

editorialbot added accepted published Papers published in JOSS labels Jun 24, 2023

gkthiruvathukal closed this as completed Jun 24, 2023

[REVIEW]: Fast Resampling and Monte Carlo Methods in Python #5092

[REVIEW]: Fast Resampling and Monte Carlo Methods in Python #5092

Comments

editorialbot commented Jan 22, 2023 • edited

Status

Reviewer instructions & questions

Checklists

editorialbot commented Jan 22, 2023

editorialbot commented Jan 22, 2023

editorialbot commented Jan 22, 2023

editorialbot commented Jan 22, 2023

editorialbot commented Jan 22, 2023

jbytecode commented Jan 22, 2023

jbytecode commented Jan 30, 2023

jbytecode commented Feb 10, 2023

jbytecode commented Feb 10, 2023

SaranjeetKaur commented Feb 13, 2023

SaranjeetKaur commented Feb 13, 2023 • edited

Review checklist for @SaranjeetKaur

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

mdhaber commented Feb 26, 2023 • edited

jbytecode commented Feb 26, 2023

mdhaber commented Feb 27, 2023

jbytecode commented Mar 3, 2023

SaranjeetKaur commented Mar 3, 2023

jbytecode commented Mar 3, 2023

jbytecode commented Mar 22, 2023

jbytecode commented Mar 24, 2023

jbytecode commented Apr 10, 2023

SaranjeetKaur commented Apr 12, 2023

jbytecode commented Jun 13, 2023

jbytecode commented Jun 13, 2023

mdhaber commented Jun 13, 2023

jbytecode commented Jun 13, 2023

editorialbot commented Jun 13, 2023

mdhaber commented Jun 13, 2023

jbytecode commented Jun 13, 2023

mdhaber commented Jun 13, 2023 • edited

jbytecode commented Jun 13, 2023

editorialbot commented Jun 13, 2023

jbytecode commented Jun 13, 2023

editorialbot commented Jun 13, 2023

jbytecode commented Jun 13, 2023

editorialbot commented Jun 13, 2023

jbytecode commented Jun 13, 2023

jbytecode commented Jun 13, 2023

editorialbot commented Jun 13, 2023

editorialbot commented Jun 13, 2023

editorialbot commented Jun 13, 2023

gkthiruvathukal commented Jun 24, 2023

editorialbot commented Jun 24, 2023

editorialbot commented Jun 24, 2023

editorialbot commented Jun 24, 2023

editorialbot commented Jun 24, 2023

editorialbot commented Jun 24, 2023

mdhaber commented Jun 24, 2023

editorialbot commented Jan 22, 2023 •

edited

SaranjeetKaur commented Feb 13, 2023 •

edited

mdhaber commented Feb 26, 2023 •

edited

mdhaber commented Jun 13, 2023 •

edited