Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: Fast Resampling and Monte Carlo Methods in Python #5092

Closed
editorialbot opened this issue Jan 22, 2023 · 94 comments
Closed

[REVIEW]: Fast Resampling and Monte Carlo Methods in Python #5092

editorialbot opened this issue Jan 22, 2023 · 94 comments
Assignees
Labels
accepted Meson published Papers published in JOSS Python recommend-accept Papers recommended for acceptance in JOSS. review Starlark Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning

Comments

@editorialbot
Copy link
Collaborator

editorialbot commented Jan 22, 2023

Submitting author: @mdhaber (Matt Haberland)
Repository: https://github.com/mdhaber/scipy
Branch with paper.md (empty if default branch): joss_resampling
Version: v1.10.0-joss-article
Editor: @jbytecode
Reviewers: @SaranjeetKaur, @kose-y
Archive: 10.5281/zenodo.8031631

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4"><img src="https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4/status.svg)](https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@coatless & @SaranjeetKaur, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review.
First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @jbytecode know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @SaranjeetKaur

📝 Checklist for @kose-y

@editorialbot editorialbot added Meson Python review Starlark Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning waitlisted Submissions in the JOSS backlog due to reduced service mode. labels Jan 22, 2023
@editorialbot
Copy link
Collaborator Author

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

Software report:

github.com/AlDanial/cloc v 1.88  T=5.24 s (485.0 files/s, 163896.3 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Python                          960          80209         144387         221351
C                               313          13419          33524          77926
Fortran 77                      409           3798          71804          73352
reStructuredText                403           9312           6466          29912
C/C++ Header                    135           3937           7263          18535
Cython                          124           5953           9618          17108
C++                              30           1680           2011          10682
Meson                            88            397            192           4878
JSON                              5             15              0           3303
YAML                             25            217            293           1723
TeX                               5            141            161           1289
diff                              2             44            572            731
INI                               4            221              0            507
Pascal                            3            115              0            466
Bourne Shell                      9             59             89            264
Markdown                          9             64              0            201
SVG                               1              4              0            133
make                              4             40             30            111
TOML                              1             19             46            110
CSS                               1             31             20            106
MATLAB                            5             42             45             94
R                                 1              5             12             67
Bourne Again Shell                2             14             26             48
HTML                              2              5              0             21
Dockerfile                        1              5             31             16
Unity-Prefab                      1              0              0              2
--------------------------------------------------------------------------------
SUM:                           2543         119746         276590         462936
--------------------------------------------------------------------------------


gitinspector failed to run statistical information for the repository

@editorialbot
Copy link
Collaborator Author

Wordcount for paper.md is 866

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.2307/2331554 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1214/aoms/1177729437 is OK
- 1544-6115.1585 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@jbytecode
Copy link

Dear @coatless, Dear @SaranjeetKaur

This is the review thread. Firstly, type

@editorialbot generate my checklist

to generate your own checklist. In that checklist, there are 23 check items. Whenever you complete the corresponding task, you can check off them.

Please write your comments as separate posts and do not modify your checklist descriptions.

The review process is interactive so you can always interact with the authors, reviewers, and the editor. You can also create issues and pull requests in the target repository. Please do mention this thread's URL in the issues so we can keep tracking what is going on out of our world.

Please do not hesitate to ask me about anything, anytime.

Thank you in advance!

@jbytecode jbytecode removed the waitlisted Submissions in the JOSS backlog due to reduced service mode. label Jan 22, 2023
@jbytecode
Copy link

@coatless, @SaranjeetKaur - could you please generate your task list and update your status? thank you in advance.

@jbytecode
Copy link

@coatless, @SaranjeetKaur - After three weeks from assigning as reviewers, I am still failed to get at least a life signal from you (Are you fine?). Please generate your task lists to start your review. If you are not available, please ping me, so I can find another reviewers to proceed. Thank you in advance.

@jbytecode
Copy link

@coatless - no worries, thank you for the response

@SaranjeetKaur - We are waiting to hear from you also.

Thank you in advance.

@SaranjeetKaur
Copy link

@jbytecode - I am looking at this thread now

@SaranjeetKaur
Copy link

SaranjeetKaur commented Feb 13, 2023

Review checklist for @SaranjeetKaur

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the https://github.com/mdhaber/scipy?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@mdhaber) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
  • Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
  • Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
  • Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

@mdhaber
Copy link

mdhaber commented Feb 26, 2023

Since the paper is about specific contributions to a much larger library (as discussed in #5047 (comment)), I thought it might be helpful to point out some relevant files.

License: SciPy's license file is available at https://github.com/scipy/scipy/blob/main/LICENSE.txt.

Contributions: The "blame" of scipy.stats._resampling.py and scipy.stats.tests.test_resampling.py reveal some of my contributions to these features over the past two years.

Installation: Installation instructions for the whole library are at https://scipy.org/install/. The list of dependencies is available in the environment.yml file. Please use the latest version of SciPy (1.10.1) for best results.

Documentation: The scipy.stats API reference includes both input/output documentation and examples. More realistic usage examples are available in the scipy.stats tutorials.

Tests: Automated tests are in scipy.stats.tests.test_resampling.py. Continuous integration test results can be seen in any pull request to SciPy, or tests can be run locally using import scipy.stats; scipy.stats.test(verbose=2).

Community Guidlines: Community guidelines are available in the Developer Documentation.

Functionality: One way of verifying the functionality would be to follow the examples in the API reference. For example, the documentation of bootstrap shows that the true (population) value of the statistic lies within the 90% confidence interval approximately 900 times out of 1000 trials. The documentation of permutation_test and monte_carlo_test each compare the p-value produced by the function against the p-value of a comparable hypothesis test function. Many similar examples of real-world use cases are available in the tutorials. Of course, the functionality was also verified when the PRs introducing the functionally were reviewed by SciPy maintainers. The original PRs were:
scipy/scipy#13371
scipy/scipy#13899
scipy/scipy#14576
Additional functionality has been added over the past few years; the relevant PRs can be found by reviewing the "blame" of stats.stats._resampling.py.

Performance: The paper claims that these methods are "fast" because "the functions take advantage of vectorized user code, avoiding slow Python loops". There is an easy way to verify this. All three functions accept an argument statistic (callable) and a parameter vectorized (boolean). All of the examples show how the function can be used with a vectorized statistic; that is, one that accepts an $N$ dimensional array as input and returns the statistic of each slice defined by the parameter axis. To see that the functions take advantage of vectorized statistics to improve performance, simply replace statistic with a function that does not have an axis argument, and ensure that vectorized is not passed as an argument.

For example, the relevant parts of the `scipy.stats.bootstrap` example are:
import numpy as np
rng = np.random.default_rng()
from scipy.stats import norm
dist = norm(loc=2, scale=4)  # our "unknown" distribution
data = dist.rvs(size=100, random_state=rng)

from scipy.stats import bootstrap
data = (data,)  # samples must be in a sequence

# np.std is vectorized
%timeit bootstrap(data, statistic=np.std, confidence_level=0.9, random_state=rng)
# 12.2 ms ± 2.01 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

If we redefine the statistic so that it is not vectorized:

# remove the axis argument so that the statistic can only operate on one 1d sample at a time
%timeit bootstrap(data, statistic=lambda x: np.std(x), confidence_level=0.9, random_state=rng)
# 266 ms ± 14.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As an independent check, a naive implementation of the (percentile) bootstrap would be:

n_resamples = 9999  # default for `bootstrap`

def naive_bootstrap(data):
    bootstrap_distribution = []
    for j in range(9999):
        i = rng.integers(len(data), size=len(data))
        bootstrap_distribution.append(np.std(data[i]))
    return np.percentile(bootstrap_distribution, [5, 95])

%timeit naive_bootstrap(data[0])
# 265 ms ± 1.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I hope this helps!

@jbytecode would it be appropriate for me to also suggest how some functionality and performance claims might be verified? Done!

@jbytecode
Copy link

@mdhaber - of course, addition to this, you can provide some reproducible test benchmark if exists. thank you.

@mdhaber
Copy link

mdhaber commented Feb 27, 2023

Thanks @jbytecode I added notes on how one might begin to verify the functionality and performance claims. I'd be happy to provide other examples if it would help!

@jbytecode
Copy link

@coatless, @SaranjeetKaur - Could you please update your status and inform us on how is your review going? Thank you in advance.

@SaranjeetKaur
Copy link

Hi @jbytecode - I have reviewed it for general checks and am yet to test the software functionality and documentation part.

@jbytecode
Copy link

@SaranjeetKaur - Thank you for responding and updating your status.

@jbytecode
Copy link

@coatless, @SaranjeetKaur - Could you please update your status and tell us how is your review going? Thank you in advance.

@jbytecode
Copy link

@coatless, @SaranjeetKaur - Is it possible to get a life signal? Could you please update your status? Thank you in advance.

@jbytecode
Copy link

Dear @coatless and @SaranjeetKaur,

I am so sorry if I am bothering you. We are a little bit ahead of normal review times. Moreover, I can only reach the reviewers by email which in turn does not give me a significant result.

Please declare your availability and set a deadline in a few days. If you are not able to review this manuscript let me know, so I'll find new reviewers.

Thank you in advance.

@SaranjeetKaur
Copy link

@mdhaber - it might be helpful to add link to the references in the section "Statement of need" (that is, for all the 3 questions, when you are sharing examples which were introduced in the Summary)

Also when you are describing the state of the ecosystem before the release of SciPy 1.9.0 and how it was partially met by tutorials, blog posts, medium.com, and niche packages - share some references here too, if possible.

@jbytecode
Copy link

@mdhaber - sure

@jbytecode
Copy link

@mdhaber - if commits are only on manuscript and bibtex, it is okay, we can proceed with the current one.

@mdhaber
Copy link

mdhaber commented Jun 13, 2023

Good to know. But I went ahead and recreated it. Same URL.
https://github.com/mdhaber/scipy/releases/tag/v1.10.0-joss-article

@jbytecode
Copy link

@editorialbot set v1.10.0-joss-article as version

@mdhaber - we can now go on with the archive creation. most authors choose Zenodo. please create an archive associated to this version tag and report the DOI here.

@editorialbot
Copy link
Collaborator Author

Done! version is now v1.10.0-joss-article

@mdhaber
Copy link

mdhaber commented Jun 13, 2023

10.5281/zenodo.8031631

@jbytecode
Copy link

@mdhaber - In the software repo, it seems the license is BSD-3-Clause license whereas in Zenodo archive it is Creative Commons Attribution 4.0 International. Could you please correct that and report the changes here

@mdhaber
Copy link

mdhaber commented Jun 13, 2023

Thanks. I must have been fixing that as you were writing this. I think it fixed the license at the original DOI.
image

@jbytecode
Copy link

@editorialbot set 10.5281/zenodo.8031631 as archive

@editorialbot
Copy link
Collaborator Author

Done! archive is now 10.5281/zenodo.8031631

@jbytecode
Copy link

@editorialbot check references

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.2307/2331554 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1214/aoms/1177729437 is OK
- 1544-6115.1585 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@jbytecode
Copy link

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@jbytecode
Copy link

@mdhaber - All of the stuff looks fine to me. I am now recommending an acceptance. The track editor or editor-in-chief will have the final decision. Thank you!

@jbytecode
Copy link

@editorialbot recommend-accept

@editorialbot
Copy link
Collaborator Author

Attempting dry run of processing paper acceptance...

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.2307/2331554 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1214/aoms/1177729437 is OK
- 1544-6115.1585 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

👋 @openjournals/dsais-eics, this paper is ready to be accepted and published.

Check final proof 👉📄 Download article

If the paper PDF and the deposit XML files look good in openjournals/joss-papers#4302, then you can now move forward with accepting the submission by compiling again with the command @editorialbot accept

@editorialbot editorialbot added the recommend-accept Papers recommended for acceptance in JOSS. label Jun 13, 2023
@gkthiruvathukal
Copy link

@editorialbot accept

@editorialbot
Copy link
Collaborator Author

Doing it live! Attempting automated processing of paper acceptance...

@editorialbot
Copy link
Collaborator Author

Ensure proper citation by uploading a plain text CITATION.cff file to the default branch of your repository.

If using GitHub, a Cite this repository menu will appear in the About section, containing both APA and BibTeX formats. When exported to Zotero using a browser plugin, Zotero will automatically create an entry using the information contained in the .cff file.

You can copy the contents for your CITATION.cff file here:

CITATION.cff

cff-version: "1.2.0"
authors:
- family-names: Haberland
  given-names: Matt
  orcid: "https://orcid.org/0000-0003-4806-3601"
contact:
- family-names: Haberland
  given-names: Matt
  orcid: "https://orcid.org/0000-0003-4806-3601"
doi: 10.5281/zenodo.8031631
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Haberland
    given-names: Matt
    orcid: "https://orcid.org/0000-0003-4806-3601"
  date-published: 2023-06-24
  doi: 10.21105/joss.05092
  issn: 2475-9066
  issue: 86
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 5092
  title: Fast Resampling and Monte Carlo Methods in Python
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.05092"
  volume: 8
title: Fast Resampling and Monte Carlo Methods in Python

If the repository is not hosted on GitHub, a .cff file can still be uploaded to set your preferred citation. Users will be able to manually copy and paste the citation.

Find more information on .cff files here and here.

@editorialbot
Copy link
Collaborator Author

🐘🐘🐘 👉 Toot for this paper 👈 🐘🐘🐘

@editorialbot
Copy link
Collaborator Author

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

  1. Check final PDF and Crossref metadata that was deposited 👉 Creating pull request for 10.21105.joss.05092 joss-papers#4344
  2. Wait a couple of minutes, then verify that the paper DOI resolves https://doi.org/10.21105/joss.05092
  3. If everything looks good, then close this review issue.
  4. Party like you just published a paper! 🎉🌈🦄💃👻🤘

Any issues? Notify your editorial technical team...

@editorialbot editorialbot added accepted published Papers published in JOSS labels Jun 24, 2023
@editorialbot
Copy link
Collaborator Author

🎉🎉🎉 Congratulations on your paper acceptance! 🎉🎉🎉

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](https://joss.theoj.org/papers/10.21105/joss.05092/status.svg)](https://doi.org/10.21105/joss.05092)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.05092">
  <img src="https://joss.theoj.org/papers/10.21105/joss.05092/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: https://joss.theoj.org/papers/10.21105/joss.05092/status.svg
   :target: https://doi.org/10.21105/joss.05092

This is how it will look in your documentation:

DOI

We need your help!

The Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:

@mdhaber
Copy link

mdhaber commented Jun 24, 2023

Thanks, everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Meson published Papers published in JOSS Python recommend-accept Papers recommended for acceptance in JOSS. review Starlark Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning
Projects
None yet
Development

No branches or pull requests

6 participants