Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: pyPLNmodels: A Python package to analyse multivariate high-dimensional count data #6969

Open
editorialbot opened this issue Jul 9, 2024 · 31 comments
Assignees
Labels
review Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning

Comments

@editorialbot
Copy link
Collaborator

editorialbot commented Jul 9, 2024

Submitting author: @Bastien-mva (Bastien Batardiere)
Repository: https://github.com/PLN-team/pyPLNmodels.git
Branch with paper.md (empty if default branch): joss
Version: 0.0.78
Editor: @likeajumprope
Reviewers: @LingfengLuo0510, @mrazomej
Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/1f6f471f36670700a9d70ccd03654038"><img src="https://joss.theoj.org/papers/1f6f471f36670700a9d70ccd03654038/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/1f6f471f36670700a9d70ccd03654038/status.svg)](https://joss.theoj.org/papers/1f6f471f36670700a9d70ccd03654038)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@LingfengLuo0510 & @mrazomej, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review.
First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @likeajumprope know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @mrazomej

@editorialbot
Copy link
Collaborator Author

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.3389/fevo.2021.588292 is OK
- 10.25080/majora-92bf1922-011 is OK
- 10.5281/zenodo.5765804 is OK
- 10.1093/biomet/76.4.643 is OK
- 10.1214/18-aoas1177 is OK

MISSING DOIs

- No DOI given, and none found for title: PyTorch: An Imperative Style, High-Performance Dee...
- No DOI given, and none found for title: gllvm - Fast analysis of multivariate abundance da...
- No DOI given, and none found for title: Zero-inflation in the Multivariate Poisson Lognorm...
- No DOI given, and none found for title: The NLopt nonlinear-optimization package

INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

Software report:

github.com/AlDanial/cloc v 1.90  T=0.09 s (702.9 files/s, 170255.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          30           1380           3087           6043
CSV                             10              0              0           1948
Markdown                         3             65              0            287
R                                2             30              9            171
TeX                              1             12              0             85
YAML                             2              8             17             79
Jupyter Notebook                 1              0           1026             64
TOML                             1              6              8             53
DOS Batch                        1              8              1             26
reStructuredText                 7             13             56             17
Bourne Shell                     1              2              0             13
make                             1              4              7              9
-------------------------------------------------------------------------------
SUM:                            60           1528           4211           8795
-------------------------------------------------------------------------------

Commit count by author:

   636	bastien-mva
    11	Julien Chiquet
     9	Jean-Benoist Leger
     1	Joon Kwon

@editorialbot
Copy link
Collaborator Author

Paper file info:

📄 Wordcount for paper.md is 1064

✅ The paper includes a Statement of need section

@editorialbot
Copy link
Collaborator Author

License info:

✅ License found: MIT License (Valid open source OSI approved license)

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@likeajumprope
Copy link

@LingfengLuo0510 @mrazomej this is the issue for the review! Let me know if you have any questions - more info can also be found here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html

@Bastien-mva
Copy link

@editorialbot generate pdf

Figures were not displayed.

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@Bastien-mva
Copy link

@editorialbot generate pdf

nicer color in figure.

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@Bastien-mva
Copy link

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@mrazomej
Copy link

mrazomej commented Aug 5, 2024

Review checklist for @mrazomej

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the https://github.com/PLN-team/pyPLNmodels.git?
  • License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@Bastien-mva) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
  • Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
  • Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
  • Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

@mrazomej
Copy link

mrazomej commented Aug 5, 2024

First, I must apologize for the delayed comments on this submission.

There are two main comments I would like to make about the state of the software and the paper.

  1. The software documentation needs significant work. As it stands, no user without previous experience with the methods offered by the package--and I would say even with previous knowledge of the methods--could get started with the information provided. There are no instructions on how to get started (a detailed quickstart section would be greatly appreciated) or the differences between the different models. The documentation only contains the automatically generated API information from the function docstrings. Even the quickstart section on the README.md file does not explain anything being done.
    1.a. On that note, between the paper and the documentation, it was not clear to me what the software's capabilities are. I felt that without reading the previous publication I stand no chance of understanding what your package can do, even though I have enough background knowledge on related areas. As the JOSS guidelines indicate:

Documentation

There should be sufficient documentation for you, the reviewer, to understand the core functionality of the software under review. A README file (or equivalent) should include a high-level overview of this documentation.
  1. The paper also needs significant work. For example, the Summary section does not feel like an actual summary of the functionality of the software. @likeajumprope can correct me if I'm wrong, but I think of this section as the Abstract of the paper, where one can quickly assess whether the paper is relevant or not for one's intentions. Even the JOSS guidelines indicate that the paper should include:
A summary describing the high-level functionality and purpose of the software for a diverse, non-specialist audience.

2.a Fig. 2 does not have a caption that explains, among other things, what the legend labels are. Furthermore, when doing this kind of comparison, it would be handy to highlight which of the curves are directly related to your software. Something that can be added in a detailed caption or in the legend, something like model name (ours) to guide the reader.

These are my first two initial comments. These changes are necessary for me to properly assess the software and the entire submission, as I need the most basic tools to understand the software's core functionality. From what I can get from glancing at the previous publications, this is an exciting piece of software. But my understanding is that, as a JOSS publication, it needs to exist as a stand-alone piece of work where users (and reviewers) can utilize the software solely from the paper and the documentation information.

@Bastien-mva
Copy link

Thank you for your comments and suggestions. We have carefully
considered them and made the necessary amendments to our manuscript and its
accompanying documentation:

Documentation:

  • The README.md file has been enriched with comprehensive explanations, including the differences between the various models offered by the package. It is also included in the documentation.
  • We have included a link to a more comprehensive notebook (getting_started.ipynb) in the README.md that provides further explanations.

Manuscript:

  • The Summary section has been revised to provide a clear and concise overview of the software's functionality and purpose. Nevertheless, we have made the assumption that readers are familiar with Poisson and Gaussian distributions.
  • Figure 1 has been updated to include a caption that explains the legend labels. For Figure 2, we mentioned which curves are related to our software. We have ensured that the models implemented within our package are emphasized.

We hope these revisions enhance the clarity and comprehensibility of our work.

Thank you for your time.

@Bastien-mva
Copy link

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@Bastien-mva
Copy link

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@mrazomej
Copy link

@Bastien-mva , thanks for the changes. In my opinion, the software is much better.
@likeajumprope, I have completed my checklist.

@LingfengLuo0510
Copy link

Sorry for the delayed comments

Here are my comments:
(1) Fig2 compares the running time of different packages, can you provide the code used to generate Figure 2.
(2) Fig2 shows the fast computaiton speed of your porposed approach, it would be beneficial to include additional performance metrics.
(3) Line 21: Consider rephrasing to "necessitating dimension reduction techniques suitable for count data."
(4) Line 25: Remove the unnecessary brackets.
(5) Line 34-37: Double-check the punctuation in lines 34-37
(6) When discussing GPU and CPU calculations for time comparison, please specify the exact machine type (e.g., GPU model, CPU cores) used.
(7) Increase the font size in Figure 1 for better readability.
(8) line 47: Verify the accuracy of the citation.
(9) At the beginning of lines 52 and 56, consider adding a few sentences to introduce the other packages mentioned.
(10) Rephrase the sentences in lines 72-76 for improved clarity and coherence.
(11) Line 80, at the begining of a paragraph, make it more detailed.
(12) Line 80: Choose between "maximizing" and "to maximize" in line 80 based on the intended meaning.
(13) Line 80, "Bound" or "BOund"?
(14) Maintain a consistent citation style throughout the paper, including journal names, volume numbers, and issue numbers.

@Bastien-mva
Copy link

Thanks you for your comments. We have carefully considered them and made the
necessary amendments to our manuscript:
(1) The code used to generate Figure 2 is available in the benchmark folder
of the joss branch. Only the dataset must be downloaded and is available here https://zenodo.org/record/5765804/files/2k_cell_per_study_10studies.tar.bz2?download=1.
(2) We have considered incorporating figures showing the
Root Mean Squared Error (RMSE) between the estimated parameters and the true
parameters, for different number of samples $n$ and dimension $p$. However, we
do not think this is necessary as the performance claim
of this paper focus mainly on the computational efficiency of the proposed
method. Moreover, the RMSE is systematically verified each time the package is uploaded to PyPI via automated testing procedures (see
https://github.com/PLN-team/pyPLNmodels/blob/main/tests/test_common.py).
(13) It is "Evidence Lower BOund", to make the acronym "ELBO".
(14) We have maintained a uniform citation
format, which includes the title, journal, and
DOI.

All points not explicitly mentioned were
satisfactorily addressed.

@mrazomej @LingfengLuo0510,
We have rectified an error in the description of
the optimization process in GLLVM. The
optimization actually utilizes the TMB https://cran.r-project.org/web/packages/TMB/index.html library,
as opposed to an alternate method previously
mentioned. The necessary amendments have been
made in the paper.

@Bastien-mva
Copy link

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@LingfengLuo0510
Copy link

@Bastien-mva
My comments have been addressed sufficiently.
Just two more minor things:
(1) Do we need to capitalize the "O" in “BOund”?
(2) Line 115, the citation style is not consistent (no link provided).

Other than that, good to go here. @likeajumprope

@Bastien-mva
Copy link

I have added the link in the bibliography and corrected the capitalization in 'Bound'.

@Bastien-mva
Copy link

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@Bastien-mva
Copy link

@likeajumprope I believe I have addressed all the comments. Is there anything further I need to do at this stage?

@likeajumprope
Copy link

@likeajumprope I believe I have addressed all the comments. Is there anything further I need to do at this stage?

No thanks, I will have a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning
Projects
None yet
Development

No branches or pull requests

5 participants