Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: ddtlcm: An R package for overcoming weak separation in Bayesian latent class analysis via tree-regularization #6220

Open
editorialbot opened this issue Jan 12, 2024 · 30 comments
Assignees
Labels
R review TeX Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning waitlisted Submissions in the JOSS backlog due to reduced service mode.

Comments

@editorialbot
Copy link
Collaborator

editorialbot commented Jan 12, 2024

Submitting author: @limengbinggz (Mengbing Li)
Repository: https://github.com/limengbinggz/ddtlcm
Branch with paper.md (empty if default branch): main
Version: 0.1.2
Editor: @Nikoleta-v3
Reviewers: @jamesuanhoro, @larryshamalama
Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/2cffa48f57384769d086069f824fd020"><img src="https://joss.theoj.org/papers/2cffa48f57384769d086069f824fd020/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/2cffa48f57384769d086069f824fd020/status.svg)](https://joss.theoj.org/papers/2cffa48f57384769d086069f824fd020)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@jamesuanhoro & @larryshamalama, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review.
First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @Nikoleta-v3 know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @larryshamalama

📝 Checklist for @jamesuanhoro

@editorialbot
Copy link
Collaborator Author

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

Software report:

github.com/AlDanial/cloc v 1.88  T=0.03 s (1033.3 files/s, 170689.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
R                               24            322           1377           1629
XML                              1              0            184           1513
Markdown                         5             96              0            259
TeX                              2              8              0            105
Rmd                              2            120            160            102
YAML                             2             11              6             55
-------------------------------------------------------------------------------
SUM:                            36            557           1727           3663
-------------------------------------------------------------------------------


gitinspector failed to run statistical information for the repository

@editorialbot
Copy link
Collaborator Author

Wordcount for paper.md is 1596

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.18637/jss.v061.i13 is OK
- 10.18637/jss.v042.i10 is OK
- 10.1109/TPAMI.2014.2313115 is OK
- 10.1080/03610918.2012.718840 is OK

MISSING DOIs

- 10.1093/oso/9780198526155.003.0042 may be a valid DOI for title: Density modeling and clustering using Dirichlet diffusion trees

INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@Nikoleta-v3
Copy link

Hey @jamesuanhoro, @larryshamalama this is the review thread for the paper. All of our communications will happen here from now on.

As a reviewer, the first step is to create a checklist for your review by entering

@editorialbot generate my checklist

as the top of a new comment in this thread.

These checklists contain the JOSS requirements ✅ As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention #6220 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@Nikoleta-v3) if you have any questions/concerns. 😄 🙋🏻

@Nikoleta-v3
Copy link

@limengbinggz, could you please add the DOI for one of your references? See: #6220 (comment)

@larryshamalama
Copy link

larryshamalama commented Jan 12, 2024

Review checklist for @larryshamalama

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the https://github.com/limengbinggz/ddtlcm?
  • License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@limengbinggz) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
  • Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
  • Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
  • Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

@larryshamalama
Copy link

Hi @limengbinggz, great software and great method! From one fellow biostats PhD to another, congratulations on your hard work :)

Below are my questions/comments, on top of some other quick things that I am opening as issue(s)/PR(s) in your repo

1. Weak Separation and State of the Field

State of the field: Do the authors describe how this software compares to other commonly-used packages?

It took me a while to understand what is meant by "weak separation", which is the focal point of this work. Perhaps this is because I am not so familiar with Bayesian tree-based methods... I was initially not sure if using other software that you mentioned (e.g. poLCA, BayesLCA, randomLCA) would yield worse results on your simulated data but Figure 3 in your methods paper seems to confirm this. I am thinking that it would be beneficial to clarify this result here since it would motivate further why we should use your package and not the other ones. What do you think?

Minor comment: lines 21, 22: "classes that share proximity to one another in the tree are shrunk towards ancestral classes a priori" Do you think that you can massage this a bit? I'm not sure if I fully understand, but this sentence seems importance since it highlights on a higher level what this method is doing under the food (re: summary bullet point above).

2. 50 burn-in, 100 posterior draws

In your example, you seem to use 50 burn-in draws and 100 posterior draws. Is that sufficient? As a user, how would I know when the MCMC converges with your software? If I am thinking of conventional MCMC-ing, these seem like low numbers, especially that, in your example, you use $N = 496$ which is larger than the total number of draws. In section 5.1 of your methods paper, you seem to use much more draws in the Gibbs sampler (7,000 and 12,000).

3. Singleton node warnings

I am getting many In checkTree(object) : Tree contains singleton nodes. when running your quickstart. I imagine that this means that, for a specific category, there is only one member, which can happen, but just wanted to double check that this can be normal/okay. Can you briefly comment on this?

@jamesuanhoro
Copy link

jamesuanhoro commented Feb 22, 2024

Review checklist for @jamesuanhoro

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the https://github.com/limengbinggz/ddtlcm?
  • License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@limengbinggz) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
  • Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
  • Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
  • Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Additional notes

Hello @limengbinggz:

  • First, this is extensive work :)
  • I have created some issues, I am open to discussion about them
  • I have also made some pull requests re minor suggestions for the code
  • I tried to reference the issues and PR below.
  • I also noticed the warnings @larryshamalama noted.

@Nikoleta-v3
Copy link

Hey @limengbinggz 👋🏻 did you get a chance to look over the comments/issues that the reviewers raised? 😄

@Nikoleta-v3
Copy link

👋🏻 @limengbinggz

@limengbinggz
Copy link

👋🏻 @limengbinggz

Thank for for checking in. We are working on incorporating the reviewers' comments into the revision, and will push to the repo when we are ready. Thanks for waiting.

@Nikoleta-v3
Copy link

Thank you for the update!

@limengbinggz
Copy link

@limengbinggz, could you please add the DOI for one of your references? See: #6220 (comment)

Thank you for pointing out. We have added the DOI for the reference. The paper is updated in commit
4f9a157

@limengbinggz
Copy link

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@limengbinggz
Copy link

limengbinggz commented Mar 25, 2024

Hi @limengbinggz, great software and great method! From one fellow biostats PhD to another, congratulations on your hard work :)

Below are my questions/comments, on top of some other quick things that I am opening as issue(s)/PR(s) in your repo

1. Weak Separation and State of the Field

State of the field: Do the authors describe how this software compares to other commonly-used packages?

It took me a while to understand what is meant by "weak separation", which is the focal point of this work. Perhaps this is because I am not so familiar with Bayesian tree-based methods... I was initially not sure if using other software that you mentioned (e.g. poLCA, BayesLCA, randomLCA) would yield worse results on your simulated data but Figure 3 in your methods paper seems to confirm this. I am thinking that it would be beneficial to clarify this result here since it would motivate further why we should use your package and not the other ones. What do you think?

Minor comment: lines 21, 22: "classes that share proximity to one another in the tree are shrunk towards ancestral classes a priori" Do you think that you can massage this a bit? I'm not sure if I fully understand, but this sentence seems importance since it highlights on a higher level what this method is doing under the food (re: summary bullet point above).

2. 50 burn-in, 100 posterior draws

In your example, you seem to use 50 burn-in draws and 100 posterior draws. Is that sufficient? As a user, how would I know when the MCMC converges with your software? If I am thinking of conventional MCMC-ing, these seem like low numbers, especially that, in your example, you use N=496 which is larger than the total number of draws. In section 5.1 of your methods paper, you seem to use much more draws in the Gibbs sampler (7,000 and 12,000).

3. Singleton node warnings

I am getting many In checkTree(object) : Tree contains singleton nodes. when running your quickstart. I imagine that this means that, for a specific category, there is only one member, which can happen, but just wanted to double check that this can be normal/okay. Can you briefly comment on this?

Thank you very much for reviewing the software as well as the methods paper!

  1. Thanks for pointing out the problem in writing. As you suggested, we have added a clarifying sentence to specifically point out why the existing packages fail, by referring to the results in our method paper: "These phenomena have been demonstrated using poLCA and BayesLCA, which are more relevant to our stated problem of weak class separation, in Figures 2 and 3 of @li2023tree."

As for the sentence regarding how classes are shrunk, we have clarified this point by changing the original sentence into "classes that are closer to one another in the binary tree are encouraged to share more similar profiles, and their profiles are shrunk towards their common ancestral classes \textit{a priori}, with the degree of shrinkage varying across pre-specified item groups defined thematically with clinical significance. "

  1. I definitely agree that 100 posterior draws are far from sufficient. However, the sampling algorithm implemented in this package is quite computationally intensive. For the sake of time and for the purpose of brief illustration, we want to demonstrate the functionality of the sampling function ddtlcm_fit. But to your good point that to provide a comprehensive view of a full posterior chain, we have added a pre-saved dataset named "result_diet_1000iters" with 1000 posterior samples, to accompany the package. In the paper, we have added a sentence to clarify this point in the last paragraph of section "Model Fitting": "To have a more comprehensive view of of the results obtained from 1000 posterior draws, we can load data named result_diet_1000iters to perform posterior summaries as described by the steps in the following sections."

  2. Thank you for pointing this out. In fact, the trees we are dealing with in our model all have singleton nodes by design. This warning message originates from the checkPhylo4 function in the phylobase package to perform basic checks on the validity of S4 phylogenetic objects, where phylogenetic trees usually want to avoid singleton nodes. Therefore, this warning shall not be concerning. To clarify this point, we have added the following Note section in README of the package:

  • When running some functions in the package, such as ddtlcm_fit, a warning that "Tree contains singleton nodes" may be displayed. This warning originates from the checkPhylo4 function in the phylobase package to perform basic checks on the validity of S4 phylogenetic objects. We would like to point out that seeing such warnings shall not pose concerns about the statistical validity of the implemented algorithm. This is because by definite of the DDT process, any tree generated from a DDT process contains a singleton node (having only one child node) as the root node. To avoid repeated appearances of this warning, we recommend either of the followings:

    • Wrapping around the code via suppressWarnings({ code_that_will_generate_singleton_warning });

    • Setting options(warn = -1) globally. This may be dangerous because other meaningful warnings may be ignored.

@limengbinggz
Copy link

limengbinggz commented Mar 26, 2024

Hi @Nikoleta-v3 @larryshamalama @jamesuanhoro 👋🏻 Thank you both for reviewing our submission and the nice comments! We have responded to all issues. We will appreciate it if you could take a look at our responses and let us know if further improvements are needed. Thank you!

@limengbinggz
Copy link

@larryshamalama @jamesuanhoro 👋🏻 Have you got a chance to review our responses? 😄 @Nikoleta-v3

@larryshamalama
Copy link

@larryshamalama @jamesuanhoro 👋🏻 Have you got a chance to review our responses? 😄 @Nikoleta-v3

I'm happy with the responses! Thanks for addressing them :)

@limengbinggz
Copy link

limengbinggz commented Apr 22, 2024

@larryshamalama @jamesuanhoro 👋🏻 Have you got a chance to review our responses? 😄 @Nikoleta-v3

I'm happy with the responses! Thanks for addressing them :)

Thank you for your reply and all the suggestions! Would you mind completing the checklist once you've got time? Thanks! @larryshamalama

@jamesuanhoro
Copy link

jamesuanhoro commented Apr 22, 2024 via email

@larryshamalama
Copy link

@larryshamalama @jamesuanhoro 👋🏻 Have you got a chance to review our responses? 😄 @Nikoleta-v3

I'm happy with the responses! Thanks for addressing them :)

Thank you for your reply and all the suggestions! Would you mind completing the checklist once you've got time? Thanks! @larryshamalama

done!

@limengbinggz
Copy link

@larryshamalama @jamesuanhoro 👋🏻 Have you got a chance to review our responses? 😄 @Nikoleta-v3

I'm happy with the responses! Thanks for addressing them :)

Thank you for your reply and all the suggestions! Would you mind completing the checklist once you've got time? Thanks! @larryshamalama

done!

Thank you thank you!

@jamesuanhoro
Copy link

@larryshamalama @jamesuanhoro 👋🏻 Have you got a chance to review our responses? 😄 @Nikoleta-v3

Done, happy with the responses to my comments :). And completed my checklist.
James.

@limengbinggz
Copy link

@larryshamalama @jamesuanhoro 👋🏻 Have you got a chance to review our responses? 😄 @Nikoleta-v3

Done, happy with the responses to my comments :). And completed my checklist. James.

Thank you so much!!

@Nikoleta-v3
Copy link

Thank you to both reviewers for your time and efforts! @limengbinggz, please give me one week to also have a final look over the submission, and then we can move forward to the next steps!

@Nikoleta-v3
Copy link

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R review TeX Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning waitlisted Submissions in the JOSS backlog due to reduced service mode.
Projects
None yet
Development

No branches or pull requests

5 participants