New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: quanteda: An R package for the quantitative analysis of textual data #774

Closed
whedon opened this Issue Jun 11, 2018 · 33 comments

Comments

Projects
None yet
5 participants
@whedon
Collaborator

whedon commented Jun 11, 2018

Submitting author: @kbenoit (Kenneth Benoit)
Repository: https://github.com/quanteda/quanteda
Version: v1.3
Editor: @arfon
Reviewer: @lmullen, @borishejblum, @alexgarciac
Archive: 10.5281/zenodo.1447219

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/40b988ba4827f8fdc07a29351c2f74b8"><img src="http://joss.theoj.org/papers/40b988ba4827f8fdc07a29351c2f74b8/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/40b988ba4827f8fdc07a29351c2f74b8/status.svg)](http://joss.theoj.org/papers/40b988ba4827f8fdc07a29351c2f74b8)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@lmullen & @borishejblum & @alexgarciac, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @arfon know.

Review checklist for @lmullen

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: Does the release version given match the GitHub release (v1.2)?
  • Authorship: Has the submitting author (@kenbenoit) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

Review checklist for @borishejblum

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: Does the release version given match the GitHub release (v1.2)?
  • Authorship: Has the submitting author (@kenbenoit) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

Review checklist for @alexgarciac

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: Does the release version given match the GitHub release (v1.2)?
  • Authorship: Has the submitting author (@kenbenoit) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
@whedon

This comment has been minimized.

Collaborator

whedon commented Jun 11, 2018

Hello human, I'm @whedon. I'm here to help you with some common editorial tasks. @lmullen, it looks like you're currently assigned as the reviewer for this paper 🎉.

⭐️ Important ⭐️

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands
@whedon

This comment has been minimized.

Collaborator

whedon commented Jun 11, 2018

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Collaborator

whedon commented Jun 11, 2018

@arfon

This comment has been minimized.

Member

arfon commented Jul 3, 2018

@lmullen & @alexgarciac - please remember to review this package when you get a chance.

@borishejblum

This comment has been minimized.

Collaborator

borishejblum commented Jul 3, 2018

My comments are here quanteda/quanteda#1393

@kbenoit

This comment has been minimized.

Collaborator

kbenoit commented Jul 17, 2018

Just to flag this for the reviewers, in case it makes a difference: Since submission, we have updated the CRAN version to 1.3.4. Looking forward to your reviews.

@arfon

This comment has been minimized.

Member

arfon commented Jul 22, 2018

@lmullen @alexgarciac - do you think you could both complete your reviews this week?

@lmullen

This comment has been minimized.

Collaborator

lmullen commented Jul 23, 2018

@kbenoit

This comment has been minimized.

Collaborator

kbenoit commented Aug 1, 2018

Looking forward to your comments @lmullen @alexgarciac. 😄

@lmullen

This comment has been minimized.

Collaborator

lmullen commented Aug 14, 2018

@arfon I am working on this review. Technically I have a conflict of interest according the the JOSS policies, since Ken and I published a JOSS paper together earlier this year. I had assumed that JOSS already knew about this, but I guess not. I do not think this COI should preclude me from submitting my review, but that's your editorial call. What do you think?

@arfon

This comment has been minimized.

Member

arfon commented Aug 14, 2018

@arfon I am working on this review. Technically I have a conflict of interest according the the JOSS policies, since Ken and I published a JOSS paper together earlier this year. I had assumed that JOSS already knew about this, but I guess not. I do not think this COI should preclude me from submitting my review, but that's your editorial call. What do you think?

Thanks for disclosing this @lmullen. As we have multiple reviewers here I'm happy for you to proceed.

@arfon

This comment has been minimized.

Member

arfon commented Sep 10, 2018

👋 @alexgarciac @lmullen - when do you think you might be able to complete your review by?

@lmullen

This comment has been minimized.

Collaborator

lmullen commented Sep 10, 2018

@lmullen

This comment has been minimized.

Collaborator

lmullen commented Sep 14, 2018

quanteda is an impressive package, well thought out and well implemented. The package is well documented and the test suite is quite extensive.

On the test suite:

  • The test suite emits numerous warnings. Some of these can obviously be ignored by comparing what is being tested to the warning message. Others, it is not clear why a warning message is being given.
  • I found at least three test failures, all relating to textplot_wordcloud, with all suggested packages installed.
  • I installed all the suggested packages listed in the DESCRIPTION. Should the austin package also be listed as a suggested package? Should the wordcloud package?

JOSS specific comments:

  • The repository does not have "a plain-text LICENSE file" so I have not checked that box. However, following R package convention the license is indicated in the DESCRIPTION, and I don't think there is a need for a separate file.
  • The version in the repository (1.3.7) is several minor releases ahead of the published CRAN version (1.3.4).
  • The JOSS paper (at least, based on the PDF proof and on the BibTeX file) does not have DOIs for a number of the articles, though a spot-check confirms that those article have been assigned DOI.

Comments on the text of the paper:

  • "on entirely sparse operations" should be "entirely on sparse operations"
  • "quanteda serializes tokens into integers." Is serialization what is happening, or would "hashes" be a better verb?
  • "make use of the sparseness document-feature matrices": word missing?
  • Describing families of functions as "textplot" (in quotation marks) seems odd to me. It might be clearer that those are prefixes for the functions by writing something like textplot_*. That format is used later on in the design section.
  • "These materials help beginner users to understand how to use these functions for basic operations, and expert users to combine the functions for advanced text processing and analysis" should be "These materials help beginner users understand how to use these functions for basic operations and expert users combine the functions for advanced text processing and analysis"
  • "process large textual data that are difficult for other R packages (such as computing distances or scoring collocations)": This is a misplaced modifier. The "such as" describes analytical computations, not "large textual data." I'd recommend a revision like "perform analyses on large textual data ... such as"
  • The readtext package should have a citation like the others, even though it is by the same authors.
  • There are a number of small typographical errors in the PDF proof that should be checked.
@arfon

This comment has been minimized.

Member

arfon commented Sep 16, 2018

  • The repository does not have "a plain-text LICENSE file" so I have not checked that box. However, following R package convention the license is indicated in the DESCRIPTION, and I don't think there is a need for a separate file.

@kbenoit - please add a plain text LICENSE file that also has a copy of the license. You will then need to tell R CMD check to ignore this by adding LICENSE to your .Rbuildignore file

@kbenoit

This comment has been minimized.

Collaborator

kbenoit commented Sep 17, 2018

Thanks @arfon, that's easily done and we are happy to work on the other comments and the statement of need. I can also eliminate the local CHECK failures that are caused by tests that try to open graphics devices. Should I consider the reviews complete now, and proceed with the revision?

@arfon

This comment has been minimized.

Member

arfon commented Sep 17, 2018

Should I consider the reviews complete now, and proceed with the revision?

Yes, please go ahead.

kbenoit added a commit to quanteda/quanteda that referenced this issue Sep 19, 2018

Address @lmullen JOSS review comments
From openjournals/joss-reviews#774 (comment)

- Adds `LICENSE` to `.Rbuildignore`
- Adds package citations for **readtext** and **spacyr**
- Fixes awkward or misleading sentences
- Updates a few URLs that have been updated since initial submission.
- Fixes small typos found in the pdf.
@kbenoit

This comment has been minimized.

Collaborator

kbenoit commented Sep 19, 2018

Please see our revised submission in quanteda/quanteda#1431 in the branch joss-review-response.

@lmullen you are right that the local tests fail (on macOS) anyway for textplot_wordcloud() and this is due to the tests trying to open a graphics device. It does not occur on CI or CRAN, and we will fix this slightly annoying result soon. (quanteda/quanteda#1427)

The warnings from the tests are usually tests of our own warnings, but we will try to clean these up too (quanteda/quanteda#1430).

@alexgarciac we are eager to receive your comments as well, if you have any not already covered by the other two reviewers and @arfon.

@kbenoit

This comment has been minimized.

Collaborator

kbenoit commented Oct 2, 2018

@arfon what are the next steps?

@arfon

This comment has been minimized.

Member

arfon commented Oct 2, 2018

@lmullen, @borishejblum - could you both confirm that the changes @kbenoit has made in quanteda/quanteda#1431 have addressed your feedback?

@lmullen

This comment has been minimized.

Collaborator

lmullen commented Oct 2, 2018

@arfon Yes, I will do this soon.

@borishejblum

This comment has been minimized.

Collaborator

borishejblum commented Oct 2, 2018

@arfon @kbenoit it seems that my comment regarding the definition of who the target audience is has not been addressed (or did I miss something ?).
Thanks

@kbenoit

This comment has been minimized.

Collaborator

kbenoit commented Oct 2, 2018

@borishejblum We’d be happy to add that. We’re talking to the article or to the README.md on the website? (Or both?)

@borishejblum

This comment has been minimized.

Collaborator

borishejblum commented Oct 3, 2018

@kbenoit my understanding of JOSS requirements would be both.
Thanks

@kbenoit

This comment has been minimized.

Collaborator

kbenoit commented Oct 3, 2018

@borishejblum

This comment has been minimized.

Collaborator

borishejblum commented Oct 3, 2018

Thanks @kbenoit. All my comments have now been succesfully adressed by quanteda's authors.

@lmullen

This comment has been minimized.

Collaborator

lmullen commented Oct 5, 2018

I've revised the diffs of the changes and re-run tests, R CMD check, and so on. The improvements to the paper have been made, and the improvements to the local tests in particular make a huge difference for checking the package. I'm satisfied that the changes have been made and the paper can be accepted.

Very much looking forward to citing this paper as I continue to use quanteda in my own work. Congratulations to @kbenoit and team on such fantastic software.

@arfon

This comment has been minimized.

Member

arfon commented Oct 5, 2018

@kbenoit - At this point could you make an archive of the reviewed software in Zenodo/figshare/other service and update this thread with the DOI of the archive? I can then move forward with accepting the submission.

@arfon arfon added the accepted label Oct 5, 2018

@kbenoit

This comment has been minimized.

Collaborator

kbenoit commented Oct 5, 2018

Thanks @arfon (and once again, to all the reviewers for your excellent and supportive comments.

I've updated the release to v1.3.10 (also under CRAN review) and Zenodo has updated the DOI to https://doi.org/10.5281/zenodo.1447219.

@arfon

This comment has been minimized.

Member

arfon commented Oct 6, 2018

@whedon set 10.5281/zenodo.1447219 as archive

@whedon

This comment has been minimized.

Collaborator

whedon commented Oct 6, 2018

OK. 10.5281/zenodo.1447219 is the archive.

@arfon

This comment has been minimized.

Member

arfon commented Oct 6, 2018

@lmullen, @borishejblum - many thanks for your reviews here

@kbenoit - your paper is now accepted into JOSS and your DOI is https://doi.org/10.21105/joss.00774 ⚡️ 🚀 💥

@arfon arfon closed this Oct 6, 2018

@whedon

This comment has been minimized.

Collaborator

whedon commented Oct 6, 2018

🎉🎉🎉 Congratulations on your paper acceptance! 🎉🎉🎉

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](http://joss.theoj.org/papers/10.21105/joss.00774/status.svg)](https://doi.org/10.21105/joss.00774)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.00774">
  <img src="http://joss.theoj.org/papers/10.21105/joss.00774/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: http://joss.theoj.org/papers/10.21105/joss.00774/status.svg
   :target: https://doi.org/10.21105/joss.00774

This is how it will look in your documentation:

DOI

We need your help!

Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment