Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: Graph Transliterator: a graph-based transliteration tool #1717

Open
whedon opened this issue Sep 7, 2019 · 14 comments

Comments

@whedon
Copy link
Collaborator

commented Sep 7, 2019

Submitting author: @seanpue (A. Sean Pue)
Repository: https://github.com/seanpue/graphtransliterator
Version: v0.2.7
Editor: @gkthiruvathukal
Reviewer: @rlskoeser, @vc1492a
Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/a9ae921a0402eadf367b888c96d6cfca"><img src="https://joss.theoj.org/papers/a9ae921a0402eadf367b888c96d6cfca/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/a9ae921a0402eadf367b888c96d6cfca/status.svg)](https://joss.theoj.org/papers/a9ae921a0402eadf367b888c96d6cfca)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@rlskoeser & @vc1492a, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @gkthiruvathukal know.

Please try and complete your review in the next two weeks

Review checklist for @rlskoeser

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@seanpue) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @vc1492a

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@seanpue) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 7, 2019

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @rlskoeser, @vc1492a it looks like you're currently assigned to review this paper 🎉.

⭐️ Important ⭐️

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 7, 2019

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 7, 2019

@vc1492a

This comment has been minimized.

Copy link
Collaborator

commented Sep 13, 2019

Overview: Of the JOSS reviews I have been a part of, this project has (to date) been the most well-documented and polished in terms of code readability and test coverage which is great to see. I think the author(s) did a good job on the software and in providing examples and documentation in order to get started easily. The software is easy to use and I have even shared the software with some of my colleagues who will be exploring its usability in some of our team's projects - I think it could have some utility! Very much appreciated the thorough documentation in the code and tests written to verify functionality as well as with how to contribute, and this will go a long way towards communicating the work but also inviting open-source contributions. I have included some commentary on checkboxes below.

Installation and Installation Instructions: The project documentation provide instructions for installation via PyPi or by cloning the Github repository. After examining both setup.py, requirements.txt, and the source code from the project repository and attempting an installation in my environment, the installation completed successfully however when using import graphtransliterator the import failed due to marshmallow being a undefined import. It is one of the project's dependencies and is noted in requirements.txt, but it was not installed during installation as it was missing from a list of requirements in setup.py. I have corrected this minor error and have provided the fix in this pull request. It may help to also have the project's dependencies noted somewhere in the documentation beyond requirements.txt in the project repository, but I don't think it is a high priority. It may also be nice to list which specific versions of Python 3 have support and are tested for in readme.md and/or in the documentation - in this case, Python versions 3.5 through 3.7. The version numbers are listed in the contribution guidelines, but that's the only location I see the Python versions mentioned outside of configuration files.

Functionality: I have verified the functionality presented through the examples presented in the documentation, and the software's functionality operates as the author describes. The ambiguity checking is a particularly nice feature which could perhaps be used as a verification check if automatically generated transliteration rules. I also liked how the author provided a means to expose the underlying graph of the transliteration rules to the user for examination - whether through visualization our outside analyses. I currently do not have my own use case in which to test the software outside of the examples presented in the documentation - the editor is free to determine whether a more in-depth look at the functionality would be beneficial as part of the review.

Automated Tests: I was able to run the automated tests easily, and the tests are well documented which allowed me to quickly understand their purpose and which functionality in the software they are designed to test. I did notice a bit of unused tests which I removed in the aforementioned pull request just to clean things up a little bit.

References: The author did a good job of pointing out related software with citation and noting the differences between GraphTransliterator and those software. I did however expect some citations for the following sentences:

It [transliteration] enables the standardized organization and search of resources, as in library
systems. It also permits the encoding of additional information, which enables disambiguation
and advanced linguistic analysis, including natural language processing tasks, that are often
not possible in the original script.

A reader which is new to the field may want to explore works in which transliteration is used in these areas - citations would provide this background and ground the paper/software under review with some real-world applications. Providing some references here would provide a more complete paper around the software and its use cases.

@seanpue

This comment has been minimized.

Copy link

commented Sep 17, 2019

@whedon generate pdf

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 17, 2019

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 17, 2019

@seanpue

This comment has been minimized.

Copy link

commented Sep 17, 2019

Thanks @vc1492a for your generous and helpful review!

I have made the following changes to the oode and documentation:

  • Fixed installation issues (using your pull request)
  • Added list of dependencies to docs/installation.rst under subsection "Required Modules"
  • Added badge for supported Python versions to README.rst

I have also added references and made changes to the first paragraph of the paper following your suggestion to better elaborate the use cases of transliteration.

I hope that resolves the issues. Thanks so much. (P.S. Would love to hear later about the potential usages!)

@rlskoeser

This comment has been minimized.

Copy link
Collaborator

commented Sep 19, 2019

I have reviewed the software, documentation, and paper. The only checklist item I'm not able to sign off on is a statement of need included with the project documentation. As I understand it, a statement of need should be included in the readme or project documentation, but the only place I could find this was in the article. I suggest adapting the summary information from the beginning of the article.

Other things I noted:

  • I was surprised not to find installation instructions included in the readme (at least the most common version); would be better if people didn't have to click through to read the docs to find that. Based on the reviewing guidelines it sounds like this should be included in the readme, but I checked it off since installation is well-documented in the read the docs documentation.
  • The detailed tutorial is well-written and very helpful, but it's difficult to cut and paste examples from the python console because of the ... included on the beginning of continuing lines. I tried to generate a file I could load using the from_yaml_file method based on the annotated sections you provide in a tutorial. First I got an error because I included the short set of rules, which looks like it has a typo (PR opened). Now that I've corrected that, I'm getting a validation error that I'm not sure how to troubleshoot. (I can't tell if this is a discrepancy between the versions in the tutorial or something I've done wrong.)
marshmallow.exceptions.ValidationError: {'rules': ['Invalid token "," in tokens of rule TransliterationRule(production=\',\', prev_classes=None, prev_tokens=None, tokens=[\',\'], next_tokens=None, next_classes=None, cost=0.5849625007211562)']}

Perhaps you could include the complete yaml file as a link that could be downloaded for use with the tutorial, and then people could compare your working version with the annotated samples in the tutorial? That would also allow you to omit the full lists of tokens rules from the tutorial, which might make it easier to follow.

  • In your contributing doc, the link on GitHub is broken but it works fine on readthedocs. I'm not sure if there's a way to make it work in both places
  • The directions for contributing say to use the Black code formatter, but it isn't included in the dev requirements. You have a note about pip installing them, but it seems like it would be better to add black to the dev requirements and update the contributing instructions to install development dependencies from that file.
@rlskoeser

This comment has been minimized.

Copy link
Collaborator

commented Sep 19, 2019

Forgot to say: this is a very well-documented and interesting tool! I haven't had the chance to work much with languages that require transliteration, but we're starting to work on a project involving Ethiopian Ge`ez. I haven't learned much about the language yet, but will keep this in mind if a use case comes up. It's important to put a statement of need on the readme so that people like me, who don't usually work with languages that require transliteration, will be able to understand immediately the use and power of your tool.

@seanpue

This comment has been minimized.

Copy link

commented Sep 19, 2019

Thank you for these excellent corrections and comments, @rlskoeser! I will work on fixing them over the weekend and will let you know when I'm done.

Do let me know how it goes with Ethiopian Ge`ez. I am working on developing a standard for Urdu and some other South Asian languages that will generate the Perso-Arabic text, with or without the Arabic diacritics, and also output in the scholarly transliteration that we use in academic writing as well as in library of congress romanization format, which I need for an archive project.

I also want to figure out a good way to bundle YAML files to the module, too, so that people can contribute them. I might add a transliterators module that reads from a directory of YAML files (or JSON) and can produce an accessible list and GraphTransliterator objects, maybe by setting __all__ programmatically in __init__.py. That will require setting some standards for the metadata field. I may model it on the fields in setup.py. I am not sure if there is any standard for that sort of thing. If you have any thoughts about that, please let me know. Thanks!

@rlskoeser

This comment has been minimized.

Copy link
Collaborator

commented Sep 19, 2019

Bundled transliterator configurations sounds valuable. Maybe you could define a subclass of the GraphTransliterator object that looks for a named config file in a specified path (or possibly multiple paths).

If/when you add that, you'll probably also want to update your contributing document to explain how to add a new transliterator configuration. You'll also want think about what kind of testing a bundled transliterator needs (probably at least as some sanity checking?).

@seanpue

This comment has been minimized.

Copy link

commented Sep 30, 2019

I'm close to finishing these corrections. It will probably take at least another week.

I have added bundled transliterators requiring tests that cover all nodes and edges of graphs, as well as OnMatchRules, if applicable. I will have to update the essay, as well.

I will also be cleaning up the documentation and other points following the excellent recommendations of @rlskoeser.

This review process has been so helpful! Thanks, @vc1492a and @rlskoeser!

@gkthiruvathukal

This comment has been minimized.

Copy link

commented Sep 30, 2019

@seanpue: Thanks for your follow up and willingness to address all review feedback.

@vc1492a and @rlskoeser: Thanks for your thorough reviews. This feels like a model of how reviews should be done. Even though the work is clearly well-done and polished, there were minor issues raised to ensure the submission leads to a great outcome.

I feel optimistic this is heading toward acceptance and look forward to recommending same when @seanpue finishes making the minor revisions. Separately, I rarely jump in with any feedback before the reviewers do, but I felt pretty optimistic about this submission when I first saw it. It's nice to see this research area represented within JOSS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.