[REVIEW]: AncesTrim - a tool for trimming complex pedigrees #179

whedon · 2017-02-07T05:12:22Z

Submitting author: @JNisk (Julia Niskanen)
Repository: https://github.com/JNisk/AncesTrim
Version: v1.0
Editor: @mgymrek
Reviewer: @jyuan1322
Archive: 10.5281/zenodo.375807

Status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/a9ee6cf921ca992227ec1ac0a19bac5d"><img src="http://joss.theoj.org/papers/a9ee6cf921ca992227ec1ac0a19bac5d/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/a9ee6cf921ca992227ec1ac0a19bac5d/status.svg)](http://joss.theoj.org/papers/a9ee6cf921ca992227ec1ac0a19bac5d)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer questions

Conflict of interest

As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

General checks

Repository: Is the source code for this software available at the repository url?
License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
Version: Does the release version given match the GitHub release (v1.0)?
Authorship: Has the submitting author (JNisk) made major contributions to the software?

Functionality

Installation: Does installation proceed as outlined in the documentation?
Functionality: Have the functional claims of the software been confirmed?
Performance: Have any performance claims of the software been confirmed?

Documentation

A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g. API method documentation)?
Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

Authors: Does the paper.md file include a list of authors with their affiliations?
A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
References: Do all archival references that should have a DOI list one (e.g. papers, datasets, software)?

The text was updated successfully, but these errors were encountered:

whedon · 2017-02-07T05:12:24Z

Hello human, I'm @whedon. I'm here to help you with some common editorial tasks for JOSS. @jyuan1322 it looks like you're currently assigned as the reviewer for this paper 🎉.

⭐ Important ⭐

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As as reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all JOSS reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

For a list of things I can do to help you, just type:

@whedon commands

jyuan1322 · 2017-02-18T03:21:35Z

Overall I found AncesTrim easy to install and run, and would only suggest minor changes to code documentation and error handling.

Functionality:
I tested AncesTrim using Python 2.7 on both an Ubuntu 14.04 and a MacOSX 10.11 machine and found no issues. The code functions as expected and performance is fast.

Documentation:

The statement of need could be expanded to make the utility of the tool more apparent. The initial paragraph of the README.md is brief and lacking in specifics about what exactly the tool does. Simply moving some of the information under "Pedigree Trimming Principles" into this section might help, but it would be great if the authors could include a visualization of a pedigree before and after pruning with AncesTrim to effectively demonstrate functionality.

Code:

There should be more extensive error checking when parsing input files. For example, submitting an empty register file will throw a "list index out of range" error.
In general, the code in ancestrim.py is difficult to follow. While steps are broken into distinct blocks of code, I think these should be wrapped into functions, with clear descriptions of the inputs and outputs of each step.
The command line options can be improved in a few ways. The authors have written a parser from scratch, but using an existing module like argparse would improve understandability and maintenance of the code and reduce the possibility of error. Also, I had to look up command line options by looking in the code - there should be a way to print a manual page either by running the script without arguments or by supplying a "--help" argument. Lastly, the "--folder" argument should only pertain to an output directory, not one for both input and output files. Since the input filenames need to be supplied anyway, this is a bit redundant and prevents tab-completion of filenames.
The raw input file appears to follow a custom specification, so users will need to preprocess their data into the expected format. The usability of the tool could be improved by adding a parser for existing pedigree data formats, such as GEDCOM, as this is already generated as an output file. (Although I wouldn't consider this a necessary addition.)
The code and column headers of the input/output files make multiple references to dogs, when presumably AncesTrim can also operate on pedigrees of humans or other animals.

Minor issues:

Typo in README.md under Usage: "(the raw rile)"

mgymrek · 2017-02-20T15:05:35Z

Thanks @jyuan1322 for the review!

@JNisk: it would be great if you could respond to these points. Most importantly:

making the suggested improvements to the README
including a "--help" option to list all possible command-line arguments

JNisk · 2017-02-20T15:52:50Z

Dear @jyuan1322 and @mgymrek,

thank you very much for your constructive criticism. I will now continue working with the script to address these issues.

Best regards,
JNisk

JNisk · 2017-02-23T11:32:11Z

To respond to the points made by @jyuan1322:

The statement of need could be expanded to make the utility of the tool more apparent. The initial paragraph of the README.md is brief and lacking in specifics about what exactly the tool does. Simply moving some of the information under "Pedigree Trimming Principles" into this section might help, but it would be great if the authors could include a visualization of a pedigree before and after pruning with AncesTrim to effectively demonstrate functionality.

The first section of README has been modified to be more descriptive. In addition, a "before and after" image has been added for demonstration. I had thought of an image before and hesitated, but your suggestion made it apparent that this improves the README quite a bit. Also, the "Pedigree trimming principles" section now includes an example image.

There should be more extensive error checking when parsing input files. For example, submitting an empty register file will throw a "list index out of range" error.

Error checking has been improved. Both raw file and register file are checked and an error message is displayed if either file is empty. Empty lines in register file are skipped, and lines in raw file are checked for missing columns. Also, an error message is displayed if an individual exists in the register file but not in the raw file.

In general, the code in ancestrim.py is difficult to follow. While steps are broken into distinct blocks of code, I think these should be wrapped into functions, with clear descriptions of the inputs and outputs of each step.

I agree with this. From my point of view, the complexity stems from the numerous pairwise comparisons that are made while searching for relatedness and when trimming the relationship paths. I understand that wrapping these into functions could make the script easier to read. However, I imagine this tool being used as a whole, which lessens the need to make particular steps available as functions. But in general this is a very valuable piece of advice and I aim to be more function-orientated in the future.

The command line options can be improved in a few ways. The authors have written a parser from scratch, but using an existing module like argparse would improve understandability and maintenance of the code and reduce the possibility of error. Also, I had to look up command line options by looking in the code - there should be a way to print a manual page either by running the script without arguments or by supplying a "--help" argument. Lastly, the "--folder" argument should only pertain to an output directory, not one for both input and output files. Since the input filenames need to be supplied anyway, this is a bit redundant and prevents tab-completion of filenames.

The command line options have been improved. A “--help” argument has been added, and it will display information about all the available parameters. Also, if a parameter is missing, messages about both the missing argument and the “--help” option will be displayed. This will also happen if no arguments are provided.

The “--folder” argument has been modified. The script now only requires an “--outfolder” parameter that designates the destination of the output files. An optional parameter “--infolder” can be used in case the user wants to specify the input file folder instead of tab-completion.

This custom command line parser was constructed as a practice, but I know realize that it would indeed have been better to use the argparse module. I am not keen on completely changing the parser at this point, but I will definitely change my approach for future projects. It is a very good point.

The raw input file appears to follow a custom specification, so users will need to preprocess their data into the expected format. The usability of the tool could be improved by adding a parser for existing pedigree data formats, such as GEDCOM, as this is already generated as an output file. (Although I wouldn't consider this a necessary addition.)

This is an excellent suggestion, and I had not thought of this before. The lack of input file parsing is partly because this script is part of a custom pedigree manipulation pipeline, but I understand that it would be a valuable addition to this tool.

The code and column headers of the input/output files make multiple references to dogs, when presumably AncesTrim can also operate on pedigrees of humans or other animals.

This is something that I considered, and it is true that AncesTrim can process any pedigree data regardless of species. I decided against changing the column names (and, additionally, the variable names in the script itself) as I thought that it is a minor issue that does not affect the performance of the script. I agree that the columns seem a bit silly, though, if one is working with something else than dog data.

Typo in README.md under Usage: "(the raw rile)"

The typo has been fixed.

In addition, I made some changes in the files to reflect the version change from 1.0 to 1.1. Changelog has been updated appropriately.

I want to sincerely thank @jyuan1322 for your hard work and great comments!
Best regards,
JNisk

jyuan1322 · 2017-02-25T01:26:46Z

Great! The readme is excellent. Everything looks good to me. :)

JNisk · 2017-02-27T13:58:41Z

Thank you @jyuan1322! Should I now deposit the up-to-date repository to an archive or do I wait for the editor @mgymrek to accept this?

Best regards,
JNisk

jyuan1322 · 2017-02-27T21:41:46Z

I think wait for @mgymrek - I'm not sure exactly what the procedure is.

mgymrek · 2017-03-07T18:56:44Z

@arfon I think we can accept this.

@JNisk I believe next is to make an archive of the reviewed software in Zenodo/figshare/other service and update this thread with the DOI of the archive.

JNisk · 2017-03-08T10:20:15Z

I have now archived version 1.1, and the DOI is http://doi.org/10.5281/zenodo.375807. I also updated the codemeta.json with the DOI.

Best regards,
JNisk

arfon · 2017-03-08T11:08:27Z

@whedon set 10.5281/zenodo.375807 as archive

whedon · 2017-03-08T11:08:29Z

OK. 10.5281/zenodo.375807 is the archive.

arfon · 2017-03-08T11:18:00Z

@jyuan1322 many thanks for your review here and @mgymrek for editing this submission ✨

@JNisk - your paper is now accepted into JOSS and your DOI is http://dx.doi.org/10.21105/joss.00179 ⚡️ 🚀 💥

whedon added the review label Feb 7, 2017

whedon mentioned this issue Feb 7, 2017

[PRE REVIEW]: AncesTrim - a tool for trimming complex pedigrees #152

Closed

arfon added the accepted label Mar 8, 2017

arfon closed this as completed Mar 8, 2017

whedon added published Papers published in JOSS recommend-accept Papers recommended for acceptance in JOSS. labels Mar 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW]: AncesTrim - a tool for trimming complex pedigrees #179

[REVIEW]: AncesTrim - a tool for trimming complex pedigrees #179

whedon commented Feb 7, 2017 •

edited

whedon commented Feb 7, 2017

jyuan1322 commented Feb 18, 2017

mgymrek commented Feb 20, 2017

JNisk commented Feb 20, 2017

JNisk commented Feb 23, 2017

jyuan1322 commented Feb 25, 2017

JNisk commented Feb 27, 2017

jyuan1322 commented Feb 27, 2017

mgymrek commented Mar 7, 2017

JNisk commented Mar 8, 2017 •

edited

arfon commented Mar 8, 2017

whedon commented Mar 8, 2017

arfon commented Mar 8, 2017

[REVIEW]: AncesTrim - a tool for trimming complex pedigrees #179

[REVIEW]: AncesTrim - a tool for trimming complex pedigrees #179

Comments

whedon commented Feb 7, 2017 • edited

Status

Reviewer questions

Conflict of interest

General checks

Functionality

Documentation

Software paper

whedon commented Feb 7, 2017

jyuan1322 commented Feb 18, 2017

mgymrek commented Feb 20, 2017

JNisk commented Feb 20, 2017

JNisk commented Feb 23, 2017

jyuan1322 commented Feb 25, 2017

JNisk commented Feb 27, 2017

jyuan1322 commented Feb 27, 2017

mgymrek commented Mar 7, 2017

JNisk commented Mar 8, 2017 • edited

arfon commented Mar 8, 2017

whedon commented Mar 8, 2017

arfon commented Mar 8, 2017

whedon commented Feb 7, 2017 •

edited

JNisk commented Mar 8, 2017 •

edited