-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Related to: openjournals/joss-reviews#1540
General feedback
Firstly, I really enjoyed this so thank you for asking me to review. As this I'm late to the party, I see that you have already substantially improved the documentation and paper thanks to a great review by Gavin.
I have no major objections to this paper and I think the software is a valuable contribution to the field. I've enjoyed playing with it so far. I would have liked a comparison against other tools, partly as I think this would attract users and increase the impact of sourcepredict, but I accept this is outside the scope of the paper.
Minor comments
-
File format descriptor for the taxonomic classification tables
You use several terms to refer to the same thing throughout the documentation, such as OTU count table, TAXID abundance count table and abundance table. Please unify these. -
Input filename in example
Rather than calling the example input datadog_example.csv
, could you make it a more informative filename? Even something likedog_sink_sample.csv
, to tie it in with your source-sink narrative. -
Reference formatting issue in the paper
The formatting for the Kraken reference in the second paragraph of the paper needs fixing, at the moment the authors are outside the reference parenthesise.
Suggestions
-
Continuity in syntax/documentation
To be clear, I don't need to see this changed but I just wanted to raise it. I found the usage documentation a little confusing when you used the termabundance_table
, for instance as the positional argument tag. Throughout the paper and documentation you have nicely set up the source-to-sink logic and I think it is a shame not have this continuity here. -
Standardised format for input taxonomic classification tables
Again, just a suggestion for a future release maybe. It may be worth considering a standardised format, e.g. biom format. I only suggest this as the parsing required to get the TAXID and abundance values from different taxonomic classifiers may make this tool less appealing or unaccessible to some users. If you could get the standard output from one or more taxonomic classifiers (kraken/kaiju/metaphlan etc.) and run them straight into sourcepredict, that would lower the barrier to entry. Your kraken pipeline is a great step in this direction. You could maybe think about adding the kraken_parse script to this repo too - particularly as people often receive kraken reports as part of sequencing results so may not want/need to re-run kraken (or use mini kraken) before using sourcepredict.