Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phylotaR -- submission #187

Closed
10 of 19 tasks
DomBennett opened this issue Jan 23, 2018 · 52 comments
Closed
10 of 19 tasks

phylotaR -- submission #187

DomBennett opened this issue Jan 23, 2018 · 52 comments

Comments

@DomBennett
Copy link

DomBennett commented Jan 23, 2018

Summary

  • What does this package do? (explain in 50 words or less):

R implementation of the PhyLoTa pipeline. It identifies orthologous sequences in GenBank using BLAST searches. It consists of three automated stages: taxise, download and cluster.

  • Paste the full DESCRIPTION file inside a code block below:
Package: phylotaR
Type: Package
Title: Automated phylogenetic sequence cluster identification from GenBank
Version: 0.1
Date: 2017-11-17
Author: Hannes Hettling, Rutger Vos, Alexander Zika, Dominic J. Bennett, Alexandre Antonelli
Maintainer: D.J. Bennett <dominic.john.bennett@gmail.com>
Description: PhyLoTa identifies orthologous sequence clusters from data readily available in GenBank. This is an R implementation of this process.
License: GPL-2
URL: https://github.com/dombennett/phylotaR#readme
BugReports: https://github.com/dombennett/phylotaR/issues
SystemRequirements: BLAST+ (>2.7.1)
Depends:
    R (>= 3.3.0),
    methods,
    foreach
Imports:
    curl,
    doMC,
    doSNOW,
    igraph,
    CHNOSZ,
    rentrez,
    DBI,
    taxize,
    XML
Suggests:
    testthat,
    knitr,
    rmarkdown
RoxygenNote: 6.0.1
VignetteBuilder: knitr
  • URL for the package (the development repository, not a stylized html page): https://github.com/DomBennett/phylotaR

  • Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

data retrieval, it downloads sequences from genbank and identifies sequence clusters that may be of use for phylogenetic analysis

  •   Who is the target audience and what are scientific applications of this package?  

Evolutionary biologists, phylogeneticists. Determining sequence orthology is the essential first step of any phylogenetic analysis. Usually a phylogeneticist may search GenBank for sequences of the same gene, download and align. Due to naming problems, however, potential relevant sequences may be missed or sequences that have been mislabelled are downloaded. PhyLoTa gets around this by performing all-v-all BLAST searches and does not rely on gene names. PhyLoTa is often the first step of any large-scale automated phylogenetic pipeline (e.g. SUPERSMART). Potentially, this R package could be central to many future phylogenetic analyses.

The only current approach for identifying PhyLoTa clusters is the PhyLoTa browser. The data underlying its current release, however, is five years old. In those five years there have been over 40 million new sequences added to GenBank. An alternative is needed to make use of the latest data. Additionally, phylotaR would allow a user to specify their own parameters when running the pipeline.

  •   If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
    • The package is novel and will be of interest to the broad readership of the journal.
    • The manuscript describing the package is no longer than 3000 words.
    • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal.
    • (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

No errors, but plenty of warnings to do with documentation.

Not just yet. Outstanding: conversion to snake_case, generate README from rmd, create website with pkgdown, NEWS required, authors@R syntax, depends to imports and cat to message (although is that necessary given the logging methods?)

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

Pre-submission Note

Hi!
I've produced this pre-submission issue to determine whether phylotaR would be something you'd like added to your roster? On our side, we would be very keen as we think future phylogenetic analyses will depend on making software open-source, modular and collaborative. I did feel, however, that it may not quite fit into the ropensci family, because it's a pipeline and it depends on software external to R (BLAST). Please be aware that phylotaR is still only being developed and a lot of testing still needs to be performed. If you are interested, I'd be happy to update the naming style (I prefer camelcase), improve the documentation and whatever else is needed to meet your criteria.
Thanks,
Dom

@sckott
Copy link
Contributor

sckott commented Jan 30, 2018

thanks for your submission @DomBennett we've been discussing and will get back to you asap

@sckott sckott self-assigned this Jan 30, 2018
@sckott sckott changed the title phylotaR -- presubmission phylotaR -- submission Jan 30, 2018
@sckott
Copy link
Contributor

sckott commented Jan 30, 2018

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks for your submission @DomBennett !!

  • You didn't check the top level MEE box - i've checked that box. You didn't check the box for The manuscript describing the package is no longer than 3000 words - will manuscript length be a problem?
  • You didn't check the box for has a CRAN and OSI accepted license. I think you do, can you check that box?
  • You didn't check the box for does not violate the Terms of Service of any service it interacts with. Is that going to be a problem? Can you check on that?
  • I see you checked the submit to MEE box. Note that we can not make any gaurantee about MEE considering your paper a good fit for them, or them accepting the paper
  • Seems like blast on Homebrew (installed via brew install blast) is the same as at the NCBI link you gave at https://www.ncbi.nlm.nih.gov/books/NBK279671/ - Is that right? homebrew is for OSX platforms only
  • It would indeed be ideal if the pkg didn't have to use system calls to external tools, but there are times when there's no other way.
  • In terms of style, we prefer snake case but as long as you are consistent within the package that is THE most important thing (i.e, don't use a combination of camelCase, TitleCase and snake_case).
  • You did not check the box for Does the package conform to rOpenSci packaging guidelines? Can you do that and describe any exceptions
  • You can put a ropensci review badge in your README

[![](https://badges.ropensci.org/187_status.svg)](https://github.com/ropensci/onboarding/issues/187)

  • Goodpractice output for you and reviewers to consider:
── GP phylotaR ────────────

It is good practice towrite unit tests for all functions, and all package code
    in general. 75% of code lines are covered by test cases.

    R/base-wrappers.R:7:NA
    R/blast-tools.R:10:NA
    R/blast-tools.R:16:NA
    R/blast-tools.R:28:NA
    R/blast-tools.R:34:NA
    ... and 325 more linesnot use "Depends" in DESCRIPTION, as it can cause name
    clashes, and poor interaction with other packages. Use "Imports"
    instead.omit "Date" in DESCRIPTION. It is not required and it
    gets invalid quite often. A build date will be added to the package
    when you perform `R CMD build` on it.use '<-' for assignment instead of '='. '<-' is the
    standard, and R users and developers are used it and it is easier
    to read your code for them if you use '<-'.

    R/ncbi-tools.R:156:9avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/blast-tools.R:59:1
    R/taxise-tools.R:361:1omit trailing semicolons from code lines. They are not
    needed and most R coding standards forbid them

    R/ncbi-tools.R:147:55avoid sapply(), it is not type safe. It might return a
    vector, or a list, depending on the input data. Consider using
    vapply() instead.

    R/blast-tools.R:32:13
    R/blast-tools.R:33:11
    R/cache-tools.R:253:13
    R/cache-tools.R:256:13
    R/cache-tools.R:317:12
    ... and 32 more linesavoid 1:length(...), 1:nrow(...), 1:ncol(...),
    1:NROW(...) and 1:NCOL(...) expressions. They are error prone and
    result 1:0 if the expression on the right hand side is zero. Use
    seq_len() or seq_along() instead.

    R/cluster-tools.R:345:12
    R/ncbi-tools.R:162:20
    R/pipeline.R:208:12
    tests/testthat/test-cluster-tools.R:123:17
    tests/testthat/test-download-tools.R:81:17
    ... and 2 more linesfix this R CMD check WARNING: '::' or ':::' imports not
    declared from:ggplot2’ ‘plyrNamespaces in Imports field not
    imported from:DBI’ ‘taxizeAll declared Imports should be used.
    Packages in Depends field not imported from:foreach’ ‘methodsThese packages need to be imported from (in the NAMESPACE file) for
    when this namespace is loaded but not attached. Missing or
    unexported object:doSNOW::registerDoMC’
  ✖ fix this R CMD check NOTE: blstN: no visible binding for
    global variablecmdblstN: no visible global function definition
    forread.tablebyLineage : .calc : <anonymous>: no visible
    binding for global variablevaluefltr: no visible global
    function definition forheadfltr: no visible global function
    definition fortailgenClstrsObj: no visible global function
    definition fornewgetBestClstrs: no visible global function
    definition fornTaxagetBestClstrs: no visible global function
    definition forseqLengthgetClMdLn : <anonymous>: no visible
    global function definition formediangetMngblIds: no visible
    global function definition forheadgetMngblIds: no visible
    global function definition fortailmkBlstDB: no visible binding
    for global variablecmdnDscndnts: no visible global function
    definition for ‘%dopar%’ nDscndnts: no visible global function
    definition forforeachnDscndnts: no visible binding for global
    variableiplotTable: no visible binding for global variableclstrnmplotTable: no visible binding for global variabletxidnmplotTable: no visible binding for global variablevaluerunStgs: no visible global function definition forissafeSrch:
    no visible global function definition forissetUp: no visible
    global function definition forpackageVersionwriteClstr: no
    visible global function definition forwrite.tablewriteSqs: no
    visible binding for global variableclstrs_objwriteTax: no
    visible global function definition forwrite.tablesummary,ClstrsObj: no visible binding for global variablexUndefined global functions or variables: %dopar% clstrnm clstrs_obj
    cmd foreach head i is median nTaxa new packageVersion read.table
    seqLength tail txidnm value write.table x Consider adding
    importFrom("methods", "is", "new") importFrom("stats", "median")
    importFrom("utils", "head", "packageVersion", "read.table", "tail",
    "write.table") to your NAMESPACE file (and ensure that your
    DESCRIPTION Imports field contains 'methods').fix this R CMD check WARNING: Undocumented data sets:blst_rs’ ‘phylt_nds’ ‘sqs’ ‘tdobjAll user-level objects in a
    package should have documentation entries. See chapterWriting R
    documentation filesin theWriting R Extensionsmanual.fix this R CMD check NOTE: Warning: no function found
    corresponding to methods exports fromphylotaRfor:showA
    namespace must be able to be loaded with just the base namespace
    loaded: otherwise if the namespace gets loaded by a saved object,
    the session will be unable to start. Probably some imports need to
    be declared in the NAMESPACE file.

You don't have to address these now.

Seeking reviewers now 🕐


Reviewers: @arendsee @naupaka
Due date: 2018-03-12

@rvosa
Copy link

rvosa commented Jan 31, 2018

@sckott - what would you recommend for a journal to publish a short application note about this package?

@sckott
Copy link
Contributor

sckott commented Jan 31, 2018

@rvosa possibly: we have a sort of pairing with JOSS and MEE. Also maybe: The R Journal, F1000Research

@rvosa
Copy link

rvosa commented Jan 31, 2018

@sckott
Copy link
Contributor

sckott commented Jan 31, 2018

http://joss.theoj.org/

@sckott
Copy link
Contributor

sckott commented Feb 6, 2018

@DomBennett I had a number of questions for you above in my comments. Please respond

@DomBennett
Copy link
Author

DomBennett commented Feb 7, 2018

@sckott Sorry, I thought I could wait for the reviewers and respond to everything wholesale.

You didn't check the top level MEE box - i've checked that box. You didn't check the box for The manuscript describing the package is no longer than 3000 words - will manuscript length be a problem?

We hadn't decided at that point which journal to send it to. We're not likely to submit to MEE now. All MEE boxes have been unticked.

You didn't check the box for has a CRAN and OSI accepted license. I think you do, can you check that box?

It does now. I added an MIT license.

You didn't check the box for does not violate the Terms of Service of any service it interacts with. Is that going to be a problem? Can you check on that?

I've read the terms now and it won't. Box ticked.

I see you checked the submit to MEE box. Note that we can not make any gaurantee about MEE considering your paper a good fit for them, or them accepting the paper.

It'd be nice if you did guarantee! But we're not going with MEE....

Seems like blast on Homebrew (installed via brew install blast) is the same as at the NCBI link you gave at https://www.ncbi.nlm.nih.gov/books/NBK279671/ - Is that right? homebrew is for OSX platforms only

It's just the link to the general description of how to install BLAST tools across all systems. Not specific to OSX.

It would indeed be ideal if the pkg didn't have to use system calls to external tools, but there are times when there's no other way.

Yeah it sucks big time. We're currently working on an approach to avoid using standalone BLAST, in which case a user would not need to install it. The BLAST tools are written in C++, so in theory it should be possible to package the code into the R package. I'm not too au fait on how to do this though. I've written external C scripts into R packages before, but never have I taken someone else's C code entirely. There may be problems with modifying the C++ code to interact with the R. Do you have any ideas on workarounds? Otherwise we could simply give the user the option to send their BLAST jobs to online servers (e.g. NCBI or EBI).

In terms of style, we prefer snake case but as long as you are consistent within the package that is THE most important thing (i.e, don't use a combination of camelCase, TitleCase and snake_case).

Yeah I know..... sigh..... Coding style can be real bugbears. I try and use a sort of PEP8 style with a mix of all three (variables snake_case, functions camelCase and classes TitleCase). I can go through and make it all snake_case. I'll probably rechristen some elements where I think the names don't quite work.

You did not check the box for Does the package conform to rOpenSci packaging guidelines? Can you do that and describe any exceptions

Not just yet. See above. Also I need to go through your more specific recommendations above as well. Do I need to make these changes before the reviewers look over the package? It's just the pipeline is still being tested and developed and it may be a while before these changes can be made.

You can put a ropensci review badge in your README

Dun'd.

@sckott
Copy link
Contributor

sckott commented Feb 7, 2018

We hadn't decided at that point which journal to send it to. We're not likely to submit to MEE now. All MEE boxes have been unticked.

Okay, sounds good.

It does now. I added an MIT license.

thanks.

I've read the terms now and it won't. Box ticked.

thanks.

We're currently working on an approach to avoid using standalone BLAST ...

Glad you're looking into some options. It's possible reviewers will have some ideas as well. One idea: if you do shell out, you could use the sys package made by Jeroen on our team https://cran.rstudio.com/web/packages/sys/

I need to go through your more specific recommendations above as well. Do I need to make these changes before the reviewers look over the package?

If you're talking about the goodpractice output, no, not now. That's just information for you and reviewers to consider.

@arendsee
Copy link

arendsee commented Feb 12, 2018

Hi @DomBennett, I'm working on reviewing phylotaR, but ran into a few little issues.

  1. When I run the vignette, I get the following error:
> clstr_id <- getBestClstrs(clstrs_obj, n=1)                        
Error in nTaxa(clstrs_obj) : could not find function "nTaxa"
  1. Fix the BLAST version handling

Currently the BLAST version requirement is hard coded to greater than 2.7.1
(setup-tools.R:118). I have 4 issues with this:

  • Hard-coding of values is generally bad.

  • BLAST version 2.7.1 is pretty restrictive. For example, the Ubuntu package
    manager provides BLAST version 2.2.28. Many researchers work in paranoid
    agencies that don't give us admin privileges, so installing the latest and
    greatest is often a pain. Anyway, unless your algorithm depends on new
    features, you should relax the version requirement.

  • The code for testing version is brittle. The version vector is parsed into a 3-tuple (e.g. 2.7.1 ==> c(2,7,1)), then tested as below

  # setup-tools.R:118
  tst <- vrsn[1] >= 2 & vrsn[2] >= 7 & vrsn[3] >= 1

However, this would lead to versions such as 2.8.0 being rejected. You can
fix this by dropping the last comparison:

  # setup-tools.R:118
  tst <- vrsn[1] >= 2 & vrsn[2] >= 7

Also, in DESCRIPTION, the SystemRequirements specifies a BLAST version of >2.7.1, but your actual version checking function requires >=2.7.1.

  1. Fix the warnings and notes in check()

When I run devtools::check() I get 5 warnings, and 3 notes. The issues all look pretty normal (easy to fix). I don't see anything too serious here. Below I have outlined the main issues and how to fix them (I might have missed one or two).

  • Remove the warnings_are_errors: false clause from your .travis.yml
    file. CRAN will generally not accept a package if any warnings are raised by
    CHECK. So you probably should not ignore warnings in your continuous
    integration.

  • DESCRIPTION says that your license is GPL-2, but the actual LICENSE file is
    MIT. You need to choose one of them. devtools makes license handling pretty
    easy. Just delete your current LICENSE file and the License entry in your
    DESCRIPTION file, and then call either devtools::use_mit_license() or
    devtools::use_gpl3_license(). This will create the license and update the
    DESCRIPTION accordingly.

  • You need to add ggplot2 and plyr to your Imports list in DESCRIPTION

  • You need to add the statements:

#' @importFrom methods is new
#' @importFrom stats median 
#' @importFrom utils heads packageVersion read.table tail write.table

Somewhere in your documentation (I usually put these lines in the same place
where I put package level documentation).

  • Describe each parameter in the documentation

Cheers!

@DomBennett
Copy link
Author

DomBennett commented Feb 13, 2018

Thanks @arendsee for the detailed and early feedback. Dealing with your specific issues:

  1. nTaxa
    The functions and classes associated with the clusters object are still a little open-ended at this stage, hence why their names may change. The equivalent to nTaxa is now getClNTx - get n taxa by cluster. We're still trying to determine the sorts of ways users will want to interrogate the clusters, and the best sort of language to use when naming functions.

I've updated the vignette. But the code is likely to change in the near future and will need changed again.

  1. BLAST
    OK. Thanks for the tips. Fixed and updated. Now using v 2.0 and up. How would you test the versioning without hard coding the values?

  2. Check warnings

  • travis warnings - removed from yaml
  • licensing updated to MIT
  • ggplot2 and plyr - added
  • statements added to phylotaR.R
  • parameters are being worked and should be updated soon!

@sckott
Copy link
Contributor

sckott commented Feb 13, 2018

Both reviewers now assigned: @arendsee @naupaka
Due date: 2018-03-12

@DomBennett
Copy link
Author

DomBennett commented Feb 14, 2018

@arendsee @naupaka Thanks for taking time to review the package.

@arendsee
Copy link

arendsee commented Feb 14, 2018

OK. Thanks for the tips. Fixed and updated. Now using v 2.0 and up. How would you test the versioning without hard coding the values?

Hmm, good point. @sckott Any thoughts on best practices for confirming the version of an outside tool (BLAST, in this case)?

I can tell you what I do, though I am not entirely happy with my approach. I simply don't check the version of my tools, instead I just print the version output into the log (e.g. system call to blastp -version).

If you want to check the version, you could write a dedicated function to extract a version string from a version statement (maybe get_version or something), and then write a dedicated version comparison function (maybe cmp_version).

I don't really have a good general solution. But in your case, I would recommend not checking the version. There are differences between the BLAST versions, but I don't think any of the BLAST+ versions anyone will actually use (i.e. >=2.2) will change the results. Perhaps it would be worthwhile to test several versions of BLAST on Travis.

Thoughts? @sckott @naupaka?

@sckott
Copy link
Contributor

sckott commented Feb 14, 2018

I don't know off hand about best practices for checking versions of cli tools. Thoughts @jeroen @richfitz ?

Perhaps it would be worthwhile to test several versions of BLAST on Travis.

this seems like a really good idea - and then any changes that affect certain BLAST versions and not others will become immediately apparent

@naupaka
Copy link
Member

naupaka commented Feb 14, 2018

Insofar as I can recall, the big breaking changes happened during the switch from BLAST to BLAST+ that happened ~6 years ago. So as long as the version is newer than that it should be ok. Some of the other NCBI software like sra-tools need to have more recent versions than are currently available in some apt repositories, because of changes in NCBI's servers, but I haven't seen problems with BLAST itself.

It seems (in my personal opinion) that trying to pull the source for BLAST into R is a bad idea, because then it's a ton of extra maintenance to make sure that stays up to date as things change on NCBI's side. Most folks who would use this package should probably be able to install a modern copy of ncbi-blast, so I would argue for spending effort on increasing the normal codebase and adding documentation, etc, instead of trying to pull BLAST in to remove the dependency.

@arendsee
Copy link

arendsee commented Feb 14, 2018

@naupaka I agree on pulling BLAST into R being a bad idea. Unless NCBI was interested in doing the work and maintaining the code.

As for the legacy BLAST versus BLAST+, the names of the executable have all changed. For example, blastp is not defined in legacy, they used blastall with some protein-vs-protein argument, I think. So if someone tried to use legacy BLAST the commands required simply would not be found.

@DomBennett
Copy link
Author

DomBennett commented Mar 27, 2018

Thanks very much @arendsee and @naupaka for your detailed reviews!

I feel I may have already addressed some of the run issues raised by @naupaka, as I encountered some of them over the past few weeks myself. The hanging is particularly annoying and seems to be due to Entrez, I've added a timeout so that it will retry after one hour. Also, I wonder if the BLAST errors might be due to using Windows, I've not tested the pipeline at all on that OS. Lots of things to be getting on with!

In terms of next steps @sckott , do I just update the package and report what I did here?

@sckott
Copy link
Contributor

sckott commented Mar 27, 2018

@DomBennett So we don't have a rule on how exactly to respond to reviewers. This is how it's best approached though I think:

Open issues in your repo that address issues raised here - Then when you reply to reviewers you can link to those - That's what I do when I bring my own pkgs through review, and I use a label or wording in the issue title to indicate its from the review, e.g. https://github.com/ropensci/charlatan/issues?q=is%3Aissue+is%3Aopen+label%3Areview What's good about opening issues is you can reference those in your commits so that everything is all linked up. It's likely that you'll want to ask for additional feedback from reviewers - so being able to link to an issue that is linked to the code itself is a nice approach

@DomBennett
Copy link
Author

DomBennett commented Apr 4, 2018

Hi @sckott,

I've copied your format and created issues. Also, FYI we have now submitted a manuscript describing phylotaR to MDPI Life, preprint available here.

Is there a deadline for responding to the reviewers? I was thinking of taking a break and coming back to phylotaR with a little more distance.

@sckott
Copy link
Contributor

sckott commented Apr 5, 2018

Not really a deadline. If you'll put it off for a while, we have a holding label that we'll use, and then we'll know to not ask you about progress until you're ready to come back to it

@sckott sckott added the holding label Apr 6, 2018
@DomBennett
Copy link
Author

DomBennett commented Jun 16, 2018

Hi @sckott, @arendsee and @naupaka,

I’ve made the following major changes as per your comments and reviews:

  • Restyled: All functions and variables are snake case with the exception of the S4 classes that remain camelcase. I’ve also updated the names to make things a little more explicit (e.g. ClstrRec rather than ClRec).
  • Tested: I now have 89% coverage
  • Examples: All ‘public’, exported functions and data have examples.
  • Pkgdown: I have created a website.
  • Documentation: all functions have return parameters, I have checked spelling with spell_check.
  • Timings: The demonstration phylotaR runs now have the timings of each stage on the README.md
  • Contributing: There is now a CONTRIBUTING.md
  • Logging: There is now more salient information printed to screen, blast versions and session info are also recorded in a user’s wd.
  • Windows: The pipeline has been run on a windows machine
  • Makeblastdb error: I think this was due to a ‘~’ in @naupaka’s filepath. I now halt the pipeline if a user’s does this.

Things I have not done:

  • Additional packages: I have not incorporated taxize, taxizeDB, tm or Biostrings into the package. In the case of the latter, I feel it best to avoid Bioconductor packages as they are a little more tricky to install and I would like to keep phylotaR light, even though it requires BLAST+. In the case of the others, I see no immediate reason to make use of these additional packages while I currently have no need for their additional capabilities. I have fixed the 80 column width issue of the user generated .fasta output. In the future, as new features are requested (e.g. alternative taxonomies), then they should be incorporated. Right now though, I feel I should follow the doctrine of if it ain’t broke, why fix it?
  • Advanced vignette: I have not added this as a lot of the additional information can be found in the recently published article on the pipeline. In the future I can imagine a new vignette containing more detailed information on new features with more complex options. This is likely to occur since I have been contacted a fair number of times with requests for new options and features.

If any of you feel that these last two points should be acted upon in some form, then I will be happy to look at these again.

Many thanks once more for taking the time to review the package!

Best wishes,
Dom

@DomBennett
Copy link
Author

DomBennett commented Jun 16, 2018

Oh also... I should add that I cannot rename the first stage 'taxise' as it is already published (I interpret taxise to just mean grab taxonomic information or place in a taxonomic context.). To compromise I have updated the stage diagram in the README to be a more explicit.

@naupaka
Copy link
Member

naupaka commented Jun 16, 2018

Sounds great @DomBennett!

@sckott shall we have another go through of it with the new version? What's the ROpenSci SOP on review after revisions?

@sckott
Copy link
Contributor

sckott commented Jun 20, 2018

thanks @DomBennett !

@naupaka @arendsee

If you're completely happy with changes to the package and responses from the maintainer then just say so. If not, please do comment here with any additional comments/suggestions/etc.

@arendsee
Copy link

arendsee commented Jun 20, 2018

@DomBennett @sckott Looks good to me.

I would recommend eventually refactoring with taxize et al, but I wouldn't consider it a necessity.

Good job!

@sckott
Copy link
Contributor

sckott commented Jun 20, 2018

@naupaka do you want to make further comments, or are you happy?

@naupaka
Copy link
Member

naupaka commented Jun 20, 2018

I'd like to put the new version through its paces and haven't had the time to sit down and do it. Will get back to you in the next couple days if that's ok?

@sckott
Copy link
Contributor

sckott commented Jun 20, 2018

@naupaka sounds good

@naupaka
Copy link
Member

naupaka commented Jul 8, 2018

@DomBennett Just ran through a couple testing workflows and everything works smoothly for me! Thanks for this very useful tool.

@naupaka
Copy link
Member

naupaka commented Jul 8, 2018

@sckott I'm happy to give this my stamp of approval 👍

@sckott
Copy link
Contributor

sckott commented Jul 9, 2018

Great, thanks again @arendsee and @naupaka for your reviews!

@sckott
Copy link
Contributor

sckott commented Jul 9, 2018

@DomBennett we're nearly there. just a few comments:

On running the example from the vignette, I get an error:

---------------------------------------------------
Running pipeline on [unix] at [2018-07-09 10:22:31]
---------------------------------------------------
Running stages: taxise, download, cluster, cluster2
--------------------------------------------
Starting stage TAXISE: [2018-07-09 10:22:31]
--------------------------------------------
Searching taxonomic IDs ...
Downloading taxonomic records ...
. [1-21]
Generating taxonomic dictionary ...
---------------------------------------------
Completed stage TAXISE: [2018-07-09 10:22:33]
---------------------------------------------
----------------------------------------------
Starting stage DOWNLOAD: [2018-07-09 10:22:33]
----------------------------------------------
Identifying suitable clades ...
Identified [1] suitable clades.
Downloading hierarchically ...
Working on parent [id 9504]: [1/1] ...
. + whole subtree ...
. . Getting [2700 sqs] ...
. . . [1-300]
Unexpected Error : NCBI is limiting the size of your request. Consider reducing btchsz.


Occurred [2018-07-09 10:22:58]
Contact package maintainer for help.
Error in stages_run(wd = wd, frm = 1, to = nstages, stgs_msg = stgs_msg) :
  Unexpected Error : NCBI is limiting the size of your request. Consider reducing btchsz.


Occurred [2018-07-09 10:22:58]
Contact package maintainer for help.

It looks like I need to fiddle with btchsz but I'm not sure where to do that. Would it make sense to say what function has that parameter? It looks like it's the function parameters()?

running goodpractice again: i'd encourage you to avoid sapply where possible and avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) type expressions where possible, but not there aren't required before approval.

@DomBennett
Copy link
Author

DomBennett commented Jul 9, 2018

Thanks again to @arendsee and @naupaka for their helpful reviews! So nice to see this package improve with all your helpful suggestions and feedback.

@sckott, as per your points ....

The default btchsz is 300 records per request. I find this is OK initially but if you use Entrez a lot they seem to impose a limit, changing it to 100 seems to always work. I will make 100 the default instead.

To change parameters you can either delete the folder and set up the folder again with different parameter values using the setup(). Or you can use parameters_reset() to change an already set-up folder. I will add a section to the vignette on editing the parameters and point users to the appropriate function in the error message.

Will run goodpractice myself again and avoid sapply and the 1: stuff.

After I've made those changes I will ping you again here.

@DomBennett
Copy link
Author

DomBennett commented Jul 12, 2018

@sckott changes made!

@sckott
Copy link
Contributor

sckott commented Jul 12, 2018

Approved! Thanks again for your submission @DomBennett

  • Please transfer the package to ropensci- I've invited you to a team within our ropensci organization. After you accept you should be able to move it. You'll be made admin once you do.
  • Make sure to
    • add rOpenSci footer to README
      [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)
    • Change any needed links, such those for CI badges
    • Travis CI should update to the new location automatically - you may have to update other CI systems manually
  • In a new .github folder in the root of the pkg, add a CONTRIBUTING.md file, an issue template file, and a PR template file, see https://github.com/ropensci/dotgithubfiles/ for examples, you can copy them directly and edit as you wish
  • We're starting to roll out software metadata files to all ropensci pkgs via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your pkg, after installing the pkg - should be easy as running codemetar::write_codemeta() in the root of your pkg

@DomBennett
Copy link
Author

DomBennett commented Jul 13, 2018

Done!
https://github.com/ropensci/phylotaR

@sckott
Copy link
Contributor

sckott commented Jul 13, 2018

@DomBennett and the question about a blog post?

@stefaniebutland
Copy link
Member

stefaniebutland commented Jul 18, 2018

@DomBennett this link will give you many examples of blog posts by authors of onboarded packages so you can get an idea of the style and length: https://ropensci.org/tags/review/. Technotes are here: https://ropensci.org/technotes/.

Here are some technical and editorial guidelines for contributing a blog post: https://github.com/ropensci/roweb2#contributing-a-blog-post.

Happy to answer any questions.

@sckott sckott closed this as completed Jul 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants