Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

submission - treestartr #239

Closed
12 of 18 tasks
wrightaprilm opened this issue Jul 23, 2018 · 73 comments
Closed
12 of 18 tasks

submission - treestartr #239

wrightaprilm opened this issue Jul 23, 2018 · 73 comments

Comments

@wrightaprilm
Copy link

Summary

  • What does this package do? (explain in 50 words or less):
    This package helps users to build a reasonable starting phylogeny for phylogenetic analysis.
    In particular, this package allows researchers to add tips to a tree through either a graphical
    user interface, an automated system based on taxonomy, or by inputting a dataframe of taxa to
    be placed.

  • Paste the full DESCRIPTION file inside a code block below:

Package: treestartr
Type: Package
Title: Generate Starting Trees For Combined Molecular, Morphological and Stratigraphic Data
Version: 0.1.0
Authors@R: person("April", "Wright", role = c("aut", "cre"),
	email = "april.wright@selu.edu")
Description: Combine a list of taxa with a phylogeny to generate a starting tree for use in
    total evidence dating analyses.
URL: https://wrightaprilm.github.io/treeStartR/
BugReports: https://github.com/wrightaprilm/treeStartR/issues
License: MIT + file LICENSE
Depends:
	R (>= 3.0.0)
Imports:
	phytools (>= 0.6-4),
	ape
Suggests:
    knitr,
    testthat,
    covr,
    utils
VignetteBuilder: utils
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1

  • URL for the package (the development repository, not a stylized html page):
    https://github.com/wrightaprilm/treeStartR

  • Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

[e.g., "data extraction, because the package parses a scientific data file format"]

data munging, combining data sources for use with phylogenetic analysis programs (in particular RevBayes & Mr. Bayes)

  • Who is the target audience and what are scientific applications of this package?

As the complexity of phylogenetic models increases, it can be difficult to find starting phylogenetic trees for Bayesian MCMC that have computable likelihoods. Most programs start from a random addition tree, which, for large datasets, may not produce reasonable likelihoods. Most software allows users to input a starting tree. The starting tree is often gleaned from the literature. However, for people performing "total evidence" or "joint" phylogenetic analysis, there may not be a published tree with all the relevant taxa. This package allows users to easily create one. Fossil record taxa are commonly left off phylogenies, so I expect biologists and paleontologists working with fossils to be the chief users of this package.

Not to my knowledge.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Emailed @sckott to discuss how to know if the package is ready, and if reviewers can help with a WARNING, but did not discuss specific scope questions.

Requirements

Confirm each of the following by checking the box. This package:

  • [x ] does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
    • The package is novel and will be of interest to the broad readership of the journal.
    • The manuscript describing the package is no longer than 3000 words.
    • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
    • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no gaurantee that your manuscript willl be within MEE scope.)
    • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
    • (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:
    I have two, and I'm unsure what they mean:

Undocumented code objects:
‘absent_list’ ‘ant_mrca_df’ ‘ant_taxa’ ‘ant_tree’ ‘mm’ ‘mol’
‘mrca_df’ ‘new_tree’ ‘tax_file’ ‘tax_list’ ‘tree’
Files in the 'vignettes' directory but no files in 'inst/doc':
‘vignette.Rmd’, ‘vignette.pdf’

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

Klaus Schliep - KlausVigo
David Bapst - dwbapst
Rachel Warnock - rachelwarnock

@sckott
Copy link
Contributor

sckott commented Jul 27, 2018

thanks for your submission @wrightaprilm !

Doing editor checks right now. On trying to install I get

treeStartR git:(master) Rscript -e 'library(devtools); document(); install()'
Updating treestartr documentation
Loading treestartr
Error: 'ants' is not an exported object from 'namespace:treestartr'

can you fix? then i'll continue with editor checks

@wrightaprilm
Copy link
Author

I misunderstood some stuff about data packaging. But I think it's working now!

@wrightaprilm
Copy link
Author

Quick question - a user noted a (both run time and unit) test I should add for their use case, and I have added it. Should I push that now, or wait?

@sckott
Copy link
Contributor

sckott commented Jul 31, 2018

Yes, go ahead! Let me know when that's pushed

@wrightaprilm
Copy link
Author

Alright, pushed.
Thanks!

@sckott
Copy link
Contributor

sckott commented Aug 2, 2018

thanks!, will have another look

@sckott
Copy link
Contributor

sckott commented Aug 3, 2018

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks for your submission @wrightaprilm !

Here's the output from goodpractice. If you haven't used goodpractice it's an R package that checks a number of things with another package - most of which we agree with and want authors to follow. You don't need to address these now, it's info for reviewers to use to get started.

── GP treestartr ─────────
It is good practice towrite unit tests for all functions, and all package code in general. 76% of code lines are covered by test cases.

    R/absent_tippR.R:15:NA
    R/absent_tippR.R:16:NA
    R/absent_tippR.R:19:NA
    R/absent_tippR.R:21:NA
    R/absent_tippR.R:22:NA
    ... and 28 more linesavoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines
    shorter than 80 characters

    R/taxon_testr.R:11:1
    R/tree_df.R:20:1avoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using vapply() instead.

    R/make_absent_df.R:14:23
    R/tree_df.R:24:21not import packages as a whole, as this can cause name clashes between the imported packages. Instead, import only the specific functions you need.fix this R CMD check NOTE: absent_tippr: no visible global function definition forplotdataf_parsr: no visible global function definition forread.csvtest_dataf: no visible global function definition forassert_thatUndefined global functions or variables: assert_that plot read.csv Consider
    adding importFrom("graphics", "plot") importFrom("utils", "read.csv") to your NAMESPACE file.fix this R CMD check WARNING: Undocumented code objects:absent_list’ ‘absent_tippr’ ‘dataf_parsr’ ‘genera_strippr’ ‘mrca_df’ ‘present_tippr’
    ‘rand_absent_tippr’ ‘text_placr’ ‘treeUndocumented data sets:absent_list’ ‘mrca_df’ ‘treeAll user-level objects in a package should have documentation
    entries. See chapterWriting R documentation filesin theWriting R Extensionsmanual.fix this R CMD check WARNING: Data with usage in documentation object 'ants' but not in code: ants

Seeking reviewers now 🕐


Reviewers:

@sckott
Copy link
Contributor

sckott commented Aug 10, 2018

Reviewers:

@sckott sckott added the pub:mee label Aug 16, 2018
@sckott
Copy link
Contributor

sckott commented Sep 12, 2018

@dwbapst @rachelwarnock Will you be able to get your review in soon? We're about 2 weeks past due date. Thanks!

@dwbapst
Copy link

dwbapst commented Sep 12, 2018 via email

@sckott
Copy link
Contributor

sckott commented Sep 12, 2018

Can you send that email again to myrmecocystus@gmail.com

@dwbapst
Copy link

dwbapst commented Sep 13, 2018 via email

@rachelwarnock
Copy link

rachelwarnock commented Sep 19, 2018 via email

@dwbapst
Copy link

dwbapst commented Sep 19, 2018

Hi, I feel the same as Rachel. I'm new to rOpenSci reviewing and it just came at a bad time when I was teaching two graduate classes. In order to force myself to work on this, I'm going to work the draft live here in the issue thread.

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).
    • Before getting the request, Ih ad already read the manuscript for this software, and promised I'd eventually look at the software, so this is what I was going to do anyway.

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README

    • yeah this exists
  • Installation instructions: for the development version of package and any non-standard dependencies in README

  • Vignette(s) demonstrating major functionality that runs successfully locally

    • the vignette is basically identical to the top-level readme, which does seem to be demonstrating major functionality, yes
  • Function Documentation: for all exported functions in R help
    - yes... dunno why you're asking this as R CHECK would complain about this if it weren't true

  • [?] Examples for all exported functions in R Help that run successfully locally

    • Some are not run, e.g. absent_tippr.
  • [?] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
    -Most of this is here, but I don't see anything on contributing to the project - I don't usually include a section on that in packages myself, so I am not sure what ropensci wants here.

Functionality

  • Installation: Installation succeeds as documented.
    -Yeah, it installed. I would point out though that R, silly enough, is a case-sensitive language, and so although the package is referred to at times as 'treeStartR', it can only be installed or loaded into the library as 'treestartr'. Please correct all case uses. (I realize that git, annoyingly, isn't case sensitive, so nothing can be done about the repo name.)

  • Functionality: Any functional claims of the software been confirmed.
    -I would say the software ran on the provided datasets, but more noisly (with more random messages, warnings, etc) than are needed and to the detriment of the software.

  • Performance: Any performance claims of the software been confirmed.
    - Not sure this package is making any performance claim, other than its easier than constructing this sort of a data structure by hand. (Which I've done. Yuck.)

  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
    --There are tests, and they are reasonable tests, but they fail on my local machine.

tools::test("d://dave//workspace//treeStartR")
Loading treestartr
Testing treestartr
√ | OK F W S | Context
Error in x[[method]](...) : attempt to apply non-function

== Results ===========================================================================
OK:       0
Failed:   4
Warnings: 1
Skipped:  0

Actually... I am beginning to wonder if there is something broken with my local installation of testthat. Hmm. Maybe disregard this comment for now if you can't reproduce it.

Anyway, this led me to running a local R CHECK --as-CRAN, where, oddly, the tests pass. It did produce some warnings (only one of which Scott notes above, that I'd like to comment on how to fix, as you indicate you would like to publish this on CRAN.

* checking top-level files ... NOTE
Non-standard files/directories found at top level:
  'README.Rmd' 'README.html' 'bibliography.bib' 'deprecated' 'docs'
  'images' 'paper' 'refs.bib'

These should be added to your .Rbuildignore file.

* checking R code for possible problems ... NOTE
absent_tippr: no visible global function definition for 'plot'
dataf_parsr: no visible global function definition for 'read.csv'
test_dataf: no visible global function definition for 'assert_that'
Undefined global functions or variables:
  assert_that plot read.csv
Consider adding
  importFrom("graphics", "plot")
  importFrom("utils", "read.csv")
to your NAMESPACE file.

CRAN policy increasingly wants us, when we use functions readily available in packages included in a base installation (but not in the base package) to directly list these as imports, and indicate that dependency in our DESCRIPTION file. Without looking though, I think the plot it is referring to here is probably ape's plot.phylo for plotting a phylogeny. I think best practice is that you should indicate this is ape::plot.phylo when you use that function, even in examples (I haven't done this, but am trying to).

* checking for missing documentation entries ... WARNING
Undocumented code objects:
  'absent_list' 'mrca_df' 'tree'
Undocumented data sets:
  'absent_list' 'mrca_df' 'tree'
All user-level objects in a package should have documentation entries.
See chapter 'Writing R documentation files' in the 'Writing R
Extensions' manual.

Obviously these need to be covered. I think they are objects in the bears dataset - all objects within data files need to be listed as aliases, e.g. in roxygen2 you can do something like #' @aliases RaiaCopesRule ceratopsianTreeRaia cervidTreeRaia to indicate all the objects and their associated help file.

* checking for code/documentation mismatches ... WARNING
Data with usage in documentation object 'ants' but not in code:
  ants

You need an example that makes use of the ant data - I don't usually get this warning, but my interpretation is that CRAN doesn't like packages with example datasets that aren't used - there was an issue a few years ago of people using CRAN as basically a repository for datasets for classes/textbooks, which isn't its intended use (and meant very large packages, size-wise, with almost no code base).

  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines
    - @sckott it would be great if there was a succinct list somewhere for this. Or just add it to this template, which is pretty great already! This is really silly, if rOpenSci has a great big list of 'packaging guidelines' then it should be part of this checklist, not its own, lone item.
    - ANYWAY, it seems to be a great big list, the first few being if the package and function names are okay. I guess so. I'm not certain what I'd think a priori what a package named treeStartR was, but I think the functionality in this package here is a reasonable association.
    - You don't seem to have a NEWS or CHANGES or CHANGELOG file.
    - 🤷

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:


Review Comments

Review Suggestions from rOpenSci:
    Does the code comply with general principles in the Mozilla reviewing guide?
    Does the package comply with the rOpenSci packaging guide?
    Are there improvements that could be made to the code style?
    Is there code duplication in the package that should be reduced?
    Are there user interface improvements that could be made?
    Are there performance improvements that could be made?
    Is the documentation (installation instructions/vignettes/examples/demos) clear and sufficient?
    If you have your own relevant data/problem, work through it with the package. You may find rough edges and use-cases the author didn’t think about.
    Checking the package’s logs on its continuous integration services (Travis-CI, Codecov, etc.)
    Running devtools::check() and devtools::test() on the package to find any errors that may be missed on the author’s system.
    Using the covr package to examine the extent of test coverage.
    Using the goodpractice package (goodpractice::gp()) to identify likely sources of errors and style issues. Most exceptions will need to be justified by the author in the particular context of their package.
    Using spelling::spell_check_package() (and spelling::spell_check_files("README.Rmd")) to find spelling errors.

So here's some random comments from playing with the package.

First, I strongly making a manual page named treeStartR or with that alias so that users who cannot remember a function name but can remember the package name can quickly access the documentation. This page also serves as a good starting point for users interested in the package, and you could put a lot of information from your readme there, such as directing them to the vignette.

absent_list <- genera_strippr(tree, bears)

This function expects a data.frame and not any other sort of object - you should be explicit about this in the help file. It also expets that the variable containing the taxon names will be labeled taxon and nothing else. This should also be laid out, possibly even allow users to vary it by adding an argument like `taxonNamesVariable = "taxon"' so the default behavior stays the same.

The manual page for this function refers to 'morphological & stratigraphic genera'. Its not clear to what the distinction is between these.

new_tree <- present_tippr(tree, absent_list)

The function present_tippr is very noisy. Why is the message Tree tip names formatted correctly repeated so many times? The message is not very informative to the user. The same issue occurs with function absent_tippr.

More messages:

Adding tips with congeners on tree:Indarctos_punjabiensis
Adding tip via MRCA at 25
Adding tips with congeners on tree:Ursus_abstrusus
Adding tip via MRCA at 30

What are these number codes? Are they node numbers 25 and 30? I'd clarify the message to the console, or clarify the documentation, or both.

Another message, this time a warning:

Warning message:
 In bind.tree(tree, tip, where = where, position = pp) :
  one tree has no branch lengths, they have been ignored

I would suggest suppressing this expected, non-informative warning from bind.tree by masking it with function suppressWarnings. This also happens with functions absent_tippr and text_placr.

More useful information to report to the user, perhaps, would be the number of OTUs added and the number of OTUs not added, perhaps also giving user a taxon list of the still-absent OTUs, to use with the other taxon-placing methods.

Also, if only a single congener is on the tree, the the species gets added, but to the root node:

tree2<-drop.tip(tree,"Indarctos_vireti")
new_tree <- present_tippr(tree2, absent_list)

We can compare the two trees with:

layout(1:2)
plot(tree);plot(new_tree)

I don't this this behavior is desired, and even if it is (sometimes) desired in certain use-cases, it should be described in the manual and perhaps toggle-able with an argument.

new_tree <- absent_tippr(tree, absent_list)

It is not immediately clear when you use this function that the prompt is asking for me to type in the number of the node, rather than interactively click on the plot (as in some ape and phytools functions). You may want to clarify the prompt to users so they know they need to enter the number.

new_tree <- rand_absent_tippr(tree, absent_list)

Where would you like to place the tip?Zaragocyon_daamsi
Error in if (where <= length(tree$tip.label) && position == 0) { : 
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In bind.tree(tree, tip, where = where, position = pp) :
  one tree has no branch lengths, they have been ignored
2: In absent_tippr(tree, absent_list) : NAs introduced by coercion

In the above code example, apparently you cannot make two species sister to each other, You can only place them in a polytomy together with a third taxon. This seems like it might be an unintentional design oversight.

If only a single relative is listed with text_placr, you get something like

new_tree <- text_placr(tree, mrca_df)
 Placing tip Kretzoiarctos_beatrix
 via relatives Indarctos_arctoides
 at node 

...And then the taxon gets added to the root node, bizarrely. Nothing in the documentation about more than one relative needing to be specified, or of what the algorithm's behavior is when only a single relative is specified. Again, this means you cannot make add a taxon as sister to another taxon, and it seems undesirable for users.

I tried generating an example dataset to see what happened.

library(paleotree)

set.seed(444)
# generate Genus Names
genNames<-paste0("R",
	apply(combn(c("a","b","c","d","e"),2),
		2,paste0,collapse="")
	)
# generate species names
specNames<-apply(combn(c("a","b","c","d"),3),
		2,paste0,collapse="")
#
#combine
extraSpec<-as.vector(sapply(genNames,function(x) 
	paste0(x,"_",specNames)))
tipSpec<-paste0(genNames,"_regularis")
#
#lets make our hypothetical simulated tree of higher taxa
singleSpeciesTree <- ape::rtree(10, tip.label=tipSpec)
#
# format extraSpec correctly
taxonDF<-data.frame(taxon=extraSpec)
#
library(treestartr)
absentList <- genera_strippr(singleSpeciesTree , taxonDF)

In the above, genera_strippr seems to work, although I note that it prints the entire absentList to the console, which is overly noisy.

@sckott
Copy link
Contributor

sckott commented Sep 19, 2018

thanks @dwbapst

@sckott
Copy link
Contributor

sckott commented Sep 19, 2018

@rachelwarnock okay, if it will be longer than that please let me know so I can get someone else

@dwbapst
Copy link

dwbapst commented Sep 20, 2018

Okay @sckott @wrightaprilm I think my review is as complete as its going to be for now.

@wrightaprilm
Copy link
Author

Thanks for that, @dwbapst. Really helpful stuff here (and thankfully all addressable)! @sckott, presumably I wait for @rwarnock, in case any of her review conflicts with Dave's. But after her's comes in, it looks like in the review guidelines, I can just start work on revision without waiting on an "Editor decision" in the way of a more traditional journal?

[If I'm wrong about the above, I'm certainly happy to contribute different wording to policies.html]

@sckott
Copy link
Contributor

sckott commented Sep 21, 2018

@wrightaprilm correct, let's wait for @rachelwarnock , then do revisions. correct, you don't need to wait for an editor decision

@rachelwarnock
Copy link

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work.

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
    • it doesn't say how to contribute but I'm not sure this is necessary

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package.
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
    • I'm not sure the tests do run; see my comment towards the end about running devtools::check
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:


Review Comments

Comments based on the README

Love the logo!

The main functionality of the package could be stated earlier on in the README. Its not until the end of third paragraph that you learn the main purpose of the package. You could include a sentence at the beginning, saying something like, "This package produces starting trees incorporating fossils for phylogenetic analysis", before jumping into your nice intro. Or the title of the README could be "treeStartR: generating starting trees for phylogenetic analysis".

The package depends on the package phytools, but since you use the phytools:: and ape:: function calling format (which is great programming practice) you don't have to load the phytools library on start up for the treestartr functions to work (I tested to make sure).

Its useful to have the function names within text using the markdown coding format, i.e. dataf_parser instead of dataf_parser. That way, the functions are highlighted if you want to quickly scan the README to locate info.

I don't think you need to use data(bears) since this dataset is already loaded into the environment when you load the package. data(ants) returns "data set 'ants' not found", although this dataset has a help page.

With the bears dataset, the absent_list, mrca_df and tree components of the data are available, but not tax_list. Instead, the info that should be represented by this variable appears to represented by the bears variable. i.e. the vector output by genera_strippr(tree, bears) matches absent_list. This doesn't seem intentional.

With new_tree <- present_tippr(tree, absent_list), I don't really follow the screen output from this command. Why is "Tree tip names formatted correctly" printed to screen so many times (= 33)? length(absent_list) + length(tree$tip.label) = 22 only. With a very large dataset you could end up generating a lot of unneccessary screen output. Why not just print "all tip names formatted correctly" once after checking? or just return incorrectly formatted names. This also applies to other functions that generate this output (absent_tippr, rand_absent_tippr, get_lost, get_found).

Unclear what the source of this warning message means:
Warning message:
In bind.tree(tree, tip, where = where, position = pp) :
one tree has no branch lengths, they have been ignored

I suppose this has something to do with the fact that the functions are returning trees with no branch lengths. This suggests that the fossil ages are not being taken into account. Will this not be important for generating starting trees for fbd/tip-dating analyses?

The manual option is tricky for trees even as large as the example dataset. For example, on my small laptop screen, some of the branching events are too close for me to be able to distinguish the node labels (i.e. many are plotted on top of each other). Not sure if there's anything you can do about this though.

Where it says TSV file, do you mean CSV? or CSV/TSV

Perhaps it would be useful to have an example, where multiple options are used to add taxa to the tree.

Comments based on the help pages

bears - tax\_list not available
ants - has a help page but I can't seem to load the data

absent_tippr - the documentation say tips are added "via user input" but its not clear that this is function is interactive.

dataf_parsr - The function description mentions morphological and molecular data but I think this function is designed to read the file containing the stratigraphic info only.
Maybe include a url to the relevant RB tutorial? Where it describes the function argument dataf, I think this should be "file name" not "data frame" (which is the output). In addition, because the functions don't return time scaled trees, I'm sure why the fossil ages are required.

genera_strippr - no issues here.

get_found, get_lost - The distinction between these functions is a bit confusing. Could the dataframe output by this function have more useful column headers? Or perhaps these could be private internal functions.

make_absentdf - Perhaps this could also be a private internal function.

present_tippr - I would add the more comprehensive description of this function from the README here.

rand_absent_tippr - no issues here.

text_placr - maybe some additional clarification would be useful here. Am I right in thinking the clade column really refers to a branch/tip? And that the function then builds a clade from the tips (e.g. Ursus_arctos, Ursus_spelaeus, Ursus_americanus) from which it works out where to place the taxon (e.g. Ursus_abstrusus). At first when I looked at mrca_df I wasn't clear why there were multiple entries per taxon in the table.

I think it would be useful to have more complete examples in the documentation. e.g. for rand_absent_tippr.

data(bears)
absent_list <- genera_strippr(tree, tax_list)
new_tree <- rand_absent_tippr(tree, absent_list)
write.nexus(new_tree)

That way, if the user wanted to use this function, they would know which other function help pages to refer to get all the info necessary to generate input.

On a related note, I initially found the relationship between the package functions hard to get my head around, while just scrolling through the help pages. I produced the following for reference.

dataf_parsr (load a total list of taxa) ->
genera_strippr (find out which taxa are already on the tree) -> 
absent_tippr (add tips manually) / 
present_tippr (add tips with congeners)/ 
rand_absent_tippr (add tips randomly) / 
text_placr (add tips using csv) -> 
write.nexus

Perhaps you could consider including a similar flow chart or table in the README. Definitely not necessary, esp. if you included more start-to-finish examples, so its just a thought!

Comments based on the code

My comments on the code are extremely minor. The functions all appear to be concise and the more complicated functions are nicely commented. I can't see any obvious room for improvement in terms of efficiency or redundancy, apart from one small thing noted below.

I'm not sure each function needs to be in a separate file (i.e. related functions could potentially live in the same file) but this is down to programmer's preference and since the package isn't massive it doesn't really matter.

genera_strippr - why does this function print and return absent?

Other comments

Note that running devtools::check() produces 1 error, 4 warnings and 3 notes. Might be good to double check these issues. devtools::spell_check() identified one misspelling of phylogeny.

I simulated a dataset to explore the functions a bit more. Everything seems to work well but I noted the following:

  • it seems to be the case that the fossil ages are not needed in dataf (i.e. the age column can be absent and the functions still work).
  • when the fossil ages are included, I can't see that the functions use this info.
  • I couldn't reproduce the errors observed with the bears dataset. i.e. the functions output trees with branch lengths and don't return a warning about this.
  • I just noticed the functions return non-binary trees. Unsure if this is a problem.

If the functions could return time scaled starting trees that respect the fossil ages I think this would be extremely useful.

If I had to pick a favourite function it would be text_placr. This is super useful :)

Note, I didn't look at Dave's review before writing this, but I can see some overlap. Hope this helps!

@wrightaprilm
Copy link
Author

I think I may have done it - all NOTEs, ERRORs and WARNINGs silenced, and build is passing.

Now what?

@sckott
Copy link
Contributor

sckott commented Oct 24, 2018

Nice work @wrightaprilm I see a few more things:

  • add these to your .Rbuildignore file: README.Rmd, README.html, bibliography.bib, deprecated, docs, images, paper, refs.bib
  • There are some undocumented objects still:
checking for missing documentation entries (1s)
   Undocumented code objects:
     ‘absent_tippr’ ‘genera_strippr’ ‘present_tippr

There's some remaining issues from my comment above #239 (comment) Did you see it yet?

@wrightaprilm
Copy link
Author

  • Realized I should have been using a inst/doc folder all along.
  • Had copied the vignettbuilder from another package, now changed to knitr, but this seems to generate another error:
Files named as vignettes but with no recognized vignette engine:
   ‘vignettes/vignette.Rmd’
(Is a VignetteBuilder field missing?).
  • I don't see the undocumented code objects issue - is there a flag I need to be using to get that?

@sckott
Copy link
Contributor

sckott commented Oct 26, 2018

I'd recommend running check with cran set to TRUE, e.g., I run from the command line like

Rscript -e 'devtools::check(cran = TRUE)'

In your vignette you're using rmarkdown - you need to add that pkg to your Suggests

@wrightaprilm
Copy link
Author

Hi Scott, I don't get that same error you were getting when I run that check. My output is below.

crancheck

I did edit some documentation, and fix a DESCRIPTION error, and change a behavior after a student ran into a problem. Undergrads are the best testers besides maybe @dwbapst ;).

Let me know if you still get the same thing. I don't have anything that is modified and not pushed in my workspace.

Thanks!

@dwbapst
Copy link

dwbapst commented Nov 2, 2018 via email

@sckott
Copy link
Contributor

sckott commented Nov 8, 2018

@wrightaprilm looking good now!

I see that phytools and ape have moved into Depends. I see @dwbapst comment above about this. it seems like you can just import the s3 methods, like @importFrom ape plot.phylo, etc. or is there some reason that doesn't work. Depends surely fixes this, but is it really necessary?

@dwbapst
Copy link

dwbapst commented Nov 8, 2018 via email

@wrightaprilm
Copy link
Author

I'm going to lean hard into the real answer on this one: I misunderstood how imports had to be done with respect to the test suite. I do some reading of nexus files in the test suite, so I load ape in the tests. I assumed that meant it had to be a depend; it turns out that wasn't right. I well and truly have no feelings on this point, so I switched to @importFrom for the S3 methods.

@sckott
Copy link
Contributor

sckott commented Nov 8, 2018

Given the arguments, I can go either way, so if you think Depends is easier go for it. or Imports.


@dwbapst and @rachelwarnock are you both happy with the changes @wrightaprilm has made ?

@sckott
Copy link
Contributor

sckott commented Nov 14, 2018

@dwbapst and @rachelwarnock are you both happy with the changes @wrightaprilm has made ?

@dwbapst
Copy link

dwbapst commented Nov 14, 2018 via email

@sckott
Copy link
Contributor

sckott commented Nov 14, 2018

thanks @dwbapst

@rachelwarnock
Copy link

rachelwarnock commented Nov 15, 2018 via email

@sckott
Copy link
Contributor

sckott commented Nov 15, 2018

thanks @rachelwarnock

@sckott
Copy link
Contributor

sckott commented Nov 15, 2018

Approved! Thanks again for your submission @wrightaprilm !

To-dos:

  • Please transfer the package to ropensci- I've invited you to a team within our ropensci organization. After you accept you should be able to move it. You'll be made admin once you do.
  • add rOpenSci footer to README
    [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)
  • Change any needed links, such those for CI badges
  • Travis CI should update to the new location automatically - you may have to update other CI systems manually
  • add ropensci review badge to your readme [![](https://badges.ropensci.org/239_status.svg)](https://github.com/ropensci/onboarding/issues/239)
  • We're starting to roll out software metadata files to all ropensci pkgs via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your pkg, after installing the pkg - should be easy as running codemetar::write_codemeta() in the root of your pkg
  • In a new .github folder in the root of the pkg, add a CONTRIBUTING.md file, an issue template file, and a PR template file, see https://github.com/ropensci/dotgithubfiles/tree/master/dotgithub for examples, you can copy them directly and edit as you wish
  • add CONDUCT.md to your .Rbuildignore file

We've started putting together a bookdown with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved. The repo is at https://github.com/ropensci/dev_guide

Are you interested in doing a blog post for our blog https://ropensci.org/blog/ ? either a short-form intro to it (https://ropensci.org/technotes/) or long-form post with more narrative about its development (https://ropensci.org/blog/). If so, we'll have our community manager @stefaniebutland get in touch with you on that

@wrightaprilm
Copy link
Author

Hurray! I'll take care of the TODO over the next couple days.

I would definitely be interested in a blog post - I found this process really helpful and well-organized. I think there's a lot of value to the community even for people who aren't "developers", or whatever, and I'd like to express that in some form.

@stefaniebutland
Copy link
Member

Hi @wrightaprilm.
I'm thrilled to hear you're interested in contributing a blog post.

I think there's a lot of value to the community even for people who aren't "developers", or whatever, and I'd like to express that in some form.

That sentiment reflects part of our mission "Building capacity of software users and developers and fostering a sense of pride in their work" so will be very welcome in your post!

This link will give you many examples of blog posts by authors of onboarded packages so you can get an idea of the style and length you prefer.

Here are some technical and editorial guidelines for contributing a blog post: https://github.com/ropensci/roweb2#contributing-a-blog-post. We ask that you submit your draft post via pull request a week before the planned publication date so we can give you some feedback.
I have 2018-12-11 available for publication. Would submission by 2018-12-04 work for you?

Happy to answer any questions.

@wrightaprilm
Copy link
Author

Oh, yeah, I'm giving a final that week, so I'll definitely be down to procrastinate with a blog post;)

@wrightaprilm
Copy link
Author

Real quick question - planning on putting some work in on this this afternoon. I had a Code of Conduct statement - Am I now covered under y'all's code of conduct? Should I edit my statement to reflect that ROpen Sci would now be in a reporting structure?

Thanks! I've never done something like this before.

@sckott
Copy link
Contributor

sckott commented Nov 19, 2018

We do have a COC for our events and any communications in our forums/slack/etc., BUT for repos, each repo should have their own COC file - which then also shows up on the right hand side of an issue when someone opens an issue. I'd suggest changing the file name to CODE_OF_CONDUCT.md instead of CONDUCT.md as I think github is looking for the former to link to

@wrightaprilm
Copy link
Author

Sorry to belabor the point - I have this sort of thing on my mind a bit lately. If someone were to feel that I was the source of unwelcoming behavior on the repository, would they be able to report it to y'all? I found the CoC site, but I didn't see an email on it for someone to make a report at.

Thx!

@sckott
Copy link
Contributor

sckott commented Nov 19, 2018

thanks @wrightaprilm - will get back to you asap

@sckott
Copy link
Contributor

sckott commented Nov 20, 2018

@wrightaprilm we have an email address now -> https://ropensci.org/coc/

and yes, they could report it to us and vice versa

@wrightaprilm
Copy link
Author

Great, thanks. I transferred the repo, updated the CI badges in the READMEs, updated them in the website directory, and updated my Code of Conduct.

@sckott sckott closed this as completed Nov 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants