Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: treedata.table #367

Closed
14 of 29 tasks
uyedaj opened this issue Jan 31, 2020 · 32 comments
Closed
14 of 29 tasks

Submission: treedata.table #367

uyedaj opened this issue Jan 31, 2020 · 32 comments

Comments

@uyedaj
Copy link

uyedaj commented Jan 31, 2020

Submitting Author: Josef C Uyeda (@uyedaj)
Repository: https://github.com/uyedaj/treedata.table
Version submitted: 0.1.0
Editor: @jooolia
Reviewer 1: @Bisaloo
Reviewer 2: @karinorman
Archive: TBD
Version accepted: TBD


  • Paste the full DESCRIPTION file inside a code block below:
Package: treedata.table
Type: Package
Title: A Wrapper For data.table For Fast Manipulation Of Phylogenetic Trees Matched To Data
Version: 0.1.0
Authors@R: c(
    person("Josef Uyeda", email= "juyeda@vt.edu", role=c("aut", "cre")),
    person("Cristian Roman-Palacios", email= "cromanpa94@email.arizona.edu", role=c("aut")), 
    person("April Wright", email= "april.wright@southeastern.edu", role=c("aut"))
  )
Maintainer: Cristian Roman-Palacios <cromanpa94@email.arizona.edu>
Description: This package combines trait data and a phylogenetic tree (or trees) into a 
    single object of class treedata.table. The resulting object can be easily 
    manipulated to simultaneously change the trait- and tree-level sampling.
    Currently implemented functions allow users to use a data.table syntax when
    performing operations on the trait dataset within the treedata.table object.
License: MIT + file LICENSE
Depends:
  data.table, R (>= 2.10)
Imports:
  lazyeval, ape, geiger, utils
RoxygenNote: 6.1.1
Suggests: 
    knitr,
    rmarkdown,
    testthat
VignetteBuilder: knitr

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • database access
    • data munging
    • data deposition
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):
    The package allows manipulation of data frames with data.table while maintaining a matched ordering to an ape phylogenetic tree. This allows rapid data munging without worrying about a
    mismatch with the tree order.

  • Who is the target audience and what are scientific applications of this package?
    Researchers doing phylogenetic analyses.

  • Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
    treeplyr works similarly but is built on the package dplyr rather than data.table, which has some advantages to dplyr.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Technical checks

Confirm each of the following by checking the box. This package:

Publication options

JOSS Options
  • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@maelle
Copy link
Member

maelle commented Feb 12, 2020

👋 @uyedaj! Thanks for your submission and sorry for the delay. I'll assign an editor to this submission shortly, who'll get started on the editor checks next week at the latest.

@jooolia
Copy link
Contributor

jooolia commented Feb 19, 2020

Hi @uyedaj!
Thanks for the submission and for your work on this package. I will be your editor for this submission. I have done a few of the initial editor checks.

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license (updated 20 June 2020)
  • Repository: The repository link resolves correctly (updated 20 June 2020)
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

  • For the licence, the description says MIT, but the LICENCE file is not complete. This needs to be completed.
  • I found the repository to be https://github.com/uyedaj/treedata.table, could you add the link at the top of the submission info so that it is easier to find the repo?
  • Spell-checking: Looks good.
  • Below are some automated checks performed on the package. Please address those without the tag reviewers.

goodpractice::gp output
── GP treedata.table

 It is good practice to

✖ write unit tests for all functions, and all package code in
 general. 34% of code lines are covered by test cases.

 R/as.treedata.table.R:40:NA
 R/as.treedata.table.R:43:NA
 R/as.treedata.table.R:44:NA
 R/as.treedata.table.R:49:NA
 R/as.treedata.table.R:55:NA
 ... and 151 more lines
  • reviewers will determine if this appropriate amount of test coverage

  • not use "Depends" in DESCRIPTION, as it can cause name clashes, and
    poor interaction with other packages. Use "Imports" instead.
    This can be argued if necessary. For more info see http://r-pkgs.had.co.nz/description.html

  • add a "URL" field to DESCRIPTION. It helps users find information
    about your package online. If your package does not have a
    homepage, add an URL to GitHub, or the CRAN package package page. (Updated 20 June 2020)

  • add a "BugReports" field to DESCRIPTION, and point it to a bug
    tracker. Many online code hosting services provide bug trackers for
    free, https://github.com, https://gitlab.com, etc. (Updated 20 June 2020)

  • use '<-' for assignment instead of '='. '<-' is the standard, and R
    users and developers are used it and it is easier to read your code
    for them if you use '<-'. (Updated 20 June 2020)

    R/detectCharacterType.R:76:8
    R/detectCharacterType.R:104:8
    R/extractVector.R:31:10
    R/tdt_methods.R:16:5
    
  • avoid long code lines, it is bad for readability. Also, many people
    prefer editor windows that are about 80 characters wide. Try make
    your lines shorter than 80 characters

    R/as.treedata.table.R:1:1
    R/as.treedata.table.R:3:1
    R/as.treedata.table.R:4:1
    R/as.treedata.table.R:5:1
    R/as.treedata.table.R:11:1
    ... and 55 more lines
    
  • avoid sapply(), it is not type safe. It might return a vector, or a
    list, depending on the input data. Consider using vapply() instead. (Updated 20 June 2020)

    R/as.treedata.table.R:71:16
    R/as.treedata.table.R:118:8
    R/pull.treedata.table.R:20:18
    
  • avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and
    1:NCOL(...) expressions. They are error prone and result 1:0 if the
    expression on the right hand side is zero. Use seq_len() or
    seq_along() instead. (Updated 20 June 2020)

    R/as.treedata.table.R:59:38
    R/treedata.table.R:67:33
    R/treedata.table.R:69:47
    
  • reviewers to consider and give feedback on
    fix this R CMD check WARNING: LaTeX errors when creating PDF
    version. This typically indicates Rd problems. LaTeX errors found:
    !pdfTeX error: pdflatex (file ts1-zi4r): Font ts1-zi4r at 600 not
    found ==> Fatal error occurred, no output PDF file produced!

  • reviewers to consider and give feedback on
    R CMD check ERROR: Error: object 'head' not found whilst loading namespace 'treedata.table'

  • avoid 'T' and 'F', as they are just variables which are set to the
    logicals 'TRUE' and 'FALSE' by default, but are not reserved words
    and hence can be overwritten by the user. Hence, one should always
    use 'TRUE' and 'FALSE' for the logicals. (Updated 20 June 2020)

    R/as.treedata.table.R:NA:NA
    R/as.treedata.table.R:NA:NA
    R/tdt_methods.R:NA:NA
    

─────────────────────────────────────────────────────────────


I will look for reviewers after these small issues have been addressed. Feel free to let me know if you have any questions.
Thanks! Julia

@wrightaprilm
Copy link

wrightaprilm commented Mar 2, 2020

Hey team, I have some free time to tackle this tomorrow! I'll push these changes and post an update here then. Thanks @jooolia for getting back with us!

@wrightaprilm
Copy link

wrightaprilm commented Mar 10, 2020

I think between my commit (7764ef2) and Scott's edit of the initial PR, we should have all these notes handled!

@jooolia
Copy link
Contributor

jooolia commented Mar 13, 2020

Thanks @wrightaprilm! I will have a look tomorrow. Cheers, Julia

@noamross
Copy link
Contributor

noamross commented Mar 17, 2020

⚠️⚠️⚠️⚠️⚠️

In the interest of reducing load on reviewers and editors as we manage the COVID-19 crisis, rOpenSci is temporarily pausing new submissions for software peer review for 30 days (and possibly longer). Please check back here again after 17 April for updates.

In this period new submissions will not be handled, nor new reviewers assigned. Reviews and responses to reviews will be handled on a 'best effort' basis, but no follow-up reminders will be sent.

Other rOpenSci community activities continue. We express our continued great appreciation for the work of our authors and reviewers. Stay healthy and take care of one other.

The rOpenSci Editorial Board

⚠️⚠️⚠️⚠️⚠️

@annakrystalli
Copy link
Contributor

annakrystalli commented May 19, 2020

⚠️⚠️⚠️⚠️⚠️
In the interest of reducing load on reviewers and editors as we manage the
COVID-19 crisis, rOpenSci new submissions for software peer review are paused.

In this period new submissions will not be handled, nor new reviewers assigned.
Reviews and responses to reviews will be handled on a 'best effort' basis, but
no follow-up reminders will be sent.
Other rOpenSci community activities continue.

Please check back here again after 25 May when we will be announcing plans to slowly start back up.

We express our continued great
appreciation for the work of our authors and reviewers. Stay healthy and take
care of one other.

The rOpenSci Editorial Board
⚠️⚠️⚠️⚠️⚠️

@jooolia
Copy link
Contributor

jooolia commented Jun 14, 2020

Hi @wrightaprilm and @uyedaj I will check the changes you made in March and get back to you. Then I will look for reviewers for the package. Thanks for your patience.
Cheers, Julia

@jooolia
Copy link
Contributor

jooolia commented Jun 20, 2020

Hi @wrightaprilm and @uyedaj,
We are now looking for reviewers. Please add the rOpenSci review badge to your README.
You can do this with by adding to your README:
[![](https://badges.ropensci.org/367_status.svg)](https://github.com/ropensci/software-review/issues/367)

Thanks! Julia

@jooolia
Copy link
Contributor

jooolia commented Jul 14, 2020

Reviewer: @karinorman
Reviewer: @Bisaloo
Due date: 2020-08-04

@Bisaloo
Copy link
Member

Bisaloo commented Jul 23, 2020

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed. -> not tested
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

  • first round: 2h30 + 1h30 + 1h

  • second round: 1h

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.


Review Comments

I've added some (optional) comments. These are not necessary for your package to follow the current best practices but standards are always evolving and I believe addressing these now will save you some trouble later down the road.

It would be super useful for me if you could use a different commit for each the points raised in my review 👍.

I hope everything makes sense. Please let me know if something is not clear.

DESCRIPTION and other packaging files

I recommend you have a look at the packaging guidelines in rOpenSci's development guide. It lists a number of points that are currently missing from your package. Among others:

  • It's recommended practice to add an ORCiD for the authors if they have one. Source

  • Please add a contributing guide.

  • There is a mismatch in the maintainer according to the Authors@R and the Maintainer fields in the DESCRIPTION. It's recommended to leave out the Maintainer field entirely and it will be automatically generated from Authors@R.

  • As far as I know, the Type field in the DESCRIPTION is not standard and in all cases unnecessary

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/DESCRIPTION#L2

  • The package Title is not actually using title case. I don't care much personally but in my experience, this can cause issues when you submit to CRAN

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/DESCRIPTION#L3

You can verify this with the tools::toTitleCase() function.

  • Please avoid as much as possible to use Depends when you can use Imports. Here is a relevant quote from Hadley Wickham and Jenny Bryan's book on R package development:

Depends: Prior to the rollout of namespaces in R 2.14.0, Depends was the only way to “depend” on another package. Now, despite the name, you should almost always use Imports, not Depends. You’ll learn why, and when you should still use Depends, in namespaces.

README

  • The README does not contain enough information about the package, how to install and use it. You may find this relevant chapter of rOpenSci's devguide useful.
  • Since this package is presented as addressing a performance need, it would be very nice to see some benchmarks to back this up. You can use the microbenchmark package for this.
  • (optional) it can be nice to add more badges to your README to advertise the fact that you follow the current best practices in package development. In particular, I like badges with code coverage. Here are the instructions on how to do this with codecov.

Code

  • Please avoid multiline instructions (separated by ;). It makes the code harder the read. This will also help with the long lines warning reported by goodpratice in @jooolia's comment:

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/as.treedata.table.R#L118

  • When you perform tests on objects that are already booleans (TRUE/FALSE), you don't need to write == TRUE. E.g.:

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/as.treedata.table.R#L44

should be

if (!equal_T) { ... }
  • Class testing should be done with inherits() (for S3 objects, which is the case for ape objects), or is() (for S4 objects). Please see this blog post by Martin Maechler for more info. Here is a concrete example of an issue caused by this in your package:
data(anolis)
td <- as.treedata.table(anolis$phy, anolis$dat)
p <- pull.treedata.table(td, type = "phy")
d <- pull.treedata.table(td, type = "dat")
td2 <- as.treedata.table(p, d)
# You would expect to get identical(td, td2) but this line actually throws an error

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/as.treedata.table.R#L59

should read

colnames(data) <- paste("trait", seq_len(length(data)), sep="")

or better

colnames(data) <- paste("trait", seq_along(data), sep="")
  • Communicate information to users via the message() function instead of cat()

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/as.treedata.table.R#L83

This will always surely cause issues upon CRAN submission if not corrected. See this detailed SO answer for more info about the differences between message() and cat().

  • On a related note, you don't need to use paste0() in message() calls. The message() will automatically concatenate the arguments you give it:
message("this ", "is ", "a ", "test!")
#> this is a test!
  • There is a problem in as.treedata.table() with the name_column argument not working. No matter what the user enters in name_column or what the auto-detection code finds, you always use the first column for tip.labels:

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/as.treedata.table.R#L84-L90

Reproducible example showing the bug:

library(treedata.table)
#> Loading required package: lazyeval
#> Loading required package: ape
#> Loading required package: geiger
#> Loading required package: data.table

data(anolis)

set.seed(20200723)

tre <- anolis$phy
dat <- anolis$dat

# shuffle columns
dat <- dat[, sample(ncol(dat), ncol(dat))]

as.treedata.table(tre, dat)
#> Phylo object detected
#> 100 tip(s) dropped from the original tree
#> 100 tip(s)  dropped from the original dataset
#> Warning in ape::drop.tip(tree, tree_not_data): drop all tips of the tree:
#> returning NULL
#> $phy 
#> 0 phylogenetic tree
#> 
#> $dat 
#> Empty data.table (0 rows and 11 cols): PCIV_lamella_num,tip.label,island,hostility,ecomorph,PCI_limbs...

Created on 2020-07-23 by the reprex package (v0.3.0)

This behaviour should be tested in tests.

  • (optional) it would be nice to have a message indicating which column was auto-detected as containing the tip.labels when name_column = "detect" in as.treedata.table()

  • (optional) uniformise the indents in your code. You sometimes use tabs and sometimes spaces. This can be fixed by running styler::style_pkg() in the root of your RStudio project.

  • As far as I can tell, the repeatsAsDiscrete is not used in detectCharacterType() or detectAllCharacters()

  • This line will cause wrong identifications since the conversion of a data.frame to a matrix forces all elements to the same type.

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/detectCharacterType.R#L34

This can be seen by using the two examples you provide

data(anolis)

detectCharacterType(anolis$dat[,1])
#> [1] "discrete"

detectAllCharacters(anolis$dat)
#> [1] "continuous" "continuous" "continuous" "continuous" "continuous" "continuous" "continuous" "continuous"
#> [9] "continuous" "discrete"   "discrete"

detectCharacterType(anolis$dat[,1]) == detectAllCharacters(anolis$dat)[1]
#> FALSE

Regarding my later point about tests, this would make a good test: ensuring that you find the correct number of discrete/continuous characters in the anolis dataset.

  • I think you missed some 1: instances from @jooolia's comment. E.g.,

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/detectCharacterType.R#L108

  • It may be my unfamiliarity with data.table but I don't understand the warning/prompt in droptreedata.table(). As far as I can tell, the original data is NOT modified. Using your example:
data(anolis)
td <- as.treedata.table(anolis$phy, anolis$dat)
td_old <- td
td_new <- droptreedata.table(tdObject = td, taxa = c("chamaeleonides" ,"eugenegrahami"))

identical(td, td_old)
#> TRUE

identical(td, td_new)
#> FALSE
  • The dots handling in extractVector() may probably be simplified with lazyeval since you already have it as a dependency

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/extractVector.R#L22

  • It would be useful to add a match.arg(type, c("dat", "phy")) in pull.treedata.table()

  • In tdt, do you really need ...? Wouldn't an FUN arg be sufficient?

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/tdt.R#L4

  • Since you provide a head() method for treedata.table, it would be nice to have a tail() method as well

  • paste()s are unnecessary here since cat() will already do this:

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/tdt_methods.R#L62-L74

  • The output of summary.treedata.table() is slightly confusing in my opinion. On the example you provide (with anolis), where no taxa are dropped from the tree or the data, you get:

These taxa were dropped from the tree: OK
These taxa were dropped from the data: OK

It's not obvious what's this OK is supposed to mean, I think it would be simpler if it was empty in this case.

  • When I run examples from [.treedata.table(), I get the following warning:

Warning message:
In 1:seq_along(x$dat) :
numerical expression has 11 elements: only the first used

Could you look into it please? I assume you meant seq_along(x$dat)

Tests

  • (optional) your tests are not compatible with the upcoming v3 of the testthat package. In particular, you could replace the (soon to be) deprecated expect_is() function by expect_s3_class(). Excepted these two lines

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/tests/testthat/test.treedata.table.R#L62-L63

that need to use expect_type()

  • I think the test coverage needs to be increased a lot, as shown by the bugs uncovered during this review. I sometimes noted what would be good candidates for tests. As a good starting point, you can also look at which lines are not covered by your tests on codecov: https://codecov.io/github/uyedaj/treedata.table?branch=master. I see no technical limitations that would prevent you to reach 100% coverage for this package but for now, I think you should at least try to reach 80% coverage.

Documentation

  • (optional) you may want to use markdown syntax in your Roxygen comments. This produces more readable documentation in the source file in my opinion. Automatic conversion of your current Roxygen comments to markdown can be done with the roxygen2md package.
  • Please add a sentence for the name_column argument to explain that "detect" (the default) will auto-detect this column:

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/as.treedata.table.R#L13

  • In as.treedata.table(), you declare

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/as.treedata.table.R#L10

but this is unnecessary since you don't use setDT() here and as.data.table() is prefixed with the data.table:: namespace

  • I'm not sure what the first word means in your @return roxygen comments. I think it will be less confusing if you remove these(?)

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/as.treedata.table.R#L14

  • (optional) you can use the @inheritParams detectCharacterType roxygen comment in the documentation of detectAllCharacters() to avoid duplicating the documentation of the function arguments. This is useful because it ensures the documentation of these functions will always stay in sync in the future. No risk of forgetting to update one of the two!

  • (optional) I find it useful to explicitly state which is default for all arguments (when it exists) in the documentation. E.g., in

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/detectCharacterType.R#L47

I would change it to something like

#' @param returnType Either discrete (the default) or continuous
  • Please expand the documentation of the hasNames()/forceNames() functions by providing a short explanation of the function purpose and/or a @return roxygen comment.

  • The documentation fo extractVector() should be updated to indicate that multiple column names can be passed to ..., which means you don't necessarily get a named vector but a list of named vectors, as opposed to what the documentation says:

extractVector(td, "SVL", "island")
  • I think the function name pull.treedata.table is slightly confusing as it sounds like a S3 and it's not. Something along the lines of pull_treedata.table() or pulldata.data.table() (you already have a dropdatatree.table() function so this would make sense) or ??? would probably be better.

  • Shouldn't this just be "If negative, all but the n last rows of x" (i.e., remove "first"):

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/tdt_methods.R#L4-L6

  • This doesn't seem correct. The dots are ignored in head.treedata.table() as far as I can tell

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/R/tdt_methods.R#L7

  • To fix the NOTE in R CMD check, you need to importFrom(utils,head) before you try to define another method for this generic. This can be done by adding a #' @importFrom utils head roxygen comment in the roxygen chunk of head.treedata.table() for example.

Vignette

  • Please update the list of authors in the vignette:

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/vignettes/treedata.table.Rmd#L3

  • (optional) it may useful for users to mention the differences between [[.treedata.table() and extractVector(), namely that [[.treedata.table() has an extra exact argument to enable partial match while extractVector() can extract multiple columns and accepts non-standard evaluation.

  • If you remove the "[1] FALSE FALSE" (which are actually output, and not R code) from this chunk, you will be able to remove the eval = FALSE. It's awlays better when all chunks run and can be copied/pasted in the console directly

https://github.com/uyedaj/treedata.table/blob/7764ef2c4a0e56f1cb5fbcef6d910fb8ba0eea8c/vignettes/treedata.table.Rmd#L102-L116

Misc

  • I can't reproduce the other R CMD check error mentioned by @jooolia (related to LaTeX). It might be version specific???

@wrightaprilm
Copy link

wrightaprilm commented Jul 30, 2020

Hey @uyedaj and @cromanpa94, I'll start tackling these early next week. I'm devoting 100% of my time to class prep this week.

@cromanpa94
Copy link

cromanpa94 commented Aug 1, 2020

Sounds perfect @wrightaprilm!! I'm also go over the @Bisaloo's comments during the weekend!

@Bisaloo
Copy link
Member

Bisaloo commented Aug 3, 2020

BTW, feel of course free to challenge any of my comments. You have spent more time thinking about this package than me so it's entirely possible that some of my concerns are misplaced.

@karinorman
Copy link

karinorman commented Aug 4, 2020

Hi @cromanpa94 - I really enjoyed exploring your package. It seems like a really nice tool for making phylogeny data easier to work with and I'm sure it's going to get a lot of use! See below for my comments.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally

Consider putting a link in the README.

  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • The data manipulation in as.treedata.table() example should be explained, or you could do it behind the scenes and just import the objects with an explanation of what they are/their structure.
  • No example for print.treedata.table
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

The maintainer denoted in Authors@Ris defferent from Maintainer: in DESCRIPTION. I don't see an explaination of contribution guidlines, the ROpenSci package documentation (https://devguide.ropensci.org/collaboration.html#friendlyfiles) gives good guidance on adding it.

Functionality

  • Installation: Installation succeeds as documented.

Not necessarily something to be fixed, but so you're aware, I'm still running R 3.6.3 and the dependency mnormt will not install from CRAN unless you're running R >4.0.0, so I had to manually install from an old github version (e.g. devtools::install_github('cran/mnormt@R-3.0.3')) before your install would work.

  • Functionality: Any functional claims of the software been confirmed.

  • Performance: Any performance claims of the software been confirmed.

  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.

    goodpractice::gp() output:

    • linebreaks under 80 characters
    • poor unit test coverage (only 34%)
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:
4 hrs

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

This is my first review for ROpenSci and I tried to approach it from the perspective of a naive user, so most feedback is related to things that may be difficult with initial use. Of course you are the expert on your users and this kind of data so I'll look forward to learning more from you about what does and doesn't make sense in this context.

General Comments:

  • The warnings for co-indexing calls (e.g. td[, head(.SD, 1), by = "ecomorph"] from the vignette don't seem particularly informative, I would suppress unless necessary.
  • I think having the user confirm the changes for droptreedata.table is unnecessary. For me at least it made me assume that the function was modifying an object in place, which I realized it wasn't after some exploration.

Functions:

  • Consider changing the naming convention to *.td.table rather than *.treedata.table for the sake of brevity. You could also add a td to other exported functions (e.g. td.extractVector).
  • The detectCharacter functions, filterMatrix, forceNames, and hasNames seem like they could be helper functions that facilitate the other main functions, but maybe don't need to be exported. If it makes sense I would not export them. If not, a little more explanation of how they fit into a workflow in the vignette would be helpful.
  • forceNames and hasNames could use more detailed description and/or justification, I'm not sure their functionality even after running the example. data(anolis) and forceNames(anolis$dat, "row") return seemingly the same object.
  • It could be useful to make dropping tips or dataframe entries optional when matching trees/dataframes instead of automatically dropping from either to match, so that the user has the option to preserve all data.
  • filterMatrix would be cleaner if charType wasn't required as an argument but instead calculated within the function. I have a hard time imagining a scenario in which you would want a vector of character types that didn't match the matrix you were already giving the function.
  • I'm curious about the pull.treedata.table function. Is there some utitlity in having a function that mimics the $ operator? Or maybe I'm missing some of the applications of this function?

detectCharacter Functions:

  • Please explain the definition of continuous or discrete in the descriptions of detectCharacterType, detectCharacterChanges, and filterMatrix.
  • I would change the language around "character" which is a specific object type, whereas this function appears to perform on multiple vector types.detectVectorType or detectColumnType may be more intuitive.
  • There is strange behavior in the examples for these functions. For example detectCharacterType(anolis$dat[,1]) returns "discrete", but detectAllCharacters(anolis$dat[,1:3]) returns three "continuous" entries. From my understanding of how the functions work I would expect the first entry to be "discrete" to match detectCharacterType(anolis$dat[,1]).

@jooolia
Copy link
Contributor

jooolia commented Aug 8, 2020

@cromanpa94
Copy link

cromanpa94 commented Aug 19, 2020

Thank you so much @karinorman and @Bisaloo for your comments 😄

We have addressed your concerns in a new branch (https://github.com/uyedaj/treedata.table/tree/cristian). Our answers are presented under issues in our repository (ropensci/treedata.table#1) but we're more than happy to post the same information here if necessary.

@Bisaloo
Copy link
Member

Bisaloo commented Aug 20, 2020

Thank you for your work! I think the package has much improved.

The authors have addressed all my comment either by updating the package or making a receivable answer as to why they'd rather not, so I recommend approving this package.

I have submitted a PR with a couple of very minor changes. I hope you don't mind. It seems like it will save time to everyone to make the changes directly instead of copy/pasting the relevant lines here.

Two last minor comments:

  • the first one is related to the filterMatrix() function. The @return value should be updated since it doesn't return a matrix but a data.frame. Maybe a name change is warranted as well(?)
  • I think you did well by removing the prompt in droptreedata.table() since both Kari and I were confused by this and thought it was modifying in place. In the same vein, I recommend you update the message and leave out the 'ORIGINAL' adjective:

https://github.com/uyedaj/treedata.table/blob/3f308e6cda841ca2f4557b07c81f972ecf67b1bc/R/droptreedata.table.R#L73

@cromanpa94
Copy link

cromanpa94 commented Aug 24, 2020

Thank you so much @Bisaloo! We just addressed all the remaining comments int the following commits:

ropensci/treedata.table@4917b4a

We also included additional details on why to talk about Matrix in filterMatrix(). The documentation now distinguishes between character matrix (describing the features of organisms) and the class of the input (a data.frame).

We're happy to make any other changes if necessary!

@karinorman
Copy link

karinorman commented Aug 24, 2020

@cromanpa94 Thank you for your response and especially for additionally explanation of domain-specific language.

@jooolia The reviewers have addressed my comments and I am happy to recommend the the package for approval.

@jooolia
Copy link
Contributor

jooolia commented Aug 25, 2020

Great! Thanks @cromanpa94 , @wrightaprilm and @uyedaj for the submission and @karinorman and @Bisaloo for the reviews. I am a guest editor so I need to get help creating the github team that will allow us to transfer the repository over to ropensci and the next steps. I will follow up soon. Thanks, Julia

@cromanpa94
Copy link

cromanpa94 commented Aug 25, 2020

Thanks @jooolia, @Bisaloo, and @karinorman for all the help and feedback!

@wrightaprilm
Copy link

wrightaprilm commented Aug 25, 2020

Awesome, thanks everyone for your hard work, especially during such a challenging time!

@jooolia
Copy link
Contributor

jooolia commented Aug 31, 2020

Thanks again @uyedaj, @wrightaprilm,and @cromanpa94 for submitting and @karinorman and @Bisaloo for your reviews! :)

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. Maelle Salmon has invited you to a team that should allow you to do so. Let us know when you've done this and I can let her know to make you admins once you do.
  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.
  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname).
  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

Thanks a lot for your contribution and your patience with the process during this difficult time.

Cheers, Julia

@uyedaj
Copy link
Author

uyedaj commented Sep 4, 2020

Apologies @jooolia, can you resend the invitation to ropensci? It has just expired for me.

@jooolia
Copy link
Contributor

jooolia commented Sep 6, 2020

No prob @uyedaj I will ask again. :) Thanks, Julia

@stefaniebutland
Copy link
Member

stefaniebutland commented Sep 6, 2020

Hi @uyedaj, I sent a new invitation just now to juyeda@vt.edu.
See you there!

@maelle
Copy link
Member

maelle commented Sep 7, 2020

@cromanpa94 @uyedaj I re-sent an invitation to the ropensci GitHub organization. @wrightaprilm was already in the team and org.

@stefaniebutland I think you mean the slack workspace, which is useful in any case!

@uyedaj
Copy link
Author

uyedaj commented Sep 7, 2020

Thank you @maelle and @jooolia, I've transferred the repo to ropensci's github organization and need to be made an admin now to do the next steps.

@maelle
Copy link
Member

maelle commented Sep 8, 2020

Yay! I made the three of you admin again. ✔️

@uyedaj
Copy link
Author

uyedaj commented Sep 11, 2020

I hope we have completed all of these satisfactorily:

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. Maelle Salmon has invited you to a team that should allow you to do so. Let us know when you've done this and I can let her know to make you admins once you do.

  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.

  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,

    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname).

  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

  • Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Thank you for your help!

@jooolia
Copy link
Contributor

jooolia commented Sep 17, 2020

Great thanks @uyedaj ! It looks good and I will close this issue. I think that you can also close your issue related to this review in your repo.
If you or @wrightaprilm or @cromanpa94 are interested in writing a blog post to promote your package that is a possibility, but it is optional and up to you.
Thanks again for the package and the nice submission and review process.
Cheers, Julia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests