Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

submission: babette #209

Closed
11 of 19 tasks
richelbilderbeek opened this issue Apr 5, 2018 · 42 comments
Closed
11 of 19 tasks

submission: babette #209

richelbilderbeek opened this issue Apr 5, 2018 · 42 comments

Comments

@richelbilderbeek
Copy link
Member

Summary

  • What does this package do? (explain in 50 words or less):

babette provides for a full workflow alternative of four different
GUI and command-line tools.

  • Paste the full DESCRIPTION file inside a code block below:
Package: babette
Title: Control BEAST2 from R
Version: 1.2.2
Author: Richel J.C. Bilderbeek <richel@richelbilderbeek.nl>
Maintainer: Richel J.C. Bilderbeek <richel@richelbilderbeek.nl>
Description: 'BEAST2' (<http://www.beast2.org>) is a widely used
  Bayesian phylogenetic tool, that uses DNA/RNA/protein data
  and many model priors to create a posterior of jointly estimated 
  phylogenies and parameters.
  'BEAST2' is commonly accompanied by 'BEAUti 2' (<http://www.beast2.org>), 
  'Tracer' <http://tree.bio.ed.ac.uk/software/tracer/> 
  and Densitree (<http://www.beast2.org>).
  'babette' provides for an alternative workflow of using 
  all these tools seperately. This allows doing complex Bayesian
  phylogenetics easily and reproducibly from R. 
License: GPL-3 | file LICENSE
LazyData: true
RoxygenNote: 6.0.1
VignetteBuilder: knitr
URL: https://github.com/richelbilderbeek/babette
BugReports: https://github.com/richelbilderbeek/babette/issues
Imports:
    beautier,
    beastier,
    phangorn,
    tracerer,
    testit,
    xml2
Suggests:
    devtools,
    ggplot2,
    knitr,
    lintr,
    nLTT,
    rmarkdown,
    testthat
Remotes: 
    jimhester/covr,
    jimhester/lintr,
    richelbilderbeek/beautier,
    richelbilderbeek/beastier,
    richelbilderbeek/nLTT,
    richelbilderbeek/tracerer,
    MangoTheCat/goodpractice,
    KlausVigo/phangorn
  • URL for the package (the development repository, not a stylized html page):

https://github.com/richelbilderbeek/babette

  • Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

[e.g., "data extraction, because the package parses a scientific data file format"]

reproducibility, as it provides a way to script three error-prone (and tedius) GUI
and one command-line tool.

  • Who is the target audience and what are scientific applications of this package?

Scientists that use the tool 'BEAST2' to do a Bayesian phylogenetic inference on DNA data.

No

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
    • The package is novel and will be of interest to the broad readership of the journal.
    • The manuscript describing the package is no longer than 3000 words.
    • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
    • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no gaurantee that your manuscript willl be within MEE scope.)
    • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
    • (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

Deviation from this guideline

There is one point I deviate from this guideline:

When using roxygen2, add #' @noRd to internal functions.

There are two functions beautier::default_params_doc and beastier::default_params_doc
that are empty internal function. Their purpose is to have their parameters inherited.

Adding #' @noRd stops internal and exported functions to inherit these parameters.

If there is alternative, I'd be happy to hear it.

Deviate from goodpractice::gp()

There is one goodpractice::gp() warning I ignore. About the 'Date' field in
the DESCRIPTION file. goodpractice suggest to remove it, R CMD check --as-cran
forces to add it. I followed the CRAN standards,

If there is way to satisfy both checks, I'd love to hear it.

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

Klaus Schliep, KlausVigo
Luke Harmon, lukejharmon
Jonathan Eastman, eastman
Joseph W. Brown, josephwb

@richelbilderbeek
Copy link
Member Author

This submission is a continuation of #208

@noamross noamross self-assigned this Apr 9, 2018
@noamross
Copy link
Contributor

noamross commented Apr 9, 2018

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks for this (continued) submission, @richelbilderbeek! I have two initial pieces of feedback, neither of which preclude moving forward, so I'll go ahead and look for reviewers.

  • First, BEAST2 is platform-agnostic, but as it is often easy to run into platform-specific issues while making system calls, please set up Windows CI via Appveyor.

  • Second, this came up as I was doing local testing and figuring out paths: I suggest you use the rappdirs to set a platform-agnostic default path for the BEAST2 .jar and have install_beast2() place it there by default. The recently accepted rtika does this. For users who already have BEAST2 installed, let them set an environment variable in .Renviron such as BEAST2_PATH. This way users can 'set it and forget it' once on package setup, no matter what platform they are on. You can either implement this now or wait until reviews are in.

  • I won't include goodpractice::gp() checks as they are all clean! Nice work including these in your CI setup.


Reviewers:
Due date:

@richelbilderbeek
Copy link
Member Author

richelbilderbeek commented Apr 12, 2018

Thanks for the great feedback! Will do so at latest at 2018-04-16:

  • Add AppVeyor
  • Use rappdir

@GuangchuangYu
Copy link

GuangchuangYu commented Apr 12, 2018

The example on tracerer README is broken since the function get_path was renamed to get_tracerer_path. Please update documentations accordingly.

The parse_beast_trees internally uses rBEAST::beast2out.read.trees, which output multiPhylo object without any BEAST inferences. I think you need to describe this in vignette and a link to treeio is helpful, see also #179. treeio is able to parse BEAST tree with corresponding statistics.

The output of the example is 10 phylogenetic trees:

trees <- parse_beast_trees(
  get_tracerer_path("beast2_example_output.trees")
)

while treeio outputs 11 trees.

After looking at the source code, there is an option:

opt.burnin: how many phylogenies to discard,

which by default is setting to 0, in the function rBEAST::beast2out.read.trees.

Although by default it is expected to discard 0 tree, it actually discard the first one. Is this a bug or intended for some reasons?


> rBEAST::beast2out.read.trees
function (file, opt.rescale.edge.length = 1, opt.burnin = 0)
{
    tmp <- readLines(file, n = 2000, warn = FALSE)
    tmp <- which(grepl("#NEXUS", tmp))
    if (length(tmp) > 1) {
        cat(paste("\nFound #NEXUS headers, n=", length(tmp),
            ".\nDiscard all lines before last entry on line",
            tail(tmp, 1)))
        cmd <- paste("sed -i\".bak\" 1,", tail(tmp, 1) - 1, "d ",
            file, sep = "")
        system(cmd)
        cmd <- paste("sed -i\".bak2\" 1s/\\;// ", file, sep = "")
        system(cmd)
        cmd <- list.files(paste(rev(rev(strsplit(file, "/")[[1]])[-1]),
            collapse = "/"), pattern = "*bak*", full.names = TRUE)
        cat(paste("\nrm files\n", paste(cmd, collapse = "\n")))
        file.remove(cmd)
    }
    mph <- ape::read.nexus(file)
    tmp <- regexpr("[0-9]+", names(mph))
    if (any(tmp < 0))
        stop("unexpected nexus file without STATE iteration numbers")
    mph.it <- as.numeric(regmatches(names(mph), tmp))
    mph <- lapply(which(mph.it > opt.burnin), function(j) mph[[j]])
    mph.it <- mph.it[mph.it > opt.burnin]
    names(mph) <- paste("STATE_", mph.it, sep = "")
    if (opt.rescale.edge.length != 1)
        for (j in seq_along(mph)) mph[[j]]$edge.length <- mph[[j]]$edge.length *
            opt.rescale.edge.length
    mph
}

The implementation of rBEAST::beast2out.read.trees is not robust.

  • It only read first 2000 lines.
  • use system command (i.e. sed, which may not be available for Windows and may have different behaviors in Unix-like system for there are different implementations of sed, e.g. GNU or BSD implementations).
### something like this in pure R is more favorable.

    con <- readLines(file, n = 2000, warn = FALSE)
    tmp <- which(grepl("#NEXUS", con))
    if (length(tmp) > 1) {
        cat(paste("\nFound #NEXUS headers, n=", length(tmp),
            ".\nDiscard all lines before last entry on line",
            tail(tmp, 1)))
########
        con <- con[tmp[length(tmp)]:length(con)]
  }
  mph <- ape::read.nexus(textConnection(con))
########
  ...

@noamross
Copy link
Contributor

Thanks @GuangchuangYu! @richelbilderbeek, for clarification, Guangchuang wasn't able to do a full review, but posted this contribution anyway - the benefit of open review is that anyone from the community can contribute! I'm still hunting for full reviewers.

@richelbilderbeek
Copy link
Member Author

richelbilderbeek commented Apr 13, 2018

Thanks for your review @GuangchuangYu!

  • Update doc in README
  • Fix rBEAST::beast2out.read.trees

EDIT: I use ape::read.nexus instead of rBEAST::beast2out.read.trees, which is cleaner, simpler, etc.
EDIT: Removed earlier (imperfect) ideas about treeio

BEAST .trees file

From the treeio test file:

#NEXUS

Begin taxa;
	Dimensions ntax=15;
	Taxlabels
		A_1995
    [some more]
		O_1998
		;
End;

Begin trees;
	Translate
		1 A_1995,
    [some more]
		15 O_1998
		;
tree TREE1 = [&R] (((11[&rate_range={0.0018985405694149748,0.005499473294363702},length_range={5.941039739620447,15.103974853365651},length_95%_HPD={7.322099111116467,11.624410805545125},height=0.0,rate=0.0029201381990576446,rate_median=0.00290446666537452,length=9.470517085164637,length_median=9.385096430786298,rate_95%_HPD={0.002327042663307498,0.003545195845584415}]:9.385096430786298,14[&rate_range={0.0016286170228414286,0.004924065374051862},length_range={2.9410397396204466,12.103974853365651},length_95%_HPD={4.322099111116467,8.624410805545125},height=3.0,rate=0.0029046712345038575,rate_median=0.002894040280462541,length=6.47051708516462,length_median=6.385096430786298,rate_95%_HPD={0.0022442766825820585,0.0035385035360908155}]:6.385096430786298)[&height_95%_HPD={7.322099111116469,11.624410805545125},length_range={15.724675549523745,36.72403651916805},length_95%_HPD={22.048771671275617,30.115400624425263},height_range={5.941039739620447,15.10397485336565},rate_95%_HPD={0.00237138786088183,0.0034784158116179195},rate_range={0.0018410123833317064,0.004221113122306114},height_median=9.385096430786298,rate=0.0029074382448820114,height=9.470517085164637,posterior=1.0,rate_median=0.002896909674386843,length=25.719898216070327,length_median=25.579167352568767]:25.43656682196353,4[&rate_range={0.0012952503972904472,0.00607832777082667},height_95%_HPD={26.0,26.0},length_range={3.1994774159685413,18.82787328822321},height_median=26.0,length_95%_HPD={5.308840020390441,13.438764103034949},height=26.0,rate=0.002924273388338787,height_range={25.999999999999993,26.000000000000007},rate_median=0.0029060937027348497,length=9.14131056894944,length_median=8.97486801838459,rate_95%_HPD={0.0021595629511043674,0.0035790878327513283}]:8.821663252749829)[&height_95%_HPD={31.424491578793983,38.72173639220357},length_range={0.006034151816699307,7.710713928814901},length_95%_HPD={0.3064036539985011,5.177672993823904},height_range={29.19947741596854,42.86605037129657},rate_95%_HPD={0.0023114970475423416,0.0035889042843810733},rate_range={0.0013736989022372617,0.0050234426880329435},height_median=34.82166325274983,rate=0.002922564275661852,height=34.892481158094604,posterior=0.941124194623417,rate_median=0.002909586916036731,length=3.0146714387172726,length_median=3.014437920595718]:3.1044254070145954,(12[&rate_range={0.0011894655609126846,0.005063562483853383},height_95%_HPD={33.0,33.0},length_range={0.004867974022467081,2.4165284569556675},height_median=33.0,length_95%_HPD={0.06521578858480126,1.2807198792124836},height=33.0,rate=0.0029035800134507893,height_range={32.99999999999999,33.00000000000001},rate_median=0.00289561150523617,length=0.6159932996275818,length_median=0.5651787519661866,rate_95%_HPD={0.002247982925766173,0.003563006335474016}]:0.5651787519661866,(10[&rate_range={0.0011908072712212856,0.005499473294363702},height_95%_HPD={30.0,30.0},length_range={0.32050714734836205,3.540496218322332},height_median=30.0,length_95%_HPD={0.7237604360518439,2.5793552536628965},height=30.0,rate=0.002904168059219689,height_range={29.999999999999993,30.000000000000007},rate_median=0.0028922772557908624,length=1.6002660228850776,length_median=1.5614319656278823,rate_95%_HPD={0.0022606850856099767,0.0035983221916164447}]:1.5614319656278823,(7[&rate_range={0.001416128514121561,0.004834328103917153},height_95%_HPD={21.0,21.0},length_range={2.933492707218676,8.35046204253208},height_median=21.0,length_95%_HPD={4.027377213294393,6.644084283024775},height=21.0,rate=0.0029126485028421432,height_range={20.999999999999993,21.000000000000007},rate_median=0.0029006798521738174,length=5.212768669347575,length_median=5.193972254430712,rate_95%_HPD={0.002302987481745105,0.003542607833130218}]:5.193972254430712,((((2[&rate_range={0.0017598392811698684,0.005016409554424679},height_95%_HPD={17.0,17.0},length_range={2.0792675983921,5.066305444453668},height_median=17.0,length_95%_HPD={2.5335883290778085,4.140102928571505},height=17.0,rate=0.0029313752237024346,height_range={16.999999999999993,17.000000000000007},rate_median=0.00291116543054281,length=3.301747717349106,length_median=3.282059786897289,rate_95%_HPD={0.0022724788824010993,0.0035412350438672234}]:3.26195447365863,(1[&rate_range={0.0013915163850006223,0.005278699019523131},height_95%_HPD={18.0,18.0},length_range={0.2653240053303314,3.0738477704317866},height_median=18.0,length_95%_HPD={0.6851789188686901,2.103544837145563},height=18.0,rate=0.0029102477038226823,height_range={17.999999999999993,18.000000000000007},rate_median=0.0028943272379052384,length=1.344156943808417,length_median=1.3212658013496608,rate_95%_HPD={0.002231134756681312,0.003564513436078316}]:1.3203366363911968,(6[&rate_range={0.0011894655609126846,0.00607832777082667},height_95%_HPD={16.0,16.0},length_range={0.24483144082356034,1.9701428525089781},height_median=16.0,length_95%_HPD={0.40058088989495744,1.315798331432319},height=16.0,rate=0.0029081788066754573,height_range={15.999999999999993,16.000000000000007},rate_median=0.002894097329244072,length=0.8488313569365579,length_median=0.8311577761397366,rate_95%_HPD={0.0022201346143224635,0.0035242778400975967}]:0.8311577761397366,13[&rate_range={0.0015728359240668728,0.00555337355224827},height_95%_HPD={16.0,16.0},length_range={0.24483144082356034,1.9701428525089781},height_median=16.0,length_95%_HPD={0.40058088989495744,1.315798331432319},height=16.0,rate=0.0029246220910024556,height_range={15.999999999999993,16.000000000000007},rate_median=0.0029069548667521017,length=0.8488313569365579,length_median=0.8311577761397366,rate_95%_HPD={0.002249781933330923,0.0035483940916491096}]:0.8311577761397366)[&height_95%_HPD={16.400580889894957,17.31579833143232},length_range={1.2231773219784436,4.032958219353446},length_95%_HPD={1.7792296041973366,3.2608835731005925},height_range={16.24483144082356,17.970142852508978},rate_95%_HPD={0.00227965222299582,0.003576307560149318},rate_range={0.0017946022132817136,0.005146224881072777},height_median=16.831157776139737,rate=0.0029551874634580187,height=16.848831356936557,posterior=1.0,rate_median=0.002925361229387638,length=2.4956333051536737,length_median=2.4780659228870086]:2.48917886025146)[&height_95%_HPD={18.63819059801841,20.05197466991173},length_range={0.1205288444743644,2.5979500096199004},length_95%_HPD={0.35697990621313025,1.6401968202036592},height_range={18.26532400533033,21.073847770431787},rate_95%_HPD={0.002256986324754042,0.0035653117333142074},rate_range={0.001713585012610335,0.005024147132398496},height_median=19.320336636391197,rate=0.0029094346531172914,height=19.34277851261509,posterior=0.9973339257942679,rate_median=0.002897624341643719,length=0.9702651579685516,length_median=0.9375831548872569]:0.9416178372674331)[&height_95%_HPD={19.521269584334302,21.088314837098206},length_range={0.012495731157944334,1.9646785292453188},length_95%_HPD={0.03574255355387024,1.0118409713578593},height_range={19.10808638959738,22.06630544445367},rate_95%_HPD={0.0022654873814480724,0.0035804161698284485},rate_range={0.001346350067772623,0.0050346787742291815},height_median=20.26195447365863,rate=0.0028946330260915033,height=20.280410550993075,posterior=0.9264607864918907,rate_median=0.002892318731009283,length=0.4968851336260279,length_median=0.451701891919436]:0.495333557442585,9[&rate_range={0.0011908072712212856,0.006158974088921042},height_95%_HPD={19.0,19.0},length_range={0.6277151355336628,4.047137030209772},height_median=19.0,length_95%_HPD={0.9782142552884494,2.5615372871148914},height=19.0,rate=0.002896673919570971,height_range={18.999999999999993,19.000000000000007},rate_median=0.0028823021762461264,length=1.7580745767926749,length_median=1.7398766578776765,rate_95%_HPD={0.0022237504040379997,0.00352292455340037}]:1.757288031101215)[&height_95%_HPD={20.0037706982471,21.559620473137144},length_range={1.006123470821656,4.693968745411883},length_95%_HPD={1.5142486048284063,3.3218072412753052},height_range={19.724429025801697,23.047137030209772},rate_95%_HPD={0.0022442766825820585,0.003507068659905805},rate_range={0.001432002889201383,0.004754358664737898},height_median=20.757288031101215,rate=0.002893996035162651,height=20.77855703124838,posterior=1.0,rate_median=0.0028879954727052365,length=2.4096117187521866,length_median=2.382702944731438]:2.3552865076271097,8[&rate_range={0.001432002889201383,0.005444269121522517},height_95%_HPD={21.0,21.0},length_range={0.642036036245635,5.621609741456432},height_median=21.0,length_95%_HPD={1.2523223624936506,3.1348224624464933},height=21.0,rate=0.002926318733025422,height_range={20.999999999999993,21.000000000000007},rate_median=0.0028992668335772655,length=2.1905515431219182,length_median=2.1612447399353485,rate_95%_HPD={0.0022026213674528677,0.0035573238126482454}]:2.1125745387283246)[&height_95%_HPD={22.238115785711244,24.058900846198604},length_range={7.752388515314124E-5,1.6789395568599588},length_95%_HPD={7.752388515314124E-5,0.7048357330995856},height_range={21.642036036245635,25.03409696956285},rate_95%_HPD={0.002252593209641964,0.0035607317234495795},rate_range={0.0015361649779566762,0.0049998262194013016},height_median=23.112574538728325,rate=0.002889014554875568,height=23.141147933114226,posterior=0.5754276827371695,rate_median=0.002878652352213215,length=0.27003604715973273,length_median=0.21598705558424847]:0.2367141542428186,(3[&rate_range={0.0013736989022372617,0.004912336326819793},height_95%_HPD={18.0,18.0},length_range={1.7197570169468612,6.508160170660414},height_median=18.0,length_95%_HPD={2.273070223646858,4.315544383431487},height=18.0,rate=0.002897274065826662,height_range={17.999999999999993,18.000000000000007},rate_median=0.002891623815438905,length=3.328657272519285,length_median=3.3119924392930997,rate_95%_HPD={0.002274531438568342,0.0035191061760105505}]:3.3119924392930997,(15[&rate_range={0.0017951712924853507,0.0046540212963077626},height_95%_HPD={15.0,15.0},length_range={3.6473240769007305,7.672938989003395},height_median=15.0,length_95%_HPD={4.221065409566133,6.332542698012862},height=15.0,rate=0.0029232886699236413,height_range={14.999999999999993,15.000000000000007},rate_median=0.002910300207749866,length=5.286264802419297,length_median=5.271310736897195,rate_95%_HPD={0.0022864023115947963,0.003529043819333591}]:5.2710481368304585,5[&rate_range={0.001416128514121561,0.005579851345674038},height_95%_HPD={17.0,17.0},length_range={1.6473240769007305,5.672938989003395},height_median=17.0,length_95%_HPD={2.2210654095661333,4.332542698012862},height=17.0,rate=0.0029307412825096777,height_range={16.999999999999993,17.000000000000007},rate_median=0.0029163497249873977,length=3.286215610322119,length_median=3.2713107368971954,rate_95%_HPD={0.002288661916548849,0.0035608320911959426}]:3.2710481368304585)[&height_95%_HPD={19.221065409566133,21.332542698012862},length_range={0.104529280272736,3.3149321561154217},length_95%_HPD={0.2764436861184514,1.7990230654952768},height_range={18.64732407690073,22.672938989003395},rate_95%_HPD={0.002277702565277827,0.0035595014163803595},rate_range={0.0013239531631633217,0.005884988518816431},height_median=20.27104813683046,rate=0.0028979861379948182,height=20.28617159766615,posterior=0.9997778271495223,rate_median=0.002888770492468453,length=1.0426733158998849,length_median=1.0067614447779913]:1.0409443024626412)[&height_95%_HPD={20.273070223646858,22.315544383431487},length_range={0.7030455347794238,4.044228639469452},length_95%_HPD={1.0570831434749905,2.896763220383612},height_range={19.71975701694686,24.508160170660414},rate_95%_HPD={0.002292056302021302,0.0035588170121507103},rate_range={0.0013424278954501671,0.0049998262194013016},height_median=21.3119924392931,rate=0.002888936557203454,height=21.32870646461646,posterior=1.0,rate_median=0.002881047504512494,length=1.9785573920978363,length_median=1.9454816826417627]:2.0372962536780435)[&height_95%_HPD={22.446849872897925,24.343871241291684},length_range={1.0699847447024595,5.8252599198970145},length_95%_HPD={1.6547568623171074,3.93005947799395},height_range={21.918722044938995,26.64317513574968},rate_95%_HPD={0.002264444816964439,0.0035248007690664588},rate_range={0.0015361649779566762,0.005585782485453258},height_median=23.349288692971143,rate=0.002898021617845291,height=23.382228870153373,posterior=1.0,rate_median=0.002885304679275195,length=2.8305397991941903,length_median=2.8138956389726033]:2.8446835614595685)[&height_95%_HPD={25.027377213294393,27.644084283024775},length_range={2.907180781056212,7.895871987833431},length_95%_HPD={3.9687063175210184,6.661889769976522},height_range={23.933492707218676,29.35046204253208},rate_95%_HPD={0.0022839388726537246,0.0034425830819287857},rate_range={0.0016321351405092822,0.004895096656896942},height_median=26.19397225443071,rate=0.0028948500744252928,height=26.21276866934752,posterior=1.0,rate_median=0.002882853706734117,length=5.3874973535374995,length_median=5.391141840619291]:5.367459711197171)[&height_95%_HPD={30.723760436051844,32.5793552536629},length_range={0.46640752949194564,3.946713637842631},length_95%_HPD={1.125494195892351,2.925649750391468},height_range={30.320507147348362,33.54049621832233},rate_95%_HPD={0.002258427762120399,0.0034988070422015296},rate_range={0.001346350067772623,0.004809967053355128},height_median=31.561431965627882,rate=0.002887185398498846,height=31.600266022885073,posterior=1.0,rate_median=0.0028830207420000985,length=2.0157272767425005,length_median=2.0105096016512505]:2.0037467863383043)[&height_95%_HPD={33.0652157885848,34.280719879212484},length_range={0.4234374999534438,11.003812116300786},length_95%_HPD={1.879699798035304,6.667501697042617},height_range={33.00486797402247,35.41652845695567},rate_95%_HPD={0.002277911029297807,0.0035746687789887446},rate_range={0.0011908072712212856,0.00638839390906598},height_median=33.56517875196619,rate=0.00290988119793417,height=33.61599329962763,posterior=1.0,rate_median=0.0028978880417879216,length=4.351763425728975,length_median=4.284004563912951]:4.360909907798238)[&height_95%_HPD={35.30577531964485,40.78227312901716},height_median=37.926088659764424,height=38.038329604461964,posterior=1.0,height_range={33.96543055073909,46.856513344806125},length=0.0];
End;

BEAST2 .trees file

From here:

#NEXUS

Begin taxa;
	Dimensions ntax=5;
		Taxlabels
			t1 
      [some more]
			t2
			;
End;
Begin trees;
	Translate
		   1 t1,
       [some more]
		   5 t2
;
tree STATE_0 = ((1:0.4796940031825733,4:0.4796940031825733):0.15868524443147886,(2:0.47305110906182046,(3:0.43063449002200127,5:0.43063449002200127):0.04241661903981919):0.16532813855223172):0.0;
[some more]
tree STATE_10000 = (1:0.7693617408342179,(2:0.6138287429830985,((3:0.11136214949831788,4:0.11136214949831788):0.22798021162858859,5:0.33934236112690647):0.27448638185619206):0.1555329978511194):0.0;
End;

richelbilderbeek pushed a commit to ropensci/tracerer that referenced this issue Apr 13, 2018
richelbilderbeek pushed a commit to ropensci/tracerer that referenced this issue Apr 13, 2018
@richelbilderbeek
Copy link
Member Author

Added AppVeyor continuous integration, used rappdir for using the correct folders and got everything to work under both Linux and Windows 👍

@noamross
Copy link
Contributor

First reviewer: @dwinter
Due date: 2018-05-18

@noamross
Copy link
Contributor

Second reviewer: @bjoelle
Due date: 2018-05-18

@noamross
Copy link
Contributor

It took a bit to get reviewers, @richelbilderbeek, but the search made it clear that there is a lot of interest in these packages! Like Guangchuang, there were several folks who had short comments but were not able to do a full review. One comment that came up is that one might want to use the xml2 package to generate and validate XML inputs rather than manipulating strings. I suggest that the reviewers consider this and determine whether they agree.

@richelbilderbeek
Copy link
Member Author

The reason of not using xml2 is in the history of the package. It started as a hackerish thing, not intended for re-use. After a while, when babette was in her final phase, I have tried out xml2 for a short while. But I could not convince myself to redo the architecture of the back-end, when everything already worked with 100% code coverage. If reviewers insist, I do have a reason to redo the back-end, but I do not see additional added value (yet?).

@richelbilderbeek
Copy link
Member Author

There are some requests by reviewers of the manuscript of babette [1] for renaming the function babette::run to babette::bbt_run, as the latter name is less likely to conflict with other packages. I agree, but would like permission to do so while under code review. As the deadline for the manuscript resubmission is already next Monday, I cannot wait until the end of the code review.

@noamross : do I have permission to change this function's name while under code review? Of course, all usages of it will also be updated and tested.

References

@noamross
Copy link
Contributor

noamross commented May 3, 2018

@richelbilderbeek That's fine, and is in line with our own guidelines.

@bjoelle
Copy link

bjoelle commented May 17, 2018

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 25-30


Review Comments

This package and its dependencies beautier, beastier and tracerer provide a R interface for setting up a BEAST2 analysis, running it and processing the results. I think this interface is a good alternative to the existing graphical interfaces, especially BEAUti which requires a lot of manipulations to set up even a simple analysis. In particular these packages will be very useful for people who want to integrate BEAST2 in an existing R pipeline.

Overall I think the packages are well done, although the documentation is lacking in some places. All the packages are missing package documentation (required by rOpenSci guidelines).
Detailed comments are below, sorted by the package to which they apply. Please also check devtools::spell_check, which shows some typos in documentation in all packages.

Regarding the suggestion about using xml2 instead of strings, the only real advantage I see to it is making it possible to load BEAST2 configuration files to modify or add an element instead of having to rebuild the model in babette, which might help/encourage people to transition to using babette. I am happy to leave the decision to the author on that.

Installation

The installation worked fine on Mac and Windows.

babette

  • this package needs a clearer statement of need (as per rOpenSci guidelines): some of the text from the onboarding issue and the description would fit nicely.

  • bbt_run: the @return documentation is missing.

  • bbt_run: The default behaviour is to place the output in temporary files which are deleted at the end of the run. I would add a warning about that in the doc or in the code, as I don't think this would be expected by most users.

  • bbt_run: the doc should mention that rng_seed can be NA and what happens in that case (i.e the seed will be set to random).

  • bbt_run.R: typos l119 "Must have as many FASTA filenames as BEAST2 output trees filenames"

beautier

Structure
  • The structure is my main concern with this package, as it contains a lot of code duplication and many functions need to run through all possible types of a certain object to figure out what to do with it. I think this makes the package hard to maintain and to extend to additional models.
    The main issue in my opinion is that the objects (site models, clock models, etc) do not store their dependencies in a generic way, so that there is no way to know what objects a model depends on unless you know which specific model it is. My suggestion would be to add a list of components to all objects - for instance a TN93 site model would have components = c("kappa_1", "kappa_2"), a HKY would have components = "kappa" and so on. This would allow you to save a lot of code in functions like init_site_models, which could do things such as for(x in model$components) model[[paste0(x,"_param")]] = init_param(model[[paste0(x,"_param")]]) instead of needing one subfunction per type of model.
    Separating the different parameters (alpha, beta, gamma, etc) also leads to a lot of code duplication. It is useful for creating parameters as their default priors are different, but I would remove it for the other functions: the many functions is_%sth%_param(x) can be replaced by one function is_named_param(x, name) and the parameter_to_xml functions also have a lot of redundancy.

  • I found it confusing that the code relative to the interaction of MRCA priors and clock models is in a different place for strict and relaxed clocks, and it seems to have led to different behaviour between the 2 clocks (see comments in Functionality section). I would recommend putting everything in the mrca_prior_to_xml functions.

  • get_id, get_ids: These functions appear to be duplicated by get_alignment_id and get_alignment_ids.

  • I think this package contains too many files, many of which contain only one small function. Gathering functions which are strongly linked to each other (for instance, are_site_models, is_site_model, is_site_model_name and get_site_model_names) in one file would make the package easier to read and search in my opinion.

Functionality
  • the relaxed clock model doesn't work with MRCA priors that have a time calibration distribution: the state, operators, distribution and tracelog sections are all missing elements compared to an XML produced with BEAUti. This issue appears to be linked to clock_models_to_xml_state l24-25: the comment is unclear as to whether it is supposed to be checking for any or no MRCA priors, and no check is performed here for either.

  • mrca_prior_to_xml_tracelog, mrca_prior_to_xml_operators, mrca_prior_to_xml_prior_distr, mrca_prior_to_xml_state: these functions only add the parts relative to the strict clock if the MRCA prior is monophyletic, whereas in BEAUTi these parts are added if the MRCA prior has a distribution associated with it, whether it is monophyletic or not.

  • the distribution hyper parameters (alpha, beta, mean etc - all except lambda) can be set with estimate = TRUE, but the xml will not be created correctly with those settings (missing distribution, operators and state sections). I think the code should not allow this setting until this functionality is fully implemented.

  • creating a TN93 model with the default settings creates two log-normal distributions as priors for kappa1 and kappa2 where the S parameter has lower=0, upper=0 and value=1.25. This works if S is not estimated, because BEAST2 only checks upper and lower when modifying the value (not when initialising it) but it will break if S is estimated.

  • get_phylo_crown_age will not work with sampling through time/non-ultrametric trees. I think these are not supported at the moment anyway, but it might be good to keep it in mind.

  • is_index_of_first_shared_clock_model: this function will return TRUE if the tested model is the 2nd of 3 shared clock models, in contradiction with the doc. It will also always return TRUE if the clock model at index i has id=NA.

  • is_distr: this function just checks if the input has a name attribute, which doesn't match with the name or documentation of the function

  • get_site_model_n_params and get_site_model_n_distrs do not check if the site model has gamma rate categories, they just assume it does

Documentation
  • kappa_1, kappa_2 and the tree prior parameters are always estimated and this cannot be changed - I think this should be mentioned in the corresponding doc.
  • the doc should make it clearer that the function used to build the branchRateModel section is clock_model_to_xml_lh_distr if there are no MRCA priors, and mrca_prior_to_xml_lh_distr if there are MRCA priors, and that mrca_prior_to_xml_lh_distr only supports strict clocks at the moment
  • fasta_file_to_sequences: both the documentation and the input variable (fasta_filenames) wrongly indicate that this function can take multiple files
  • get_id, get_ids: I don't understand what "conclude the ID" means in this doc. I would replace that with "extract the file names without extensions" or something similar.
Minor comments
  • all the functions clock_models_to_xml start by calling get_unlinked_clock_models. This call could be placed earlier (create_beast2_input_run is my suggestion) which would save the repetition.
  • there is an empty file to_do.R in the package
  • fastas_to_phylos: the main description and @return say this creates a random phylogeny instead of one or more depending on the input
  • is_index_of_non_first_shared_clock_model, create_beast2_input_init: wrong doc
  • get_site_model_n_params: the doc refers to number of distributions instead of parameters.
  • is_in_patterns: mismatched doc ("Create a random beta distribution")
  • remove_first_brm_clock_rate: missing doc
  • remove_multiline: consequetive -> consecutive
  • clock_model_to_xml_tracelog: the seealso links to the same function
  • has_xml_closing_tag: opening -> closing
  • init_gamma_site_model: the doc refers to "all site models" but the function takes only one as input
  • mrca_prior_to_xml_taxonset, mrca_priors_to_xml_prior_distr: the top description of these functions is confusing
  • has_xml_short_closing_tag: at the end of the text -> in one of the lines of text
  • create_beast2_input_run: state section -> run section

beastier

  • run_beast2: I think that this function should give a warning before erasing beast_log_filename and beast_trees_filenames if they exist, because I wouldn't have expected those files to be touched at all if I have specified different values for output_log_filename and output_trees_filenames. A better solution would be to move/rename these files temporarily if they already exist, and restore them once the run is done.

  • run_beast2, create_beast2_run_cmd: the -overwrite option for BEAST2 is not for overwriting the state file (it will always be overwritten), it is for overwriting the log & trees files. Thus at the moment setting overwrite_state_file does nothing (it's also always set back to TRUE at l107 in run_beast2).

  • run_beast2, create_beast2_run_cmd: the doc should mention that rng_seed can be NA and what happens in that case (i.e the seed will be set to random).

  • gives_beast2_warning: This function has a lot of code in common with is_beast2_input_file, so I would suggest putting the common code in a separate function that would be called by both.

  • has_unique_ids: the doc of this function implies that it checks both parameter and distribution ids, but get_duplicate_ids only checks real parameters

  • is_beast2_input_file: l38-40 should be removed - they don't do anything and status_code is 0 even if the file is invalid

  • create_posterior: it's unclear what this function is meant to be used for, as it's not exported or called from anywhere else in the package

  • gives_beast2_warning: this doc should link to is_beast2_input_file rather than are_beast2_input_lines since the input is a file in both.

tracerer

  • calc_stderr_mean: this function asks for the sample_interval as input but doesn't seem to use it. To simplify the code I would also suggest using this formula instead of the C code: stderr_mean = stats::sd(trace)/sqrt(length(trace)).
  • parse_beast_output_files, parse_beast_posterior: these functions need more detailed @return documentation
  • calc_summary_stats.R: the doc in this entire file refers to calculating ESS but this function calculates other things too: a list of all the stats calculated should be included in the @return doc. The @param traces doc for calc_summary_stats is also confusing ("a numeric vector or data frame or numeric vector")

@noamross
Copy link
Contributor

Thank you for the thorough review, @bjoelle! FYI @richelbilderbeek, the second reviewer requested an additional week to finish, so expect the next review next week.

@richelbilderbeek
Copy link
Member Author

I am impressed by @bjoelle's review, thanks so much! I will happily implement that feedback. Sure, I will wait after the other reviewer is finished, to do so in one go.

@richelbilderbeek
Copy link
Member Author

@noamross how is the state of the second review (as it is overdue)? I don't want to rush our beloved volunteers; I just want to know for sure nothing ended up in a state of limbo.

@dwinter
Copy link
Member

dwinter commented May 30, 2018

Hi all, the outstanding review is mine and it is definitely on it's way. Apologies for the delay, I will try to get something by the end of the week.

@richelbilderbeek
Copy link
Member Author

All beastier feedback has been addressed. Only beautier (the biggest!) to go... 🌈

@richelbilderbeek
Copy link
Member Author

Worked on this September 14th... just to let you know that there is progress...

@richelbilderbeek
Copy link
Member Author

All feedback has been addressed, last (and biggest) part was about beautier. Rounding things off now, then will let know.

@richelbilderbeek
Copy link
Member Author

@noamross: I have processed all feedback! Blimey, the feedback from @dwinter and @bjoelle heavily improved the packages. Sure, I've put both of them in the DESCRIPTION file of all packages.

So, where do we go from here? 👍

@noamross
Copy link
Contributor

noamross commented Nov 7, 2018

@richelbilderbeek I've followed up with both reviewers and they are reviewing your changes.

@bjoelle
Copy link

bjoelle commented Nov 13, 2018

@richelbilderbeek @noamross I have reviewed the changes. One additional thing I would suggest would be to include the overwrite parameter in babette::bbt_run (it's only present in beastier::run_beast2 at present), and to set it to FALSE by default (as it is in BEAST2).
Otherwise I am happy with the state of the package and would recommend its approval.

@dwinter
Copy link
Member

dwinter commented Nov 29, 2018

@richelbilderbeek @noamross ,

I have a chance now to check out the changes to all these packages. I want to say @richelbilderbeek should be commended for the work he has done to take our comments on board and improve the packages. I particularly like the way he has used the issue tracker to explain the rationale for changes that don't' exactly match the reviewer's suggestions.

I think this is an excellent suite of packages that will support reproducible approaches in an important field, so I'm happy to recommend it be approved.

@noamross
Copy link
Contributor

noamross commented Nov 29, 2018

Approved! Thanks @bjoelle and @dwinter for your thorough reviews and follow-ups, and @richelbilderbeek for your diligent work. It's been a long road and a lot of code to go through and I'm glad to have this suite as part of rOpenSci.

To-dos:

  • Wrap up Add overwrite parameter to bbt_run babette#38

  • Transfer the repos to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.

  • Add the rOpenSci badge to top and footer to the bottom of your READMEs
    " [![Peer Review Status](https://badges.ropensci.org/209_status.svg)](https://github.com/ropensci/onboarding/issues/209)"
    " [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)"

  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname).

  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd also love a blog post about your packages, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. (https://ropensci.org/blog/). If you are interested, @stefaniebutland will be in touch about content and timing.

We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

@richelbilderbeek
Copy link
Member Author

richelbilderbeek commented Dec 3, 2018

Hi @noamross,

I am in a dilemma here, as I want to remove some half-baked functionality. This will change the interface a lot:

bbt_run <- function(
  fasta_filename, # Was 'fasta_filenames' (plural)
  tipdates_filename = NA, # Added one week ago
  site_model, # Was 'site_models' (plural)
  clock_model, # Was 'clock_models' (plural)
  tree_prior, # Was 'tree_priors' (plural) 
  # posterior_crown_age = NA, # Remove
  # rest as-is
)

If rOpenSci is flexible in allowing me to change the interface, I will first round my to-dos. After that, I will modify the interface as above, before I submit the package to CRAN.

Is that the preferred way to go? If no, what is?

@noamross
Copy link
Contributor

noamross commented Dec 3, 2018

These changes are fine, and I suggest you make the changes, transfer, set the version, then submit to CRAN with the new URLs in DESCRIPTION, so that the accepted version is the same as what is on CRAN. (We are started to roll out versioned peer-review badges so that it is clear what the version was at acceptance.) You are free to continue to develop and make both small and big changes after acceptance, we have a section of our dev guide dedicated to ongoing changes and maintenance: https://ropensci.github.io/dev_guide/evolution.html

@stefaniebutland
Copy link
Member

Hello @richelbilderbeek and thank you for all your work on having babette go through peer review.

We'd also love a blog post about your packages, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. (https://ropensci.org/blog/).

We would love to help get more eyes on your work. This link - https://ropensci.org/tags/software-peer-review/ - will give you examples of blog posts by authors of peer-reviewed packages so you can get an idea of the style and length you prefer. Technotes are here: https://ropensci.org/technotes/.

Here are some technical and editorial guidelines for contributing a blog post: https://github.com/ropensci/roweb2#contributing-a-blog-post.

If you are interested in contributing a blog post, please let me know what timing might work for you. Happy to answer any questions.

@richelbilderbeek
Copy link
Member Author

@stefaniebutland: I will happily create a blog post and am already working on it...

@stefaniebutland
Copy link
Member

@richelbilderbeek Do you have an idea of when you might submit a draft for review? Then we can choose a potential publication date.

@richelbilderbeek
Copy link
Member Author

@stefaniebutland: I can submit a draft before the Friday 25th of January, 19:00 (Amsterdam time zone, UTC +1). Would that work out?

@stefaniebutland
Copy link
Member

Sounds great! I'll mark my calendar and we can confirm a publication date at that time. Thank you.

@richelbilderbeek
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants