Feature request: tip dates functionality #27

richelbilderbeek · 2018-10-15T04:21:53Z

Great resource! Do you have plans for a function to import tip dates?
Thank you!

richelbilderbeek · 2018-10-15T04:29:50Z

If I have a user with a use case: definitely! Could I use you for this?

If yes, I'll take a peek how to implement this this Friday, October 19th 2018.

If it is easy, I may add it that day.If it is too hard, I will give priority to getting babette accepted by rOpenSci and then to be put on CRAN.

richelbilderbeek · 2018-10-20T13:30:25Z

No use case, so worked on getting babette accepted by rOpenSci.

If someone volunteers for a use case, let me know.

richelbilderbeek · 2018-10-26T09:51:19Z

Peter Durr has volunteered to help. 🎉

richelbilderbeek · 2018-11-11T06:18:27Z

Email from Peter Durr and example files:

[...] I appreciate that you probably only wanted some example files.
But when I started looking a the problem, I then realized that that it was quite complex, due to the challenges of getting the date file working.

anyway, attached are three files which will give you - I trust - a good example on which to base the tip dating function within your Beautier library:

a fasta alignment file of 58 sequences of an important virus that causes epidemics in chickens ("Newcastle disease virus"): G_VII_pre2003_msa.fasta
A tab separated list of the fasta headers in the alignment file plus the the year of isolation of the virus: G_VII_pre2003_dates_4.txt
a XML file generated from BEAUti using the above two files - to check the data was OK:

The challenge I found was that creating the date file for BEAUti is very crude.

For this to be able to be uploaded and build the height - the number of decimal years before the most recent common sample - requires that the user upload a file with two columns/fields:

the name field which must be exactly identical to the sequence header file.
the date field must follow the name field with a TAB

This very restrictive nature of the permissive upload file means that it often fails - with no error message of why it failed! This is especially a problem with the requirement for tab-separation, as BEAUti does not accept a simple TSV export from Excel. Instead I needed to run it through various steps to get it to work - thus the file has a "4" in its' name!

In practice, because uploading a separate date file is so hard, all of the tutorials on producing a time-tree in BEAST I have seen use the tip-dating tool which extracts the date from the fasta header.

This does has the advantage that there will always be the correct order of the fasta sequence file and the date file, which is a potential problem if the two files are uploaded separately. However, this then puts the effort back into producing a complex header - with all the risk of introducing error manipulating the concatenation.

I am also guessing that implementing this complex interface using R functions will be a lot of work for you, as well as needing a complex R function with lots of arguments.

So thinking it through, I would like to recommend the following for babette/beautier:

The input date file:

The preferred (maybe only) way beautier accepts date input will be by a separate file upload - as this avoids the need to implement tip-date parsing tool
The date format to be restricted to dd/M/yyyy, M/yyyy or yyyy
The date upload file must contain two comma-separated columns: the sequence ID and the date.
The sequence ID must be contained in the fasta file header, but the sequence ID (in the date file) does not have to equate to the header

This I think will make the preparation of the date file very easy, but more importantly it would allow for some validation at import/parsing:

the number of dates in the date file correspond to the number of sequences in the fasta file. Validation by counting the number of records in each file. Error message example: "number of sequences: 58; number of tip dates: 61"
all of the dates follow one (and only one) of the three allowable formats. Validation: each of the entries in the second column is checked against the three formats to confirm a permissible format has been entered. Error message example: "Two date formats detected - only one date format is allowed"
each ID in the date file can be matched with the corresponding fasta sequence header. Validation: the date ID is used to query the fasta header and to confirm it is present within it. Error message example: "The following date IDs could not be matched to a sequence: ........."

To make the above practical, I have attached as the fourth file a CSV date file exported from Excel containing just the Genbank accession ID and the year (date).

[...]

richelbilderbeek · 2018-11-11T07:51:05Z

This is very helpful!

I will add an argument called 'tip_dates' that requires a data frame. Let the parsing be done by the caller 🌈

[edit: will follow Peter's idea to use a filename instead]

…gress ropensci/babette#27

richelbilderbeek · 2018-11-12T09:19:44Z

Came halfway, will finish at 16th (p = 25%), 23rd (p = 50%) or 30th (p = 99%) November.

… dating, progress ropensci/babette#27

…tte#27

…sci/babette#27

…babette#27

richelbilderbeek · 2018-11-30T14:33:16Z

Done. Not tested to the bone, but I was able to reproduce the file supplied by Peter.

richelbilderbeek mentioned this issue Oct 15, 2018

Tip dates functionality? ropensci/beastier#18

Closed

richelbilderbeek changed the title ~~Feature request: Tip dates functionality~~ Feature request: tip dates functionality Nov 11, 2018

richelbilderbeek pushed a commit to ropensci/beautier that referenced this issue Nov 11, 2018

Expose ropensci/babette#27

1f97d68

richelbilderbeek pushed a commit to ropensci/beautier that referenced this issue Nov 11, 2018

'create_beast2_input' has an extra argument: 'tipdates_filename', pro…

74ed642

…gress ropensci/babette#27

richelbilderbeek pushed a commit to ropensci/beautier that referenced this issue Nov 23, 2018

'state' section creates a statenode for the clock rate when using tip…

49a23b0

… dating, progress ropensci/babette#27

richelbilderbeek pushed a commit to ropensci/beautier that referenced this issue Nov 23, 2018

'state' section correct for tip-dating, progress ropensci/babette#27

2484e04

richelbilderbeek pushed a commit to ropensci/beautier that referenced this issue Nov 30, 2018

Adding tip dating, progress ropensci/babette#27

374c04c

richelbilderbeek pushed a commit to ropensci/beautier that referenced this issue Nov 30, 2018

correct 'distribution' section for tip dating, progress ropensci/babe…

0601f91

…tte#27

richelbilderbeek pushed a commit to ropensci/beautier that referenced this issue Nov 30, 2018

Correct 'operators' XML section when using tip-dating, progress ropen…

a4d8db8

…sci/babette#27

richelbilderbeek pushed a commit to ropensci/beautier that referenced this issue Nov 30, 2018

Reproduce Peter Durr's tip dating file, progress ropensci/babette#27

3d23e5b

richelbilderbeek pushed a commit to ropensci/beautier that referenced this issue Nov 30, 2018

Reproduce Peter Durr's tip dating file completely, progress ropensci/…

f1220b2

…babette#27

richelbilderbeek closed this as completed Nov 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: tip dates functionality #27

Feature request: tip dates functionality #27

richelbilderbeek commented Oct 15, 2018

richelbilderbeek commented Oct 15, 2018

richelbilderbeek commented Oct 20, 2018

richelbilderbeek commented Oct 26, 2018

richelbilderbeek commented Nov 11, 2018

richelbilderbeek commented Nov 11, 2018 •

edited

Loading

richelbilderbeek commented Nov 12, 2018

richelbilderbeek commented Nov 30, 2018

Feature request: tip dates functionality #27

Feature request: tip dates functionality #27

Comments

richelbilderbeek commented Oct 15, 2018

richelbilderbeek commented Oct 15, 2018

richelbilderbeek commented Oct 20, 2018

richelbilderbeek commented Oct 26, 2018

richelbilderbeek commented Nov 11, 2018

richelbilderbeek commented Nov 11, 2018 • edited Loading

richelbilderbeek commented Nov 12, 2018

richelbilderbeek commented Nov 30, 2018

richelbilderbeek commented Nov 11, 2018 •

edited

Loading