-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
augur treetime is too complex #133
Comments
I strongly agree with what you've proposed @trvrb. Unlike other augur commands, the treetime command is modal and bundles several things into one. It'd be good to unbundle those. |
Oh... and I thought of two other things that bothered me here:
|
that was the initial idea, but that doesn't work since rerooting and polytomy resolution changes topology/adds/removes nodes.
yes and no. timetree inference via treetime will in most cases do ancestral inference. One could do it twice, but then it would be crucial to do it after timetree inference.
initially node_data.json was the only one. agree that the name doesn't make much sense anymore. but we also don't want a separate file for each individual attribute. if you call it dates, then branch lengths should be somewhere else and we get a massive proliferation of files.
but this only works if we basically hide all method specific commands from the user. Otherwise, we get a big confusion with the hundreds of options that these different tools take. In this spirit, the current treetime module is just a wrapper around treetime that happens to produce output compatible with augur. |
I agree with @rneher . Also, moving I think given what treetime does it's acceptable that it has a lot of arguments, but I'd agree that they're not so straightforward at the moment and could be made clearer, possibly by renaming some, better help info, and mostly, good documentation somewhere. I also agree that timetree/treetime is confusing. I mess this up on the regular 🙃 |
I think this is a false dilemma. It's possible to expose only the most common method-specific options, if any, and also provide a general mechanism for passing any arbitrary option into specific methods without enumerating all of them. There are several approaches we could take. Using
Again, there are other ways to approach the syntax here, but the general idea is the same.
On what order of magnitude would "massive" be? |
the number of files is ultimately not the issue (would be probably just 2-3 more). The more central issue is how to address nodes across steps. |
Nod. It didn't seem like the number of files would actually be an issue, that's why I asked. For the core issue of internal nodes, it seems that node annotations must happen after the topology is fixed, whichever step that may be. Is that correct? |
what we could do are simple consistency checks on load:
|
Sounds reasonable, @rneher, except for one bit:
Overwriting an input file in-place is a huge taboo. Even with an all caps warning, it could easily be a nasty surprise. I strongly recommend not doing this, but finding another way of allowing on-the-fly node naming if a tree without them is provided. This could be, for example, But also, |
I agree. (but hey, gzip does it...)
yes, I was thinking something similar.
it sort of strikes me as clunky to have such an book-keeping step explicit. |
re: creating an
For what it's worth, I like the requirement for "new" layers to simply write a flat JSON with the |
Okay. Latest thoughts... It's clear that we need to support at least 3 big use cases.
The current setup is actually not so bad for these three aims.
I think the core operations could be made more obvious by a simple rename of In one example... there was the issue of wanting to add traits to an ML tree (option 2 above). In this case, I'd recommend to enforce I know this is basically exactly as things currently exist. Would just work on the exact interface to |
an argument in favor of pulling sequences out of this would align the file I/O pattern between the vcf/fasta workflows. I am fine with splitting sequence reconstruction from rerooting/timetree if this makes is more transparent. the computational overhead is small. |
392c17c checks whether names in |
I hadn't noticed that VCF version of Looking at Edit: I see... The difference is that |
I agree this needs streamlining. |
I'm going to close this now given that #145 is merged. There were some issues surfaced above (like passing arguments to |
@rneher ---
I'm of the opinion that the
augur treetime
module is overly complex. If you just look at the arguments we see 28 lines and significantly more complexity than the other modules. Much of this complexity seems necessary, but I would try to tame things a bit.I had initially been reading the command as
augur timetree
and expecting it to do the one Unix-y task of taking a ML subs tree and outputting a timetree. I would then haveaugur ancestral
to do ancestral state inference. This streamlines both modules and surfaces the hidden ancestral state functionality.Right now, after
augur treetime
is run there will benode_data.json
with things likesequence
,mutations
,clock_length
andmutation_length
. I like the pattern whereaugur export
stitches together multiple JSON files keyed off ofNODE_0000071
, etc...node_data.json
is opaque, but ifaugur timetree
producednode_dates.json
andaugur ancestral
producednode_nt_seqs.json
it would be immediately clear what sort of data it is we're talking about. This would also make it clearer that to runaugur translate
you neednode_nt_seqs.json
as input, which would producenode_aa_seqs.json
as output.Also, there is an issue with BEAST tree input. It should be obvious how to proceed from BEAST Newick. I don't think having one function like this is obvious, especially when almost everyone will think that treetime is just to get a clock (the name doesn't suggest at all that it can do ancestral state reconstruction).
Along these lines, what about having
augur tree
label nodes in the output Newick asNODE_0000071
, etc... to prevent any issues downstream? Thenaugur ancestral
could be run without having to run treetime first to get labels.Also, checking with @emmahodcroft, @huddlej, @jameshadfield and @tsibley to see how they feel about
augur treetime
vsaugur timetree
/augur ancestral
The text was updated successfully, but these errors were encountered: