Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement composable auspice config file for augur export v2 #298

Open
jameshadfield opened this issue Jun 18, 2019 · 15 comments
Open

implement composable auspice config file for augur export v2 #298

jameshadfield opened this issue Jun 18, 2019 · 15 comments
Assignees
Labels
v6 Issues relating to augur v6 / auspice v2

Comments

@jameshadfield
Copy link
Member

Please read nextstrain/seasonal-flu#5 for background information.

We need to decide on the config file structure, which can then be built into augur export after #297 has been closed.

@tsibley tsibley removed their assignment Jun 19, 2019
@emmahodcroft
Copy link
Member

emmahodcroft commented Jun 20, 2019

Apologies if this isn't the correct place, but here are some issues identified by @jameshadfield and I where we would like different functionality for running with/without a config file.

The goal, as I see it, is to tackle one of the most common problems people have: When running tutorials and from ppl trying out on their own, they add a trait to their run, but don't change the config file - then they have no idea why the trait isn't showing up.*

We should have a very easy way to run export command-line-only where everything you supply is transparent. It will be less fancy, but it will allow people to easily visualise what they've run. Then, if they want to 'prettify' it, they can explore also including a config file.

Any extended config stuff (flu-specific, etc) should sit on top of the 'default' config file which can be supplied, so it's not required. This should make it fairly easy for those familiar with the current config file use to adapt to the new use.

I would propose (if possible) the general rule is 'override command-line if present in config file', but that a 'complete' config is not necessary if something has been supplied via command-line. So, in the config I maybe just supply colorings with my trait data (see below), but as I've given the title and maintainers via --title and --maintainers, these do not need to be in the config file. If I were to supply title and maintainers in the config as well, the ones in the config would override the command-line. This will make it easier for people to transition from command-line to config file as-needed (without having to write up a whole config file just to specify a title for one trait).

Traits

Keys, Legend, & Menu Items
Current: In config.json color_options users can specify menuItem (what shows up in Color By drop-down), legendTitle (what appears at top of legend box), and key (the column header in metadata file of the trait being referred to).
Future:
Without config: There is no way to specify a menuItem (title) or legendTitle. If a config file isn't passed in, whatever the key is (column header of the trait) will be used for both. This will allow simple, though not ideal, functionality for beginners/simple runs.
With config: If a config file is supplied, format should be very similar to how it's used currently, modified for the new schema:

   "colorings": { 
      "age_range1": {
         "title": "Age range 1",
         "type": "discrete",
    },
...

Note that there is no property for legendTitle - this will be same as title. key is not needed as it is the property name (is this correct, @jameshadfield ?)
What needs to be done: Modify augur export v2 so that if config is supplied, values passed in under 'colorings' in that file override the default behaviour of making key the title and auto-guessing the type.
Should we also allow users to specify domain in this part of the config?

Excluding Traits from Color By
Current: In export v1, only traits listed in the config file are Color Bys in auspice. In export v2, all traits in the traits.json (output from augur traits) are automatically Color Bys, plus anything passed in with export v2 argument --extra-traits (from metadata file). There is no way to exclude a trait run in augur traits from being a Color By.
Future:
Without config: As current/above. All traits in traits.json and --extra-traits will be Color Bys, with no way to exclude.
With config: As export v1 - only traits colorings in the supplied config file will be Color Bys. But perhaps excluding gt (genotype) and num_date - these should not have to be included in colorings, and will automatically be Color Bys if the information is present?
What needs to be done: Modify augur export v2 so that if config is supplied, only traits listed in colorings will be made Color Bys. gt and num_date should not have to be in colorings in the config file, and should be included automatically unless the data is missing. (This may be default behaviour right now already.)

Excluding Filter Bys
Similar to the above (but less pressing) - every trait is also automatically a Filter By. Propose that in export v2 with config, only traits listed in filters will be Filter By options (this is as it works in export v1 currently).

Other notes:

  • How do we currently guess 'type' for keys when no config supplied?
  • Should 'domain' ever be guessed or is this entirely handled by auspice?

*There was a guy at ABPHM who downloaded augur/auspice ran his own TB data in the time between my talk finishing and the end of the next talk. (Amazing) Even he didn't realise how to modify the config file to make his stuff show up.

@emmahodcroft
Copy link
Member

We should probably work to include footer text in the new unified JSON and thus have a way to specify this (perhaps only via config file, since could be large?) in augur export v2. See issue 707 in auspice.

@emmahodcroft
Copy link
Member

I've worked a bit on how to guess 'type' for key when it's not passed in a config (which is impossible at the moment). I'll push these soon. But it is not complete as it should be, probably.

@tsibley
Copy link
Member

tsibley commented Jun 20, 2019

This is all great, @emmahodcroft! Thanks for all the work you've done thinking about this. :-)

There's only one thing I'd like to suggest we do differently:

I would propose (if possible) the general rule is 'override command-line if present in config file', but that a 'complete' config is not necessary if something has been supplied via command-line. So, in the config I maybe just supply colorings with my trait data (see below), but as I've given the title and maintainers via --title and --maintainers, these do not need to be in the config file. If I were to supply title and maintainers in the config as well, the ones in the config would override the command-line.

This is the opposite of the Unix conventions for commands which take a config file. The conventional order of overriding things goes like this, from lowest priority to highest priority:

  1. Defaults (e.g. default Auspice config as known to augur export)
  2. Config file(s) (e.g. additional user-specified config)
  3. Command-line options
  4. Environment variables

This is often accomplished by building a config data structure for each level (1-4) and then doing a deep merge starting with 1, then overlaying 2, etc. For example, the command-line option processing would build a partial config data structure which would then be merged over levels 1+2.

@emmahodcroft
Copy link
Member

Ok, I think this is fine - in this case I'd suggest some messages to remind the user of this - maybe at the end of export if both things like --title/--extra-traits and a config file have been specified, we can print a message reminding the user that command-line will override config. (Which actually would probably be sensible whichever way the override works.)

@tsibley
Copy link
Member

tsibley commented Jun 20, 2019

Agreed about the usefulness of warnings like that!

@jameshadfield
Copy link
Member Author

Note that there is no property for legendTitle - this will be same as title. key is not needed as it is the property name (is this correct, @jameshadfield ?)

Yup, that's it! -- https://github.com/nextstrain/augur/blob/master/augur/data/schema.json#L156

This is a great summary of how the command line args & config should interact. @huddlej what are your thoughts on the config file design & composability? If we can agree on something before next wednesday I hope we can implement it as part of #297

@huddlej
Copy link
Contributor

huddlej commented Jun 21, 2019

I completely agree with @tsibley's recommended approach for composable config files and would propose the same thing for this current issue.

@jameshadfield and @emmahodcroft you bring up a lot of important details about how the config files need to be structured and each of these likely warrant their own separate issue so they can be addressed by whoever is most able to work on them. I am way behind on the changes to the auspice schema and will need to catch up a little, but I don't think the specifics prevent me from writing up a proposed interface for augur export that could address the current issue.

Can we plan to:

If this plan won't work because command line options to override config settings is tightly coupled to this composable config feature, we could consider moving the command line arguments component of #297 into this issue. My hope generally is that an individual issue can be addressed by one or two people at most by one or two commits, so each of us can take on bite-sized pieces of work that address a bigger feature.

@rneher
Copy link
Member

rneher commented Jul 4, 2019

Looking at the discussion here and in the flu repo, it seems that the consensus option is to allow multiple config files via --auspice-config cfile1.json cfile2.json. what is unclear to me is which one takes precedence both files contain the same field? My intuition would be that later overwrites earlier. Within augur, this would be super simple to just update dicts.

@trvrb
Copy link
Member

trvrb commented Jul 4, 2019

This would be exactly my expectation. The last file passed in takes precedence.

And fields passed directly in the command line (like --maintainer) would take precedence over the same fields in the config file.

@tsibley
Copy link
Member

tsibley commented Jul 5, 2019

Yep, those were my expectations.

I do think that the config overwriting should be a deep merge of the data structures, not just a top-level update.

@rneher
Copy link
Member

rneher commented Jul 5, 2019

yes, hadn't thought of nested structures here. something like this should do though, right?
https://stackoverflow.com/questions/20656135/python-deep-merge-dictionary-data

@tsibley
Copy link
Member

tsibley commented Jul 5, 2019

Yep, the recursive function to do a dictionary merge is often pretty short to write. Many languages have the function in their standard library or in a canonical package already. One option for Python would be deepmerge (though I think we'd want to customize its default append-to-list behaviour).

@jameshadfield jameshadfield added the v6 Issues relating to augur v6 / auspice v2 label Jul 18, 2019
@jameshadfield jameshadfield changed the title Define (composable) auspice config file for unified JSON production implement composable auspice config file for augur export v2 Jul 30, 2019
@emmahodcroft
Copy link
Member

I think most of the issues here are resolved by the latest PR on this - but it's a rather long convo so I could be missing something.

One decision that is left (from above, and generally) is whether we want to have this 'extra' config option (and/or, do we want this now, in v6) - where more complicated options can be imported in a second config file. We haven't spoken about this much, and I'm afraid I don't have a good idea of what kind of things were meant to be implemented here?

If it's just that the second config is very similar to the first but has a few different options, that shouldn't be too hard to implement, I imagine.
(Ex: a 'main' config for flu that specifies the colorbys, geo, panels, then specific ones for builds that give different titles)

But was there more?

@jameshadfield
Copy link
Member Author

One decision that is left ... is whether we want to have this 'extra' config option ... where more complicated options can be imported in a second config file. We haven't spoken about this much

As this isn't yet started, let's implement this in a future release (but please reopen this issue if you think it can be implemented now!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v6 Issues relating to augur v6 / auspice v2
Projects
None yet
Development

No branches or pull requests

6 participants