Skip to content

Expanded configuration mechanisms #147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
15 of 17 tasks
sophacles opened this issue Jun 13, 2014 · 70 comments
Closed
15 of 17 tasks

Expanded configuration mechanisms #147

sophacles opened this issue Jun 13, 2014 · 70 comments
Labels
Milestone

Comments

@sophacles
Copy link

Maintainer edits

Overall TODO generated by huge arse discussion thread.

  • Add new concepts doc on configuration, including a clearly marked hierarchy definition, and a big WARNING about how multiple files in same location w/ diff extensions are not merged.
  • Make sure to clarify .yaml, not .yml.
  • Probably also needs an update to the CLI docs section re: flags and env vars (new subsection) punting on user-driven config updates that aren't env vars; env vars have their own section in config so no need to duplicate
  • Re: env vars, need to figure out best way for user-driven config settings to work too, eventually punting
  • Brief mention of concepts doc in the tutorial Actually don't see an elegant way to work it in - feels best just as a standalone regular concepts doc for now
  • Make sure API in docs gets updated to be consistent & 'modern' - e.g. obj['foo.bar'] should turn into obj['foo']['bar'] or ideally with the option of obj.foo.bar.
  • Update concepts/contexts to adapt, possibly paring it way down or removing entirely, tl;dr it's just about explaining how contexts perform shared state.
    • The 'nested collection configs get merged' explanation could still live here but probably lives better in the new config page, or the namespacing page if it exists.
  • Vendorize etcaetera
  • Use its Config object in place of Context's existing inner dicts, get tests passing again referencing it (it's dict-like).
    • Decide whether the current run/general/etc splits become sub-config objects or just nested namespaces within the primary config object
    • Leaning towards the latter so people can override them everywhere
  • Add new CLI flags for setting config options (primarily -f or whatnot for "specific config file")
  • Implement hierarchy re: multiple conf files + env vars + cli flags + in-module configs
  • Conf file loading of different extensions at the same 'location' is exclusive - no merging of eg /etc/invoke.yaml and /etc/invoke.json.
  • Debugging of conf option overrides must be easy
    • either via a no-op CLI flag showing what the tree would be for a given task module (or task invocation?), or
    • via regular debug output (ideally + a no-op option)
    • really, those go to the same place
  • If not covered in the previous point: it must be easy to troubleshoot stupid typos in config file paths.
    • e.g. if you accidentally created a /etc/invok.yaml, or fatfingered the arg to invoke -f <path> - it should be clear when running with -d which config files it attempted to load and which ones failed to load.
    • Main problem is that for the first 3-4 levels we have 3x file paths we end up searching for, so it will be somewhat verbose :( but no obvious solution offhand, better to add it to debug output (or some other level?) than try to be clever.
  • Update invocations to use this new setup instead of dotted names - they still technically work but are invalid for eg env var translation.
  • Context objects (& subclasses like Fabric2's Connection) need a clean API for this configuration stuff - either Context directly grows all the config related methods (perhaps becoming a etcaetera.Config subclass itself?) or it accepts some config object (again perhaps etc.Config) as an instance member.
    • The current driving need here is actually testing related, I need to inject a config value override or two in Connection tests.
  • Fill in any remaining skipped tests related to this feature set.

Original ticket description from sophacles

It would be nice to be able to pass context variables on the command line to invoke scripts. Something like:

> inv -X var1=val1 -Xvar2=val2 target --target1opt=val

And perhaps even allow targeting variables to specific namespaces, ala:

inv -X ns1.var1=val1 -Xvar2=val2 ns1.target  target2

for those cases where target2 should use the default for var1, but ns.target1 needs the overridden value.

There are of course concerns about how command line set options override things in a call to Collection.configure... take for instance this script:

@ctask
def task1(ctx):
      run("echo {val1}".format(**ctx.config))

@ctask
def task2(ctx):
    run("echo {var1} and {var2}".format(**ctx.config))

ns1 = Collection(task1)
ns1.configure({"var1":"ns1var1"})
ns = Collection(ns, task2)
ns.configure({"var1":"globalvar1","var2":"globalvar2"})

Which is the right output:

option 1:

# inv -X ns1.var1=clival1 -X var2=clivar2 ns1.task1 task2
clival1
globalvar1 and clivar2

(note - because generally it's considered cli options override anything in-program)

option 2:

# inv -X ns1.var1=clival1 -X var2=clivar2 ns1.task1 task2
globalvar1
globalvar1 and clivar2

(note because of the way overrides work in invoke already)

I think everyone would agree that:

# inv -X var1=clival1 ns1.task task.2
clival1
clival1 and globalvar2

is correct.

@bitprophet
Copy link
Member

I haven't yet sat down and pondered how I want ye olde config override hierarchy to work here, but this is definitely part of it. Then there's also the possibility of env vars (#22) and config files, possibly multiple layers of those as well (system/global, per user, local to project, etc).

I'm already semi unhappy with how the existing config mechanism can be unintuitive and hard to reason about, so I'd prefer to really hash this out sometime with pros/cons instead of iterating. Would welcome more ideas if you have them, of course.

@sophacles
Copy link
Author

Generally speaking.. (leaving aside for a moment the namespace stuff in inv) in unix land the hierarchy goes something like this (lowest priority to highest priority):

  • default value
  • general environment variable (e.g. $EDITOR)
  • configuration file (first global, then local - e.g. ~/.gitconfig is overriden by $project/.git/config)
  • program specific environment variables - e.g. $GIT_EDITOR
  • command line options.

Of course, some programs swap the config files with one of the environment variables - because strong consistency is not a unixy concept.

So I think in the large, that would be a good idea to go with as a start.

Now, about configs files, and how they override each other, and so on - for an extreme case of what could happen I like to refer to vim's runtimepath. It does (lowest precedence to highest):

  • /usr/share/vim (app defaults)
  • /etc/vim (system defaults)
  • ~/.vim (local settings)

(it also then has "after/" stuff for which it does in the reverse order, with all sorts of other exceptions and other rules (in vim do - :he runtimepath for the gory details). I think that's not the goal here :))

So about invoke... The way collection namespacing and config overriding works now is pretty nice (once the multiple paths to a collection issue is figured out that is). It's basically the same as the ones mentioned above...

  • submodule
  • importing module
  • importing importing module... etc

The common thing about all of these is they basically order precedence on a concept of "locality" or "immediacy". Effectively the, the precedence is given to the option that was most likely to have been changed most recently. In the invoke case this is "i imported already written code, then called configure on a collection that had the imported stuff"... it's most likely I won't import a library that isn't written yet.

All that is a really long-winded way of suggesting this type of "how to set variables" precedence order:

  • module level settings
  • importing module settings
  • importing importing module settings... (etc)
  • /etc/invoke.conf
  • ~/.invoke set variables
  • ${DIRECTORY_CONTAINING_TASKFILE}/config.py (or whatever)[1]
  • environment vars
  • command line -f path/to/config.py
  • command line -X var=val options (with the semantics from above hashed out for namespace overrides)

[1]With the option of having ~/.invoke or /etc/invoke.conf specify a search path for relevant config files. Also note - then of course naming config files conf.py is not a good idea anymore.

Anyway that turned out longer than expected. hah!

@bitprophet
Copy link
Member

Hah, no, that's perfect - overall it obviously fits what I or any other longtime Unix user would have come up with if we'd spent a few minutes thinking about it. Thanks!

The only parts I have any quarrel with:

  • The 'local to tasks file config.py' - I'm not sure what this gives users over setting the equivalent values within their tasks.py instead? If it were a YAML or JSON file, maybe, but even then I wonder if it is worth the conceptual/implementation overhead.
  • The config.py's vs invoke.conf's - shouldn't those all be the same type/format? (and curious if you've thoughts on what format - personally I strongly prefer YAML despite its shortcomings, but barring that I'd use JSON; ini and configobj are both terribad at nested dicts.)

@matleh
Copy link

matleh commented Jun 16, 2014

+1 for this - to be able to override configuration settings and for the use of JSON and/or (JSON-subset style) YAML for configuration files.

@sophacles
Copy link
Author

Not sure why I put config.py instead of invoke.conf - I'm just going to blame "Friday afternoonism". I agree that they should all be the same format. (And no preference on what the actual format is).

As for the use case of task local invoke.conf vs having it built into the tasks.py: for me it comes down to separation of concerns - I just like having config details different than the code that executes them. It's not a strong technical argument, just a preference - in the end either way would be fine.

@bitprophet bitprophet changed the title Pass context variables on the command line. Expanded configuration mechanisms Aug 15, 2014
@bitprophet
Copy link
Member

Overriding this ticket to be a general config overhaul/expansion one. I need to start poking at this to support Fabric 2 development, where things like "storing the default SSH port and local username, then allow overriding in various ways" come into play.

I skimmed tickets on Fabric's tracker and most PRs or issues mentioning the topic, and specifically JSON/YAML support (which I think is absolutely required) just tie such file loads into Fab 1's single level env dict.

I googled for Python libs dealing with this to see if we can avoid reinventing the wheel (tho as usual I worry that we'd have the usual fork/collaborate/fuckit problem trying to hone the inevitable 10% of functionality that we need to change).

I'll edit this post to add more links, but so far I've found:

@sophacles
Copy link
Author

+1 for etcaetera but all of them look pretty good. (note: my vote is based only on random details in my first impression and is totally subjective. I have not arguments for or against any of them)

@bitprophet
Copy link
Member

Cool, thanks @sophacles. Digging deeper into etcaetera now (tho partly just as it's first). For any of these, our specific needs seem to be:

  • Ability to handle arbitrarily nested config to match nested namespaces (unless we can make dotted names work well enough or whatnot)
  • At least one level of nesting to handle different areas of config (general, run() related, etc)
  • Strong control over hierarchy, as per earlier comments about flags vs env vs multiple config locations - we want it to work our way and not be foisted on us.
  • JSON support
  • YAML
  • environment variables
  • some way of manually injecting values from our CLI parser at a specific hierarchy level
  • (ideally/optionally) Python file support)

@bitprophet
Copy link
Member

Notes from etcaetera:

  • Typing the name is a pain :)

  • It interprets a single nested config file (eg YAML) as nested subdicts, which is good.

  • You can add additional config objects (each with their own set of adapters) as sub-configs; these do not get "merged" but are instead available as sub-objects via attribute. E.g.:

    config['foobar']
    config['foobar']['bizbaz']
    config.subconfig['whee'] # This comes from some other source, eg another conf file
    # NOT VALID
    config['subconfig]['whee']
    
    • This may or may not be useful for us depending how much we directly expose Etcaetera objects to our users.

So far it fits all of my previous bullet point needs except possibly the "inject our own CLI flag values" - I need to see if we can stick those in the middle or if they can only be added via the 'defaults' adapter. Optimistic tho.

EDIT: looks like the only builtin way to do that is the Overrides adapter which always comes last in the hierarchy, but that may be appropriate for CLI flags anyway. Gotta doublecheck against the above notes. We could also probably make a custom adapter, have to look into that.

@bitprophet
Copy link
Member

There are no explicit docs on custom adapters, but it's pretty basic, the AdapterSet accepts any number of adapter objects between the defaults (always forced first in the deque) and overrides (always last) and as long as one implements a few basic "here's my settings" methods, should be easy to subclass. And again that's if Overrides don't work for us.

@bitprophet
Copy link
Member

Taking the above hierarchy list, here's how it could work with an etcaetera based Config object + adapters. Note: it might replace the existing namespace driven hierarchy, or it might augment it, I need to revisit the corner cases of that design to see if it works ok in what etc gives us.

  • module level settings - anything defined in Invoke itself in the default context/collection objects; could be an etc Defaults adapter obj
  • importing module settings - 'importing module' means a sub-collection, so that falls under the 'note' above. Could be we just take our existing "stuff from the namespace" and end up putting it all in the initial Defaults once we've internally collapsed it.
  • importing importing module settings... (etc) - ditto
  • /etc/invoke.conf - setting aside default format & multiple formats for now (tho see below) this would simply be the next adapter in the set, a File.
  • ~/.invoke set variables - again setting aside specifics, this would be another File adapter
  • ${DIRECTORY_CONTAINING_TASKFILE}/config.py (or whatever) - yet another File adapter
  • environment vars - an Env adapter
  • command line -f path/to/config.py - another File adapter
  • command line -X var=val options - this would be an Overrides adapter (or a custom adapter) generated out of our existing parser stuff. Presently, these values are (like the Collection namespace configs) self-merged into the core Context object, so this is another spot where we need to figure out how much we delegate to etcaetera.

@bitprophet
Copy link
Member

Re: file formats, random thoughts:

  • Use of etc or the other libs lets us deal with Python module files, YAML and JSON roughly equally/transparently (etc's File adapter selects how to load data based on file extension, for example)
  • The easiest-implementation option is to say "we only support one type, we will seek out only config.<that type>, deal with it" and then there's no ambiguity about ordering.
  • That's not too friendly and etc makes it easy to load more than one, so we could instead add adapters for each possible extension - i.e. the /etc/invoke.conf "item" becomes 2 or 3 adapters, i.e. File('/etc/invoke.yml'), File('/etc/invoke.json'), File('/etc/invoke.py').
    • HOWEVER the problem is if some poor sod ends up with >1 of those on the FS - due to etc's implementation (really - any implementation) there has to be some sort of order and there's no clear one right intuitive answer.
    • Feel it's best to go for practicality over purity (hi Zen of Python) so off the cuff I'd just go with yaml -> json -> py (if we offer py).
  • At first I thought etcaetera's Python-module support was single-level only (module level constants) but no, it happily consumes dict values. So Python ends up being just as flexible as YAML/JSON in terms of nesting; we might as well support all 3.

@sophacles
Copy link
Author

  • Some thoughts on file format things -

If it's decided that invoke should handle all the formats, then I agree there should be a strict preference ordering. That is: /etc/invoke.yml always wins, if it isn't present, then /etc/invoke.json always wins, and so on. (note: the actual ordering choice is irrelevant to me, the one I presented was example only). When wearing my devops and/or sysadmin hats, I really really want this to be deterministic and simple.

This should be true any time invoke finds a file rather than being explicitly told "use this file".

Additionally, whichever format gets the "always wins" choice, should be the format distributed with the pypi and github packages for things like default values and primary example scripts. Secondary examples in the other formats can be provided in a contrib area, and maybe in one documented example each.

I can see the strong ordering being a bit confusing at first, and there being a bit of pushback and FAQ answering, but in the long run, I think it's far simpler than trying to deal with >1 "winning" config files and all the weird merging cases.

  • Some thoughts in general on the above discussion:
  • I want to retract my preference for ${DIRECTORY_CONTAINING_TASKFILE}/config.py in favor of an invoke approved method of Defaults(). See my next comment for my thinking...
  • I am beginning to wonder if we want an "include" directive for each of the supported formats. This way we could allow people like me, who have a large collection of tasks, modularized heavily, to not need a new full config file for different formats. For example, I use the exact same nginx and uwsgi deployment options/variables in development, testing, and deployment, so including my sophacles-nginx.conf would be nice in my scripts. (other devs have different development settings for thier environment, so they would not want this... for example the guy to my left uses the same nginx options, but different uwsgi options in development). Generally speaking, this would probably simplify a lot of usage scenarios more than some of the other options discussed.

@sophacles
Copy link
Author

Deeper on: ${DIRECTORY_CONTAINING_TASKFILE}/config.py being a potentially bad route...

  • The idea of an include directive in .conf files is better.
  • Having only a Defaults() object in any given tasks module also allows discoverability:
    • as sysadmin/devops person I know what I can do easily with inv --list, just like with per-task arguments
    • as a developer, It keeps everything clean, and since they are called defaults (semantics i know, but sometimes it matters...) I know that they may change, and should actually work with that in mind, just like with kwargs.
  • It makes overrides of defaults by containing modules more difficult. (Conversely, overrides are discoverable by --list on the overriding taskfile too. Again, sysadmin and devops me is very happy about this thought, not to mention the zen of python stamp of explicit approval)
  • Speaking of the zen - I'm pretty sure not having a separate .conf is more zeny (particularly with the include directive mentioned above - explicit, simple, obviously right, etc)

@bitprophet
Copy link
Member

[...] I agree there should be a strict preference ordering [...]

To clarify, do you mean that if there's both a /etc/invoke.yaml and /etc/invoke.json that we load invoke.yaml and do not load invoke.json at all?

I had been imagining we'd load anything that exists but allow yaml to 'win' over json where they conflict - which is what etcaetera's API makes easiest to do.

If you did mean the 'winning' should be more exclusive, I'm torn, I could see users being confused with either implementation. Leaning towards the exclusive option though - it is simpler to reason about as you said.

[...] whichever format gets the "always wins" choice, should be the format distributed with the pypi and github packages for things like default values and primary example scripts [...]

Excellent point, I agree. I've always (lately anyway) been of the opinion one should 'bless' a single option in any situation when there's multiple equally-valid options. Makes it easier for folks who have no strong pre-existing preference, and for us.

stuff about local-to-project conf files and includes

Random response thoughts:

  • Includes, in general, are always awesome. We as sysadmins know this :)
  • Having includes in this situation requires custom logic (and probably hacks to etcaetera) because JSON doesn't support it by default, and I don't know if YAML does either (tho it's more likely to.)
    • This could be a small or large stumbling block, depending. Off the cuff I'm thinking a "special" top level key like _include that we look for & honor, and if users needed to use that key themselves, tough noogies. (This approach is inspired by Chef role files, for better or worse...)
  • That said, I do not quite grok your meaning re: using Defaults objects in tasks modules instead of local-to-project conf files. Can you give a concrete example of what you're thinking?
    • Generally, local-to-proj conf files strikes me as a useful thing to have; they aren't strictly necessary because they serve identical function to in-tasks-module configuration (which may be what you mean?) but a lot of the time it's nice to have those defaults expressed in a non-Python config format, and this would enable that option.
  • I also don't get how you feel this ties into inv --list since currently that only displays the tasks that exist, not their config values.
    • Unless you mean the kwargs/flags specifically, which is a valid point - but those are only one part of what can be configured at this level (i.e. all kwargs can be considered "config options" but not all config options are, or should be, exposed as kwargs.)

@sophacles
Copy link
Author

Oh man, so I made a bunch of notes a couple weeks back, which I thought I had fleshed out into a comment here. Since there is no evidence of such a comment, I'll do it here. Hopefully that will help clarify all of the above. I'll also try and address the things specifically mentioned above. (Warning - this will probably be long).

The way I'm using invoke is this:

  • There are several python modules in a python package with some subpackages. They are grouped by concept. So within the package I have a subpackage called build, and a module for python, a module for java, a module for C and so on. The tasks in each module correspond to the relevant tasks - the maven calls for each package in java, the cmake (etc) commands for the C code, and so on. Similarly i have subpackages for install, deploy, test and so on. (Not necessarily organized by language, but by some scheme consistent within the subpackage).
  • Each python module exports a single invoke collection/namespace.
  • Each subpackage exports a "composite" collection (e.g. the build collection is composed of the java, python, c, etc collections)
  • There is a utils directory in our tree that contains "top-level" invoke scripts. These are a bit ad-hoc, in that they define some common groups of tasks, based on actual deployment needs, or build needs or whatever. So I have task file called ci-tasks.py which imports the entire build sub package, and the entire test subpackage, and has a few targets, e.g. buildall that has all of the build tasks listed in pre=[...], and e.g. run_travis which has some setup stuff and has pre=[buildall, testall]. A different taskfile has all the build steps and some installation steps, and so on.
  • Each of my per-module collections heavily use context variables. Some of course only matter to the module itself (e.g. pip_binary) others are used nearly universally (e.g. source_tree_root).
  • the very commonly used context variables are actually defined in a commonly imported dict that keeps defaults consistent, but i consider this outside the scope of invoke - just mentioning it here for clarity.
  • all the of context variables are given a default value in a dict, and passed to some_collection.configure()
  • each of the taskfiles in utils above will also call configure with relevant "secondary defaults". (e.g. the source directory on test instances is /usr/local/src/our_stuff/, but the build modules define the source directory as os.path.expanduser('~/src/our_stuff') since the most common use case of individual build tasks is on developers' machines)
  • since we're still working out various deployment options, I have defined an invoke script in utils/ for each thing we try. That is: deploy_ubuntu_tasks.py, deploy_redhat_tasks.py, deploy_docker_tasks.py, and so on. This is because each has differences for some (~10%) of the context variables.

Ok - so thats some background on how we currently use invoke. There are a few major pain points:

  • If a common variable changes it can be a pain to track down where it is set, and where it is overridden.
  • similarly with many possible collections to invoke via invoke -c foo_tasks, it is hard to remember what each one has set as a "script default" for the various variables. (the variables though are super useful, using contexts has reduced my code a significant amount, so it's the right thing), so I don't know what to override where.
  • It's annoying to figure out each variable that exists without going through each python module, or getting a bunch of specific documentation listings.

With those pain points in mind, there are a few things I actually see getting worse by adding configuration files as an option:

  • Knowing or discovering what configuration directives are even available.
  • Knowing the value of a context variable after invoke pulls in defaults (as set by collection.configure()), the various config files, and environment variables.
  • Understanding the results of various overrides in a "multiple inclusion" setting for collections.

So in order to mitigate these I propose that invoke --list starts displaying all of the known context for the relevant collection. This includes every variable that is in a context based on the environment and command line options that are passed. Alternately another command called invoke --show-vars or similar do that job.

OK - that covers most of the things I think are implicit to my comments from yesterday (whew). To address your specific points (in a slightly different than asked order, because I feel they build on each other a bit):

  • When i mention "using Defaults objects in tasks modules instead of local-to-project conf files", I had in mind my library of tasks mentioned above. Rather than have config file for each of the modules (and associated tasks and collections) imported from various packages above, I meant just have a default configuration similar to collection.configure. One way to make this work relatively simply is to have a class of defaults that is defined to go with each collection. (or just use the dict as the object containing the defaults).
  • I feel this ties to --list because of the stuff expanded above, additionally, because without a target to show all the context (or other config variables available) we have a bunch of implicit knowledge required. Further, the origins of this ticket are to pass context variables on the command line, so those should have a method of discovery similar to the tasks which are passed on the command line.
    • I do also really like the idea of current --list behavior expanding to show each task's kwargs tho!
  • I agree with your assessments and comments on include directives.

About config file format and naming:

  • I did mean "if there's both a /etc/invoke.yaml and /etc/invoke.json that we load invoke.yaml and do not load invoke.json at al"
    • in a larger context, I have no problem having yaml in one place, and JSON in another and merging, that is explicitly:
# Files in /etc:
/etc/invoke.yml
/etc/invoke.json

# files in my home dir:
~/.invoke/invoke.json

# files in my task dir:
/path/to/tasks/invoke.yml

Then this happens within invoke:

load_file('/etc/invoke.yml')
load_file('~/.invoke/invoke.json')
load_file('/path/to/tasks/invoke.yml')

The excepion to the above is when there is an explicit command line arg stating which invoke file to use, at the path/to/tasks level. That is - if I pass e.g. inv -f /path/to/tasks/invoke.py it will be loaded rather than /path/to/tasks/invoke.yml. But the same files from /etc and ~/.invoke would be loaded.

I hope this giant wall of text makes my thoughts a bit clearer for you.

@bitprophet
Copy link
Member

Oh boy oh boy reply time~~

The tl;dr I hear for your setup is:

  • You've got the meat of your task definitions in a set of 'topic' module trees / namespaces.
  • Those tasks are prepared for invocation by inclusion in user-facing top level task files that collect the task namespaces in various cross-sections.
  • Both the inner namespaces and the top level task files set config values, with the latter overriding the former.

That strikes me as a reasonable enough 'big site deployment' of invoke.

It mirrors how I perform code reuse in Invoke to a degree - the invocations set of modules is imported into my various OSS projects' top level tasks files. See especially invocations.docs and its use in the tasks.py of fabric/invoke/paramiko, which was the impetus for the entire existing config override setup.


What I hear for your pain points is really one thing: it's difficult to introspect the overall state of the configuration. What are the defaults, and where; how are they overridden, and by whom; what is the final state?

I agree that exposing that info would be useful, IMHO as a separate flag (bundling into --list feels like it'd be too noisy for the common case) or alternately just as debug level logging output.

A sub-issue there is your mention of knowing "what settings are available" - which really means "what settings are being consumed anywhere in the code". Dunno if there's any good way to check on this that isn't static analysis? Unless we were to set up a declaration style mechanism where you must declare settings at a high level (which could be framed as "you must always have some default value for every setting"?)

Finally, re: task kwargs, I totally flaked and forgot that we already do that, inv --help <taskname> displays all the flags for the task named. :)


Re: config file loading, I agree with most of your points re: selecting files when there are conflicts, that's all what was in my head too.

Your points re: conf files vs in-python structures, I think you may have misinterpreted something, my intention here was to add config files as another option for configuration. In-Python dicts would not be removed in any way; for folks who find conf files (or env vars or etc) an easier alternative (or who want the 'extra levels' of merging) they'd simply gain the option whereas right now it's not possible w/o ugly manual hacks.

@sophacles
Copy link
Author

re: conf files vs in-python-structures - yeah, I was confusing myself - rereading this thread seems to have put me on the same page with what you're saying.

re: --help, --list and kwargs, I think the idea in #163 might actually cover what I was thinking.

re: introspection things and showing the configuration options and what's available in contexts.
and
re:

A sub-issue there is your mention of knowing "what settings are available" - which really means "what settings are being consumed anywhere in the code"

I didn't intend that - bad wording on my part. I really just meant introspection of what a context would be given the "current" tree. Assuming that there's an argument --show-config, and that there are several task files, as well as tasks imported from python packages here's a set of examples (some output from existing commands is not real, so much as just to get the idea across, not suggesting we change --list):

$ inv -c taskfile_A --list
foo.collection1.task1
foo.collection1.task2
bar.collection2.task3
bar.collection2.task4
$ inv -c taskfile_A --show-config
foo.ctx_var1 = "value" [set in collection: foo]
bar.ctx_var1 = "different" [set in collection: collection2]
ctx_var2 = "value" [set in collection: collection1]
ctx_var3 = "value" [set in config: /etc/invoke.yml]

$ inv -c taskfile_A --show-config foo.collection1
ctx_var1 = "value" [set in collection: foo]
ctx_var2 = "value" [set in collection: collection1]
ctx_var3 = "value" [set in config: /etc/invoke.yml]

$inv -c taskfile_A --show-config bar
ctx_var1 = "different" [set in collection: collection2]
ctx_var2 = "value" [set in collection: collection1]
ctx_var3 = "value" [set in config: /etc/invoke.yml]

$inv -c taskfile_A -f configfile.yml
ctx_var1 = "value" [set in config: configfile.yml]
ctx_var2 = "value" [set in collection: collection1]
ctx_var3 = "value" [set in config: /etc/invoke.yml]

$inv -c taskfile_B --list
collection1.task1
collection1.task2
collection3.task5

$inv -c taskfile_B --show-config
ctx_var2 = "value" [set in collection: collection1]
ctx_var3 = "value" [set in config: /etc/invoke.yml]
ctx_var4 = "value" [set in collection: collection3]
ctx_var5 = "value" [set in collection: ns]

This would require some merge tracking, but the way overrides work in the "etc" package would facilitate this.

If I use a variable ctx_var_hidden in some of the tasks, and don't define it anywhere, it will runtime error either way, and it won't be listed in the --show-config output.

@bitprophet
Copy link
Member

OK, I get all that. We've definitely words'd enough and I think we're mostly on the same page now too, so here's what I think needs doing for this (so I can get down to brass tacks): EDIT: moved to description, derp.

@bitprophet
Copy link
Member

As per the just-closed #101 we need to make sure that defaults for things like run(echo=True/False) are considered here. In fact that need is what prompted me to return to this ticket, since I need Fabric 2 to have a way of setting defaults for things like SSH port number, user name, etc.

@bitprophet
Copy link
Member

Type-casting down.

@bitprophet
Copy link
Member

Bunch of hierarchy tests already pass, first fail was unexpected, env vars are not overriding the per-project conf file (only thing that should override env vars is the runtime conf file). Wondering if the "double-load" setup isn't working the way I expect it to and env isn't getting applied - though it was working fine in the simpler tests.

@bitprophet
Copy link
Member

Interesting - when I introspect the AdapterSet at the fail point, the NestedEnv adapter is ending up 2nd in the list, after Defaults, instead of at the end where I expected it. Need to dig and figure out why, perhaps some sort of "where's the next spot to insert an adapter" pointer is getting reset on that first load.

EDIT: Looks like a bug in etc.Config.insert, when it is a "true insert" (vs an insert that would be an append) it does this strange rotate (feels like reverse), append, rotate backwards, process. Probably only shows up in this kind of situation when one adds a new non-special adapter after adding Overrides (since if there is no overrides it's just an append).

EDIT 2: The rotate is https://docs.python.org/2.6/library/collections.html#collections.deque.rotate

@bitprophet
Copy link
Member

OK, fixed boog, submitted oleiade/etcaetera#19

@bitprophet
Copy link
Member

Now I am at the expected failure where env/runtime conflict:

  • Env must be added last so it can 'see' all the other values (via an initial load() call)
  • But by being added last, it is now higher than the runtime config file in the hierarchy
  • Feels like the 'best' way to fix it is to manually call .insert on the Config adapters object, instead of calling register on the Config itself.
    • This breaks encapsulation, which is less than ideal.
    • The only other way to affect the adapters deque without doing so is to override the entire adapters list with Config.adapters = which feels just as gross if not moreso.

@bitprophet
Copy link
Member

OK, entire config test module is green, time to add a handful of CLI module integration style tests (and probably something to actual integration suite, heh) and this might be like...done. For real. EDIT: except I forgot the debuggery stuff in the checklist, and attempted real world use to eg override the -e flag didn't work. Whee. Those dovetail well though.

Realized that super low level debugging might mean even more invasive changes to etcaetera, hopefully not the case (and I might skip it if so, given how long this has taken).

@bitprophet
Copy link
Member

Wow, I think I got all the way here only to figure out that etcaetera doesn't perform deep nested merging (i.e. anything deeper than top level is wholly overridden by higher levels). Double checking this but it's kind of a major WTF if so. Why is this feature cursed :(

@bitprophet
Copy link
Member

Yup etc.Config just does .update in a for loop. Updating it to perform recursive merging & will see if upstream accepts - doing it inline for now, could make it controllable pretty easily if that makes it more palatable.

@bitprophet
Copy link
Member

Have that generally working, but now running into other issues, I suspect because it's not performing tilde expansion within File. (Also, yes, printing debug values about what was loaded or not loaded looks like it might be a pain too.)

At this point, really starting to suspect it's time to cut losses and just remove etcaetera from the loop as I'm spending more time working around it / modifying it, than actually using it :( Will see though, if i can just get this tiny bit farther, leaving it in would work for now.

@bitprophet
Copy link
Member

Have even more tiny baby branches ready to submit upstream, tho I am going to ask oleiade what he thinks of both my existing PRs and these new ones, and whether it's worth even submitting stuff or if I should just keep using my fork-as is (and probably wean myself back off of etcaetera if I find teh time).

Also realized while poking at the debuggery - I somehow misplaced the tests that proved "YAML prevents loading of JSON or Python, etc"!! Poking at that now :( EDIT: tests in, naturally implementation requires Yet More Etcaetera Munging because I can't really do it as I am now w/ all 3 explicit file paths. So, a File subclass (or something) it is.

@bitprophet
Copy link
Member

While writing out the docstring for this, it does make me wonder if it's worth revisiting our logic here, now that we have an existing implementation that makes it much easier to have all 3 file suffixes in play. Rereading our arguments above, though, it does still feel like the option best for users as well - nobody is likely to have two types of this file in a single prefix/location and this does make things easier to reason about when troubleshooting. Will soldier on.

@bitprophet
Copy link
Member

Welp also realized while debugging that I never implemented the INVOKE_ prefix the docs claim we look for, for env vars. Hooray! Easy enough to make an option which I can disable for testing though, adding it to every test string feels kinda gross.

@bitprophet
Copy link
Member

Continuing to discover various new exciting things that were either forgotten, or unknowable until implementation time.

Latest is that if we want to allow configuration of the --no-dedupe config flag, it causes more config ordering issues because Executor wants to call load on the configuration right before execution (due to loading collection-driven defaults) but we also want to know whether config files stated to perform deduplication or not (and deduping occurs before execution, of course).

This is now the third or fourth instance of having to bend over backwards to meet etcaetera's view of config loading :( I think I am going to keep digging this hole deeper & add another 'pre-load load()' but at this point it's clear I'd have spent much less effort just writing something from scratch.

@bitprophet
Copy link
Member

Now running into the 4th or 5th instance: dealing with how etc's AdapterSet is a deque and makes reasoning about the hierarchy and inserting/overwriting things a big PITA. Time to branch and try ripping it out.

@bitprophet
Copy link
Member

A ways into that effort (been doing other things lately too). Realizing my original idea of a truly lazy config object is too much magic/code/etc and is really not necessary, so I'm tweaking things to be a little simpler.

@bitprophet
Copy link
Member

Wow. The tests pass. And then they pass under Python 3. And so do the integration tests. And the docs build. Is this nightmare over?!

Need to revisit checklist up top. Eg debugging is still possibly hairy - by having debug output in every config-merge, and then having to run merging many times during a single CLI session (due to reloading for env vars, re-merging after cloning, etc) it becomes very verbose, and this isn't even with per-variable logging, which might be useful in some deep cases.

But really, I've spent so much God damn time on this I can't see myself waiting much longer to merge & move onwards. It's been literally months of spare time down the drain.

@sophacles
Copy link
Author

W00t. For the record... It's not down the drain. We've been building our scripts aroarouthe eventual aavailability of this feature and are super excited for it!

@bitprophet
Copy link
Member

Yea it was a poor choice of words, but still, there's just no way this feature - however complex - should have taken literally months >_< at least I learned some hard lessons about using third party code. (Which is sad, usually I'm super aggro about people not reusing libraries.)

@bitprophet
Copy link
Member

I'm currently back looking at my nascent Fabric 2 branch to make sure that this new API works for its needs, since that was the entire reason I started poking this :) assuming it looks passable I'll merge to master and put out a new Invoke release.

@sophacles
Copy link
Author

W00t!!

@bitprophet
Copy link
Member

I haven't actually finished the Fab 2 integration but I figured any changes would be minimal and was not worth having this mega huge branch floating outside master :) whee.

@pieterlukasse
Copy link

Hi, what is the conclusion regarding

inv -X var1=clival1 ns1.task task.2

I am struggling to find what is the current way of setting global arguments that are valid for all tasks.
Thanks!

@TaiSHiNet
Copy link

@pieterlukasse I believe this is being followed on #153
Although, namespaces/collections can have configs, you would still have to define (as of now) the parameters on every task

@LecrisUT
Copy link

  • Context objects (& subclasses like Fabric2's Connection) need a clean API for this configuration stuff - either Context directly grows all the config related methods (perhaps becoming a etcaetera.Config subclass itself?) or it accepts some config object (again perhaps etc.Config) as an instance member.

    • The current driving need here is actually testing related, I need to inject a config value override or two in Connection tests.

@sophacles are there any linked issues for this?

@sophacles
Copy link
Author

Hi @LecrisUT - I don't think there are. I've forgotten most of the context (pun not intended) around this at this point, and any issues I would have linked to it went away when the startup I was working for at the time closed down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants