Configuration defaults set per-module/etc #89

bitprophet opened this Issue Oct 1, 2013 · 9 comments


None yet

1 participant


Problem: distributing general tasks like in Invocations' docs module. can store default options for shit like 'sphinx source/build targets' in the module, and override at runtime, already. cannot obtain the docs module and attach it to my namespace with a different set of defaults, without doing stupid/fragile wrapping of the tasks.

Solution: get configuration working in the context that can be set in ways besides what the parser generates. feels like best way is to go through Collection since that's really the only link from a task to the context.

what happens now:

  • cli creates context (generated from some parser args affecting run's behavior which get turned into a top level 'run' subdict), obtains collection, hands both to executor
  • executor hands context to contextualized tasks
  • cli calls executor.execute() passing in the name of the task, the parser args given, etc

So we can implement updating the context w/ data from the collection in a few ways:

  • in the cli module, before creating the executor. not great, esp since this action should not require using the CLI itself - ought to be lib friendly
  • in executor, probably in __init__, maybe in execute
    • doing later allows more lazy tricks, but otoh we probably don't want to encourage modification of the context obj between tasks...
    • either way doing in executor is as low as we can go. probably fine.

In terms of how to set the info we'd need to expand the api of Collection somehow. not sure if constructor kwargs would work since those are used as shorthand for creating the namespace.


There aren't any 'regular' kwargs for Collection right now, but "handle **kwargs but work around 'real' kwargs" is a not entirely shitty pattern, so could get away with following the rest of its API and offer both an explicit e.g. .add_config() (or .configure(), .set_config(), etc) + convenience config={...} kwarg?


Need to consider how this might work in nontrivial situations, i.e. nested namespaces.

In the base case it's "obvious" - the config dict given to the collection is merged into the context.

If a sub, or sub-sub, or etc, collection defines a config, is that also merged directly into the context (single flat namespace)? Do we follow task namespacing (outer.inner) in a flat dict (e.g. ctx['outer.inner'])? Do we use nested dicts (so e.g. ctx['outer']['inner'])?

Feels like the latter is more sensible, though I worry it's letting my Chef experience show through too strongly. But it means that a given "to be shared" collection module doesn't have to worry about conflicts.

Though that also means that the context has to know "where" the task is that it's being given to - otherwise the task very much needs to know "where" it is in the namespace or it won't know where to find its config options! I.e. if I'm collection, my task body would have to know to access ctx['']; and if my collection got re-rooted (which is part of the point) that would break.

That in turn implies we do need a single "flat" or "global" namespace and must rely on tasks guarding against conflict by naming their stuff in a nested fashion (i.e. in my docs module I might name Sphinx options as sphinx.*).


In addition, anything handed to the Collection must be easily overridden by the "downstream" user. Seems the obvious solution there is for downstream to clone+modify the upstream Collection object - meaning we need to add a "from another Collection" variant of the constructor (which is distinct from "create a Collection with another Collection inside it").

If you assume it's unlikely to want >1 copy of a given Collection within one's process space (which seems relatively valid*), one could simply use Collection.from_module(module) (which, in this case where there's always an explicit namespace obj within the module, ends up just returning that object) and then modify that object with more .configure() (or w/e) calls.

Otherwise we'll need an actual .clone method somehow. I think that can be done by simply reconstructing the object - initialize w/ name arg if it was given, then for each task in the others' tasks, add_task; ditto for collections; done.

* if one is going down the road towards importing an upstream tasks module 2+ times and fiddling with their config options to differentiate...I feel like they're past what we can reasonably design around anyway.


Cloning feels doable but I'll punt for now until the need comes up, don't wanna get too bogged down with idealism at this stage.


Was wondering about using Executor+Context to perform merging between runtime + config (+ others?) values for a given arg, prior to actually calling the task. But this was predicated on having 3 'levels' - runtime, config, lowest-level-default - meaning shit like ctx.get('var', kwarg, default) would pop up a lot.

However I realized that's false, one could/should always put the 'local' default into the config instead, meaning it's only a two-level 'get', e.g.:

def mytask(ctx, arg=None):
    arg = ctx.get('mymodule.arg', arg)
    # OR EVEN
    arg = arg or ctx['mymodule.arg']

ns = Collection(mytask)
ns.configure({'mymodule.arg': 'default'})

Have this working well enough to use in and then using it in a newer project w/ nonstandard docs dir. Woot.

@bitprophet bitprophet closed this Oct 5, 2013

Whoops - don't think I implemented recursive merging with this. E.g.'s repo (which is what I designed the feature for) overrides the docs conf options (like sphinx.source) in its own root ns.

But e.g. Invoke's does not - and since that has its own root NS, and we don't have recursive merging, that NS has no sphinx.* options and thus the docs module's attempt to access these options finds nothing (and gets a KeyError).

@bitprophet bitprophet reopened this Oct 12, 2013

While poking more, realized we have to handle both recursive & sibling merging (e.g. outermost NS may have N>1 subcollections, in which case the subcollections' configs must get merged in some order.)

Thinking of going alpha merging for now but I suspect the "right" approach is to keep track of when collections were added to their parent & merge in that order instead. This means telling Collection to keep track of the subcollections in order (right now they're in a dictlike object).

Feel like this merging behavior is likely to confuse users no matter how it's done, too :(


Ugh - and in some cases it's not possible to determine sibling add order - if one uses kwargs, e.g. c = Collection(inner_one=inner1, inner_two=inner2) there's no way to tell what order the user intended (or rather - they cannot have intended one). Compared to doing c = Collection(); c.add_collection(inner1); ... or c = Collection(inner1, inner2) where ordering could be implied.

So I guess I will do alpha order just so some explicit order is documented, and if somebody else finds some requirement for a different ordering type in the (rare! or should be) collision/override use case - then we can rehash it.

@bitprophet bitprophet reopened this Nov 27, 2013
@bitprophet bitprophet closed this Nov 27, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment