Config5 #400

Merged
merged 11 commits into from May 17, 2011

Projects

None yet

3 participants

@ellisonbg
Member

The first stage of updating our configuration system. I have added a new command line option parser that integrates with our config system. To see how all of this works, look at docs/examples/core/appconfig.py and run it like this:

python appconfig.py -h

Options can be set using the syntax:

python appconfig.py Foo.i=10 Bar.enabled=False log_level=50

This branch should be reviewed and merged into trunk and then we all can start to work on updating IPython's main apps to use this new API.

ellisonbg added some commits Mar 25, 2011
@ellisonbg ellisonbg Added KeyValueConfigLoader with tests. 58fc715
@ellisonbg ellisonbg Starting help in InteractiveShell. 7a726af
@ellisonbg ellisonbg Started to move config docs to objects. 939184c
@ellisonbg ellisonbg Ongoing work on the config system.
At this point, I am mostly testing out various approaches, but:

* class_traits and class_trait_names added.
* appconfig example created for testing new config ideas.
* InteractiveShell traits updated.
* shortname handling added to command line config loader.
4999878
@ellisonbg ellisonbg Created config.application and updated Configurable. 6c3563d
@ellisonbg ellisonbg Added new tests for config.loader and configurable. e766cc8
@ellisonbg ellisonbg Added SingletonConfigurable with instance method.
I have added a singleton configurable class for objects that
should have only a single instance, such as InteractiveShell and
the main application objects. For these classes the .instance()
method should be used to create/retrieve the instance.
a8f1082
@ellisonbg ellisonbg Made InteractiveShell a SingletonConfigurable.
I have moved the instance method to a new subclass of Configurable
so we can use it on other configurable classes, like Application.
InteractiveShell now just inherits from this class.
a93fd45
@ellisonbg ellisonbg IPython.config.application.Application updates.
* Application now subclasses SingletonConfigurable.
* Added logging support.
6e90172
@ellisonbg ellisonbg Added tests for IPython.config.application.Application 605614c
@ellisonbg ellisonbg Adding more documentation to config.application related files. 4a3f8d5
@rkern
Contributor
rkern commented Apr 24, 2011

I have to say that I really dislike that InteractiveShell is currently a singleton. I don't think it's necessary, and it introduces artificial limitations. I would like to remove that, but it gets harder if we keep adding features that use it.

@ellisonbg
Member

I am not attached to InteractiveShell being a singleton. The only class that absolutely need to be a singleton is the main Application object, which represents the entire process. In fact, two summers ago, I spent quite a bit making changes to move us in that direction. But there are still a few things we would need to figure out before we can remove this restriction entirely.

  • How to handle the cases where we are injecting things into Python's global infrastructure (sys.displayhook, sys.excepthook, sys.stdout|err|in, builtins, etc.).
  • What API to use to allow parts of the code base to get the active InteractiveShell when those parts don't hold a reference to that object.
  • How to handle references so that we can actually trigger Python's garbage collection when an InteractiveShell is done being used.

I think that fixing these things would improve the core greatly. Just out of curiosity, what usage cases do you have for multiple InteractiveShell instances?

@minrk
Member
minrk commented Apr 24, 2011
  • How to handle the cases where we are injecting things into Python's global infrastructure (sys.displayhook, sys.excepthook, sys.stdout|err|in, builtins, etc.).

We would have to register/unregister these overrides/redirections before/after every execution.

  • What API to use to allow parts of the code base to get the active InteractiveShell when those parts don't hold a reference to that object.

Are any of these methods not called from inside the user_ns?

  • How to handle references so that we can actually trigger Python's garbage collection when an InteractiveShell is done being used.

This is less clear to me

I would be interested in an example of having multiple simultaneous InteractiveShells in one process, because I can't think of any.

@rkern
Contributor
rkern commented Apr 24, 2011
  1. As Min says. Also, you never have to modify the __builtins__. You can always inject a crafted __builtins__ into the namespace.
  2. Give them a reference. As Min suggests, most of the current uses of InteractiveShell.instance() are injected into user_ns anyways (like the formatting display functions) or are used in the remote 0MQ process (which can remain a singleton, just with the singleton-enforcement logic moved over to that part of the code).
  3. To some extent, it'd be nice to have that problem. You don't have to solve it in order for the singletonness to be valuable, and it doesn't cause any additional problems that aren't already there. It would be nice to clean up the cycles eventually, but that can just be something that the user needs to deal with. It should be straightforward to add an InteractiveShell.finalize() that will explicitly break cycles in the object graph as a stopgap.

As for use cases, we have a CodeEditor in Traits UI that provides a shell widget that lets you run commands in a namespace. There can be any number of these in the GUI each editing different namespaces. We can use our own hacked up shell using Cmd, but IPython would be infinitely better. And before you ask, yes, IPython's Qt frontend does work just fine in-process (with some hacks around 0MQ). The only thing interfering with an IPython implementation of CodeEditor is the singleton nature of InteractiveShell.

@minrk
Member
minrk commented May 4, 2011

SingletonConfigurable does not enforce uniqueness, it just has a notion of a 'current instance'. You can create as many as you want, but cls.instance() will always refer to the same one. If we allowed InteractiveShell.select_instance(inst) to change the active instance (and return the previous), then with multiple instances, inside run_cell we could do:

saved_instance = InteractiveShell.select_instance(self)
...
InteractiveShell.select_instance(saved_instance)

That way, InteractiveShell.instance() would always return the calling instance, replacing get_ipython() in __builtins__.

Can we gather the singleton-requiring code here, so we know what needs to change?

As I understand it, there are:

  • utils.io.stdout etc. (and callers of it)
    • use shell.stdout etc. instead (easy)
  • lib.inputhook
    • I don't know, and possibly not necessary?
  • core.display_trap, core.ultratb
    • set/unset hooks in each shell.run_cell call. This is already done, as far as I can tell.
  • __builtins__
    • use InteractiveShell.instance() or require passing of instances everywhere
  • ref cleanup
    • When the Shell instance is gone, the user_ns links also go away, the HistoryManager closes, etc. What might be hanging on to references?
    • as @rkern suggested, a finalize() method might get us close enough.

What other code assumes IPython == InteractiveShell?

@rkern
Contributor
rkern commented May 4, 2011
  • ref cleanup: the problem is that there are (or would be) lots of references to the InteractiveShell instance creating cycles. It won't go away cleanly all the time, especially since there can be arbitrary gc-breaking objects in the user_ns. But of course, a finalize() method would break those cycles. Since InteractiveShell is a real heavyweight object that people will not be making many of, explicit finalization will probably work fine.
@minrk
Member
minrk commented May 6, 2011

Is it possible to have in the config (and traitlets) for collections of a type, along the lines of argparse's nargs. For instance, I have many Traits that are pairs of ints, but that just means I have to use a list. It would be nice if I could have something like:

class C(HasTraits):
    n = Int(n=2) # this results in a length-2 list, or 2-tuple or something
    ns = Int(n='?') # extendable list of ints

Where I would still get type checking, etc.

@ellisonbg
Member

enthoughts.traits actually subclasses the fundamental containers (list, tuple, dict) to provide typed and traited containers for this type of thing. When I wrote traitlets, Fernando and I talked about this and decided that it is overkill for ipython and that we didn't like the idea of having to use special list/tuple/dict subclasses everywhere. I know this doesn't help your situation though. Maybe a simple way of handling it is this: set the type to List and then define an _on_foo_changed method that will get called when the trait is set. You can do the type checking at that point.

@rkern
Contributor
rkern commented May 6, 2011

Note that you could still let the List and Tuple traits take type arguments (e.g. List(Int) is a list of integers and Tuple(Str, Int) is a 2-tuple of a string and an integer) that only type-check on assignment. It's not quite as thorough as being able to type-check what you append to that list, but it solves most of the configuration use cases.

@ellisonbg
Member

That is a good idea and would cover most of our usage cases.

On Thu, May 5, 2011 at 10:12 PM, rkern
reply@reply.github.com
wrote:

Note that you could still let the List and Tuple traits take type arguments (e.g. List(Int) is a list of integers and Tuple(Str, Int) is a 2-tuple of a string and an integer) that only type-check on assignment. It's not quite as thorough as being able to type-check what you append to that list, but it solves most of the configuration use cases.

Reply to this email directly or view it on GitHub:
#400 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@minrk
Member
minrk commented May 10, 2011

@ellisonbg The old Application had a crash_handler_class. Do we not want this to be part of the top-level Application anymore?

@ellisonbg
Member

I left out attributes and capabilities in config.application.Application that are IPython specific. In IPython.core.application, we can have a subclass with those things still there. I imagine that IPython.config could be useful outside of IPython, so I am trying to keep it clean of IPython specific logic.

@ellisonbg
Member

This code review has gotten off topic to the issue of InteractiveShell being a singleton. I am wondering if there are other comments related to the actual config changes in this branch.

@minrk
Member
minrk commented May 10, 2011

I know, sorry about that. I'm currently working on adapting the parallel applications to the new config branch (that's where the crash handler question came from).

So far, it looks very nice and clean!

I am having some issues with init_logging. What is the right model for logging that requires other objects to be constructed first? A separate 'reinit_logging', so logging gets initialized twice?

Yes, I imagine we will want IPython.core.application that includes crash_handler, and various general-IPython bits.

@minrk
Member
minrk commented May 10, 2011

I see that you manually call init_foo(), init_bar(). Do we want a List of init methods, so we can have a single 'init' or 'construct' method that executes them in order?

Otherwise, code that wants to construct an app needs to know about all the init methods.

Or should we expect the __init__ method to produce a fully configured object?

@ellisonbg
Member

On logging: good point. Possibly not have init call init_logging? Either that or your reinit logging (or maybe config_logging).

On calling init_foo, init_bar by hand. We want the main init method to only instantiate the app with the defaults. So init should not read the command line or config files. The reason for this is that in many cases third parties want to customize the app and not do all of the logic. An example is sympy, which may want to start ipython with command line args, but no config files. We want to have Application methods with that type of granularity. Now that doesn't mean we can't also have something like init_all() as a shorthand for "call all the init methods"

@minrk minrk commented on the diff May 10, 2011
IPython/config/application.py
+
+ def update_config(self, config):
+ """Fire the traits events when the config is updated."""
+ # Save a copy of the current config.
+ newconfig = deepcopy(self.config)
+ # Merge the new config into the current one.
+ newconfig._merge(config)
+ # Save the combined config as self.config, which triggers the traits
+ # events.
+ self.config = config
+
+ def parse_command_line(self, argv=None):
+ """Parse the command line arguments."""
+ argv = sys.argv[1:] if argv is None else argv
+
+ if '-h' in argv or '--h' in argv:
@minrk
minrk May 10, 2011 Member

'--help' should probably trigger here as well.

@minrk
minrk May 10, 2011 Member

That is, it should probably be '-h', '--help'. I don't think anyone expects '--' with abbreviated names.

@minrk
Member
minrk commented May 10, 2011

The logging bit is tricky, because you want some kind of logging ready as soon as __init__ fires, but sometimes the real logging configuration requires other objects to be instantiated first. I don't think there is a problem simply reassigning self.log to a differently configured logger, but the issue is that the logical name for that step ('init_logging') is taken.

I'm fine with having a custom 'really_init_logging' in my App, or pulling init_logging out of __init__, but still having the most basic possible version still fire inside __init__.

Re: init_stages, I understand the desire for breaking it up into steps, that makes sense. I am just thinking that maybe a simplest-case where there is one method that calls the steps in the right order could be part of the official API.

@ellisonbg
Member

Yes, the logging stuff is tricky. Can we build a traitlets based API that when some attribute is set, the logging is immediately updated? Sort of like we do with the log_level.

Yes, I think we do want a single method (something other than init) that does all of the init stuff. As an aside. Previously, the individual steps in the init logic had to be done in a very specific order. Now, that is not the case. I think the traits machinery will simplify this and allow any thing to be called in a more flexible order.

@minrk
Member
minrk commented May 10, 2011

Collision detection on shortnames probably shouldn't catch inherited Traits. For instance, with:

class A(Configurable):
    a = Int(1, config=True, shortname='a')

class B(A):
    b = Int(2, config=True)

class C(A):
    c = Int(3, config=True)

No two of those classes can be used together in an Application due to a conflict on 'a'. This is a pretty big problem in the parallel code, since most objects inherit from a SessionFactory. It means that I can have no shortnames there, in one of the most commonly configured objects in the entire code.

It seems like a safe assumption that if I set 'a=5', I want that set in the parent class that defines it.

@ellisonbg
Member

Good catch. So the collision detection code needs to see if the collision is in a parent/child class relationship and let it pass.

@minrk
Member
minrk commented May 11, 2011

Not just parent/child, but also siblings - B&C should work together. If that's too complicated, it's probably okay to require that A (the mutual ancestor) be in the class list as well, though inspecting Class.mro(), it's pretty easy to find the first common ancestor.

What precisely should the resolution be?

Case 1: A&B

  • A.a=5
  • A.a=B.a=5

Case 2: B&C

  • A.a=5
  • B.a=C.a=5
  • A.a=B.a=C.a=5

The effect of all of these is going to be the same in almost all cases. However, if there is also a D(a) class instantiated that is not in the class list, the cases that set A.a will affect D.a, and those that don't will not.

If we require that A be included in the class list, then case 2 is irrelevant.

@rkern
Contributor
rkern commented May 11, 2011

I would recommend against configuring the shortnames in the trait declarations themselves. The abbreviations should not be under the control of the individual components. It requires coordination between components to contribute to the global shortname namespace. Subclassing just adds more problems.

Instead, the shortname abbreviations should be owned by a top-level object, maybe the Application (I've only taken a brief look at the diffs; I'm not sure I have a complete picture, yet). For example, it could have a dictionary mapping shortnames to their full "addresses": dict(a='A.a').

This also allows the Application writer greater control rather than the component author. For example, you may write the component A and want a particular shortname for the main IPython script. I may want to use component A for some other application and do not want to emphasize that particular option.

@minrk
Member
minrk commented May 11, 2011

I think I agree about the shortnames. A simple dict of the form @rkern suggested: {'shortname' : 'Configurable.trait'} defined by Applications would make this cleaner and remove the duplicate issue entirely.

This comes up a lot in the parallel code, which has 4 scripts, all of which involve some overlap in Configurables, but have dramatically different priorities of what to configure.

The shortname dict would also allow you to organize the help output by placing the options with a shortname in a higher priority location than those without. This is possible with the current code but doesn't make any sense to do because, as Robert pointed out, the component shouldn't dictate trait prominence to the Application.

@ellisonbg
Member

I think this makes sense as well. It will also help the design in the following way. Currently, we have to track all of the Configurables for an app and pass those to the command line parser to handle the shortnames. With this change we won't have to track all of that. Instead, we can just pass the dict. The only thing we will loose is the ability to have the auto-generated help strings for each component include the shortnames. But we can probably figure out a way of handling that without addiing to much complexity. I am in the middle of midterms through the end of next week, so I am not sure I will have much time to devote to making these changes immediately.

@minrk
Member
minrk commented May 11, 2011

Can you have the Application construct a first-priority section with just the shortname-aliased configurables?

I think it would be helpful to have the shortname options (which are presumably the high priority ones) get their own first section, followed by everything else with their full names (the current output). It's great that we now expose every configurable, but having more options makes it significantly harder for users to actually find what they are looking for. Allowing the Application to define a first-priority section with the shortnames will help alleviate that.

Is there an easy mechanism for traditional --opt/--no-opt set/unset flags for boolean configurables?

@ellisonbg
Member

Yes, I was thinking the same thing that the shortnames doc info could go beforehand.

In terms of the --opt/--no-opt style, I don't see how we can support that easily in the current model. Also, I have never liked that style. I much prefer opt=True.

@minrk
Member
minrk commented May 11, 2011

I strongly disagree on opt=True vs. store_true/false flags, but if there's no clean way to do it, let's keep it simple.

@minrk
Member
minrk commented May 12, 2011

@ellisonbg see my macro branch for what the flags/config look like.

@ellisonbg
Member

Min, I thought the values in the macros dict were going to be Config
objects with values set?

On Wed, May 11, 2011 at 11:22 PM, minrk
reply@reply.github.com
wrote:

@ellisonbg see my macro branch for what the flags/config look like.

Reply to this email directly or view it on GitHub:
#400 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@minrk
Member
minrk commented May 12, 2011

They can be Config objects. There are two valid types:

  • Config objects or dicts that get passed to self.config.update
  • strings of the key-value long form: 'Class.trait=value'

The reason I allowed the strings is that probably the most common case is setting a single value (e.g. Bool traits). And it's cleaner to write enable="Bar.enabled=True" than the equivalent enable={'Bar' : {'enabled' : True}}. But both work.

I can remove it, so there's only one valid type if you prefer.

@ellisonbg
Member

OK, thanks for clarifying this. I guess I like to "one way of doing something" idea, but I don't feel too strongly about this.

@minrk
Member
minrk commented May 12, 2011

I made the decision I was thinking it was Config or nothing - and adding Config({...}) to that is just too much. However, since dicts work fine and aren't too much more clumsy than strings, I'm okay with dropping the strings.

@ellisonbg
Member

OK, I think that is fine.

On Thu, May 12, 2011 at 12:48 PM, minrk
reply@reply.github.com
wrote:

I made the decision I was thinking it was Config or nothing - and adding Config({...}) to that is just too much.  However, since dicts work fine and aren't too much more clumsy than strings, I'm okay with dropping the strings.

Reply to this email directly or view it on GitHub:
#400 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@minrk
Member
minrk commented May 17, 2011

I added a short summary explanation for each section, and removed the string alias, so the macros are strictly config objects or the corresponding dicts.

The current output of the appconfig example help:

$> python appconfig.py -h
This is an application.

Flags
-----
Flags are command-line arguments passed as '--<flag>'.
These take no parameters, unlike regular key-value arguments.
They are typically used for setting boolean flags, or enabling
modes that involve setting multiple options together.

--enable
    Set Bar.enabled to True
--disable
    Set Bar.enabled to False

Aliases
-------
These are commonly set parameters, given abbreviated aliases for convenience.
They are set in the same `name=value` way as class parameters, where
<name> is replaced by the real parameter for which it is an alias.

i (Foo.i) : Int
    The integer i.
running (MyApp.running) : Bool
    Is the app running?
j (Foo.j) : Int
    The integer j.
enabled (Bar.enabled) : Bool
    Enable bar.
name (Foo.name) : Unicode
    First name.

Class parameters
----------------
Parameters are set from command-line arguments of the form:
`Class.trait=value`.  Parameters will *never* be prefixed with '-'.
This line is evaluated in Python, so simple expressions are allowed, e.g.
    `C.a='range(3)'`   For setting C.a=[0,1,2]

MyApp options
-------------
MyApp.log_level : Enum
    Set the log level (0,10,20,30,40,50).
MyApp.running : Bool
    Is the app running?
MyApp.config_file : Unicode
    Load this config file

Bar options
-----------
Bar.enabled : Bool
    Enable bar.

Foo options
-----------
Foo.i : Int
    The integer i.
Foo.j : Int
    The integer j.
Foo.name : Unicode
    First name.

Should I do a PR against your Config5 branch?

What's preventing us merging this code into master now?

I already have ipcontroller and ipengine working with the new config system in a local branch. ipcontroller especially can be cleaner after merging #423. The resulting application scripts are so much cleaner now, thanks for the great work!

@ellisonbg
Member

Great work! Yes, let's do a PR against config5. Once that is done,
let's setup an IRC with @fperez to go over our decisions and then we
can merge.

On Tue, May 17, 2011 at 9:41 AM, minrk
reply@reply.github.com
wrote:

I added a short summary explanation for each section, and removed the string alias, so the macros are strictly config objects or the corresponding dicts.

The current output of the appconfig example help:

$> python appconfig.py -h
This is an application.

Flags
-----
Flags are command-line arguments passed as '--<flag>'.
These take no parameters, unlike regular key-value arguments.
They are typically used for setting boolean flags, or enabling
modes that involve setting multiple options together.

--enable
   Set Bar.enabled to True
--disable
   Set Bar.enabled to False

Aliases
-------
These are commonly set parameters, given abbreviated aliases for convenience.
They are set in the same `name=value` way as class parameters, where
<name> is replaced by the real parameter for which it is an alias.

i (Foo.i) : Int
   The integer i.
running (MyApp.running) : Bool
   Is the app running?
j (Foo.j) : Int
   The integer j.
enabled (Bar.enabled) : Bool
   Enable bar.
name (Foo.name) : Unicode
   First name.

Class parameters
----------------
Parameters are set from command-line arguments of the form:
`Class.trait=value`.  Parameters will *never* be prefixed with '-'.
This line is evaluated in Python, so simple expressions are allowed, e.g.
   `C.a='range(3)'`   For setting C.a=[0,1,2]

MyApp options
-------------
MyApp.log_level : Enum
   Set the log level (0,10,20,30,40,50).
MyApp.running : Bool
   Is the app running?
MyApp.config_file : Unicode
   Load this config file

Bar options
-----------
Bar.enabled : Bool
   Enable bar.

Foo options
-----------
Foo.i : Int
   The integer i.
Foo.j : Int
   The integer j.
Foo.name : Unicode
   First name.

Should I do a PR against your Config5 branch?

What's preventing us merging this code into master now?

I already have ipcontroller and ipengine working with the new config system in a local branch.  ipcontroller especially can be cleaner after merging #423.  The resulting application scripts are so much cleaner now, thanks for the great work!

Reply to this email directly or view it on GitHub:
#400 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@minrk
Member
minrk commented May 17, 2011

PR issued. I should be available pretty much any time (except for the next ~60 minutes because I will be in transit).

@ellisonbg
Member

OK, I will ping @fperez about an IRC session, but I know he has a lot
going on this week.

On Tue, May 17, 2011 at 10:02 AM, minrk
reply@reply.github.com
wrote:

PR issued.  I should be available pretty much any time (except for the next ~60 minutes because I will be in transit).

Reply to this email directly or view it on GitHub:
#400 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@ellisonbg ellisonbg merged commit 4a3f8d5 into ipython:master May 17, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment