Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

function specific tab completion via annotations #2701

Closed
wants to merge 13 commits into from

8 participants

@rmcgibbo

This is a new attempt at #2636. The feature enables custom tab completion for different arguments of a function, and also for the return value of a function (without calling it).

It's easier to show in action though. Check it out!


Here's tab completion on the return value of a function, without ever evaling the function

In [1]: from IPython.extensions.completion import tab_complete, globs_to,  literal

In [2]: @tab_complete
  ...: def baz(x) -> str:
  ...:     pass
  ...:

In [3]: baz(notevaluated).<TAB>
.capitalize  .find        .isspace     .partition   .rstrip      .translate
.center      .format      .istitle     .replace     .split       .upper
.count       .index       .isupper     .rfind       .splitlines  .zfill
.decode      .isalnum     .join        .rindex      .startswith
.encode      .isalpha     .ljust       .rjust       .strip
.endswith    .isdigit     .lower       .rpartition  .swapcase
.expandtabs  .islower     .lstrip      .rsplit      .title

Note that sense this never calls the function, all the tab completion is based on methods defined on the str class. If you notate the return value as some class that dynamically defines its methods when it is
instantiated, the tab complete engine is not going to be able to find them. There's no way around this without actually invoking the code on the TAB keypress, which is not desired (and is already possible by configuring your ipython config file).


Here, tab completion on the first argument only shows literal strings and directories that match a glob pattern. For the second argument, it shows some string literals.

In [4]: @tab_complete
   ...: def foo(x : globs_to('*.txt'), mode : literal('r', 'w')):
   ...:     pass
   ...:

In [5]: foo(<TAB>
'COPYING.txt'        'dist/'              'setupext/'
'IPython/'           'docs/'              'tools/'
'__pycache__/'       'ipython.egg-info/'
'build/'             'scripts/'

In[6]: foo('COPYING.txt', <TAB>
'r', 'w'

The default type of tab completion is based on isinstance matching. You could also be explicit about it:

In [7]: @tab_complete
   ...: def bar(x : str, x : instance_of(str)):
   ...:     pass
   ...:

In [8]: string1 = string2 = 'some string'

In [9]: bar(<TAB>
string1  string2

In [9]: bar('a', <TAB>
string1  string2

All of this functionality is possible in python2 syntax (without PEP3107 annotations) by calling the @tab_complete decorator with keyword arguments.

The design here was influenced by Nick Coghlan's emails on python-ideas, and an annotation-based typechecking library on activestate.

@Carreau
Owner

That's nice !

Not much time to look at it, but the travis faillure are mostly issues with <tab>in doctest than can be fixed with the @skipdoctestdecorator (or something like that).

@rmcgibbo

@Carreau, unfortunately, there doesn't seem to be a way to skip the doctests that run on the module docstring, since you can't decorate the module docstring.

@Carreau
Owner

Oh, it was on the module docstring, I didn't saw that, I only saw doctest errors :-)

@bfroehle

Wow, fantastic! I'll give this a whirl in a few days.

@takluyver
Owner

You could try setting skip_doctest=True at the module level - that's all the decorator does to functions. I'm curious about how your 'sleazy' fix works.

@asmeurer

It's unfortunate that the easiest way to add return completions in Python 2 is @tab_complete(**{'return': completer}). Would it make sense to add a convenience decorator @tab_complete_return, which would work like

@tab_complete(a=str, **{'return': str})
def f(a):
    return a + '1'

is the same as

@tab_complete_return(str)
@tab_complete(a=str)
def f(a):
    return a + '1'

?

@asmeurer

Are there supposed to be four " before "IPython"? Also, I learned something new about Python here, which is that """"hi""" produces the string "hi with the quotation mark!

Good spot. I fixed the doctest issue by skip_doctest=True at the module level.

@rmcgibbo

@takluyver, good idea. Setting skip_doctest=True at the module level did the trick.

@asmeurer, I added the python2 convenience decorator.

@rmcgibbo

In the qtconsole, the tooltip and the tab completions come up on top of each other for entries like foo(<TAB>, and it doesn't look great.

@rmcgibbo

A few unordered thoughts:

  • I'm sure there are Windows-related bugs in the globs_to annotation. There are most likely issues with filenames that have spaces in them as well.
  • When tab completing for a class constructor, under what conditions should the code check __new__ vs. __init__? What is the correct behavior when both methods exist and take different arguments? Is that even allowed?
  • There might be a nicer way to, given a class, infer (approximately) what methods are available on the instances of that class. Currently, I'm just using all of the entries of utils.dir2() on the class, minus the mro. But it should instead try to filter out class methods. I think the best way to do this (at least in python2) is with inspect.classify_class_attrs.
  • Using inspect.getcallargs could make the logic of utils.tokens.cursor_argument simpler and more robust.
  • A nice extension to the return value tab completion would be some specific support for return values that are a tuple of different types, like
@tab complete 
def f() -> (str, int):
    return 'hello', 5
f()[1].<TAB>
.bit_length   .conjugate    .denominator  .imag         .numerator    .real
@takluyver
Owner

I wouldn't filter out class methods - they can still actually be called on the instance, so it's valid for them to show up in tab completions.

The question about tuples nrings us back to the discussion on python-ideas - should that syntax be for "2-tuple of str, int", or for "either str or int"? In any case, I suspect the complexity of tracking those variables isn't worth it.

@rmcgibbo

Maybe you're right. I feel like the most common thing is 2-tuple of str, int. But perhaps it's not worth it.

A new issue with architectural implications is that most of this code fails with functions defined in C (or cython) since they can't be properly introspected. You also can't set random attributes on functions defined in C trying to set f._tab_completions = data is a nonstarter. Since this is where the tab complete data is stored, it's not a good situation.

I was trying to manually "decorate" numpy by parsing the docstrings, but since so many of the functions are defined in C, the possibilities are really limited.

@rmcgibbo

Also, it would be possible to have both semantics with something like

@tab_complete
def f(x) -> (str, int):
    return 'a', 5

@tab_complete
def g(x) -> one_of(str, int):
    if np.random.randn() < 0:
        return 'a'
    return 5

given them the two different semantics

But I am -1 on this idea.

@bfroehle

Or would it be the much less satisfying:

@tab_complete
def f(x) -> tuple:
    return 'a', 5
@rmcgibbo

right. that doesn't provide much functionality with the current code (just .count and .index)

@takluyver
Owner

I think that unpacking return values quickly gets complex. E.g. if the return value is guaranteed to be a 2-tuple, I would often unpack it: message, result = f(x). Now you could parse that, figure out that message should be a string, and show me tab completions next time I reference that variable.

But before long, you're almost reimplementing Python to keep track of variable changes. And if that's the route we want to go down, we should look more closely at existing refactoring and completion solutions (like rope and jedi). And in any case, let's not try to do too much in one pull request.

@rmcgibbo

Yeah. I concur. I think we're on the same page.

@rmcgibbo

@takluyver, do you think it's better to store the function tab completion info on the function (e.g. f._tab_completions) or in an IPython.core.completer module level dict/db?

The first seems less spooky, but doesn't let you do builtins like np.ndarray. The code complexity is about equal both ways.

@bfroehle
@takluyver
Owner

Hmm, it's annoying that you can't attach arbitrary information to a compiled function. I wonder if it's worth looking for information both on the function, and then in a global registry? E.g. our display machinery looks for methods like _repr_html_() on the object, and then looks in a dictionary of formatters to see if one matches the type at hand.

Alternatively, perhaps we should look at a module-level structure - then packages like numpy could add a dictionary of annotation-like info to the module, rather than having to play the 'are we inside IPython' game to add the information to an IPython registry.

@rmcgibbo

Okay. As long as you don't think it's too... "dirty" (for lack of a better word), I will make the decorator try to attach the completion info to the function and if it fails (due to this issue) try instead to register the completion info inside of an IPython registry.

@asmeurer

Yes, both __new__ and __init__ can be called. __new__ exists so that objects can return different types of objects. If it does indeed return the same type, __init__ is called. To quote http://docs.python.org/3/reference/datamodel.html?highlight=__new__#object.__new__:

If __new__() returns an instance of cls, then the new instance’s __init__() method will be invoked like __init__(self[, ...]), where self is the new instance and the remaining arguments are the same as were passed to __new__().

A simple example:

In [3]: class Test(object):
    def __new__(cls):
        print "calling __new__"
        return object.__new__(cls)
    def __init__(self):
        print "calling __init__"
   ...:         

In [4]: Test()
calling __new__
calling __init__
Out[4]: <__main__.Test object at 0x108a29a90>

As far as tab completion, it's impossible to know what will be returned if an object has __new__ without running __new__. In general, a completely different object will be returned. __init__ is only run if the current class is returned:

If __new__() does not return an instance of cls, then the new instance’s __init__() method will not be invoked.

Anyway, when __new__ does call __init__, it is with the exact same arguments. So probably the correct thing to do is to just ignore __init__ when __new__ is defined. For interactive purposes, this is correct (the only exception I think is a subclass calling the base class's constructor). By the way, how do you handle multiple possible return types? I would do

def f(x): -> (str, int)
    # f(x) can return str or int
    ...

def f(x): -> tuple
    # f(x) returns a tuple
    ...

def f(x): -> tuple_of(str, int)
    # f(x) returns a tuple of a string and an int
    ...

where the third one I would hold off on actually implementing for now. It's a more advanced completion feature to work with f(x)[0]<TAB>.

On the other hand, having multiple possible return types is important, especially for classes that define __new__.

@bfroehle
@asmeurer

Your example is wrong. __init__ should not return anything. And by the way, somehow GitHub decided to completely ignore the backtiks in your comment.

@asmeurer

But you're right. It does seem to ignore it. In a way, it is correct. __init__ is the constructor. __new__ is just a static method (this is also how help(A) views things.

@bfroehle

A digression: @asmeurer, as you know there is no concept of void in Python -- every function has an implicit return None if no other return statement is given. So in this case it doesn't really matter if we return object.__init__(...) or not.

@asmeurer

Oh, I get it. object.__init__() is None. It's a very confusing way to put it. Really, there should be no return statement at all in the __init__ method.

@rmcgibbo

Thanks for the input. I think ignoring __new__ makes sense for now. It's less common in user code, and the feature would be tricky. If you can't reason about __new__ without executing it, that's suboptimal. And when both are defined but one takes tab-completions annotated positional argument but the other takes *args and **kwargs, having the tab completion code do the right thing is nontrivial.

@rmcgibbo

Here's some code to automatically add tab completion by parsing numpy format docstrings. https://github.com/rmcgibbo/autotab

It doesn't work in a lot of cases (if the docstrings aren't formatted well enough). But when it does, it lets you do fun stuff like this:

In [1]: import numpy as np

In [2]: import autotab; autotab.install(np)

In [3]: np.dot(np.ones(not_evaluated), whatever).transpose().<TAB>
.T             .base          .copy          .dtype         .getfield      .min           .put           .searchsorted  .squeeze       .tolist
.all           .byteswap      .ctypes        .dump          .imag          .nbytes        .ravel         .setasflat     .std           .tostring
.any           .choose        .cumprod       .dumps         .item          .ndim          .real          .setfield      .strides       .trace
.argmax        .clip          .cumsum        .fill          .itemset       .newbyteorder  .repeat        .setflags      .sum           .transpose
.argmin        .compress      .data          .flags         .itemsize      .nonzero       .reshape       .shape         .swapaxes      .var
.argsort       .conj          .diagonal      .flat          .max           .prod          .resize        .size          .take          .view
.astype        .conjugate     .dot           .flatten       .mean          .ptp           .round         .sort          .tofile        
@takluyver
Owner

That's pretty cool, but I think the docstring parsing should definitely stay separate from the main codebase. We should encourage numpy & other packages to add type information in a structured form - whether via decorators, or in module-level data structures.

@rmcgibbo

I agree. That's why I put it on my personal GitHub and not in the PR. I just wanted to reference it here so that someone testing out this PR could play with it.

@Carreau
Owner

Pinging the status of this PR. I haven't followed a lot, but it seem done to me.

As for the autocompletion base and docstring, would ou like to add this to the wiki ?

@minrk
Owner

Extension tests do not appear to be passing on Python 3.

@minrk
Owner

What's the decision here on whether this extension belongs in IPython vs separately maintainted? It seems like bundling every new extension in IPython misses much of the point of extensions. Pinging @ellisonbg and @fperez on that one.

@takluyver
Owner

At a glance, I don't think that module is really an extension (as in something that you would do %load_ext foo with). It looks more like infrastructure for extensions, in which case I'd say it belongs in IPython.lib.

I also think we might have to rationalise our tab-completion infrastructure a bit to better cope with more complex parsing.

@asmeurer

I would argue for this one to be included.

  • It is general. It's not like this only applies to numpy functions or something like that. In fact, by itself, it applies to nothing, because it rather sets up an API for others to use. To me, that indicates it should be part of IPython. On the other hand, specific decorations belong in their respective libraries (e.g., decorating numpy functions' return types).

  • From my understanding, tab completion is a pretty fundamental part of the IPython mission (@fperez can correct me if I'm wrong). I personally would like to see much better tab completion in IPython in the future, especially in non-terminal clients where more is possible.

But it's your decision. I could also see very nice tab completion being an external library that plugs in to IPython.

@Carreau
Owner

I think there are 2 things here.

  • The modification of the completer
  • The "tab_completion" extension decorator.

So I think the question of having this part with the decorator "separate" make sens, because it can be used without the rest of this PR or IPython.

@Carreau
Owner

BTW, it conflicts now.

@ellisonbg
Owner

I agree with @minrk that most "extension" like things should not be in IPython itself. My preferences would be:

  • If it is really important, we include it into the core, not as an extension.
  • If it is not that important, it is third party extension.

I am -1 on the in between of having it as an extension that is shipped with IPython.

@rmcgibbo

Sure. I realize that this PR is called an "extension" (and there's code in the 'extension' directory), but most of it is additions to the completer module that enable the tab completion mechanism to be extended via annotations. That's the core feature. The question, to me, is, do you want that in IPython?

@ellisonbg
Owner

I like the functionality, but this is a lot of added code and complexity to an already messy and complex part of our code base. It adds a lot of potential for future breakage points and we will have to maintain it. All of that points to not including it in IPython. I don't know enough about our completer logic to know if/how this could be simplified. I would like to get a good read on these issues from the other core devs.

@takluyver
Owner

On the one hand, it is quite a bit of extra, non-trivial code for something that users aren't especially crying out for. On the other hand, tab completion is often one of those things that you don't know how valuable it is until you're using it heavily. And this wouldn't be much use as an extension - it's only likely to see much use if it's a default component of IPython.

This is made harder by the fact that the completer stuff is some of the older, thornier code, that I don't think any of us are especially familiar with.

I think I'd regretfully defer this for now, but hope that parts of it can be reintroduced when someone gets to spring-cleaning the completer stuff.

@rmcgibbo

Is that spring cleaning on anyone's todo list? I agree that the existing completer code is quite obtuse. I had to learn its ins-and-outs to write this PR. I have some ideas about how it could be improved, that I can share with whoever.

@ellisonbg
Owner
@takluyver
Owner

It's on my 'maybe get to this one day' list. ;-)

You might be the world expert on it after doing this pull request, so if you want to have a go at it, be our guest. Off the top of my head, I think the things to look at include:

  • It was originally tightly coupled to readline, which provides the interface for tab completion in the terminal. Now completions can either go to readline or over the wire to frontends. I think there has been some separation of concerns to allow that, but it can probably go further.

  • Like other bits of our code, the IPython completion stuff appears to be built over a plain Python version (IPython.core.completer.Completer). We're moving away from the 'pure Python' modes, which tended to be little used and poorly tested, so if that's not architecturally useful, we might get rid of it.

No doubt there'll be plenty of other fun things waiting to be discovered ;-)

@fperez
Owner

After reading the whole code, here's my take on the matter: this is simply too complex to put into IPython right now, and part of the complexity stems from the fact that it's hacking around a sub-optimal completion API that doesn't return the right pieces of information in various places.

There are also a number of minor issues with the code itself, but since I don't think it can go in at all, I don't want to get into line-by-line review.

The functionality is potentially quite useful if adopted widely, but because people are very reluctant to put ipython dependencies into their own codes, I'm somewhat skeptical of how well that will go. Obviously we want to break the chicken-and-egg problem, but right now the cost/benefit analysis just isn't there.

Furthermore, the 2.x version of this is hideous (no fault of yours, @rmcgibbo, its' just how the language goes). I don't think many people will adopt it for 2.x, but I can see it working much better in 3.x.

For all these reasons, here's the path forward I see:

  1. A new PR that focuses only on refactoring the completion API to make something like this possible. It cleans up any limitations of the current API, such as returning full line information to the completers, etc. We'll need to see how much backwards compatibility we break by doing so.

  2. This is released as a standalone, small package that contains the api specification for how people should annotate their functions and the ipython extension they can install.

Note that the two parts above should be separate: people should be able to annotate their codes without creating any dependency on IPython. The annotations are just metadata, possibly aided by your tool. And only users of IPython would load the extension and thus take advantage of the metadata.

In doing so, I'd try to see if it's possible for the py3 version to require zero imports and rely strictly on the annotation protocol (maybe that's already the case, I didn't study all your examples in detail). That way we'd have a protocol for this, rather than a dependency.

@rmcgibbo, you've done a terrific job and we really appreciate it. So I hope you can continue working with us, we just need to find the right path forward.

@fperez
Owner

For reference, the h5py custom completers do exist and rely on our API. So let's make every reasonable effort to come up with a solution that doesn't break their code, even if it means making a slightly uglier API (such as having the new one use different names and keep deprecated methods with the old functionality).

@fperez
Owner

Also @rmcgibbo, have a look at that h5py completer to see if a similar regex-based hook isn't sufficient to trigger your completer machinery without further changes.

@ellisonbg
Owner

@rmcgibbo I think the first step will be to remove everything from this PR that is not related to your refactoring of the core completer logic. I am hopeful that we can come up with a good API that doesn't require completely rewriting the inner workings of everything. I would try to think about what the simplest API is that will work for you. I am hopeful that a small amount of refactoring will suffice. I don't want you to feel like you have to completely rewrite the completer logic to implement your stuff.

As Fernando said, we really appreciate your work - please don't hesitate to bug us with questions...

@takluyver
Owner
@rmcgibbo rmcgibbo closed this
@fperez
Owner

@rmcgibbo, just read your summary of the issues, this is fantastic work. Thanks so much, we'll continue the discussion off that. I'm very impressed.

@rmcgibbo

@fperez, where is the best forum in which to continue the discussion? Once I've finished that document (I want to add more on the proposal), do you want me to ping this thread, or is another format preferable, such as using that text as the basis of a new "Proposal" GH issue?

@takluyver
Owner

To echo @fperez , that looks like an excellent examination of the problems. If we can flesh out the proposed solutions a bit, I think making an IPEP for it would be the way forwards.

@rmcgibbo

Got it.

@fperez
Owner

Yup, go ahead and start an IPEP, it's the perfect way to move forward these kinds of more complex discussions. I just noticed typeahead.js from Twitter, I'd suggest thinking about what could be done with this kind of tool on the client side if we had a richer completion API.

We can always flatten the returned object into a simple list for readline, but our internal design shouldn't be constrained by that at all.

@rmcgibbo

Thanks @fperez. That's awesome. So many good ideas.

@fperez
Owner

@rmcgibbo, I'm going to be at Stanford giving a talk about IPython on Friday at 10am, in the Stats dept. How about we book some time later that day to pound on this IPEP?

@rmcgibbo

@fperez, yeah, sounds good. I have a meeting with my advisor from 2-3pm, but otherwise I'm free. Your schedule is probably more constrained than mine.

Where is the talk? If it's a public seminar, I think some lab members and I might like to come.

@fperez
Owner

SPECIAL TALK
DEPARTMENT OF STATISTICS
10:00AM, Friday, March 8, 2013
Huang Engineering 18

@rmcgibbo

Awesome. I'll come up & find you after the talk and we can figure something out?

@fperez
Owner

yes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.