Stateful line transformers #2447

Merged
merged 27 commits into from Apr 11, 2013

Conversation

Projects
None yet
9 participants
Owner

takluyver commented Sep 29, 2012

The big step for IPEP2 (#2293). I anticipate there may be more work to do on this, but I think it's ready for other people to start playing with it.

What does this solve?

  • De-denting indented code, but only by as much as the first line of the cell is indented. See PR #2436.
  • Line magics can have line continuations within them (#2164)
  • The architecture for transforming cell magics is much cleaner.
  • We can clean prompts within multiline strings, e.g.
>>> a = """123
... 456"""

How does it work?

You can create 3 kinds of transformers:

  • Stateless transformers can be defined as a simple function with a decorator @stateless_input_transformer. The transformer that recognises ? at the end of lines is defined like this.
  • Stateful transformers can be defined as a generator with a decorator @coroutine_input_transformer. Most of our own transformers are written like this, because the problem lends itself neatly to using coroutines. Coroutines can save up input by yielding None until they're ready to return it. They're reset (at present) by sending None in, but I'm thinking of changing that mechanism, perhaps to throwing&catching an exception.
  • Finally you can subclass IPython.core.inputtransformer.InputTransformer directly, implementing push(line) and reset() methods.

Transformers have an attribute look_in_string, which if set to True allows them to be called even if the start of the line is within a multiline string (or another multiline expression). This is set for the leading_indent, classic_prompt and ipy_prompt transformers.

Owner

jasongrout commented Oct 9, 2012

This may make gh-2402 obsolete. I like your coroutine idea.

Owner

takluyver commented Oct 9, 2012

Thanks Jason. Yes, I intend this to offer the same possibilities as #2402 in a more joined up way - I was particularly keen to handle cell magics without the current special casing in inputsplitter.

Owner

takluyver commented Oct 19, 2012

Also fixes #2507

Contributor

rmcgibbo commented Nov 1, 2012

This PR also fixes #2539. Looking forward to the merge.

Owner

takluyver commented Nov 1, 2012

What we really need is code review. I suspect a change of this magnitude in the core code needs at least a quick look over by @fperez , but he's very busy at present. If people would like to help, it would be good to get #2301 reviewed and landed. That's a much less invasive change, although it's not trivial.

Contributor

rmcgibbo commented Nov 13, 2012

I've been using this branch for a while personally, and I just noticed when trying to launch the qtconsole that the initialization fails with

I'm sure this is a pretty easy fix.

$ ipython qtconsole
Traceback (most recent call last):
  File "/Library/Frameworks/EPD64.framework/Versions/Current/bin/ipython", line 7, in <module>
    launch_new_instance()
  File "/Users/rmcgibbo/local/ipython/IPython/frontend/terminal/ipapp.py", line 388, in launch_new_instance
    app.initialize()
  File "<string>", line 2, in initialize
  File "/Users/rmcgibbo/local/ipython/IPython/config/application.py", line 84, in catch_config_error
    return method(app, *args, **kwargs)
  File "/Users/rmcgibbo/local/ipython/IPython/frontend/terminal/ipapp.py", line 313, in initialize
    super(TerminalIPythonApp, self).initialize(argv)
  File "<string>", line 2, in initialize
  File "/Users/rmcgibbo/local/ipython/IPython/config/application.py", line 84, in catch_config_error
    return method(app, *args, **kwargs)
  File "/Users/rmcgibbo/local/ipython/IPython/core/application.py", line 325, in initialize
    self.parse_command_line(argv)
  File "/Users/rmcgibbo/local/ipython/IPython/frontend/terminal/ipapp.py", line 308, in parse_command_line
    return super(TerminalIPythonApp, self).parse_command_line(argv)
  File "<string>", line 2, in parse_command_line
  File "/Users/rmcgibbo/local/ipython/IPython/config/application.py", line 84, in catch_config_error
    return method(app, *args, **kwargs)
  File "/Users/rmcgibbo/local/ipython/IPython/config/application.py", line 420, in parse_command_line
    return self.initialize_subcommand(subc, subargv)
  File "<string>", line 2, in initialize_subcommand
  File "/Users/rmcgibbo/local/ipython/IPython/config/application.py", line 84, in catch_config_error
    return method(app, *args, **kwargs)
  File "/Users/rmcgibbo/local/ipython/IPython/config/application.py", line 352, in initialize_subcommand
    subapp = import_item(subapp)
  File "/Users/rmcgibbo/local/ipython/IPython/utils/importstring.py", line 40, in import_item
    module = __import__(package,fromlist=[obj])
  File "/Users/rmcgibbo/local/ipython/IPython/frontend/qt/console/qtconsoleapp.py", line 63, in <module>
    from IPython.frontend.qt.console.frontend_widget import FrontendWidget
  File "/Users/rmcgibbo/local/ipython/IPython/frontend/qt/console/frontend_widget.py", line 15, in <module>
    from IPython.core.inputsplitter import InputSplitter, transform_classic_prompt
ImportError: cannot import name transform_classic_prompt
Owner

takluyver commented Nov 13, 2012

Thanks, @rmcgibbo , good catch. I've fixed that now.

Owner

jasongrout commented Dec 6, 2012

As I mention on gh-2612, a lot of the string-based line transformers might be better done as tokenizer transforms. For example, %magic transformations can automatically take into account line continuations and strings. See #2612 (comment) for an example. In fact, this would probably be a better way to do all of the line-prefix IPython syntax ('%', '!!', etc.). Just look for the prefix as either the first token or as a token immediately following a newline token.

Contributor

dwf commented Jan 7, 2013

Just ran into this myself, with ! at the beginning of a line being transformed if inside a multiline string, but this discussion seems to have died. What's the current state of this pull request? Has it been superseded by something else?

Owner

takluyver commented Jan 7, 2013

It's not superseded. I'd like to get it merged, but it's a big enough change that I'm hoping @fperez will be able to look at it first. He's rather busy at the moment, though - maybe we should just land it and look out for breakages, before something comes along to conflict with it.

Thoughts, anyone?

Contributor

rmcgibbo commented Jan 7, 2013

If there's any work that needs doing (by a novice), I'd be happy to contribute a pair of eyes, write some tests, or whatever.

Contributor

bfroehle commented Jan 7, 2013

@takluyver I think another consideration is where in the release cycle we are. Given the scope, we probably want to land this a few months before a new point release so we have ample time to work out any kinks.

Owner

takluyver commented Jan 7, 2013

I think what I'm mainly after is a review of the design itself - see the initial description of this PR. For instance, Jason has pushed me to think about token-based transformers, which I think it should be possible to add to the same conceptual framework.

Anyone's welcome to make suggestions like that, but I have a lot of respect for Fernando's judgement in particular about what's required and what's unnecessary. But I agree that we need to get big changes in to give us time to find the problems before 0.14. I'll email Fernando and see if he'll have time soon.

Contributor

rmcgibbo commented Jan 7, 2013

Sweet. All of this sounds good.

Owner

fperez commented Jan 8, 2013

Just catching up here... I've been away for so long that I need to fit everything back into my mental cache, which right now is very, very cold. I've given everything a first read and will sleep on it and come back tomorrow to work further on it. But definitely as @takluyver says, now is the time to pitch in with further feedback on how well this works, ideas/problems, etc.

@bfroehle is 100% right that we need to be very careful regarding our release cycle. We haven't really had that discussion yet, but indeed merging this close to a release would be a bad idea. This touches the very heart of the system in major ways, so it should be done early in a full cycle, not late. But let's bang on it and review it for now, we can see what the right approach re. release will be once we're happy with the approach and code.

Thanks again to @takluyver for the enormous amount of work you've put into this: this stuff is not easy, so I really appreciate you working on it with such care.

Owner

jasongrout commented Jan 8, 2013

This may or may not make a difference, but right now the transformer system is the major holdup of getting the new IPython in Sage. In particular, right now there isn't a nice way for us to pass everything on to the input transformers (including things inside of strings). We can patch our IPython, but having a more flexible official system like this pull request starts giving would be better, of course.

Owner

fperez commented Jan 8, 2013

@jasongrout, definitely lay out for us anything in the planned new design that would be insufficient/suboptimal for you guys. Feedback like that is precisely what we need at this point, and Sage and Sympy are probably our highest priority real-world use cases.

Owner

jasongrout commented Jan 8, 2013

I think having the ideas in this pull request, plus token-based transformers, handles our use cases.

My comment above was an encouragement to get this pull request in (or at least the ideas behind this pull request). I haven't converted our transformers over to use AST-based transforms, but hope to in the near future.

Owner

fperez commented Jan 15, 2013

@takluyver, this is looking very solid, but I think it needs a bit of documentation explaining how the system is supposed to work and the coroutine logic. Coroutines aren't one of the most widely used parts of the language, so it's a good idea to provide a little extra explanation on how they fit into the problem here.

I know our docs are a little messy right now so even finding the right place for new information is painful. But having the explanation means we can always move it once we reorganize the docs (something I hope will happen soon).

IPython/core/inputtransformer.py
+ except tokenize.TokenError:
+ pass
+ return(tokenize.COMMENT in toktypes)
+
@fperez

fperez Jan 15, 2013

Owner

blank line missing here (and in a few more places - 2 blank lines between top-level function defs)

Owner

fperez commented Jan 15, 2013

On a general note, there's a ton of code with minmal/no docstrings there, which makes it hard for other developers to dig into later on. I know the old code wasn't perfect either, but now's the time to make an improvement on that front too :)

I'm a little unhappy about the logic implementing the coroutine machinery via a factory that creates a class at runtime: it's complex and requires calling this multiple times (one for each coroutine) at init time. This will likely add a small amount of startup time (likely very little, but I'm worried about our startup time going up and up, and these small changes add up in the long run and are hard to push back later).

I don't have a simpler solution in my head right now, but I wonder if we can't keep the coroutine design with an implementation that has less moving parts... Ideas?

Owner

minrk commented Jan 19, 2013

If I'm not mistaken, this should also close #678 when it's done.

Owner

takluyver commented Jan 20, 2013

@fperez : I can make CoroutineInputTransformer a simple class which is passed the coroutine definition at instantiation. But then, if it's used as a decorator, the result is already an instance, so there's no neat way to create a second instance of it. We could leave the coroutine undecorated, and instantiate it as CoroutineInputTransformer(cellmagic) when we're setting up the transformers. I'll have a think about it, but I think that might be the best way.

I'll try to add some docstrings.

Owner

takluyver commented Jan 24, 2013

I've rearranged things a bit, so that you can decorator a function with a class method called wrap, i.e.

@CoroutineInputTransformer.wrap
def classic_prompt():
    ...

I've also added a prototype TokenInputTransformer for @jasongrout. There's an example use in the test module. The API for that might well need to change, though.

Owner

jasongrout commented Jan 24, 2013

I notice that several times, you have stateful transformers for the sole reason of supporting line continuations (\). It would be nice to abstract that pattern out; I can imagine lots of people will want to automatically support line continuations in their line-by-line transforms. One way to abstract the pattern, just like the look_in_string variable, is that you could have a logical_line variable that would automatically concatenate lines that had continuations before feeding them into the transformer, or a line transformer could actually take a list of lines that were one "logical line".

In fact, that's one of the big reasons I advocated for tokenizer transforms---line continuations and things inside of strings (i.e., the 'logical line' issues) are automatically taken care of then.

Owner

jasongrout commented Jan 24, 2013

Above (#2447 (comment)) I commented that the prefix transformers (%, ;, etc.) would probably be a lot more elegant done as token-based transformers (then there is no need to do line continuation tricks, etc.). What do you think?

Owner

takluyver commented Jan 25, 2013

Thoughts on line continuations so far:

  • One way would be to provide a separate routine for accumulating a logical line, that transformers could call. But before 3.3 (with yield from), the code to delegate to another coroutine is probably as long as the half-a-dozen lines to actually do this.
  • Another approach would be to add a default transformer which assembled logical lines, and acted before the line magic & similar transformers. The latter could then be implemented as stateless transformers again. This is what I currently favour, but I want to think it through for a bit longer.

With that done, I think doing the prefix transformers based on tokens would be unnecessary complexity.

(The thought has also just occurred to me: could we do the same to multi-line strings? i.e. instead of a look_in_string attribute, put those transformers before the transformer that joins multi-line strings? Still thinking about this.)

Owner

takluyver commented Jan 25, 2013

In fact, I think one transformer can easily do line continuations and multi-line strings. I love it when a plan comes together.

Owner

takluyver commented Jan 25, 2013

The only slight problem I can see with this approach is that %magic/!system commands etc. can't use an unpaired triple-quote in their syntax, because it will be treated as the start of a multi-line string before they get to process it. But I've never seen anything that uses unpaired triple-quotes, so I think we can cross that bridge when we come to it.

Owner

jasongrout commented Jan 25, 2013

It sounds like you are basically rewriting the tokenizer. One other thing token-based magics would do is have a logical line contain tuples or parentheses that span multiple lines. For example:

%time [i*i for i
    in range(10)]

would work fine if %time was a tokenizing transformer. Do we want to support something like that?

Owner

takluyver commented Jan 25, 2013

Not at all - I'm using Python's tokenizer to find logical lines. I haven't reimplemented any of that functionality.

I'm inclined to say that no, we don't want to support that case. Only a couple of magics take Python code as input, and the input transformation framework isn't, and shouldn't be, aware of what specific magics do. Other magics could accept unbalanced parentheses, without consuming the following lines.

I do see a problem, though, with code like:

a = ("%s"
%foo
)

It's somewhat contrived, but it is valid, and at present we avoid transforming %foo, although I'm not quite sure how. I'll have a think about this.

Owner

takluyver commented Jan 25, 2013

I recall now, at present we skip the transformations if the input isn't complete, and the last line didn't end with : or ,. Which has various problems, but it handles the example I gave correctly. I've got the rough shape of an idea to deal with this, but I think I shall sleep on it.

Owner

jasongrout commented Jan 25, 2013

You can also use the tokenizer to split the input into logical lines, and have a (new?) type of transformer that takes logical lines, or maybe a list of physical lines that form one logical line.

Owner

jasongrout commented Jan 25, 2013

If I understand your comment right, then really the code-accepting transformers (like %time) should be reimplemented as tokenizer transforms so that they take logical lines. But then it's confusing that some magics use physical lines and some magics use logical lines. Maybe it's not worth the confusion...

Owner

jasongrout commented Jan 25, 2013

So it seems that one of the real difficulties here is dealing with the difference between logical lines and physical lines. We want magics to only work if they are at the start of a logical line, not just if they start a physical line. Sometimes we want transforms to operate on a logical line, sometimes on a physical line, sometimes on something between (e.g., line continuations work, but a list spanning multiple lines doesn't).

I guess the trick here is deciding on good conventions. I'm going to have to think about this more too.

Owner

takluyver commented Jan 25, 2013

Well, that's roughly the current state (different transformers that handle physical/logical lines) - transformers before assemble_logical_lines will be passed physical lines, those after it will get logical lines. But at the moment, logical lines are based on explicit \ continuations or triple-quoted strings, not parentheses. It's tricky because I want an open bracket to make a logical line when it's outside a magic, but not when it starts inside a magic.

There aren't separate transformers for separate magics, and I don't want there to be. The transformers just see the % prefix and turn the rest of the line into a magic.

Owner

jasongrout commented Jan 25, 2013

So now we have:

  • physical lines
  • semi-logical lines (only triple-quoted strings and line continuations span lines
  • python logical lines (semi-logical lines, plus matching brackets/parentheses)

I'd say that if you insist on matching string delimiters, it'll be confusing to not match brackets too. Maybe it may make sense to:

  1. run transformers acting on true physical lines
  2. assemble_logical_lines only concatenates lines with line continuations (\)
  3. run transformers acting on this variant of "logical lines"
  4. run tokenizing transformers
  5. parse AST and run ast transformers

I'm thinking more about this too...

Owner

jasongrout commented Jan 25, 2013

hehe...did you notice that my definition of a semi-logical line was a semi-logical line, but not a logical line :) No pun intended...

Owner

takluyver commented Jan 26, 2013

Roughly, but at step 3, I'd like to run line-based transformers for things like magics, which shouldn't be act if we're within a logical line.

To give concrete examples: in this case, I want the logical line to be assembled before the magic transformer acts:

a = ('%s'
%foo )

whereas in this case, I want the magic transformer to act before it tries to assemble a logical line:

%foo (
blah

i.e. this should get translated to:

get_ipython().magic(u'foo (')
blah
Owner

takluyver commented Jan 27, 2013

OK, I think I'm getting somewhere. Transformers are now in 3 groups:

  • Physical line: act per line as the user enters it, e.g. stripping >>> prompts.
  • Logical line: act on lines as joined by explicit \ line continuations. Skipped when we're within a multi-line statement. E.g. %magic recognition.
  • Python line: act on lines containing whole Python statements, i.e. they get multi-line strings, lists, function calls etc. in one go. We don't currently use any of these in IPython.

Cell magics are currently recognised by logical lines, rather than physical lines. The advantage is that the first line (which is passed separately) can be extended using explicit line continuations. The downside is that the content of the cell magic can't use a \ at the end of a line to mean anything other than transparent line continuation.

Contributor

asmeurer commented Jan 27, 2013

I've only been half following this, so pardon if this has already been answered, but will this make it possible to directly paste something like

def f(x):
    a = x + 1

    return a

with the blank line before the return?

Contributor

asmeurer commented Jan 27, 2013

I guess to do so, you would have to recompile the function if there are more indented lines immediately after the definition. There shouldn't be any code reexecution issues with this, no? The downside is that you wouldn't really be able to do this with classes, where it would be more useful since people tend to put blank lines between methods, because you can get code reexecution in that case. You also shouldn't delay class creation because arbitrary things can happen then with metaclasses.

Owner

takluyver commented Jan 27, 2013

@asmeurer : No, that's not currently in the plan for the terminal - it should already work in the Qt console and the notebook. We'll continue to point people to %paste and %cpaste for such cases.

Besides the technical complexity of the suggestion, it's not clear that it's the expected behaviour in all cases. e.g.

def f(x):
    a = x + 1

[At this point, the cell closes and IPython defines the function]

    print a

[Does the user want this to be part of the function in the previous cell, or do they want to execute it directly, and the indentation is just an artefact of copying it?]

Of course, you can come up with heuristics to try to guess that, but control interfaces should not be intelligent.

Also, I've just recalled that defining a function can have arbitrary effects, because default parameters are evaluated when the function is defined. So automatic redefinitions are a no go.

Contributor

asmeurer commented Jan 27, 2013

Ah, you're right. I was thinking that default parameters wouldn't have an effect, because they are only on the first line, but of course that doesn't make a difference here. So you can ignore the suggestion.

Owner

takluyver commented Jan 28, 2013

I've now bundled patched copies of tokenize for Python 2 and 3 into utils, because there are fixes we need in both tokenizing and untokenizing.

The Travis failures are real, but they're a test added in master for a particular error message, and with these changes the relevant code doesn't raise an error. I'll have to tweak that separately.

One remaining corner cases I'm aware of is that, unlike in regular Python, a backslash at the end of a comment now extends the comment to the next physical line. i.e.:

In [8]: #foo\
   ...: a=1

In [9]: a
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-9-60b725f10c9c> in <module>()
----> 1 a

NameError: name 'a' is not defined

I'm still thinking through the best way to fix this.

Owner

takluyver commented Jan 28, 2013

I've fixed the comment thing, although that means that magic commands can't be continued by a \ if they contain a #, i.e. !echo foo#bar \ will execute directly, rather than letting you continue the magic command on a new physical line. But before this PR, you can't continue magic commands at all, so this restriction doesn't seem like a big loss.

Owner

minrk commented Jan 28, 2013

Why are magics and comments treated the same wrt line continuation?

Owner

takluyver commented Jan 28, 2013

It's not the magic, it's the # (i.e. the comment marker). Line continuations don't work within comments in Python:

>>> # comment\
... print(2)
2

So the test for whether to continue the line is if line.endswith('\\') and (not has_comment(line)):. But this is running before magic commands are transformed, to solve other problems, so has_comment() picks up on a # character in the magic command (unless it's in a "string#literal"), and cancels the line continuation. i.e.

In [1]: !echo foobar \
   ...: baz
foobar baz

In [2]: !echo foo#bar \
foo#bar \

(!system commands behave in essentially the same way as magics, and they're easy to demonstrate)

Owner

jasongrout commented Jan 28, 2013

I like your thinking regarding line terminology, Thomas. So are you saying that logical lines always start at the start of a python line? Does that handle your "skipped when in multiline statement". So a line in a block of code will always be a physical line, and will always be part of a python line, but may or may not be part of a logical line?

Owner

minrk commented Jan 28, 2013

oh, I get it. I thought you were saying the first case wouldn't work. That seems totally reasonable, then.

Owner

takluyver commented Jan 28, 2013

Jason, yes, that's right. So when a line is input, it goes (summarising IPythonInputSplitter.push_line):

-> physical line transformers (strip prompts & leading whitespace)
if not within_python_line:
  -> assemble_logical_lines
  -> logical line transformers (transform magics etc.)
-> assemble_python_lines
-> python_line_transformers (any 'almost Python' transformations, like your [1..2] syntax)

Both of the assemble_ transformers act by saving up lines until they judge it complete, at which point they send the lot downstream.

So yes, physical and python line transformers see everything (eventually, in the latter case), while logical line transformers don't see subsequent lines in a multi-line Python statement.

Owner

jasongrout commented Jan 29, 2013

It's still vaguely bothersome to me that logical line transformers miss some of the input. What if you pass logical line transformers a sequence of logical lines that together form a single python line? The transformers are then responsible to pass back a sequence of logical lines.

Is there still a place for tokenizer transformers? After the python_line_transformers? And then ast transformers after that?

Owner

takluyver commented Jan 29, 2013

Well, missing some of the input is the same idea as the transformers with look_in_string = False in the previous iteration of the design. I think it's a reasonable feature.

A series of logical lines can naturally make a Python line. For example, if I enter:

a = 1 + \
2 + \
3

assemble_logical_lines will turn that into a = 1 + 2 + 3, it will pass through the logical line transformers without any change, and continue to the Python line transformers.

If I enter:

a = \
!ls

Note that after the first line, it could be valid Python. It will be assembled into the logical line a = !ls, translated into an assignment from a system command a = get_ipython().getoutput('ls') by one of the logical line transformers, and then carry on to the Python line transformers.

Likewise, in the case of a cell magic, it will accumulate lines, and then spit out a single Python expression with its first line and cell contained in string literals. That expression will go through Python line transformers.

Token transformers can go in python_line_transforms. For now, the TokenInputTransformer wrapper tokenizes and untokenizes the code each time it's used, but we can optimise that later.

Owner

Carreau commented Feb 11, 2013

Link to #2539 (enable cell magic in ipy script) which I assigned to @takluyver

Owner

Carreau commented Mar 27, 2013

FYI, Need rebase.

Owner

minrk commented Mar 29, 2013

We wanted this one to live in master for a while, so that kinks could be worked out, but the window between now and 1.0 is shrinking. What is left to do here?

Owner

takluyver commented Mar 31, 2013

I've rebased it - the conflict was just in the what's new document.

I'd still like to get this in sooner rather than later. I'd really like @fperez to have a look over it, though.

Owner

takluyver commented Apr 6, 2013

On reflection, I'd like to push gently to get this in sooner rather than later. I've had some productive discussions about how to implement it, and I think I'm happy with the result. I don't want us to ship 1.0 with an input transformation framework that we're already planning to scrap.

Owner

minrk commented Apr 11, 2013

So @takluyver, you would say it's ready for merge as-is, nothing left to do? If so, I'm okay with merging now.

Owner

takluyver commented Apr 11, 2013

Yes. I anticipate that once more people start using it, we may discover some other problems that we need to fix, but that's always the case with big changes, and I'm happy with the way it's working.

Owner

minrk commented Apr 11, 2013

Great - then I say go ahead and merge. I would do it, but I bet you have a better sense of what issues this closes, since I think there are quite a few.

Owner

takluyver commented Apr 11, 2013

I probably don't remember them all myself, but I'll close what I can find.

Here we go!

takluyver added a commit that referenced this pull request Apr 11, 2013

@takluyver takluyver merged commit 331b90a into ipython:master Apr 11, 2013

1 check passed

default The Travis build passed
Details

mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment