New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPEP 2: Input transformations #2293
Comments
Can we use github issues for IPEPs like Fernando did for IPEP 1? I think On Sun, Aug 12, 2012 at 10:25 AM, Thomas Kluyver
Brian E. Granger |
This is a Github issue ;-). Fernando also used a Gist for the actual document, but copied it into the issue. My preference is to have the document in a single location; with two locations you can't be sure that what you're reading is up to date. |
I should have clarified what I mean. There are currently two pages I have to visit:
And there are three places people can post comments:
That is confusing. I agree that Fernando's approach of putting the doc as a gist and pasted into the issue isn't quite right, but I your approach has too many things going on. Why not just put the doc only on the issue and encourage people to leave comments only there (rather than also on the list)? If we want a separate place other than the issue to host the document I recommend putting them on the github wiki. The benefit of this is that any of us can easily edit the IPEP and we will have a centralize record of all the IPEPs. I think that is what I would prefer along with all comments going to the issue. |
IPEP 3 : How to write an IPEP. |
OK, I've copied it in above. The downside of this approach is that we don't have a history of the IPEP, but I guess that's not so important (I've never looked up the versioning of a PEP). Now perhaps we can discuss the content. ;-) |
Will this be able to handle things like the physics extension that attempt to extend the Python syntax? |
Why do you want to give the ast of the code instead of the string of the code? Maybe related to some detail of how IPython works that I'm not familiar with? |
Too late for meaningful comment, I just wanted to say thank you for getting this going!! That machinery did improve a lot after the big refactor that led to inputsplitter in 0.11, but I knew at the time we'd still need one more pass to completely consolidate and rationalize things. Now is the time to do it, and hopefully we'll end up with just one, comprehensible module to handle all input transformations, as well as a clear policy of what it is that we do, where users can plug in their own customizations and what they can and can't expect to be able to do there. |
Thanks, Fernando. Aaron: I would leave the prefilter machinery working as is, but with the same limitation of only acting on a single line, so the physics extension would continue to work as well as it does now. If Georg wants to update it after this refactoring, I would point him towards the new InputSplitter machinery, which should be capable of handling it. I want to have the option of using an AST because, if your intention is to transform all integer literals in a block of code, the only way to do that reliably is to parse the code (consider an input like
In case it wasn't clear, the AST transformation I'm talking about is in addition to the code-string transformations done by inputsplitter & prefilter. But I would encourage people to use it where possible, and I think it is possible in your case. |
Is #2164 (Request: Line breaks in line magic command) related to this IPEP? |
I'm increasingly enthusiastic about the AST-transforming idea. With access to the parse tree and the interactive namespace, all sorts of magic becomes possible, like intercepting references to undefined variables. Of course, there's a lot of stuff that you really shouldn't do, but some of the best bits of IPython are those that skirt the edge of 'never do this'. |
@tkf: It's certainly related. I'm having ideas about how to redesign InputSplitter, so I'll try to work in a way to handle that case. Thanks for pointing it out. |
Great! |
OK, that makes sense. Currently we use tokenize to do the transformation, but I guess ast would work just as well. This would also solve the problem of what to do with SyntaxErrors for the user, as with your model code with incorrect syntax would never even reach the sat transformer stage. Though I disagree with leaving line based machinery. As I argued elsewhere, 99% of the time, you really want to act on the whole cell, not a line (#2164 is a great example of this). In the cases when you really do want lines, you can always do Also, since it's clear that you need a string based whole cell transformer, wouldn't a line transformer be redundant anyway? |
By the way, I don't remember if I mentioned this elsewhere (maybe that's why you mentioned it), but we would indeed use the AST to catch undefined names for SymPy. Our current method adds an exception handler for NameError, but this leads to subtle bugs, because the code is ran until the NameError is caught, and then ran again, so things like a = 0
for i in range(10):
a += 1
print b would result in My only complaint with AST is that it's a little easier to do the things I want with tokenize (although I've never actually used |
By the way, for additional reference, the SymPy hack code that monkey-patches Also, issue #1491 is related. |
…orms Remove code from prefilter that duplicates functionality in inputsplitter This is the first step towards implementing IPEP 2 (#2293). Removed all the static transformations from prefilter, because we're relying on the equivalent functionality in inputsplitter. Note that this is a backwards-incompatible change for anyone who might have relied on the low-level details of the prefiltering machinery. Regular users of the IPython applications themselves will not see any changes in behavior.
I think it might actually be easier to do the things you want with AST - i.e. simpler code - but it's less obvious. E.g. to transform integers, I think this should be sufficient:
You would register that class with IPython to have it act on all input. Transforming a whole cell at a time might seem obvious from the outside, but:
To reiterate, where the transformation expects syntactically valid Python as input, I would encourage people to use the AST hook. InputSplitter is only required when you're defining some syntax that's not valid Python itself. |
We must be misunderstanding each other, because this is exactly my argument for using cell-based transformations. You can have
or even
You have to parse the whole cell to know what is inside a string and what is not.
OK, I see your argument. So I guess the question is, should IPython mask from the user (of the API) whether or not the front-end is line-based or not? If yes, then the line-by-line parsing into a single cell should happen at a different level than the rest of this. If no, then there are some tricky issues, especially since the line-by-line-ness of the console front-end is shaky, as it depends on the state of readline, and has things like semi-working multiline editing enabled. I don't know the answer here (I also know much less about IPython internals than you, so I hope my input is meaningful). Perhaps there should be a way for transformers to "register" with InputSplitter to let it extend the idea of whether input should continue after a line or not.
I'm not arguing there, but there is still the issue of extending the Python syntax. Most of this will happen with IPython itself, but others may want to do it too. And the public API should be good enough that IPython internals just use it. |
@asmeurer hope you don't mind, I edited you comment, you were missing a backtick which was making the all thing hard to read. |
Consider if you enter:
You can't just hand it to Python's parser, because it will choke on the syntax. We need to implement some degree of parsing ourselves. Going line-by-line, that's a relatively simple regex, and inputsplitter takes care of not applying it to It does give us a limitation: if you're extending Python syntax, your new syntax needs to be line based (like our magic commands: you can do Yes, my thoughts for InputSplitter include a way for transformers to indicate when a block is complete. This would be useful for cell magics, for instance. But that still puts transformers in the same position of being fed code line-by-line; though there may be a way for them to accumulate it and return it in one go, to allow things like line breaks in a magic command (#2164). |
@Carreau that's fine. |
Hmm. We'll, I understand why you don't want to implement your own parser to truly extend the syntax, though I still don't see why you can't make it possible for others to do that if they really wanted to, and just split the code line-by-line yourself. But I guess there is more logic for line-by-line than By the way, a third potential transformation we might do in SymPy is the automatic replacement of Actually, changing precedence of operators might be an interesting thing for us to try too. I know, I know, at some point, we might as well just invent our own language and stop using Python (and actually, one of SymPy's design goals is to embrace Python's designs and not try to change them), but some modules in SymPy use certain operators, in particular the logical operators |
For extensions to Python syntax, I think a line-based syntax is useful and Third parties could split by line and reimplement what we do in
The ^ -> ** case is interesting. Part of me thinks that if you're going to |
The tricky one is changing the precedence order of operations. To do that, you'd have to add parentheses, which is not doable without the whole string, at least in the case of line continuations. |
On 16 August 2012 01:21, Aaron Meurer notifications@github.com wrote:
I would probably use tokenize to do ^ -> ** properly.
|
The way I would do it is to just add parentheses (for example, I would change |
Currently you can't use "variable injection"
Is it in the scope of this IPEP? |
I don't think that's within the scope of this work. If we want that, it should be implemented in the relevant cell magics - because it doesn't make sense for all of them, e.g. |
My understanding is that |
It is currently applied for any line magic, but it's a separate step after transformation. Transformation produces valid Python code, and the functions called to run magic or system commands expand variables. E.g. |
I see. Thanks for the explanation. |
The reworking of input transformation is essentially complete, so I'm going to close this. |
…-transforms Remove code from prefilter that duplicates functionality in inputsplitter This is the first step towards implementing IPEP 2 (ipython#2293). Removed all the static transformations from prefilter, because we're relying on the equivalent functionality in inputsplitter. Note that this is a backwards-incompatible change for anyone who might have relied on the low-level details of the prefiltering machinery. Regular users of the IPython applications themselves will not see any changes in behavior.
The state of our input transformation machinery has come up a couple of times recently, and I'd promised to look into it.
Requirements
A line-by-line input filter is needed for two main reasons:
We also need to do some transformations which are only possible with access to the interactive namespace, i.e. they must be done in the kernel. Examples include the autocall system (which lets you type
exit
to exit), macros, automagics (using magics without the % prefix) and aliases for shell commands (likels
). We refer to these as 'dynamic transformations'.Finally, we need an extensible system that third parties can hook into without having to monkeypatch lots of our code.
Current situation
InputSplitter does line-by-line transformation (the name's a little confusing, as its primary role is no longer splitting input). It also handles cell magics, but the implementation feels somewhat awkward to me. For line-based frontends, inputsplitter is run twice: once by the frontend, and again by
run_cell()
, which is called with the raw, untransformed code.Prefilter does dynamic transformations using a mixture of Transformer subclasses and Checker/Handler subclass pairs. We've struck the compromise that dynamic transforms only happen on single line cells, because the frontend can't make them valid syntax on its own. This is the primary extension point for third parties, but it's somewhat awkward to use (subclassing from Transformer isn't simple), and doesn't work as extension authors might expect (only transforms single lines).
Several bits of functionality are duplicated in inputsplitter and prefilter: the transformations for
%magic
,!system
, assigning versions of both (foo = %magic
),help?
(and?help
,morehelp??
), escapes for various kinds of call (/callme arg
,,quoteseparate a b c
,;quotetogether a b c
), and stripping Python/IPython input prompts. As far as I know, we only use the inputsplitter versions of these functions, since Fernando fixed%paste
to use inputsplitter.Suggestions
ast
module already has a NodeTransformer class to support this kind of thing. This approach is limited to code that is already valid Python syntax before the transformation, but it should be powerful and reliable in those situations.The text was updated successfully, but these errors were encountered: