Ast transfomers #2301

Merged
merged 10 commits into from Nov 30, 2012

Conversation

Projects
None yet
6 participants
Owner

takluyver commented Aug 14, 2012

This is the next step towards IPEP 2 (#2293). It provides a mechanism that third parties can register an instance of a subclass of ast.NodeTransformer - or anything that quacks the same way, although the docstring doesn't currently mention that. It will be applied to the REPL input before executing it.

For now, transformers are expected to catch their own errors. If an error leaks out of a transformer, it is disabled, and a warning printed. We could relax this restriction in future.

In normal use, with no transformers defined, it should have negligible impact on performance, as we were already using the AST, so there's just one extra no-op function call. Edit: there's now also a call to ast.fix_missing_locations() - we could skip this with an if when there are no transformations active.

Updated: I've added tests for %timeit, %time and macros with a transformation. Macros are transformed when they are run, not when they are defined, in keeping with the idea that they're like copying the code in place. Adapting %timeit required dropping the use of timeit.template - instead, it constructs an AST equivalent of the template and adds the transformed code into that.

This pull request passes (merged a910db8 into 06a7a57).

Owner

takluyver commented Aug 14, 2012

Apologies, this has acquired a few unrelated changes - I made a small change to IPython.utils.warn, then realised I should update half a dozen places where it is called.

This pull request passes (merged ea09ed9 into 06a7a57).

Owner

takluyver commented Sep 26, 2012

I've added a test for the Integer() wrapper that both SAGE and SymPy want to use this for. This is what the transformer looks like:

class IntegerWrapper(ast.NodeTransformer):
    """Wraps all integers in a call to Integer()"""
    def visit_Num(self, node):
        if isinstance(node.n, int):
            return ast.Call(func=ast.Name(id='Integer', ctx=ast.Load()),
                            args=[node], keywords=[])

When a transformer threw an error, wouldn't be be better to ignore it in the current transformation only and not for all subsequent transformations?

Owner

takluyver replied Sep 29, 2012

It's similar to what we do with hooks elsewhere - something that leaks errors is considered misbehaving, and dropped. It's deliberately conservative for now, but it could be relaxed later if there's a good reason to.

I could just imagine, that a transformer fails in one transformation, but works again later on, so it'd be better to leave it in place.
But misbehaving once and nuking it out, makes sense too.

Contributor

asmeurer commented Sep 29, 2012

I'm playing around with this, to see how it would look like for isympy. So far, I haven't gotten it to work. I just get

WARNING: AST transformer <sympy.interactive.session.SymbolWrapper object at 0x112449cd0> threw an error. It will be unregistered.

This isn't very helpful. Is there a way to see the full traceback?

Contributor

asmeurer commented Sep 29, 2012

By the way, do you plan to update the documentation here? A specific example of, for example, how to use the Integer wrapper transformer (or another transformer if you think it would be more instructive) would be quite helpful.

Contributor

asmeurer commented Sep 29, 2012

I think your Integer example is wrong. From the ast docs: "If the return value of the visitor method is None, the node will be removed from its location, otherwise it is replaced with the return value. The return value may be the original node in which case no replacement takes place." So I think you need an additional line at the bottom, "return node". Otherwise, something like 1.2 will fail.

Contributor

asmeurer commented Sep 29, 2012

How efficient will this be for multiple transformers? Is it worth the time, either for me as a user or possibly for IPython, to attempt to combine disjoint transformers, so that the nodes are only walked once?

Contributor

asmeurer commented Sep 29, 2012

On the topic of ast transformers, does anyone know of any good documentation/tutorials for this? The official Python docs are very sparse, and don't explain what many things are (for example, what does Param mean?). I'm trying to figure out how to write a transformer to wrap undefined names with Symbol(), but visit_Name is not enough (I think) because it needs to not transform something like the a in a = 1.

Contributor

asmeurer commented Sep 29, 2012

If you use your IntegerWrapper, without the return node, you get something like

In [1]: 1.2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/sw/lib/python2.7/codeop.pyc in __call__(self, source, filename, symbol)
    131 
    132     def __call__(self, source, filename, symbol):
--> 133         codeob = compile(source, filename, symbol, self.flags, 1)
    134         for feature in _features:
    135             if codeob.co_flags & feature.compiler_flag:

TypeError: required field "value" missing from Expr

I wonder if IPython should somehow catch this as well, and unload the transformer in that case too. It happens at a very specific location in the code (namely, when it tries to compile the empty ast), so I think it should be possible.

Owner

takluyver commented Sep 29, 2012

Thanks, good catch with the Integer wrapper, I've fixed it.

  • Documentation: yes, I'll put this example in as part of the IPEP2 work. It might be in a separate PR.
  • Efficiency: IPython certainly shouldn't combine transformers, because one transformation might interfere with anoter. If you're sure it's safe, you can combine your own, but it sounds like premature optimisation. I doubt the performance gain will be noticeable.
  • I don't know of any good documentation, but it looks like the .ctx attribute of a Name shows whether it's being used to store or load a variable:
In [5]: ast.dump(ast.parse("a = b"))
Out[5]: "Module(body=[Assign(targets=[Name(id='a', ctx=Store())], value=Name(id='b', ctx=Load()))])"
Owner

takluyver commented Sep 29, 2012

Hmm, that's a good point with the error from producing an invalid AST. It's not trivial to fix, though - we don't know which transformer introduced the error if there's more than one loaded. Maybe we can check the AST after each transformation has been applied.

Contributor

asmeurer commented Sep 29, 2012

Efficiency: IPython certainly shouldn't combine transformers, because one transformation might interfere with anoter. If you're sure it's safe, you can combine your own, but it sounds like premature optimisation. I doubt the performance gain will be noticeable.

Ah, you're right. I was thinking it would be safe to combine transformers with disjoint visit_* methods, but clearly that is not the case, because each method could replace the node with an arbitrary other node. In fact, the tricky problem might be to make sure your transformers are injected in the right place, if you want to combine two different apps that use them. But that is not IPython's problem.

For SymPy, if we want to use both the integer and undefined name wrappers, it will be trivial to make a subclass of both transformers, and use that as the transformer instead.

And you're right that it probably wouldn't be noticeable, unless you have hundreds of transformers for some reason (or an insanely large input).

Contributor

asmeurer commented Sep 29, 2012

I think I figured out the basic idea for the Symbol transformer. It still has some issues, but the basic stuff works.

Owner

takluyver commented Sep 29, 2012

Even with a very large input, my guess would be that the string processing
steps would be the bottleneck, not the AST transformations. If there are
hundreds of transformations, it might get slow, but that's probably a cue
that you need to write your own language. ;-)

Yes, if you combine transformers from different sources, all kinds of
oddities could occur. I envisage this as mainly something for use by
downstream projects like Sympy or Sage, who can arrange a set of
transformers that play nicely together. Users are welcome to define
transformers themselves, but we'll make the docs clear that there's rope to
hang yourself there.

Contributor

asmeurer commented Sep 30, 2012

Yeah, I completely agree with you.

Contributor

asmeurer commented Nov 23, 2012

In case anyone is curious, my work so far at rewriting isympy -I to use this is at https://github.com/asmeurer/sympy/tree/ast. The Symbol transformer is broken, because it's not actually as simple as wrapping all undefined names in Symbol, because that does things like Symbol('a') = 1, f(Symbol('a')=1), and x.Symbol('a') (instead of a = 1, f(a=1), and x.a). Do you think wrapping undefined names would be a common use-case? If so, maybe it would be nice to add a helper to IPython itself to do this, since it's clearly not easy to get right.

Owner

takluyver commented Nov 23, 2012

In cases like a=1, the Name node should have ctx=Store() - and I see your transformer already checks that ctx.__class__ == Load, so that shouldn't be an issue. The other cases (f(a=1) and x.a) don't generate a Name node for a:

In [2]: %%dump_ast
   ...: b.a
   ...: 
Module(body=[
    Expr(value=Attribute(value=Name(id='b', ctx=Load()), attr='a', ctx=Load())),
  ])

In [3]: %%dump_ast
   ...: f(a=1)
   ...: 
Module(body=[
    Expr(value=Call(func=Name(id='f', ctx=Load()), args=[], keywords=[
        keyword(arg='a', value=Num(n=1)),
      ], starargs=None, kwargs=None)),
  ])

So I'm not sure why it would break in any of those cases.

If it's useful to you, I've started a guide to working with ASTs: http://greentreesnakes.readthedocs.org/en/latest/

Contributor

asmeurer commented Nov 23, 2012

So I'm not sure why it would break in any of those cases.

Maybe I am thinking more of issues with tokenize than with ast. Even so, there are bugs still. I guess they are more related to not overriding already defined names.

If it's useful to you, I've started a guide to working with ASTs: http://greentreesnakes.readthedocs.org/en/latest/

Great. I'll keep that in mind the next time I do some ast hacking.

Owner

takluyver commented Nov 24, 2012

I can imagine there would be problems if you define and use a variable in the same cell, because it won't be in the namespace when you get the AST. I think that's hard to deal with precisely, because of scoping rules and so on, but it should be possible to do a solution that works in enough cases, by scanning for variable definitions.

Contributor

asmeurer commented Nov 25, 2012

I didn't even think of that. I personally only use the terminal, with one-line commands 99% of the time. I'll keep that in mind.

Maybe I could just create a magic function that returns the object if it is already defined and a Symbol otherwise and wrap everything as obj_or_symbol('name').

Owner

takluyver commented Nov 25, 2012

Ye-es, although getting the scopes right could still be tricky. And doing
it to every name could hit performance.

I think it's reasonable to say that inside functions/classes, the user
should declare their own symbols. So auto-symbols are only needed in the
main namespace, which should make the problem more tractable. Then we just
need to scan for assignments (names with a Store() context), imports, and
function/class definitions.

Contributor

asmeurer commented Nov 25, 2012

Yes, it already doesn't work inside functions, and I didn't plan to make that work. It's designed for easy interactive use, which usually doesn't involve defining many functions/classes. The main issue is multiline cells, like for loops. Or with the notebook, literally anything can be grouped together in a cell.

I'd also add a search for var(), which is SymPy's hacky function that injects symbols into the namespace. It's hackish, but still less so than the current solution (catching NameError).

Owner

takluyver commented Nov 25, 2012

As currently written, I think it will act inside function definitions, although I haven't tested. A NodeTransformer does walk into function and class definitions by default. If you want to prevent that, you can override visit_FunctionDef and visit_ClassDef:

def visit_FunctionDef(self, node):
    return node

(The key is that it's not calling self.generic_visit, which implements visiting the child nodes)

Owner

takluyver commented Nov 27, 2012

I had an idea about your use case. Instead of turning undefined variables into Symbol('a'), you could scan for variables that aren't yet defined, then insert an Assign node at the top of the cell to create them. So 2**y would become:

y = Symbol('y')
2**y

Then if it's defined by any means in that cell, the new definition will replace the automatically created symbol. The only case I can see it would break is if the code is looking for a NameError, as in:

try:
    unicode
except NameError:
    unicode = str

I might be inclined to only create automatic symbols for single-character variables, to reduce potential collisions like that. But that's up to you.

Contributor

asmeurer commented Nov 27, 2012

We still have to check if the name has been imported or not.

Owner

takluyver commented Nov 28, 2012

Within the cell? I think imports should override names defined above. We
still have to check if the name is defined before the cell, but that's the
easy bit.

Owner

jasongrout commented Nov 29, 2012

I'm curious what the status of this pull request is. Do you see a lot more work that needs to be done on it? I'd like to start working on making Sage transforms use it if it's relatively stable.

Owner

takluyver commented Nov 29, 2012

I'm quite happy with the shape of this. Ideally, I'd like @fperez to have a brief look at it before I merge, but I fully intend to get it into 0.14, and I don't think anyone has objected yet.

Contributor

bfroehle commented Nov 29, 2012

@takluyver Nothing here seems objectionable to me, but I'm not familiar enough (yet) with these mechanisms to form a solid opinion.

Contributor

asmeurer commented Nov 29, 2012

From what I've seen and used of this so far, everything seems fine from me.

Owner

takluyver commented Nov 30, 2012

Thanks guys. I'm going to merge this now. It's possible that the API will still change before release.

@fperez - if you want to discuss any aspect of this post-merge, let me know. The same goes for anyone else, of course.

@takluyver takluyver added a commit that referenced this pull request Nov 30, 2012

@takluyver takluyver Merge pull request #2301 from takluyver/ast-transfomers
Ast transfomers
a7cc583

@takluyver takluyver merged commit a7cc583 into ipython:master Nov 30, 2012

1 check passed

default The Travis build passed
Details
Owner

jasongrout commented Nov 30, 2012

One thing that is common for us to do is to not only transform integers to Integer() calls, but we also pull these calls out of subexpressions as an optimization step. For example:

sage: print sage.misc.preparser.preparse_file("""for i in range(20):
    print i+1
""")
....: 
_sage_const_20 = Integer(20); _sage_const_1 = Integer(1)
for i in range(_sage_const_20 ):
    print i+_sage_const_1 

I'm curious if you can see an easy way to do this with your AST transformers. It seems that since we are really just visiting a node at a time, it's hard to do massive reorganizations of the AST, like adding some new nodes at the top of the code.

Owner

takluyver commented Nov 30, 2012

It's certainly possible. The top-level node is an ast.Module, so you should be able to override visit_Module something like this:

def visit_Module(self, node):
    self.integer_consts = []
    self.generic_visit(node)   # This walks through the descendants, calling methods such as visit_Num, which you also override
    for const in self.integer_consts:
        const_assign = assign_new_const(const)
        node.body.insert(0, const_assign)
    return node
Owner

jasongrout commented Dec 1, 2012

Ah, great! Thanks!

@mattvonrocketstein mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014

@takluyver takluyver Merge pull request #2301 from takluyver/ast-transfomers
Ast transfomers
1cb281c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment