Annotated AST #166

erezsh · 2021-05-04T14:25:29Z

Is there a way to access the parsed AST, with the inferred types? I want to write a code translation tool, and knowing the types would be very useful.
I mean, having a regular AST (similar to ast.AST) but with added attributes, such as: declared_type, and inferred_types.

I tried to look through the code, but couldn't figure it out.

The text was updated successfully, but these errors were encountered:

JelleZijlstra · 2021-05-04T14:30:48Z

There is, yes: the annotate=True argument to the NameCheckVisitor constructor. We've used it internally but it's not documented yet. It puts instances of pyanalyze.value.Value in the inferred_value attribute of AST nodes. I can write up some more documentation later this week.

erezsh · 2021-05-04T15:43:05Z

Can you help me understand what I'm doing wrong here?

import ast
from pyanalyze import name_check_visitor

code = """
a = 1
"""

tree = ast.parse(code)

ncv = name_check_visitor.NameCheckVisitor("bla", code, tree, annotate=True)
ncv.visit(tree)

assign = tree.body[0]
assert assign.value.inferred_value.get_type() is int	# OK

a = assign.targets[0].inferred_value
print(repr(a))			# UnresolvedValue()  :(

It would expect a to have type int.

JelleZijlstra · 2021-05-04T15:48:07Z

I guess we don't infer a value for names in write context because it doesn't matter for pyanalyze's own type inference. I submitted a fix in #167.

(Also note that the version of pyanalyze on PyPI is pretty out of date, if you're using it. I'll hopefully make a new PyPI release soon.)

erezsh · 2021-05-04T15:54:59Z

Nice, it's working, thanks! I'll try it on more complicated stuff.

JelleZijlstra · 2021-05-04T15:56:20Z

Great to hear! Also for more complicated cases pyanalyze will likely want a real imported module. You may need a trick like in

pyanalyze/pyanalyze/test_name_check_visitor.py

Line 86 in febd99c

mod = _make_module(code_str)

.

I'm interested in making this a more directly usable API if it's useful for you, something like a documented pyanalyze.annotate_code function.

erezsh · 2021-05-04T16:04:44Z

Yes, that would be very useful for me.

And I imagine that for others as well.

JelleZijlstra · 2021-05-04T16:07:03Z

Great! Let me know what your experience is with this functionality.

erezsh · 2021-05-04T16:13:50Z

Well, it's definitely a rocky road :)

import ast
from pyanalyze import name_check_visitor

code = """
class A:
	a = 1
"""

tree = ast.parse(code)

ncv = name_check_visitor.NameCheckVisitor("bla", code, tree, annotate=True)
ncv.visit(tree)

print( tree.body[0].body[0].targets[0].inferred_value )		# AttributeError :(

JelleZijlstra · 2021-05-04T16:23:07Z

OK, that's because class bodies are only visited during the second "checking" pass, but your code makes it only do the "collecting" pass. Calling ncv.check() should do both passes, but that returns early if the module can't be imported. Maybe removing the early return in NameCheckVisitor.check will fix the immediate issue for you, or the _make_module trick I mentioned above.

erezsh · 2021-05-04T16:31:33Z

Okay, it's working now. It crashes if any imports are wrong, which ideally it shouldn't, but for now it's not a problem.

JelleZijlstra · 2021-05-04T16:32:44Z

It crashes if any imports are wrong

Thanks, could you share more details on that? I'd like to make it more robust to this sort of thing.

erezsh · 2021-05-04T16:40:47Z

Well, if I analyze a file within a package, it might have all sorts of relative imports that aren't available. Ideally they should be, of course, and that probably warrants a warning, but ideally I'd like it to just accept the import as unknown, like imported_module = UnknownObject().

Anyway, new problem:

import ast
from pyanalyze import name_check_visitor
from pyanalyze.test_name_check_visitor import _make_module

code = """
class A:
	def __init__(self):
		self.a = 1
"""
mod = _make_module(code)

tree = ast.parse(code)
ncv = name_check_visitor.NameCheckVisitor("bla", code, tree, annotate=True, module=mod)
ncv.visit(tree)
ncv.check()

a = tree.body[0].body[0].body[0].targets[0].inferred_value
print(repr(a)) # UnresolvedValue()

JelleZijlstra · 2021-05-04T16:46:00Z

For modules, this should be fixable by giving the right module name and filename to the constructor, so that pyanalyze knows how to do the relative imports right. It still shouldn't crash though, but just log an error and turn the modules into Any/UNRESOLVED_VALUE. I'l look into doing that.

The attribute thing is the same issue as for Name, but needs to be fixed in the def composite_from_attribute function. The same thing is of course going to happen for subscripts. There it looks like we'd currently set the inferred_value to the return value of __setitem__, which doesn't really make sense. Writing a PR now to fix both.

erezsh · 2021-05-04T17:09:44Z

I wonder if perhaps this sort of API would require a lot of changes.

JelleZijlstra · 2021-05-04T17:14:13Z

I fixed the issue for attribute and subscript in #168. There aren't any more kinds of assignment targets, so hopefully at least this particular kind of issue won't require more changes. I think when I previously used this functionality I just looked at the type of expressions, not of variables.

erezsh · 2021-05-04T17:22:06Z

Nice.

Well, that may be beyond the scope of what pyanalyze can do, but I would expect that to work:

import ast
from pyanalyze import name_check_visitor
from pyanalyze.test_name_check_visitor import _make_module

code = """
class A:
	def __init__(self):
		self.a = 1

	def bla(self):
		return self.a

a = A()
b = a.bla()
"""
mod = _make_module(code)

tree = ast.parse(code)
ncv = name_check_visitor.NameCheckVisitor("bla", code, tree, annotate=True, module=mod)
ncv.visit(tree)
ncv.check()

print( tree.body[0].body[1].body[0].inferred_value) 	# UnresolvedValue(), expected int
print( tree.body[1].value.inferred_value ) 	# UnresolvedValue(), expected A
print( tree.body[2].value.inferred_value ) 	# UnresolvedValue(), expected int

JelleZijlstra · 2021-05-04T17:32:29Z

I'd expect that to work, it might require some different hack around the module object. It may work better if you're looking at values inside a function. At the module level pyanalyze tends to just look at module globals, so if those aren't available things might not work.

I'll spend some time tonight trying to expose a usable API. Your feedback is very helpful!

erezsh · 2021-05-04T17:43:36Z

Happy to help!

I'm surprised at how difficult it is to find such API, given how many different static analyzers exist for Python. But they each re-implement everything from scratch, and provide no access to it. Maybe this can become a popular building block.

JelleZijlstra · 2021-05-05T02:10:54Z

How does #169 work for you?

erezsh · 2021-05-05T02:57:58Z

The build failed. I work with py38, so I'll try when it's fixed.

JelleZijlstra · 2021-05-05T03:46:47Z

It's green now! Happily the tests uncovered a few more places where we weren't adding annotations to the AST.

erezsh · 2021-05-05T04:40:55Z

The example I gave here ( #166 (comment) ) is still giving me 3 UnresolvedValues on the inferassign branch. Am I doing somethign wrong? What results do you get when you run it?

P.S. I'm also getting the following error whenever I run it (on windows 8):

OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'c:\\python38\\DLLs\\<test input>-stubs'

Internal error: OSError(22, 'The filename, directory name, or volume label syntax is incorrect') (code: internal_error)

But it doesn't interrupt the run.

JelleZijlstra · 2021-05-05T05:03:33Z

Calling pyanalyze.ast_annotator.annotate_code() on the sample, I get this:

In [5]: print( tree.body[0].body[1].body[0].inferred_value) ^I# UnresolvedValue(), expected int
   ...: print( tree.body[1].value.inferred_value ) ^I# UnresolvedValue(), expected A
   ...: print( tree.body[2].value.inferred_value ) ^I# UnresolvedValue(), expected int
Any
<test input>.A
Any

The first Any is because you're printing the value of the Return node, but we only set the value on the expression, not on the Return node itself. print( tree.body[0].body[1].body[0].value.inferred_value) does print Literal[1]. I'm not entirely sure why the last one prints Any; pyanalyze should have inferred the return value. It correctly prints int if you add an annotation to bla, though.

I'm afraid I haven't tested on Windows before, I guess I need to be more careful what paths I use there. Maybe we should try using test-input instead of <test input> as the fake module name.

erezsh · 2021-05-05T05:19:09Z

Doing tree.body[0].body[1].body[0].value.inferred_value still returns UnrolvedValue.

I'm not entirely sure why the last one prints Any

I don't know. But I would guess that you're not adding an annotation on return, or that it's not connected in the same way that an explicit annotation is.

JelleZijlstra · 2021-05-05T05:32:50Z

The UnresolvedValues you're seeing are likely related to the internal errors around Windows paths, at least some of them. Could you try locally changing the module_name = "<test input>" line in pyanalyze.analysis_lib.make_module to something like module_name = "test-input" or some other legal Windows path? I can't easily test on Windows unfortunately.

I don't know. But I would guess that you're not adding an annotation on return, or that it's not connected in the same way that an explicit annotation is.

It should work through the _argspec_to_retval dict inside NameCheckVisitor: when the method gets visited it stores its return value there, and when the call is checked it should look in that dict to find the return value corresponding to the callable. Most likely that mechanism doesn't work if the callee is a method on a KnownValue.

JelleZijlstra · 2021-05-06T00:55:01Z

Just pushed a change to the PR that may fix Windows (it will skip looking for stubs for module names that aren't valid identifiers).

erezsh · 2021-05-06T02:35:30Z

Yeah, there's no error anymore. But the results are still unresolved.

JelleZijlstra · 2021-05-06T03:11:12Z

Not sure how to explain that. To be clear, you're running annotate_code with show_errors=True?

erezsh · 2021-05-06T03:20:17Z

It says there is no annotate_code, and no show_errors. But there is annotate.

I am literally running the exact code I pasted above.

JelleZijlstra · 2021-05-06T03:21:51Z

I see, I was referring to the new functions from the PR (in the ast_annotator.py file). I'll probably merge it to master soon.

Helps with #166 Plus some related changes: - Make the importer fall back to importing outside sys.path (which seems generally useful) - Make name_check_visitor.py work a bit better if self.module is None - Ensure all expressions have an annotation set

erezsh · 2021-05-21T21:52:38Z

Hi, are you making any progress to fix this issue?

I don't mean to pressure, but if not, it seems I'll have to find a different solution.

JelleZijlstra · 2021-05-21T21:57:41Z

From my perspective, it's basically fixed, since pyanalyze has public functions that do what you're asking for (released in 0.2.0 last week and as of yesterday documentd at https://pyanalyze.readthedocs.io/en/latest/reference/ast_annotator.html).

But it sounded like it's not working for you due to some issue that I couldn't reproduce. And it might be due to Windows-specific issues, which I don't really have a great way to debug. Could you tell me exactly what code you're running now that doesn't do what you expect? I'm also open to doing a call or some sort of pair programming to help figure out the problems if you're interested.

erezsh · 2021-05-21T22:08:11Z

Here's an updated version of the code:

from pyanalyze.ast_annotator import annotate_code

code = """
class A:
	def __init__(self):
		self.a = 1

	def bla(self):
		return self.a

a = A()
b = a.bla()
"""

tree = annotate_code(code)

print( repr(tree.body[0].body[1].body[0].value.inferred_value) 	)  # KnownValue(1)      -- Okay
print( repr(tree.body[2].targets[0].inferred_value))               # UnresolvedValue()  -- Not Okay!

I get the same problem when I run it in both Windows and Ubuntu (WSL).

Can you please run this and show me what you get as a result?

Sure, we can do a voice call if you think it will help. I like discord, but I'm open to other apps.

JelleZijlstra · 2021-05-21T22:18:26Z

Thanks! I get the same as you, first KnownValue(1) and then UnresolvedValue(). Good to know that at least this isn't OS-dependent. It's a bit of an edge case, as it relies on some ad-hoc inference logic instead of type annotations, but I'd expect it to work, so I'll look into why it doesn't work with annotate_code().

Interestingly, it does manage to infer the type as KnownValue(1) here if I put the code in a file, add dump_value(b) and run pyanalyze on the file. I'll have to look into how that's happening.

JelleZijlstra · 2021-05-21T22:22:50Z

In general I wouldn't be surprised if you run into more places where pyanalyze doesn't infer things as well as it could. That's basically #64. I'll fix such issues as I encounter them, but well, there's a lot of Python :)

erezsh · 2021-05-21T22:39:28Z

From my perspective, this is a very very simple use-case.

I don't mind the occasional bug or missing feature. But if this is considered an edge case, then maybe it's not ready for this kind of use yet.

JelleZijlstra · 2021-05-21T22:45:50Z

That's fair. I say it's an edge case because pyanalyze is now generally intended to be a PEP 484-compatible type checker, which means it largely works on the basis of annotations when working across functions. Mypy also thinks b is Any, for example. But it should be able to infer the type in this specific case, so I'll make it work.

JelleZijlstra · 2021-05-22T01:54:23Z

Fixed this case in #201.

erezsh · 2021-05-22T22:04:56Z

How can I access the call graph?

For example, if I have a _ast.Call object, is there an API that would give me the target function?

JelleZijlstra · 2021-05-22T22:07:10Z

The inferred value of the .func attribute of the Call should be the called function as a KnownValue.

erezsh · 2021-05-22T23:07:51Z

Thanks, seems to be working!

If you don't mind, I'd like to suggest adding an API for that, just to save the users the trouble of figuring it out each time for themselves. Something like

class CollectFunctions(ast.NodeVisitor):
	def __init__(self):
		self.functions = {}

	def visit_FunctionDef(self, func_def):
		self.functions[func_def.inferred_value.val] = func_def


class Analyze:
	def __init__(self, code):
		self.code = code
		self.tree = annotate_code(code)

		c = CollectFunctions()
		c.visit(self.tree)
		self._functions = c.functions

	def get_callee(self, call):
		assert isinstance(call, ast.Call)
		return self._functions[call.func.inferred_value.val]

Or maybe instead, to write the references to inferred_value.ref and inferred_value.xrefs

JelleZijlstra · 2021-05-22T23:09:45Z

There is something like that, the CallSiteCollector class in name_check_visitor.py. I think in our internal code we have a function get_call_map(); I can look into moving that to the public part of pyanalyze.

erezsh · 2021-05-23T15:58:37Z

I think that would be a good idea. Especially since it turns out get_callee() is at least a little more complicated:

	def get_callee(self, call):
		assert isinstance(call, ast.Call), call
		inferred_value = call.func.inferred_value
		if isinstance(inferred_value, pyanalyze.value.MultiValuedValue):
			for v in inferred_value.vals:
				with suppress(KeyError):
					return self._functions[v.val]
			raise ValueError(f"Cannot find callee for call: {call}")

		elif isinstance(inferred_value, pyanalyze.value.UnresolvedValue):
			raise ValueError(f"Cannot find callee for call: {call}")

		elif isinstance(inferred_value, pyanalyze.value.UnboundMethodValue):
			f = getattr(inferred_value.typ.typ, inferred_value.attr_name)
			return self._functions[f]

		return self._functions[inferred_value.val]

JelleZijlstra · 2021-05-23T18:38:15Z

Makes sense. A few quick notes on your code:

https://pyanalyze.readthedocs.io/en/latest/reference/value.html has fairly thorough documentation on the different Value classes (much of the docs is still a work in progress though)
For UnboundMethodValue you can use the .get_method() method
There's a variety of other Value classes you could encounter: AnnotatedValue (just recurse and get the callee out of .value), TypedValue (use the type's __call__ method), SubclassValue (instantiation of a type).
I'm not sure what your use case is but in general for MultiValuedValue (=Union) you'd want to consider all members of the union, not just the first one. And of course, the union members could all be arbitrary values, not just KnownValues with a .val attribute.

erezsh · 2021-05-23T19:06:22Z

Thanks, good to know.

you'd want to consider all members of the union

Yes, you're right. For now I'm just trying to get it to work. Tbh I'm still in the dark regarding whether pyanalyze will fit my purposes. I'm trying to bring the interface up to speed, so I can explore it more easily on different files.

JelleZijlstra · 2021-11-11T05:37:30Z

Closing as the pyanalyze.ast_annotator provides what was requested here.

JelleZijlstra mentioned this issue May 5, 2021

add documented code for AST annotation #169

Merged

JelleZijlstra closed this as completed in #169 May 6, 2021

JelleZijlstra reopened this May 6, 2021

JelleZijlstra closed this as completed Nov 11, 2021

Annotated AST #166

Annotated AST #166

Comments

erezsh commented May 4, 2021 • edited

JelleZijlstra commented May 4, 2021

erezsh commented May 4, 2021 • edited

JelleZijlstra commented May 4, 2021

erezsh commented May 4, 2021

JelleZijlstra commented May 4, 2021

erezsh commented May 4, 2021

JelleZijlstra commented May 4, 2021

erezsh commented May 4, 2021

JelleZijlstra commented May 4, 2021

erezsh commented May 4, 2021

JelleZijlstra commented May 4, 2021

erezsh commented May 4, 2021

JelleZijlstra commented May 4, 2021

erezsh commented May 4, 2021

JelleZijlstra commented May 4, 2021

erezsh commented May 4, 2021 • edited

JelleZijlstra commented May 4, 2021

erezsh commented May 4, 2021

JelleZijlstra commented May 5, 2021

erezsh commented May 5, 2021

JelleZijlstra commented May 5, 2021

erezsh commented May 5, 2021

JelleZijlstra commented May 5, 2021

erezsh commented May 5, 2021

JelleZijlstra commented May 5, 2021

JelleZijlstra commented May 6, 2021

erezsh commented May 6, 2021

JelleZijlstra commented May 6, 2021

erezsh commented May 6, 2021

JelleZijlstra commented May 6, 2021

erezsh commented May 21, 2021

JelleZijlstra commented May 21, 2021

erezsh commented May 21, 2021

JelleZijlstra commented May 21, 2021

JelleZijlstra commented May 21, 2021

erezsh commented May 21, 2021

JelleZijlstra commented May 21, 2021

JelleZijlstra commented May 22, 2021

erezsh commented May 22, 2021

JelleZijlstra commented May 22, 2021

erezsh commented May 22, 2021 • edited

JelleZijlstra commented May 22, 2021

erezsh commented May 23, 2021

JelleZijlstra commented May 23, 2021

erezsh commented May 23, 2021

JelleZijlstra commented Nov 11, 2021

erezsh commented May 4, 2021 •

edited

erezsh commented May 4, 2021 •

edited

erezsh commented May 4, 2021 •

edited

erezsh commented May 22, 2021 •

edited