Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotated AST #166

Closed
erezsh opened this issue May 4, 2021 · 46 comments · Fixed by #169
Closed

Annotated AST #166

erezsh opened this issue May 4, 2021 · 46 comments · Fixed by #169

Comments

@erezsh
Copy link

erezsh commented May 4, 2021

Is there a way to access the parsed AST, with the inferred types? I want to write a code translation tool, and knowing the types would be very useful.
I mean, having a regular AST (similar to ast.AST) but with added attributes, such as: declared_type, and inferred_types.

I tried to look through the code, but couldn't figure it out.

@JelleZijlstra
Copy link
Contributor

There is, yes: the annotate=True argument to the NameCheckVisitor constructor. We've used it internally but it's not documented yet. It puts instances of pyanalyze.value.Value in the inferred_value attribute of AST nodes. I can write up some more documentation later this week.

@erezsh
Copy link
Author

erezsh commented May 4, 2021

Can you help me understand what I'm doing wrong here?

import ast
from pyanalyze import name_check_visitor

code = """
a = 1
"""

tree = ast.parse(code)

ncv = name_check_visitor.NameCheckVisitor("bla", code, tree, annotate=True)
ncv.visit(tree)

assign = tree.body[0]
assert assign.value.inferred_value.get_type() is int	# OK

a = assign.targets[0].inferred_value
print(repr(a))			# UnresolvedValue()  :(

It would expect a to have type int.

@JelleZijlstra
Copy link
Contributor

I guess we don't infer a value for names in write context because it doesn't matter for pyanalyze's own type inference. I submitted a fix in #167.

(Also note that the version of pyanalyze on PyPI is pretty out of date, if you're using it. I'll hopefully make a new PyPI release soon.)

@erezsh
Copy link
Author

erezsh commented May 4, 2021

Nice, it's working, thanks! I'll try it on more complicated stuff.

@JelleZijlstra
Copy link
Contributor

Great to hear! Also for more complicated cases pyanalyze will likely want a real imported module. You may need a trick like in

mod = _make_module(code_str)
.

I'm interested in making this a more directly usable API if it's useful for you, something like a documented pyanalyze.annotate_code function.

@erezsh
Copy link
Author

erezsh commented May 4, 2021

Yes, that would be very useful for me.

And I imagine that for others as well.

@JelleZijlstra
Copy link
Contributor

Great! Let me know what your experience is with this functionality.

@erezsh
Copy link
Author

erezsh commented May 4, 2021

Well, it's definitely a rocky road :)

import ast
from pyanalyze import name_check_visitor

code = """
class A:
	a = 1
"""

tree = ast.parse(code)

ncv = name_check_visitor.NameCheckVisitor("bla", code, tree, annotate=True)
ncv.visit(tree)

print( tree.body[0].body[0].targets[0].inferred_value )		# AttributeError :(

@JelleZijlstra
Copy link
Contributor

OK, that's because class bodies are only visited during the second "checking" pass, but your code makes it only do the "collecting" pass. Calling ncv.check() should do both passes, but that returns early if the module can't be imported. Maybe removing the early return in NameCheckVisitor.check will fix the immediate issue for you, or the _make_module trick I mentioned above.

@erezsh
Copy link
Author

erezsh commented May 4, 2021

Okay, it's working now. It crashes if any imports are wrong, which ideally it shouldn't, but for now it's not a problem.

@JelleZijlstra
Copy link
Contributor

It crashes if any imports are wrong

Thanks, could you share more details on that? I'd like to make it more robust to this sort of thing.

@erezsh
Copy link
Author

erezsh commented May 4, 2021

Well, if I analyze a file within a package, it might have all sorts of relative imports that aren't available. Ideally they should be, of course, and that probably warrants a warning, but ideally I'd like it to just accept the import as unknown, like imported_module = UnknownObject().

Anyway, new problem:

import ast
from pyanalyze import name_check_visitor
from pyanalyze.test_name_check_visitor import _make_module

code = """
class A:
	def __init__(self):
		self.a = 1
"""
mod = _make_module(code)

tree = ast.parse(code)
ncv = name_check_visitor.NameCheckVisitor("bla", code, tree, annotate=True, module=mod)
ncv.visit(tree)
ncv.check()

a = tree.body[0].body[0].body[0].targets[0].inferred_value
print(repr(a)) # UnresolvedValue()

@JelleZijlstra
Copy link
Contributor

For modules, this should be fixable by giving the right module name and filename to the constructor, so that pyanalyze knows how to do the relative imports right. It still shouldn't crash though, but just log an error and turn the modules into Any/UNRESOLVED_VALUE. I'l look into doing that.

The attribute thing is the same issue as for Name, but needs to be fixed in the def composite_from_attribute function. The same thing is of course going to happen for subscripts. There it looks like we'd currently set the inferred_value to the return value of __setitem__, which doesn't really make sense. Writing a PR now to fix both.

@erezsh
Copy link
Author

erezsh commented May 4, 2021

I wonder if perhaps this sort of API would require a lot of changes.

@JelleZijlstra
Copy link
Contributor

I fixed the issue for attribute and subscript in #168. There aren't any more kinds of assignment targets, so hopefully at least this particular kind of issue won't require more changes. I think when I previously used this functionality I just looked at the type of expressions, not of variables.

@erezsh
Copy link
Author

erezsh commented May 4, 2021

Nice.

Well, that may be beyond the scope of what pyanalyze can do, but I would expect that to work:

import ast
from pyanalyze import name_check_visitor
from pyanalyze.test_name_check_visitor import _make_module

code = """
class A:
	def __init__(self):
		self.a = 1

	def bla(self):
		return self.a

a = A()
b = a.bla()
"""
mod = _make_module(code)

tree = ast.parse(code)
ncv = name_check_visitor.NameCheckVisitor("bla", code, tree, annotate=True, module=mod)
ncv.visit(tree)
ncv.check()

print( tree.body[0].body[1].body[0].inferred_value) 	# UnresolvedValue(), expected int
print( tree.body[1].value.inferred_value ) 	# UnresolvedValue(), expected A
print( tree.body[2].value.inferred_value ) 	# UnresolvedValue(), expected int

@JelleZijlstra
Copy link
Contributor

I'd expect that to work, it might require some different hack around the module object. It may work better if you're looking at values inside a function. At the module level pyanalyze tends to just look at module globals, so if those aren't available things might not work.

I'll spend some time tonight trying to expose a usable API. Your feedback is very helpful!

@erezsh
Copy link
Author

erezsh commented May 4, 2021

Happy to help!

I'm surprised at how difficult it is to find such API, given how many different static analyzers exist for Python. But they each re-implement everything from scratch, and provide no access to it. Maybe this can become a popular building block.

@JelleZijlstra
Copy link
Contributor

How does #169 work for you?

@erezsh
Copy link
Author

erezsh commented May 5, 2021

The build failed. I work with py38, so I'll try when it's fixed.

@JelleZijlstra
Copy link
Contributor

It's green now! Happily the tests uncovered a few more places where we weren't adding annotations to the AST.

@erezsh
Copy link
Author

erezsh commented May 5, 2021

The example I gave here ( #166 (comment) ) is still giving me 3 UnresolvedValues on the inferassign branch. Am I doing somethign wrong? What results do you get when you run it?

P.S. I'm also getting the following error whenever I run it (on windows 8):

OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'c:\\python38\\DLLs\\<test input>-stubs'

Internal error: OSError(22, 'The filename, directory name, or volume label syntax is incorrect') (code: internal_error)

But it doesn't interrupt the run.

@JelleZijlstra
Copy link
Contributor

Calling pyanalyze.ast_annotator.annotate_code() on the sample, I get this:

In [5]: print( tree.body[0].body[1].body[0].inferred_value) ^I# UnresolvedValue(), expected int
   ...: print( tree.body[1].value.inferred_value ) ^I# UnresolvedValue(), expected A
   ...: print( tree.body[2].value.inferred_value ) ^I# UnresolvedValue(), expected int
Any
<test input>.A
Any

The first Any is because you're printing the value of the Return node, but we only set the value on the expression, not on the Return node itself. print( tree.body[0].body[1].body[0].value.inferred_value) does print Literal[1]. I'm not entirely sure why the last one prints Any; pyanalyze should have inferred the return value. It correctly prints int if you add an annotation to bla, though.

I'm afraid I haven't tested on Windows before, I guess I need to be more careful what paths I use there. Maybe we should try using test-input instead of <test input> as the fake module name.

@erezsh
Copy link
Author

erezsh commented May 5, 2021

Doing tree.body[0].body[1].body[0].value.inferred_value still returns UnrolvedValue.

I'm not entirely sure why the last one prints Any

I don't know. But I would guess that you're not adding an annotation on return, or that it's not connected in the same way that an explicit annotation is.

@JelleZijlstra
Copy link
Contributor

The UnresolvedValues you're seeing are likely related to the internal errors around Windows paths, at least some of them. Could you try locally changing the module_name = "<test input>" line in pyanalyze.analysis_lib.make_module to something like module_name = "test-input" or some other legal Windows path? I can't easily test on Windows unfortunately.

I don't know. But I would guess that you're not adding an annotation on return, or that it's not connected in the same way that an explicit annotation is.

It should work through the _argspec_to_retval dict inside NameCheckVisitor: when the method gets visited it stores its return value there, and when the call is checked it should look in that dict to find the return value corresponding to the callable. Most likely that mechanism doesn't work if the callee is a method on a KnownValue.

@JelleZijlstra
Copy link
Contributor

Just pushed a change to the PR that may fix Windows (it will skip looking for stubs for module names that aren't valid identifiers).

@erezsh
Copy link
Author

erezsh commented May 6, 2021

Yeah, there's no error anymore. But the results are still unresolved.

@JelleZijlstra
Copy link
Contributor

Not sure how to explain that. To be clear, you're running annotate_code with show_errors=True?

@erezsh
Copy link
Author

erezsh commented May 6, 2021

It says there is no annotate_code, and no show_errors. But there is annotate.

I am literally running the exact code I pasted above.

@JelleZijlstra
Copy link
Contributor

I see, I was referring to the new functions from the PR (in the ast_annotator.py file). I'll probably merge it to master soon.

JelleZijlstra added a commit that referenced this issue May 6, 2021
Helps with #166 

Plus some related changes:
- Make the importer fall back to importing outside sys.path (which seems generally useful)
- Make name_check_visitor.py work a bit better if self.module is None
- Ensure all expressions have an annotation set
@JelleZijlstra JelleZijlstra reopened this May 6, 2021
@erezsh
Copy link
Author

erezsh commented May 21, 2021

Hi, are you making any progress to fix this issue?

I don't mean to pressure, but if not, it seems I'll have to find a different solution.

@JelleZijlstra
Copy link
Contributor

From my perspective, it's basically fixed, since pyanalyze has public functions that do what you're asking for (released in 0.2.0 last week and as of yesterday documentd at https://pyanalyze.readthedocs.io/en/latest/reference/ast_annotator.html).

But it sounded like it's not working for you due to some issue that I couldn't reproduce. And it might be due to Windows-specific issues, which I don't really have a great way to debug. Could you tell me exactly what code you're running now that doesn't do what you expect? I'm also open to doing a call or some sort of pair programming to help figure out the problems if you're interested.

@erezsh
Copy link
Author

erezsh commented May 21, 2021

Here's an updated version of the code:

from pyanalyze.ast_annotator import annotate_code

code = """
class A:
	def __init__(self):
		self.a = 1

	def bla(self):
		return self.a

a = A()
b = a.bla()
"""

tree = annotate_code(code)

print( repr(tree.body[0].body[1].body[0].value.inferred_value) 	)  # KnownValue(1)      -- Okay
print( repr(tree.body[2].targets[0].inferred_value))               # UnresolvedValue()  -- Not Okay!

I get the same problem when I run it in both Windows and Ubuntu (WSL).

Can you please run this and show me what you get as a result?

Sure, we can do a voice call if you think it will help. I like discord, but I'm open to other apps.

@JelleZijlstra
Copy link
Contributor

Thanks! I get the same as you, first KnownValue(1) and then UnresolvedValue(). Good to know that at least this isn't OS-dependent. It's a bit of an edge case, as it relies on some ad-hoc inference logic instead of type annotations, but I'd expect it to work, so I'll look into why it doesn't work with annotate_code().

Interestingly, it does manage to infer the type as KnownValue(1) here if I put the code in a file, add dump_value(b) and run pyanalyze on the file. I'll have to look into how that's happening.

@JelleZijlstra
Copy link
Contributor

In general I wouldn't be surprised if you run into more places where pyanalyze doesn't infer things as well as it could. That's basically #64. I'll fix such issues as I encounter them, but well, there's a lot of Python :)

@erezsh
Copy link
Author

erezsh commented May 21, 2021

From my perspective, this is a very very simple use-case.

I don't mind the occasional bug or missing feature. But if this is considered an edge case, then maybe it's not ready for this kind of use yet.

@JelleZijlstra
Copy link
Contributor

That's fair. I say it's an edge case because pyanalyze is now generally intended to be a PEP 484-compatible type checker, which means it largely works on the basis of annotations when working across functions. Mypy also thinks b is Any, for example. But it should be able to infer the type in this specific case, so I'll make it work.

@JelleZijlstra
Copy link
Contributor

Fixed this case in #201.

@erezsh
Copy link
Author

erezsh commented May 22, 2021

How can I access the call graph?

For example, if I have a _ast.Call object, is there an API that would give me the target function?

@JelleZijlstra
Copy link
Contributor

The inferred value of the .func attribute of the Call should be the called function as a KnownValue.

@erezsh
Copy link
Author

erezsh commented May 22, 2021

Thanks, seems to be working!

If you don't mind, I'd like to suggest adding an API for that, just to save the users the trouble of figuring it out each time for themselves. Something like

class CollectFunctions(ast.NodeVisitor):
	def __init__(self):
		self.functions = {}

	def visit_FunctionDef(self, func_def):
		self.functions[func_def.inferred_value.val] = func_def


class Analyze:
	def __init__(self, code):
		self.code = code
		self.tree = annotate_code(code)

		c = CollectFunctions()
		c.visit(self.tree)
		self._functions = c.functions

	def get_callee(self, call):
		assert isinstance(call, ast.Call)
		return self._functions[call.func.inferred_value.val]

Or maybe instead, to write the references to inferred_value.ref and inferred_value.xrefs

@JelleZijlstra
Copy link
Contributor

There is something like that, the CallSiteCollector class in name_check_visitor.py. I think in our internal code we have a function get_call_map(); I can look into moving that to the public part of pyanalyze.

@erezsh
Copy link
Author

erezsh commented May 23, 2021

I think that would be a good idea. Especially since it turns out get_callee() is at least a little more complicated:

	def get_callee(self, call):
		assert isinstance(call, ast.Call), call
		inferred_value = call.func.inferred_value
		if isinstance(inferred_value, pyanalyze.value.MultiValuedValue):
			for v in inferred_value.vals:
				with suppress(KeyError):
					return self._functions[v.val]
			raise ValueError(f"Cannot find callee for call: {call}")

		elif isinstance(inferred_value, pyanalyze.value.UnresolvedValue):
			raise ValueError(f"Cannot find callee for call: {call}")

		elif isinstance(inferred_value, pyanalyze.value.UnboundMethodValue):
			f = getattr(inferred_value.typ.typ, inferred_value.attr_name)
			return self._functions[f]

		return self._functions[inferred_value.val]

@JelleZijlstra
Copy link
Contributor

Makes sense. A few quick notes on your code:

  • https://pyanalyze.readthedocs.io/en/latest/reference/value.html has fairly thorough documentation on the different Value classes (much of the docs is still a work in progress though)
  • For UnboundMethodValue you can use the .get_method() method
  • There's a variety of other Value classes you could encounter: AnnotatedValue (just recurse and get the callee out of .value), TypedValue (use the type's __call__ method), SubclassValue (instantiation of a type).
  • I'm not sure what your use case is but in general for MultiValuedValue (=Union) you'd want to consider all members of the union, not just the first one. And of course, the union members could all be arbitrary values, not just KnownValues with a .val attribute.

@erezsh
Copy link
Author

erezsh commented May 23, 2021

Thanks, good to know.

you'd want to consider all members of the union

Yes, you're right. For now I'm just trying to get it to work. Tbh I'm still in the dark regarding whether pyanalyze will fit my purposes. I'm trying to bring the interface up to speed, so I can explore it more easily on different files.

@JelleZijlstra
Copy link
Contributor

Closing as the pyanalyze.ast_annotator provides what was requested here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants