Skip to content

Python: implement printAst for Python#4461

Merged
erik-krogh merged 11 commits intogithub:mainfrom
erik-krogh:pyPrint
Oct 22, 2020
Merged

Python: implement printAst for Python#4461
erik-krogh merged 11 commits intogithub:mainfrom
erik-krogh:pyPrint

Conversation

@erik-krogh
Copy link
Contributor

(Creating a Python PR feels weird).

I recommend trying out the printAst feature (see next section) before reviewing the code.

This PR adds a printAst.ql query that can be used for viewing the AST of Python files.


Recommended usage instructions:

  • Enable AST viewer by adding "codeQL.experimentalAstViewer": true to your settings in VSCode
    (Ctrl + Shift + P: Open Settings (JSON)).
  • Open some Python file from some database (this step is intentionally vague).
  • Open the CodeQL view in VSCode and click the new View AST button in the bottom.
  • Navigate the AST a bit to see what it looks like.

The printAst.ql query produces a tree, and there are 3 basic kinds of nodes in this tree (most of the other languages have more kinds of nodes, JS has 15):

  • AstElementNode: This node represents an AstNode. The children of this node can be any kind of node.
  • FunctionParamsNode: This node represents the parameters of a function. The children are always AstElementNode.
  • StmtListNode: This node represents a StmtList, and is e.g. used as a child of an if-statement. The children of a StmtListNode are always AstElementNode.
    The StmtList that this node represents must be a result from AstElementNode::getStmtList(...). This is used to figure out where the StmtList belongs and printing a label.

I could have made a CallArgumentsNode for aggregating all the arguments in a call, but I decided against this, as the result already looks pretty good as it is. (I've done that differently in JavaScript. But JavaScript have both arguments and type-arguments, so not splitting them up would look bad).

The AstElementNode implements some default behavior that works for AstNodes, and for some it actually looks OK (it determines children based on the AstNode::getAChildNode() predicate).
This default behavior is overwritten in subclasses to get more specific/prettier behavior, for example:

  • Functions, where all the parameters are aggregated into a FunctionParamsNode.
  • If-statements, where each StmtList becomes a child, instead of each Stmt inside the StmtLists.
  • Parameters, where I've "moved" the type-annotation and default value to be children of the parameter instead of the function.

The pretty-printing is quick and dirty and far from complete.
But it looks pretty good to me.

Maybe the PrettyPrinting module should be moved somewhere else?

@erik-krogh erik-krogh requested a review from a team as a code owner October 12, 2020 18:39
@erik-krogh
Copy link
Contributor Author

erik-krogh commented Oct 12, 2020

The tests failed in CI, but not on my machine.

My best guess was that every Str is also a Bytes in Python 2.7.17 (used in CI), but not in Python 2.7.18 (on my machine).
Edit: Seems like I was on the right track, the tests are passing now.

@yoff
Copy link
Contributor

yoff commented Oct 13, 2020

Thank you very much! I hope one of us finds time to look at it soon :-)

@aeisenberg
Copy link
Contributor

Thanks for looking into print AST support in Python. Trying it out now. I am using https://lgtm.com/projects/g/AtsushiSakai/PythonRobotics/. I mostly looked at the /opt/src/ArmNavigation/arm_obstacle_navigation/arm_obstacle_navigation_2.py file.

  • FunctionDef source locations do not include the body of the function. Maybe this is intentional, but typically, a parent node would subsume the source locations of all its children.
  • Name of a FunctionDef is the third child after the parameters and body.
  • I see that (StmtList) is in parens, not square brackets. Real AST Nodes should be in square brackets and synthetic nodes in parens.
  • Source location for the (parameters) synthetic node seems to cover the entire declaration expression eg def main():. Either, it should just cover what's inside the parens or it should not have a source location at all.
  • Label for integer literals look like this: [IntegerLiteral] IntegerLiteral. Instead of having the second IntegerLiteral, maybe just print the number itself (same for other kinds of literals).
  • Calls have no distinction between the function name and its arguments. I wonder if you could add a synthetic node for the arguments list, similar to what you do for FunctionDefs (See screenshot)

_Extension_Development_Host__-_arm_obstacle_navigation_2_py__read-only___read-only__—_vscode-codeql-starter__Workspace_

@erik-krogh
Copy link
Contributor Author

FunctionDef source locations do not include the body of the function. Maybe this is intentional, but typically, a parent node would subsume the source locations of all its children.

That is the location of FunctionDef from the extractor, so that seems intentional.
Changing it would require an extractor change.

Name of a FunctionDef is the third child after the parameters and body.

I changed the order 👍

I see that (StmtList) is in parens, not square brackets. Real AST Nodes should be in square brackets and synthetic nodes in parens.

StmtList is a weird middle child. It is a synthetic node made by the extractor, and it doesn't extend AstNode like all the other nodes. That is why I chose to put it in parens.
I think I'll keep that unless you prefer square brackets.

There is a bunch of those synthetic nodes made by the extractor. For example all the positional arguments of a call are aggregated into an ExprList (But we cannot use that as an (arguments) node, as it doesn't include the keyword arguments).

StmtList was the only such synthetic node that was useful while printing the Ast.

Source location for the (parameters) synthetic node seems to cover the entire declaration expression eg def main():. Either, it should just cover what's inside the parens or it should not have a source location at all.

I removed the location from (parameters) and (StmtList), as I don't have their exact locations in the datebase.

Label for integer literals look like this: [IntegerLiteral] IntegerLiteral. Instead of having the second IntegerLiteral, maybe just print the number itself (same for other kinds of literals).

That was a bug, I had accidentally used the wrong method.
Fixed now.

Calls have no distinction between the function name and its arguments. I wonder if you could add a synthetic node for the arguments list, similar to what you do for FunctionDefs (See screenshot)

👍

@aeisenberg
Copy link
Contributor

Thanks for addressing everything. Ah...I see how StmtList isn't like other "real" AST Nodes. It's fine to keep it as it is. And everything else you mention makes sense. I hope to get a chance to try this out again later today.

@aeisenberg
Copy link
Contributor

This looks great. Only issue is that I am seeing errors when I click on synthetic nodes like (parameters).

Similar to what I was seeing in JavaScript.

cannot open codeql-zip-archive://0-185/Users/andrew.eisenberg/Library/Application%20Support/Code/User/workspaceStorage/600c0c6f9b1cb74bf3beb8a937563b05/GitHub.vscode-codeql/python-4/AtsushiSakai_PythonRobotics_fa636b2/src.zip/. Detail: Unable to read file 'codeql-zip-archive://0-185/Users/andrew.eisenberg/Library/Application Support/Code/User/workspaceStorage/600c0c6f9b1cb74bf3beb8a937563b05/GitHub.vscode-codeql/python-4/AtsushiSakai_PythonRobotics_fa636b2/src.zip/' (EntryIsADirectory (FileSystemError): codeql-zip-archive://0-185/Users/andrew.eisenberg/Library/Application Support/Code/User/workspaceStorage/600c0c6f9b1cb74bf3beb8a937563b05/GitHub.vscode-codeql/python-4/AtsushiSakai_PythonRobotics_fa636b2/src.zip/)

@erik-krogh
Copy link
Contributor Author

erik-krogh commented Oct 16, 2020

Similar to what I was seeing in JavaScript.

And similar to JavaScript I don't see what to do about it - none of those synthetic nodes have a result for .getLocation().
Also, I can't replicate the error.
When I click a synthetic node nothing happens.

@aeisenberg
Copy link
Contributor

Will be fixed here: github/vscode-codeql#620

Thanks for looking into this. I'm happy with the implementation, but someone on the python team should approve.

@erik-krogh
Copy link
Contributor Author

Will be fixed here: github/vscode-codeql#620

Thanks for looking into this. I'm happy with the implementation, but someone on the python team should approve.

/cc @github/codeql-python

yoff
yoff previously approved these changes Oct 20, 2020
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for doing this.
There is some extractor-induced funkiness (like how decorators are represented, although it makes great semantic sense), but the feature is certainly useful.

Co-authored-by: yoff <lerchedahl@gmail.com>
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! Thank you for stepping out of your comfort zone and writing some Python. 😉
I’ve made a couple of suggestions, all related to the documentation.


/**
* Controls whether the `Element` should be considered for AST printing.
* By default it checks whether the `Element` `e` belongs to `Location` `l`.
Copy link
Contributor

@tausbn tausbn Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says “Element” where it should perhaps say “AstNode”?

* is `key`.
*/
string getProperty(string key) {
key = "semmle.label" and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m just going to assume that this is some sort of deep magic that is meaningful to someone.

/**
* An `AstNode` printed in the print-viewer.
*
* This class can be overwriten to define more specific behavior for some `AstNode`s.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* This class can be overwriten to define more specific behavior for some `AstNode`s.
* This class can be overwritten to define more specific behavior for some `AstNode`s.

... unless you actually meant to use “overridden“.

*
* This class can be overwriten to define more specific behavior for some `AstNode`s.
* The `getChildNode` and `getStmtList` methods can be overwritten to easily set up a child-parent relation between different `AstElementNode`s.
* Be very carefull about overriding `getChild`, as `getChildNode` and `getStmtList` depends on the default beavior of `getChild`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Be very carefull about overriding `getChild`, as `getChildNode` and `getStmtList` depends on the default beavior of `getChild`.
* Be very careful about overriding `getChild`, as `getChildNode` and `getStmtList` depend on the default behavior of `getChild`.


/**
* A print node for `StmtList`.
* A `StmtListNode` is always a child of a `AstElementNode`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* A `StmtListNode` is always a child of a `AstElementNode`,
* A `StmtListNode` is always a child of an `AstElementNode`,

/**
* A print node for `StmtList`.
* A `StmtListNode` is always a child of a `AstElementNode`,
* and the child-parent relation is decided by the `getStmtList` predicate in `AstElementNode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* and the child-parent relation is decided by the `getStmtList` predicate in `AstElementNode.
* and the child-parent relation is defined by the `getStmtList` predicate in `AstElementNode`.

*/
private module PrettyPrinting {
/**
* Gets the QL class for `a`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Gets the QL class for `a`.
* Gets the QL class for the `AstNode` `a`.

}

/**
* Gets the QL class for `a` for the `AstNode`s where the toString method does not print the QL class.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Gets the QL class for `a` for the `AstNode`s where the toString method does not print the QL class.
* Gets the QL class for `AstNode`s where the `toString` method does not print the QL class.

/**
* Gets a human-readable representation of the `AstNode` `a`, or the empty string.
*
* Has one and only result for every AstNode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Has one and only result for every AstNode.
* Has exactly one result for every `AstNode`.


/**
* Gets a human-readable representation of the given `AstNode`.
* Is only defined for the an `AstNode` where a human-readable representation can be created without using recursion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Is only defined for the an `AstNode` where a human-readable representation can be created without using recursion.
* Is only defined for `AstNode`s for which a human-readable representation can be created without using recursion.

@tausbn
Copy link
Contributor

tausbn commented Oct 21, 2020

I'm somewhat surprised that you didn't just batch my suggestions and commit them directly, but who am I to argue. 🙂
I think this looks ready to merge.

@erik-krogh erik-krogh merged commit e89e99d into github:main Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants