-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for assignments and for binary operation expressions #86
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* AST_ASSIGN -> ast.expressions.AssignmentExpression A few notes: (1) I renamed AssignmentExpr into AssignmentExpression and fixed C world accordingly. This is mostly for consistency. All other node types ending with "Expr.*" ended in "Expression", except for AssignmentExpr and SizeofExpr. (In the same spirit I would suggest renaming SizeofExpr too, but since there is no sizeof operator in PHP, I did not touch this node.) (2) I refactored BinaryExpression to not override getChildCount() and getChild(int). I think that when maintaining an AST node's children cleanly, such overrides should never be necessary in any ASTNode, and the super-methods in ASTNode should always do the right thing. Whenever such an override *is* present and necessary in an ASTNode, it is a sign that something else was done uncleanly before. In this particular case however, removing these methods did not break anything, as the super-methods return the same values as the overriding methods anyway. Thus I suppose these overrides were some kind of legacy restover from earlier versions of Joern, and are not needed any longer. (3) I also enriched the class AssignmentExpression with methods getVariable() and getAssignExpression() (and corresponding setters) which only forward to getLeft() and getRight() (accordingly for the setters), for comfort and consistency.
* AST_ASSIGN_REF -> ast.php.expressions.PHPAssignmentByRefExpression As opposed to C, PHP supports references only in a very limited way (mostly in assignments and as function parameters); there is an explicit "assignment by reference" operator =& (often written as "$a = &$b" in the C spirit, but actually, the form "$a =& $b" would be technically more accurate; both forms are acceptable in PHP) and thus I introduced a PHPAssignmentByRefExpression which extends ast.expressions.AssignmentExpression.
* AST_ASSIGN_OP -> ast.expressions.AssignmentWithOpExpression In this case, since assignment operations such as +=, *= etc. exist in other high-level languages as well (say, Java), I decided to put the new node AssignmentWithOpExpression into the ast.expressions package.
* AST_BINARY_OP -> ast.expressions.BinaryOperationExpression I introduced a new class BinaryOperationExpression which extends BinaryExpression, and made this new class a common ancestor for the following nodes: - AdditiveExpression - AndExpression - BitAndExpression - EqualityExpression - ExclusiveOrExpression - InclusiveOrExpression - MultiplicativeExpression - OrExpression - RelationalExpression - ShiftExpression Thereby, the class BinaryExpression now has only two *direct* children: AssignmentExpression and BinaryOperationExpression, thus giving us a more fine-grained understanding of the various kinds of nodes that exist by making a clear distinction between assignment expressions and binary operation expressions, both of which are binary expressions. This change was also necessary because PHP ASTs do *not* use a distinguished kind of node for each kind of binary expression, as Joern does in C world; rather, (almost) all binary expressions are represented as an AST_BINARY_OP node. The concrete operation that the node represents is specified via a flag on this node. Also see https://github.com/nikic/php-ast#flags. Generally I think that's a good thing, because there are even more kinds of binary operation expressions in PHP than there are in C, and having a distinguished node for each of them would generate a lot of overhead. This "flag" approach is, by the way, also used for AST_ASSIGN_OP nodes (added in the previous commit). The following is a list of PHP's binary operators that will map to an AST_BINARY_OP node. This list makes every claim to completeness: // bit operators $or1 | $or2; $and1 & $and2; $msg ^ $otp; $x << $y; $x >> $y; // string operators $str1 . $str2; // arithmetic operators $x + $y; $x - $y; $x * $y; $x / $y; $x % $y; $x ** $y; // boolean operators $x xor $y; // comparison operators $x === $y; $x !== $y; $x == $y; $x != $y; $x < $y; $x <= $y; $x <=> $y;
* AST_GREATER -> ast.expressions.GreaterExpression * AST_GREATER_EQUAL -> ast.expressions.GreaterOrEqualExpression * AST_AND -> ast.expressions.AndExpression * AST_OR -> ast.expressions.OrExpression I had to add GreaterExpression and GreaterOrEqualExpression as new classes inheriting from RelationalExpression. These four nodes are extremely similar to the BinaryOperationExpression's added in the last commit. In fact, in a future version of the php-ast extension by Niki (version 20), these nodes will not be necessary any longer, as the corresponding expressions will be mapped to BinaryOperationExpression too. The fact that they currently aren't is most probably due to historic reasons, and illustrates that PHP 7 is still under development.
When we rename nodes, we will need to make sure none of the steps in python-joern still depend on the old name. I have created an issue for this: fabsx00/python-joern#13 to make sure that I don't forget about this before merging the phpsupport branch at some point. |
fabsx00
pushed a commit
that referenced
this pull request
Dec 2, 2015
Support for assignments and for binary operation expressions
This was referenced Dec 4, 2015
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Support for assignments
AST_ASSIGN
->ast.expressions.AssignmentExpression
AST_ASSIGN_REF
->ast.php.expressions.PHPAssignmentByRefExpression
AST_ASSIGN_OP
->ast.expressions.AssignmentWithOpExpression
A few comments on this:
AssignmentExpr
intoAssignmentExpression
and fixed C world accordingly. The reason is consistency: All other node classes whose name matches^.*Expr.*$
have a name that matches^.*Expression$
, except forAssignmentExpr
andSizeofExpr
. (In the same spirit I would suggest renamingSizeofExpr
too, but since there is nosizeof
operator in PHP, I did not touch this node. I will add a comment to issue Some refactorings to abstract from C and PHP #80, though.)BinaryExpression
(which is the parent ofAssignmentExpression
) to not overridegetChildCount()
andgetChild(int)
any longer. I think that when maintaining an AST node's children cleanly, overriding either of these two methods inherited fromASTNode
should never be necessary, as the super-methods inASTNode
should always do the right thing. Whenever such an override is present and necessary in a node class, it is a sign that something else was done uncleanly before. In this particular case however, removing these methods does not break anything, as the super-methods return the same values as the overriding methods did anyway. Thus I suppose these overrides were some kind of legacy restover from earlier versions of Joern, and are not needed any longer.AssignmentExpression
with public methodsgetVariable()
andgetAssignExpression()
(and corresponding setters) which only relay to the protected methodsgetLeft()
andgetRight()
(accordingly for the setters) from theBinaryExpression
parent class, for added comfort and consistency.=&
(often written as$a = &$b
in the C spirit, but actually, the form$a =& $b
is technically more accurate; both forms are acceptable in PHP and generate identical ASTs). Since this operator is quite php-specific, I introduced anast.php.expressions.PHPAssignmentByRefExpression
which extendsast.expressions.AssignmentExpression
.+=
,*=
,.=
, etc.), I declared a new base classast.expressions.AssignmentWithOpExpression
, since such assignment operations exist in other high-level languages as well (say, Java).Support for binary operation expressions
AST_BINARY_OP
->ast.expressions.BinaryOperationExpression
I introduced a new class
BinaryOperationExpression
which extendsBinaryExpression
, and made this new class a common parent for the following nodes in theast.expressions
package (these nodes previously extendedBinaryExpression
directly):AdditiveExpression
AndExpression
BitAndExpression
EqualityExpression
ExclusiveOrExpression
InclusiveOrExpression
MultiplicativeExpression
OrExpression
RelationalExpression
ShiftExpression
Thereby, the class
BinaryExpression
now only has two direct children:AssignmentExpression
andBinaryOperationExpression
. This gives us a more fine-grained understanding of the various kinds of nodes that exist by making a clear distinction between assignment expressions and binary operation expressions, both of which are binary expressions.This change was also necessary because PHP ASTs do not use a distinguished kind of node for each kind of binary expression, as Joern does in C world; rather, (almost) all binary expressions are consistently represented as an
AST_BINARY_OP
node. The concrete operation that the node represents is specified via a flag on this node. Also see https://github.com/nikic/php-ast#flags.Generally I think that's a good thing, because there are even more kinds of binary operation expressions in PHP than there are in C, and having a distinguished node for each of them would generate a lot of overhead. This "flag" approach is, by the way, also used for
AST_ASSIGN_OP
nodes (see bullet point 5 of the previous Support for assignments section.)The following is a list of PHP's binary operators that will map to an
AST_BINARY_OP
node. This list makes every claim to completeness:Support for "greater", "greater or equal", "and" and "or" expressions
AST_AND
->ast.expressions.AndExpression
AST_OR
->ast.expressions.OrExpression
AST_GREATER
->ast.expressions.GreaterExpression
AST_GREATER_EQUAL
->ast.expressions.GreaterOrEqualExpression
Here I added two new
GreaterExpression
andGreaterOrEqualExpression
classes inheriting from the already existingRelationalExpression
.Thereby we also added support for the last remaining binary operators:
Looking at this, you might wonder: "What? Why are there distinguished node kinds for these four binary operation expressions, when all other binary operation expressions are mapped to the same node kind with an appropriate flag? Why should greater and greater or equal have distinguished nodes, when less and less or equal don't? Or and and or when xor doesn't? That's completely inconsistent!"
You would be absolutely right. We are looking here at an inconsistent design of PHP ASTs. Remember that PHP ASTs are new, under development and their structure non-final. This inconsistency is quite probably due to some historic reason, but that's the way it is. However, this structure shall change in the future, and these four nodes will also be mapped to
AST_BINARY_OP
nodes with appropriate flags. In fact, Niki's php-ast extension already introduced this change in version 20 of the PHP ASTs: See https://github.com/nikic/php-ast#version-changelog (we are currently still using version 10). However, as version 20 is still marked as unstable and subject to change, I do not want to use it yet (I ought to upgrade to version 15 at some point, though, which will make next to no difference.) Hence, in the future, these four nodes will not be necessary any longer, as the corresponding expressions will be mapped toBinaryOperationExpression
just as the other binary operation expressions are, too.