Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for assignments and for binary operation expressions #86

Merged
merged 5 commits into from
Dec 2, 2015
Merged

Support for assignments and for binary operation expressions #86

merged 5 commits into from
Dec 2, 2015

Conversation

malteskoruppa
Copy link
Collaborator

Support for assignments

  • AST_ASSIGN -> ast.expressions.AssignmentExpression
  • AST_ASSIGN_REF -> ast.php.expressions.PHPAssignmentByRefExpression
  • AST_ASSIGN_OP -> ast.expressions.AssignmentWithOpExpression

A few comments on this:

  1. I renamed the already existing class AssignmentExpr into AssignmentExpression and fixed C world accordingly. The reason is consistency: All other node classes whose name matches ^.*Expr.*$ have a name that matches ^.*Expression$, except for AssignmentExpr and SizeofExpr. (In the same spirit I would suggest renaming SizeofExpr too, but since there is no sizeof operator in PHP, I did not touch this node. I will add a comment to issue Some refactorings to abstract from C and PHP #80, though.)
  2. I refactored the class BinaryExpression (which is the parent of AssignmentExpression) to not override getChildCount() and getChild(int) any longer. I think that when maintaining an AST node's children cleanly, overriding either of these two methods inherited from ASTNode should never be necessary, as the super-methods in ASTNode should always do the right thing. Whenever such an override is present and necessary in a node class, it is a sign that something else was done uncleanly before. In this particular case however, removing these methods does not break anything, as the super-methods return the same values as the overriding methods did anyway. Thus I suppose these overrides were some kind of legacy restover from earlier versions of Joern, and are not needed any longer.
  3. I also enriched the class AssignmentExpression with public methods getVariable() and getAssignExpression() (and corresponding setters) which only relay to the protected methods getLeft() and getRight() (accordingly for the setters) from the BinaryExpression parent class, for added comfort and consistency.
  4. As opposed to C, PHP supports references only in a limited way (mostly, in assignments and as function parameters): For assignments, there is an explicit "assignment by reference" operator =& (often written as $a = &$b in the C spirit, but actually, the form $a =& $b is technically more accurate; both forms are acceptable in PHP and generate identical ASTs). Since this operator is quite php-specific, I introduced an ast.php.expressions.PHPAssignmentByRefExpression which extends ast.expressions.AssignmentExpression.
  5. In the case of the "combined assignment and binary operation" operators (e.g., +=, *=, .=, etc.), I declared a new base class ast.expressions.AssignmentWithOpExpression, since such assignment operations exist in other high-level languages as well (say, Java).

Support for binary operation expressions

  • AST_BINARY_OP -> ast.expressions.BinaryOperationExpression

I introduced a new class BinaryOperationExpression which extends BinaryExpression, and made this new class a common parent for the following nodes in the ast.expressions package (these nodes previously extended BinaryExpression directly):

  • AdditiveExpression
  • AndExpression
  • BitAndExpression
  • EqualityExpression
  • ExclusiveOrExpression
  • InclusiveOrExpression
  • MultiplicativeExpression
  • OrExpression
  • RelationalExpression
  • ShiftExpression

Thereby, the class BinaryExpression now only has two direct children: AssignmentExpression and BinaryOperationExpression. This gives us a more fine-grained understanding of the various kinds of nodes that exist by making a clear distinction between assignment expressions and binary operation expressions, both of which are binary expressions.

This change was also necessary because PHP ASTs do not use a distinguished kind of node for each kind of binary expression, as Joern does in C world; rather, (almost) all binary expressions are consistently represented as an AST_BINARY_OP node. The concrete operation that the node represents is specified via a flag on this node. Also see https://github.com/nikic/php-ast#flags.
Generally I think that's a good thing, because there are even more kinds of binary operation expressions in PHP than there are in C, and having a distinguished node for each of them would generate a lot of overhead. This "flag" approach is, by the way, also used for AST_ASSIGN_OP nodes (see bullet point 5 of the previous Support for assignments section.)

The following is a list of PHP's binary operators that will map to an AST_BINARY_OP node. This list makes every claim to completeness:

// bit operators
$or1 | $or2;
$and1 & $and2;
$msg ^ $otp;
$x << $y;
$x >> $y;
// string operators
$str1 . $str2;
// arithmetic operators
$x + $y;
$x - $y;
$x * $y;
$x / $y;
$x % $y;
$x ** $y;
// boolean operators
$x xor $y;
// comparison operators
$x === $y;
$x !== $y;
$x == $y;
$x != $y;
$x < $y;
$x <= $y;
$x <=> $y;

Support for "greater", "greater or equal", "and" and "or" expressions

  • AST_AND -> ast.expressions.AndExpression
  • AST_OR -> ast.expressions.OrExpression
  • AST_GREATER -> ast.expressions.GreaterExpression
  • AST_GREATER_EQUAL -> ast.expressions.GreaterOrEqualExpression

Here I added two new GreaterExpression and GreaterOrEqualExpression classes inheriting from the already existing RelationalExpression.

Thereby we also added support for the last remaining binary operators:

// boolean operators
$x && $y;
$x || $y;
// comparison operators
$x > $y;
$x >= $y;

Looking at this, you might wonder: "What? Why are there distinguished node kinds for these four binary operation expressions, when all other binary operation expressions are mapped to the same node kind with an appropriate flag? Why should greater and greater or equal have distinguished nodes, when less and less or equal don't? Or and and or when xor doesn't? That's completely inconsistent!"

You would be absolutely right. We are looking here at an inconsistent design of PHP ASTs. Remember that PHP ASTs are new, under development and their structure non-final. This inconsistency is quite probably due to some historic reason, but that's the way it is. However, this structure shall change in the future, and these four nodes will also be mapped to AST_BINARY_OP nodes with appropriate flags. In fact, Niki's php-ast extension already introduced this change in version 20 of the PHP ASTs: See https://github.com/nikic/php-ast#version-changelog (we are currently still using version 10). However, as version 20 is still marked as unstable and subject to change, I do not want to use it yet (I ought to upgrade to version 15 at some point, though, which will make next to no difference.) Hence, in the future, these four nodes will not be necessary any longer, as the corresponding expressions will be mapped to BinaryOperationExpression just as the other binary operation expressions are, too.

* AST_ASSIGN -> ast.expressions.AssignmentExpression

A few notes:
(1) I renamed AssignmentExpr into AssignmentExpression and fixed C world accordingly.
    This is mostly for consistency. All other node types ending with "Expr.*" ended in "Expression", except for AssignmentExpr and SizeofExpr.
    (In the same spirit I would suggest renaming SizeofExpr too, but since there is no sizeof operator in PHP, I did not touch this node.)
(2) I refactored BinaryExpression to not override getChildCount() and getChild(int). I think that when maintaining an AST node's children cleanly,
    such overrides should never be necessary in any ASTNode, and the super-methods in ASTNode should always do the right thing. Whenever such an override *is*
    present and necessary in an ASTNode, it is a sign that something else was done uncleanly before. In this particular case however, removing these methods
    did not break anything, as the super-methods return the same values as the overriding methods anyway. Thus I suppose these overrides were some kind
    of legacy restover from earlier versions of Joern, and are not needed any longer.
(3) I also enriched the class AssignmentExpression with methods getVariable() and getAssignExpression() (and corresponding setters) which only forward
    to getLeft() and getRight() (accordingly for the setters), for comfort and consistency.
* AST_ASSIGN_REF -> ast.php.expressions.PHPAssignmentByRefExpression

As opposed to C, PHP supports references only in a very limited way (mostly in assignments and as function parameters);
there is an explicit "assignment by reference" operator =& (often written as "$a = &$b" in the C spirit, but actually, the form "$a =& $b" would be technically more accurate; both forms are acceptable in PHP) and thus I introduced a PHPAssignmentByRefExpression which extends ast.expressions.AssignmentExpression.
* AST_ASSIGN_OP -> ast.expressions.AssignmentWithOpExpression

In this case, since assignment operations such as +=, *= etc. exist in other high-level languages as well (say, Java), I decided to put the new node AssignmentWithOpExpression into the ast.expressions package.
* AST_BINARY_OP -> ast.expressions.BinaryOperationExpression

I introduced a new class BinaryOperationExpression which extends BinaryExpression, and made this new class a common ancestor for the following nodes:
- AdditiveExpression
- AndExpression
- BitAndExpression
- EqualityExpression
- ExclusiveOrExpression
- InclusiveOrExpression
- MultiplicativeExpression
- OrExpression
- RelationalExpression
- ShiftExpression

Thereby, the class BinaryExpression now has only two *direct* children: AssignmentExpression and BinaryOperationExpression, thus giving us a more fine-grained understanding of the various kinds of nodes that exist by making a clear distinction between assignment expressions and binary operation expressions, both of which are binary expressions.

This change was also necessary because PHP ASTs do *not* use a distinguished kind of node for each kind of binary expression, as Joern does in C world; rather, (almost) all binary expressions are represented as an AST_BINARY_OP node. The concrete operation that the node represents is specified via a flag on this node. Also see https://github.com/nikic/php-ast#flags.
Generally I think that's a good thing, because there are even more kinds of binary operation expressions in PHP than there are in C, and having a distinguished node for each of them would generate a lot of overhead. This "flag" approach is, by the way, also used for AST_ASSIGN_OP nodes (added in the previous commit).

The following is a list of PHP's binary operators that will map to an AST_BINARY_OP node. This list makes every claim to completeness:

// bit operators
$or1 | $or2;
$and1 & $and2;
$msg ^ $otp;
$x << $y;
$x >> $y;
// string operators
$str1 . $str2;
// arithmetic operators
$x + $y;
$x - $y;
$x * $y;
$x / $y;
$x % $y;
$x ** $y;
// boolean operators
$x xor $y;
// comparison operators
$x === $y;
$x !== $y;
$x == $y;
$x != $y;
$x < $y;
$x <= $y;
$x <=> $y;
* AST_GREATER -> ast.expressions.GreaterExpression
* AST_GREATER_EQUAL -> ast.expressions.GreaterOrEqualExpression
* AST_AND -> ast.expressions.AndExpression
* AST_OR -> ast.expressions.OrExpression

I had to add GreaterExpression and GreaterOrEqualExpression as new classes inheriting from RelationalExpression.

These four nodes are extremely similar to the BinaryOperationExpression's added in the last commit.
In fact, in a future version of the php-ast extension by Niki (version 20), these nodes will not be necessary any longer, as the corresponding expressions will be mapped to BinaryOperationExpression too. The fact that they currently aren't is most probably due to historic reasons, and illustrates that PHP 7 is still under development.
@fabsx00
Copy link
Member

fabsx00 commented Dec 2, 2015

When we rename nodes, we will need to make sure none of the steps in python-joern still depend on the old name. I have created an issue for this: fabsx00/python-joern#13 to make sure that I don't forget about this before merging the phpsupport branch at some point.

fabsx00 pushed a commit that referenced this pull request Dec 2, 2015
Support for assignments and for binary operation expressions
@fabsx00 fabsx00 merged commit 463e7c7 into octopus-platform:phpsupport Dec 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants