Skip to content

Operator Precedence

rocky edited this page Jan 29, 2022 · 5 revisions

What is Operator Precedence?

In Python you can write expressions, sometimes in various ways. Consider:

a*x**2 + b*x + c

This is a concise way of writing:

(a*(x**2)) + (b*x) + c

In the more concise expression, parentheses do not appear. It is understood that when an expression with exponentiation (**) occurs combined with an expression with multiplication (*), do the exponentiation first and then combine with the multiplication. Similarly, in the absence of parentheses dictating otherwise, perform multiplication (*) before addition (+).

Decompilation outputs expressions with parenthesis omitted when they are not necessary. The process by which the semantic actions, or tree phase of the decompiler figures this out is by using a precedence table which specifies which operator to perform when it is appears next to another operator.

Again, operator precedence is used to decide whether or not to add parenthesis around an expression. Note that an operator like multiplication is involved here and in the context of another expression. In the expression:

sin(2 * theta) + pi

the multiplication appears in the expression 2 * theta, but that expression doesn't butt up with another expression. Instead the expression is used as the argument in a function call. The call is used as the first operand of an addition operation, but the important thing is that two expressions with operators don't appear next to each other as was the case in a*x**2 + b*x + c.

How does operator precedence work?

To recap, in a*x**2 the expression with a multiplication operator appears next to an expression with an exponentiation operator. How can we specify whether this means (a*x)**2 or a*(x**2) to someone who does not know this?

In Python's the reference manual operators are separated into a hierarchy. Here is an excerpt of a much larger table:

operator Notes
.. . ...
** Exponentiation
+x, -x, ~x Positive, negative, bitwise NOT
*, @, /, //, % Multiplication, matrix multiplication, division, floor division, remainder
... ...

In this table, we see that ** appears in a table entry line before *. And that means to put the expression with ** in a group before grouping an expression with * when the two expressions are adjacent. Using the table above, grouping behavior would be identical if we had written a@x**2, a/x**2, etc.

The way this is specified in the decompiler though is to give for each operator a numeric value. Here is the corresponding code in consts.py that captures the above:

PRECEDENCE = {
    # ...
    "BINARY_POWER":           4,  # Exponentiation: **
    "unary_op":               6,  # Positive, negative, bitwise NOT: +x, -x, ~x
    "BINARY_MULTIPLY":        8,  # *
    "BINARY_MATRIX_MULTIPLY": 8,  # @
    "BINARY_DIVIDE":          8,  # /
    "BINARY_FLOOR_DIVIDE":    8,  # //
    "BINARY_MODULO":          8,  # Remainder, %
	# ...
	}

In the actual dictionary in the Python code, we list things from high number (8) to low (4) rather than as shown above. I ordered things this way just to make the correspondences with the Python reference manual more clear.

The thing to note is that lower numbers, like 4 group first, and higher numbers like 8, group last.

Examples

Above, we described how we see things from the human side, and how to specify precedence in a way the decompiler can act on. But from the standpoint of the decompiler, how does this all work?

We will describe this in terms of some examples before summarizing with the general process. Using the example a*x**2, The decompiler in the semantic or tree phase sees:

 1  expr
 2  bin_op (BINARY_MULTIPLY, precedence 8)
 3       0. expr
 4            L.   1        0  LOAD_NAME                a
 5       1. expr
 6          bin_op (BINARY_POWER, precedence 4)
 7               0. expr
 8                          2  LOAD_NAME                x
 9               1. expr
10                   constant
11                          4  LOAD_CONST               2
12               2. binary_operator
13                          6  BINARY_POWER
14       2. binary_operator
15                          8  BINARY_MULTIPLY
...

As we traverse from the top-level tree for expr at line 1 down, four more exprs are encountered below it. The expr at line 3 we don't have to worry about since that covers LOAD_NAME at line 4 and LOAD_NAME is not an operator. However, the tree node for the expr at line 5 we do have to take into account operator precedence.

At the expr tree node at line 5, the precedence value set from line 2 in BINARY_MULTIPLY is passed down. The expr first child in line 6 is bin_op, an operator which has a precedence of 4. Since parent precedence value 8 is greater than child precedence value 4, the expression with the parent should group first. When the parent groups first, we don't have to add parenthesis.

The other exprs under the second bin_op we don't have to worry about either since, again, those involve LOAD_NAME or LOAD_CONST which are not operators.

Now let's compare that with (a*x)**2:

The decompiler in the semantic or tree phase sees:

 1 expr
 2  bin_op (BINARY_POWER, precedence 4)
 3       0. expr
 4          bin_op (BINARY_MULTIPLY, precedence 8)
 5               0. expr
 6                   L.   1         0  LOAD_NAME                a
 7               1. expr
 8                                  2  LOAD_NAME                x
 9               2. binary_operator
10                                  4  BINARY_MULTIPLY
11       1. expr
12           constant
13                                  6  LOAD_CONST               2
14       2. binary_operator
15                                  8  BINARY_POWER

Again, as we descend the top-level tree for expr at line 1, our more exprs are seen: the first expr at line 3 involving another bin_op at line 4. Passed down to the expr at line 3 is the precedence BINARY_POWER with value 4. But in contrast to its bin_op parent at line 2, the bin_op child at line 4 has a precedence value 8. Since the parent value 4 is less than the child value 8, the child expression needs to group first and that is done by adding a parentheses in the handling of the expr at line 3.

In general, in the function that handles expressions, n_expr, a precedence value is passed to it via self.prec, That is compared to the expression's single child, if that happens to be an operator with a precedence value. If the precedence value of the child is less than the precedence value of the parent, then a set parenthesis needs to be added.

%p and %P specifiers

There are situations where operator composition appears in the tree in a more implicit way. This occurs when the template engine expands strings that have operators mentioned in a template string.

Going through an example will make this more clear.

Consider this Python code:

x = "system" if not "site-packages" in file else "local"

We have a grammar rule if_exp_not that corresponds roughly to IfExp(test=[UnaryOp(...)]) in a Python AST.

The output template an if_exp_not node of a parse tree is:

"if_exp_not":
  ( "%p if not %p else %p",
   ...
  (0, "expr", PRECEDENCE["unary_not"]),
  ...)

The way to read this is that if the nonterminal type of the parse tree is a if_exp_not node, then the first child of the parse tree is an expression (or "expr") that will get expanded in between the words "not" and "else".

The thing to note though is that in contrast to the Python AST for this, where there is an explicit UnaryOp, the UnaryOp operator has been folded into the nonterminal if_exp_not. But we still need a way to know whether we need parenthesis around the expression. In the example given above, ... if not "site-packages" in file else ..., we don't need additional parenthesis.

However if the expression is a or b, then, yes, do need parenthesis around this expression. Otherwise Python will interpret ... if not a or b else ... as ... if (not a) or b else ... rather than ... if not (a or b) else ....

The way indicate the correct meaning we intend is to set the precedence of the parent precedence for that particular "expr" to be the precedence of "unary_not": logically that is the surrounding context that this expression is getting used. Other "expr"s that might appear as placeholders in other parts of the template can have different precedences or no precedence.

In particular, there probably will never be a parenthesis required by the expression before "if not" or the expression after "else". So that precedence can be set to a high value.

If we change first and last "%p" to "%c", the precedence value is whatever was passed down to us from above. Rather than rely on that we set that to be something equal to what matches the precedence of an if ... else expression.

%P description to be continued...