Question about parser behavior with right shifts #2156

fruffy · 2020-01-27T15:11:10Z

Hello
I have a question regarding operator precedence. I have the following p4 representation (p4_shift_order) with this expression tree:

control ingress(...) {
    apply {
        bit<4> tmp = 4w0;
        /*
      <AssignmentStatement>(20029)
        <Member>(19035)a
          <Member>(19036)h
            <PathExpression>(19037)
              h
        <BAnd>(20034)
          <Constant>(1155) 1
            <Type_Bits>(1105)
          <Shr>(20044)
            <Add>(19042)
              <Constant>(1156) 2
                <Type_Bits>(1105)
              <PathExpression>(19044)
                tmp
            <Constant>(20043) 1
              <Type_InfInt>(20042) */
        h.h.a = 4w1 & (4w2 + tmp) >> 1;
    }
}

However, when I parse this exact program again using ./p4c/build/p4c-bm2-ss --top4 ParseAnnotationBodies_0_ParseAnnotations --dump dmp -v bugs/validation_candidate.p4 I get this expression tree:

control ingress(...) {
    apply {
        bit<4> tmp = 4w0;
        /* 
      <AssignmentStatement>(1164)
        <Member>(1154)a
          <Member>(1153)h
            <PathExpression>(1151)
              h
        <Shr>(1163)
          <BAnd>(1160)
            <Constant>(1155) 1
              <Type_Bits>(1105)
            <Add>(1159)
              <Constant>(1156) 2
                <Type_Bits>(1105)
              <PathExpression>(1157)
                tmp
          <Constant>(1162) 1
            <Type_InfInt>(1161) */
        h.h.a = (4w1 & 4w2 + tmp) >> 1;
    }
}

Which one is the right order? Judging from the spec I would say the first program is correct. + has higher precedence than >> and &. And >> has higher precedence than &.

Also if I change >> 1 to / 8w2 the order is preserved.

source_program.txt
p4_shift_order.txt

The text was updated successfully, but these errors were encountered:

mihaibudiu · 2020-01-27T22:12:06Z

I think that this is a problem with our Bison-based parser. We are specifying the precedence for various operators, but the way we parse SHR is as follows:

expression: ...
    | expression "<<" expression         { $$ = new IR::Shl(@1 + @3, $1, $3); }
    | expression ">" ">" expression
        { driver.checkShift(@2, @3); $$ = new IR::Shr(@1 + @4, $1, $4); }

We are parsing SHR as two greater-than signs that should have no spaces between them. We do this to accommodate the parsing of templates. @ChrisDodd : does the precedence rule for R_ANGLE apply in this case? Do you see how we can fix this?

%left COMMA
%nonassoc QUESTION
%nonassoc COLON
%left OR
%left AND
%left EQ NE
%left L_ANGLE R_ANGLE LE GE
%left BIT_OR
%left BIT_XOR
%left BIT_AND
%left SHL
%left PP PLUS MINUS PLUS_SAT MINUS_SAT

mihaibudiu · 2020-01-27T22:13:38Z

I will assign this to @ChrisDodd because he is a Bison expert.

ChrisDodd · 2020-01-27T22:17:21Z

Ugh, an ugly problem. Yes the precedence here will be that of >, not that of >>. We can half-fix this by setting the precedence on the rule, but that would only help the reverse case, not this one. I'll see if I can figure out something.

jafingerhut · 2020-01-27T22:26:47Z

Would fully parenthesizing all arithmetic expressions in the P4 code output by intermediate passes of the compiler be a solution here?

Or is the bigger issue that developer-written code can also be parsed into an AST that doesn't match what the P4 spec says it should?

(Lisp! :-)

mihaibudiu · 2020-01-27T22:29:13Z

The parsing is wrong according to the spec.
I am wondering whether we should try a better parser generator, like ANTLR.

ChrisDodd · 2020-01-27T22:35:28Z

ANTLR is a much inferior parser generator IMO. Its much harder to get the parser correct.

ChrisDodd · 2020-01-27T22:37:40Z

I think we need to go back to the original formulation of having two different > tokens (depending on whether the next character is another >) to get the precedence right. It makes the grammar a bit more complex, but should fix things. I'll put together a change.

mihaibudiu · 2020-01-27T22:38:35Z

Our grammar has become much more complex in the meantime due to other features, such as free-form annotations, so this will be in comparison a small increment.

mihaibudiu · 2020-01-27T22:39:29Z

@jnfoster : I think you were making some experiments using ANTLR.

hesingh · 2020-01-27T23:05:32Z

ANTLR also completes new software with its Java implementation first. Later, a C++ implementation is worked on. I would think p4c would use a C++ implementation of ANTLR and remain behind latest code. It's a minor issue.

However, ANTLR uses a grammar file. I like Bison parser better because it allows greater flexibility to specify parsing behavior.

jnfoster · 2020-01-28T04:37:02Z

Yes, and I'm afraid I agree with @ChrisDodd's assessment.

jnfoster · 2020-01-28T04:37:27Z

(I'm not afraid of agreeing with @ChrisDodd, but sad that it didn't solve all my problems :-)

hesingh · 2020-01-28T17:01:34Z

Incidentally, ANTLR works with a grammar file. An example grammar file is included below.

https://github.com/performance-queries/marple/blob/master/src/main/antlr4/edu/mit/needlstk/PerfQuery.g4

ANTLR uses LL parsing while Bison uses LALR. p4c Bison code would have to change to use ANTLR due to such parsing differences. It is not clear how does yacc integrate with ANTLR grammar file. p4c Bison is tightly integrated with yacc and p4c IR.

Grad students or asic companies with limited compiler resources use ANTLR.

mihaibudiu · 2020-01-28T17:33:29Z

The first parser I wrote for P4 was in ANTLR (in 2015), and I read the ANTLR4 book, so I have some experience. ANTLR integrates the lexer and the parser, you don't need a lexer anymore.

Bison is an evolution of Yacc, what you mean is "flex" not "yacc". Bison is not integrated with the p4c IR in any way.

ANTLR has a lot of commercial uses, more than I know for Bison; here is a list of contributed grammars https://github.com/antlr/grammars-v4, but I don't know how complete each of them is. And it generates ALL(*) parsers, which can do unbounded lookahead. I think that Bison has also an option to do GLR parsing, which could potentially simplify the grammar, but we are not using it, and it is not clear that it works well.

From what I can tell, the main problem with ANTLR is that you will not know statically whether your grammar is deterministic; only when you attempt to parse a legal program you may realize that the grammar finds a different parse tree than the one you expected.

jnfoster · 2020-01-28T17:45:45Z

From what I can tell, the main problem with ANTLR is that you will not know statically whether your grammar is deterministic; only when you attempt to parse a legal program you may realize that the grammar finds a different parse tree than the one you expected.

This was my (negative) experience.

ChrisDodd · 2020-01-28T19:25:58Z

Antlr is not really LL(*) -- its PEG (parser expression grammar) based, which is related, but not quite the same. The problem with PEG is that its not quite the same thing as a context free grammar, so its easy to write a PEG grammar that is subtly differrent from the CFG you think you're using. Then too, many problems don't show up until runtine with particular inputs that trigger them, rather than at build time. With LALR (bison/yacc) you know that if the grammar is accepted with no conflicts, the parser will accept exactly that grammar in linear time. No subtle ambiguities or exponential blowups that only manifest when you run an input that triggers them.

hesingh · 2020-01-28T19:34:06Z

I saw this paper about ANTLR and LL(*).

https://www.antlr.org/papers/LL-star-PLDI11.pdf

ChrisDodd · 2020-01-28T19:45:47Z

Yes -- that's the paper where they define LL() as "what antlr accepts", by which definition it is LL(), but its a definition that has little to do with LL.

fruffy · 2020-01-28T22:45:45Z

Thanks for fixing this so quickly. I have a minor follow-up. I reran all the p4c tests and stumbled across an issue in one of the test files (strength3.p4). After the lowering pass (PassManager_7_LowerExpressions) I get the following output:

/* 
<AssignmentStatement>(120252)
  <Member>(84583)c
    <Member>(84584)h
      <PathExpression>(84585)
        h
  <Cast>(120273)
    <Cast>(120272)
      <Shr>(120271)
        <Shr>(84588)
          <Member>(84589)a
            <Member>(84590)h
              <PathExpression>(84591)
                h
          <Constant>(1166) 3
            <Type_InfInt>(1165)
        <Constant>(120270) 8
          <Type_InfInt>(120269)
      <Type_Bits>(1111)
    <Type_Bits>(1111) */
h.h.c = (bit<8>)(bit<8>)(h.h.a >> 3 >> 8);

Parsing this again I get:

/* 
<AssignmentStatement>(1218)
  <Member>(1201)c
    <Member>(1200)h
      <PathExpression>(1198)
        h
  <Cast>(1217)
    <Cast>(1216)
      <Shr>(1215)
        <Member>(1209)a
          <Member>(1208)h
            <PathExpression>(1206)
              h
        <Shr>(1214)
          <Constant>(1211) 3
            <Type_InfInt>(1210)
          <Constant>(1213) 8
            <Type_InfInt>(1212)
      <Type_Bits>(1111)
    <Type_Bits>(1111) */
h.h.c = (bit<8>)(bit<8>)(h.h.a >> (3 >> 8));

Functionally, this is the same since shifts are associative, right? But why is the parser bracketing from right-to-left instead of left-to-right?
Full programs:
shift_associativity.p4.txt
shift_associativity_parsed.p4.txt

mihaibudiu · 2020-01-28T22:58:06Z

Is this before or after the fix?
I think you should file a new issue with the p4 program source attached (without the many comments).

fruffy · 2020-01-28T23:00:14Z

After. I thought this was not worth another issue but I can create one.

mihaibudiu added the bug This behavior is unintended and should be fixed. label Jan 27, 2020

mihaibudiu assigned ChrisDodd Jan 27, 2020

mihaibudiu mentioned this issue Jan 27, 2020

Fix right-shift operator precedence #2158

Merged

mihaibudiu added the fixed This topic is considered to be fixed. label Jan 27, 2020

fruffy mentioned this issue Jan 28, 2020

Associative order of shift operators #2161

Closed

mihaibudiu closed this as completed Jan 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about parser behavior with right shifts #2156

Question about parser behavior with right shifts #2156

fruffy commented Jan 27, 2020 •

edited

Loading

mihaibudiu commented Jan 27, 2020

mihaibudiu commented Jan 27, 2020

ChrisDodd commented Jan 27, 2020

jafingerhut commented Jan 27, 2020

mihaibudiu commented Jan 27, 2020

ChrisDodd commented Jan 27, 2020

ChrisDodd commented Jan 27, 2020

mihaibudiu commented Jan 27, 2020

mihaibudiu commented Jan 27, 2020

hesingh commented Jan 27, 2020

jnfoster commented Jan 28, 2020

jnfoster commented Jan 28, 2020

hesingh commented Jan 28, 2020

mihaibudiu commented Jan 28, 2020

jnfoster commented Jan 28, 2020

ChrisDodd commented Jan 28, 2020

hesingh commented Jan 28, 2020

ChrisDodd commented Jan 28, 2020

fruffy commented Jan 28, 2020

mihaibudiu commented Jan 28, 2020

fruffy commented Jan 28, 2020

Question about parser behavior with right shifts #2156

Question about parser behavior with right shifts #2156

Comments

fruffy commented Jan 27, 2020 • edited Loading

mihaibudiu commented Jan 27, 2020

mihaibudiu commented Jan 27, 2020

ChrisDodd commented Jan 27, 2020

jafingerhut commented Jan 27, 2020

mihaibudiu commented Jan 27, 2020

ChrisDodd commented Jan 27, 2020

ChrisDodd commented Jan 27, 2020

mihaibudiu commented Jan 27, 2020

mihaibudiu commented Jan 27, 2020

hesingh commented Jan 27, 2020

jnfoster commented Jan 28, 2020

jnfoster commented Jan 28, 2020

hesingh commented Jan 28, 2020

mihaibudiu commented Jan 28, 2020

jnfoster commented Jan 28, 2020

ChrisDodd commented Jan 28, 2020

hesingh commented Jan 28, 2020

ChrisDodd commented Jan 28, 2020

fruffy commented Jan 28, 2020

mihaibudiu commented Jan 28, 2020

fruffy commented Jan 28, 2020

fruffy commented Jan 27, 2020 •

edited

Loading