Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about parser behavior with right shifts #2156

Closed
fruffy opened this issue Jan 27, 2020 · 21 comments
Closed

Question about parser behavior with right shifts #2156

fruffy opened this issue Jan 27, 2020 · 21 comments
Assignees
Labels
bug This behavior is unintended and should be fixed. fixed This topic is considered to be fixed.

Comments

@fruffy
Copy link
Collaborator

fruffy commented Jan 27, 2020

Hello
I have a question regarding operator precedence. I have the following p4 representation (p4_shift_order) with this expression tree:

control ingress(...) {
    apply {
        bit<4> tmp = 4w0;
        /*
      <AssignmentStatement>(20029)
        <Member>(19035)a
          <Member>(19036)h
            <PathExpression>(19037)
              h
        <BAnd>(20034)
          <Constant>(1155) 1
            <Type_Bits>(1105)
          <Shr>(20044)
            <Add>(19042)
              <Constant>(1156) 2
                <Type_Bits>(1105)
              <PathExpression>(19044)
                tmp
            <Constant>(20043) 1
              <Type_InfInt>(20042) */
        h.h.a = 4w1 & (4w2 + tmp) >> 1;
    }
}

However, when I parse this exact program again using ./p4c/build/p4c-bm2-ss --top4 ParseAnnotationBodies_0_ParseAnnotations --dump dmp -v bugs/validation_candidate.p4 I get this expression tree:

control ingress(...) {
    apply {
        bit<4> tmp = 4w0;
        /* 
      <AssignmentStatement>(1164)
        <Member>(1154)a
          <Member>(1153)h
            <PathExpression>(1151)
              h
        <Shr>(1163)
          <BAnd>(1160)
            <Constant>(1155) 1
              <Type_Bits>(1105)
            <Add>(1159)
              <Constant>(1156) 2
                <Type_Bits>(1105)
              <PathExpression>(1157)
                tmp
          <Constant>(1162) 1
            <Type_InfInt>(1161) */
        h.h.a = (4w1 & 4w2 + tmp) >> 1;
    }
}

Which one is the right order? Judging from the spec I would say the first program is correct. + has higher precedence than >> and &. And >> has higher precedence than &.

Also if I change >> 1 to / 8w2 the order is preserved.

source_program.txt
p4_shift_order.txt

@mihaibudiu
Copy link
Contributor

I think that this is a problem with our Bison-based parser. We are specifying the precedence for various operators, but the way we parse SHR is as follows:

expression: ...
    | expression "<<" expression         { $$ = new IR::Shl(@1 + @3, $1, $3); }
    | expression ">" ">" expression
        { driver.checkShift(@2, @3); $$ = new IR::Shr(@1 + @4, $1, $4); }

We are parsing SHR as two greater-than signs that should have no spaces between them. We do this to accommodate the parsing of templates. @ChrisDodd : does the precedence rule for R_ANGLE apply in this case? Do you see how we can fix this?

%left COMMA
%nonassoc QUESTION
%nonassoc COLON
%left OR
%left AND
%left EQ NE
%left L_ANGLE R_ANGLE LE GE
%left BIT_OR
%left BIT_XOR
%left BIT_AND
%left SHL
%left PP PLUS MINUS PLUS_SAT MINUS_SAT

@mihaibudiu mihaibudiu added the bug This behavior is unintended and should be fixed. label Jan 27, 2020
@mihaibudiu
Copy link
Contributor

I will assign this to @ChrisDodd because he is a Bison expert.

@ChrisDodd
Copy link
Contributor

Ugh, an ugly problem. Yes the precedence here will be that of >, not that of >>. We can half-fix this by setting the precedence on the rule, but that would only help the reverse case, not this one. I'll see if I can figure out something.

@jafingerhut
Copy link
Contributor

Would fully parenthesizing all arithmetic expressions in the P4 code output by intermediate passes of the compiler be a solution here?

Or is the bigger issue that developer-written code can also be parsed into an AST that doesn't match what the P4 spec says it should?

(Lisp! :-)

@mihaibudiu
Copy link
Contributor

The parsing is wrong according to the spec.
I am wondering whether we should try a better parser generator, like ANTLR.

@ChrisDodd
Copy link
Contributor

ANTLR is a much inferior parser generator IMO. Its much harder to get the parser correct.

@ChrisDodd
Copy link
Contributor

I think we need to go back to the original formulation of having two different > tokens (depending on whether the next character is another >) to get the precedence right. It makes the grammar a bit more complex, but should fix things. I'll put together a change.

@mihaibudiu
Copy link
Contributor

Our grammar has become much more complex in the meantime due to other features, such as free-form annotations, so this will be in comparison a small increment.

@mihaibudiu
Copy link
Contributor

@jnfoster : I think you were making some experiments using ANTLR.

@hesingh
Copy link
Contributor

hesingh commented Jan 27, 2020

ANTLR also completes new software with its Java implementation first. Later, a C++ implementation is worked on. I would think p4c would use a C++ implementation of ANTLR and remain behind latest code. It's a minor issue.

However, ANTLR uses a grammar file. I like Bison parser better because it allows greater flexibility to specify parsing behavior.

@mihaibudiu mihaibudiu added the fixed This topic is considered to be fixed. label Jan 27, 2020
@jnfoster
Copy link
Contributor

Yes, and I'm afraid I agree with @ChrisDodd's assessment.

@jnfoster
Copy link
Contributor

(I'm not afraid of agreeing with @ChrisDodd, but sad that it didn't solve all my problems :-)

@hesingh
Copy link
Contributor

hesingh commented Jan 28, 2020

Incidentally, ANTLR works with a grammar file. An example grammar file is included below.

https://github.com/performance-queries/marple/blob/master/src/main/antlr4/edu/mit/needlstk/PerfQuery.g4

ANTLR uses LL parsing while Bison uses LALR. p4c Bison code would have to change to use ANTLR due to such parsing differences. It is not clear how does yacc integrate with ANTLR grammar file. p4c Bison is tightly integrated with yacc and p4c IR.

Grad students or asic companies with limited compiler resources use ANTLR.

@mihaibudiu
Copy link
Contributor

The first parser I wrote for P4 was in ANTLR (in 2015), and I read the ANTLR4 book, so I have some experience. ANTLR integrates the lexer and the parser, you don't need a lexer anymore.

Bison is an evolution of Yacc, what you mean is "flex" not "yacc". Bison is not integrated with the p4c IR in any way.

ANTLR has a lot of commercial uses, more than I know for Bison; here is a list of contributed grammars https://github.com/antlr/grammars-v4, but I don't know how complete each of them is. And it generates ALL(*) parsers, which can do unbounded lookahead. I think that Bison has also an option to do GLR parsing, which could potentially simplify the grammar, but we are not using it, and it is not clear that it works well.

From what I can tell, the main problem with ANTLR is that you will not know statically whether your grammar is deterministic; only when you attempt to parse a legal program you may realize that the grammar finds a different parse tree than the one you expected.

@jnfoster
Copy link
Contributor

From what I can tell, the main problem with ANTLR is that you will not know statically whether your grammar is deterministic; only when you attempt to parse a legal program you may realize that the grammar finds a different parse tree than the one you expected.

This was my (negative) experience.

@ChrisDodd
Copy link
Contributor

Antlr is not really LL(*) -- its PEG (parser expression grammar) based, which is related, but not quite the same. The problem with PEG is that its not quite the same thing as a context free grammar, so its easy to write a PEG grammar that is subtly differrent from the CFG you think you're using. Then too, many problems don't show up until runtine with particular inputs that trigger them, rather than at build time. With LALR (bison/yacc) you know that if the grammar is accepted with no conflicts, the parser will accept exactly that grammar in linear time. No subtle ambiguities or exponential blowups that only manifest when you run an input that triggers them.

@hesingh
Copy link
Contributor

hesingh commented Jan 28, 2020

I saw this paper about ANTLR and LL(*).

https://www.antlr.org/papers/LL-star-PLDI11.pdf

@ChrisDodd
Copy link
Contributor

Yes -- that's the paper where they define LL() as "what antlr accepts", by which definition it is LL(), but its a definition that has little to do with LL.

@fruffy
Copy link
Collaborator Author

fruffy commented Jan 28, 2020

Thanks for fixing this so quickly. I have a minor follow-up. I reran all the p4c tests and stumbled across an issue in one of the test files (strength3.p4). After the lowering pass (PassManager_7_LowerExpressions) I get the following output:

/* 
<AssignmentStatement>(120252)
  <Member>(84583)c
    <Member>(84584)h
      <PathExpression>(84585)
        h
  <Cast>(120273)
    <Cast>(120272)
      <Shr>(120271)
        <Shr>(84588)
          <Member>(84589)a
            <Member>(84590)h
              <PathExpression>(84591)
                h
          <Constant>(1166) 3
            <Type_InfInt>(1165)
        <Constant>(120270) 8
          <Type_InfInt>(120269)
      <Type_Bits>(1111)
    <Type_Bits>(1111) */
h.h.c = (bit<8>)(bit<8>)(h.h.a >> 3 >> 8);

Parsing this again I get:

/* 
<AssignmentStatement>(1218)
  <Member>(1201)c
    <Member>(1200)h
      <PathExpression>(1198)
        h
  <Cast>(1217)
    <Cast>(1216)
      <Shr>(1215)
        <Member>(1209)a
          <Member>(1208)h
            <PathExpression>(1206)
              h
        <Shr>(1214)
          <Constant>(1211) 3
            <Type_InfInt>(1210)
          <Constant>(1213) 8
            <Type_InfInt>(1212)
      <Type_Bits>(1111)
    <Type_Bits>(1111) */
h.h.c = (bit<8>)(bit<8>)(h.h.a >> (3 >> 8));

Functionally, this is the same since shifts are associative, right? But why is the parser bracketing from right-to-left instead of left-to-right?
Full programs:
shift_associativity.p4.txt
shift_associativity_parsed.p4.txt

@mihaibudiu
Copy link
Contributor

Is this before or after the fix?
I think you should file a new issue with the p4 program source attached (without the many comments).

@fruffy
Copy link
Collaborator Author

fruffy commented Jan 28, 2020

After. I thought this was not worth another issue but I can create one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This behavior is unintended and should be fixed. fixed This topic is considered to be fixed.
Projects
None yet
Development

No branches or pull requests

6 participants