-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about parser behavior with right shifts #2156
Comments
I think that this is a problem with our Bison-based parser. We are specifying the precedence for various operators, but the way we parse SHR is as follows:
We are parsing SHR as two greater-than signs that should have no spaces between them. We do this to accommodate the parsing of templates. @ChrisDodd : does the precedence rule for R_ANGLE apply in this case? Do you see how we can fix this?
|
I will assign this to @ChrisDodd because he is a Bison expert. |
Ugh, an ugly problem. Yes the precedence here will be that of |
Would fully parenthesizing all arithmetic expressions in the P4 code output by intermediate passes of the compiler be a solution here? Or is the bigger issue that developer-written code can also be parsed into an AST that doesn't match what the P4 spec says it should? (Lisp! :-) |
The parsing is wrong according to the spec. |
ANTLR is a much inferior parser generator IMO. Its much harder to get the parser correct. |
I think we need to go back to the original formulation of having two different |
Our grammar has become much more complex in the meantime due to other features, such as free-form annotations, so this will be in comparison a small increment. |
@jnfoster : I think you were making some experiments using ANTLR. |
ANTLR also completes new software with its Java implementation first. Later, a C++ implementation is worked on. I would think p4c would use a C++ implementation of ANTLR and remain behind latest code. It's a minor issue. However, ANTLR uses a grammar file. I like Bison parser better because it allows greater flexibility to specify parsing behavior. |
Yes, and I'm afraid I agree with @ChrisDodd's assessment. |
(I'm not afraid of agreeing with @ChrisDodd, but sad that it didn't solve all my problems :-) |
Incidentally, ANTLR works with a grammar file. An example grammar file is included below. ANTLR uses LL parsing while Bison uses LALR. p4c Bison code would have to change to use ANTLR due to such parsing differences. It is not clear how does yacc integrate with ANTLR grammar file. p4c Bison is tightly integrated with yacc and p4c IR. Grad students or asic companies with limited compiler resources use ANTLR. |
The first parser I wrote for P4 was in ANTLR (in 2015), and I read the ANTLR4 book, so I have some experience. ANTLR integrates the lexer and the parser, you don't need a lexer anymore. Bison is an evolution of Yacc, what you mean is "flex" not "yacc". Bison is not integrated with the p4c IR in any way. ANTLR has a lot of commercial uses, more than I know for Bison; here is a list of contributed grammars https://github.com/antlr/grammars-v4, but I don't know how complete each of them is. And it generates ALL(*) parsers, which can do unbounded lookahead. I think that Bison has also an option to do GLR parsing, which could potentially simplify the grammar, but we are not using it, and it is not clear that it works well. From what I can tell, the main problem with ANTLR is that you will not know statically whether your grammar is deterministic; only when you attempt to parse a legal program you may realize that the grammar finds a different parse tree than the one you expected. |
This was my (negative) experience. |
Antlr is not really LL(*) -- its PEG (parser expression grammar) based, which is related, but not quite the same. The problem with PEG is that its not quite the same thing as a context free grammar, so its easy to write a PEG grammar that is subtly differrent from the CFG you think you're using. Then too, many problems don't show up until runtine with particular inputs that trigger them, rather than at build time. With LALR (bison/yacc) you know that if the grammar is accepted with no conflicts, the parser will accept exactly that grammar in linear time. No subtle ambiguities or exponential blowups that only manifest when you run an input that triggers them. |
I saw this paper about ANTLR and |
Yes -- that's the paper where they define LL() as "what antlr accepts", by which definition it is LL(), but its a definition that has little to do with LL. |
Thanks for fixing this so quickly. I have a minor follow-up. I reran all the p4c tests and stumbled across an issue in one of the test files ( /*
<AssignmentStatement>(120252)
<Member>(84583)c
<Member>(84584)h
<PathExpression>(84585)
h
<Cast>(120273)
<Cast>(120272)
<Shr>(120271)
<Shr>(84588)
<Member>(84589)a
<Member>(84590)h
<PathExpression>(84591)
h
<Constant>(1166) 3
<Type_InfInt>(1165)
<Constant>(120270) 8
<Type_InfInt>(120269)
<Type_Bits>(1111)
<Type_Bits>(1111) */
h.h.c = (bit<8>)(bit<8>)(h.h.a >> 3 >> 8); Parsing this again I get: /*
<AssignmentStatement>(1218)
<Member>(1201)c
<Member>(1200)h
<PathExpression>(1198)
h
<Cast>(1217)
<Cast>(1216)
<Shr>(1215)
<Member>(1209)a
<Member>(1208)h
<PathExpression>(1206)
h
<Shr>(1214)
<Constant>(1211) 3
<Type_InfInt>(1210)
<Constant>(1213) 8
<Type_InfInt>(1212)
<Type_Bits>(1111)
<Type_Bits>(1111) */
h.h.c = (bit<8>)(bit<8>)(h.h.a >> (3 >> 8)); Functionally, this is the same since shifts are associative, right? But why is the parser bracketing from right-to-left instead of left-to-right? |
Is this before or after the fix? |
After. I thought this was not worth another issue but I can create one. |
Hello
I have a question regarding operator precedence. I have the following p4 representation (p4_shift_order) with this expression tree:
However, when I parse this exact program again using
./p4c/build/p4c-bm2-ss --top4 ParseAnnotationBodies_0_ParseAnnotations --dump dmp -v bugs/validation_candidate.p4
I get this expression tree:Which one is the right order? Judging from the spec I would say the first program is correct.
+
has higher precedence than>>
and&
. And>>
has higher precedence than&
.Also if I change
>> 1
to/ 8w2
the order is preserved.source_program.txt
p4_shift_order.txt
The text was updated successfully, but these errors were encountered: