-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEG syntax considerations #17
Conversation
For reference, Brian Ford's paper has a DOI https://doi.org/10.1145/982962.964011. |
I agree in the absence of a standard, it is probably better to stay close with the original paper rather than an implementation specific variant. Given that we chose PEG because it is machine readable, would it be difficult to write a conversion tool that took a BrianFord compatible grammar and converted it into a form that the other tools could read ? |
Yes... but it can get sort of tricky, as it depends on the tool. Take into account that we are talking about a small subset of the tools mentioned on Bryan Ford's page. The more idiomatic ones rely on the language itself and have got functions like As for the character groups, as said above, Mouse seems to favour the precedence operator when multiple sets are involved (when the opposite is actually recommended). For arpeggio, the same character sets are defined as python regexes. Luckily, the grammar included in the pull requests works out-of-the-box with In short, given a rule
By the way, I actually forgot to add one of the dealbreakers (so to speak) for Mouse, where it uses the quotation marks and apostrophes in the same way Java does: 's' for a character, "String" for a string. |
bec6712
to
486116d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling these. Some minor nits, and perhaps more seriously: how can this grammar work without numeric_primary?
@@ -482,10 +477,6 @@ user_defined_function <- | |||
(_ value_expression (_ ',' _ value_expression)* _)? | |||
')' | |||
|
|||
numeric_primary <- | |||
value_expression_primary | |||
/ numeric_value_function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hu? How can this disapear without breaking the grammar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peg_rules = re.sub("\\[", "r'[", peg_rules) | ||
peg_rules = re.sub("\\!r'\\[", "r'[^", peg_rules) | ||
peg_rules = re.sub("\\]", "]'", peg_rules) | ||
peg_rules = re.sub("EOF <-[^;]*;", "", peg_rules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm too enthused by hacking the grammar syntax with regular expressions. For now, that's probably all right, but we should have a plan to parse our grammar into a tree and then serialise it in the various syntaxes, I guess. I just wonder if someone else might not already have written something like that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's indeed a very interesting idea and much stable/reliable than regular expressions, though apparently this is enough and allowed us to quickly test this grammar with different parsers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hacking is to fit arpeggio, which asks for regular expressions: documentation on grammars.
Regex matches are given as strings with prefix
r
(e.g.r'\d*\.\d*|\d+'
).
We'd need to think about an alternate toolset which doesn't rely on them to test the ADQL examples.
As discussed privately, there is another parser generator called Canopy that can generate a parser in 4 languages: Java, Javascript, Python and Ruby. |
To add to this, the major differences with the formal PEG grammar are:
I was successfully able to generate parsers on all languages with these changes. EDIT: after the generated python parser complained about reserved keywords and fixing the PEG grammar accordingly (locally), it now returns a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge
Merge done. Although I think this should be discussed further elsewhere: github issue, mailing list... |
(I added @gmantele and @msdemlei as reviewers for the time being, feel free to add whomever you feel adequate.)
Hi all,
as PEG adoption for the ADQL grammar definition draws near(er), I have been testing different parser generators recently. Namely:
peg(1)
, written in C.I discovered by trial and error, that their definition of PEG and its syntax differ from the one defined by Bryan Ford in the original paper (cf. Figure 1 and Table 1). In other words, the original syntax looks like a formal definition rather than a standard one -- there doesn't seem to be any, actually. Other tools much more idiomatic tools exist: i.e., fitting the respective programming language.
These deviations range from aesthetical changes (separators and delimiters) to extensions and in some cases syntactic sugar -- you can see them on the links above. To give a couple of examples:
camelCase
, due to the underscore character_
being an extra operator.rule <- [a-zA-Z]
becomingrule = [a-z] / [A-Z]
peg(1)
, allows character classes to be negated with[^s]
, the Mouse counterpart being^[s]
, in both cases instead of the more formal![s]
In addition, both also define the first rule as the root of the parse tree... unlike arpeggio used by lyonetia.
I think that, for completeness purposes, following the formal definition would be the sensible decision, eventually adding some clarifications to deployers regarding the disparities between different generators and optionally offer some tools for conversion.
In this case, I have updated the
adql2.1.peg
file to reflect the syntax as defined by Bryan Ford, and made the required changes to thetestadql.py
file so the conversion to the arpeggio compliant PEG syntax does not fail. Sadly, tests fail, which we would need to check.Finally, other errors encountered by the parsers are:
The rulefixed: see 486116dnumeric_primary
is redefinedANY_CHAR
,unsigned_hexadecimal
,numeric_expression
,string_expression
,string_function
,bitwise_expression
,geometry_function
are defined but not used