New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can this be theoretically parsed by peg? how? #489
Comments
Consider use lookahead, something like:
|
both |
This library (or any conformant PEG library) has a greedy Looking at Modifying you simplified self-contained illustration:
which is equivalent to:
|
For reference I built a demo grammar you can play around with showing an implementation of what @flaviojs posted. While we can very closely mimic the EBNF from the RFC I have chosen to slightly alter the format to hint that the domain part must be followed by a top-level domain part. For the most part, however, it's practically identical to the RFC and that's some of the fun of PEGs. For a glance…
|
@dmsnell Your grammar http://peg.arcanis.fr/2cx6Sx/2/ seems lacking standalone Recently I'm working on an rST parser with PEG.js. I implemented the standalone-hyperlinks according to RFC 3986's absolute-URI ABNF definition. (RFC 3986 is an update of RFC 2396) Though rST spec restricts URI schemes to 'known schemes' , I don't put it in grammar, it is better to be put in semantic validating. FYI here's my implementation in the parser named The URI is already parsed into several meaningful parts by the grammar, however, I just take the whole URI string for processing rST. And about the This way is far no elegant as @dmsnell's assertion way. Just informing of thought. And I have to refine my past grammars... (/ω\) Notice the construction of Hostname rule make sure Hostname is always ended with TopLabel, and with a little /*
PEG.js non greddy match
==========================
BNF as follows:
hostname = *( domainlabel "." ) toplabel [ "." ]
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
*/
Hostnames
= a:Hostname b:("\n" Hostname)*
{ return [a].concat(b.map(z => z[1])) }
Hostname
= d:DomainLabel "." h:Hostname
{
return Object.assign(h, {
domainlabel: [d].concat(h.domainlabel || [])
})
}
/ t:($(TopLabel "."?))
{ return { toplabel: t } }
// be careful about the order of parsing expressions
// specific ones go first
DomainLabel
// AlphaNum (AlphaNum / "-") c:AlphaNum
// above is greddy, let's do similar convert like Hostname
= $(a:AlphaNum b:DomainLabelNonFirst)
/ AlphaNum
DomainLabelNonFirst
= $((Dash / AlphaNum) DomainLabelNonFirst)
/ AlphaNum
TopLabel
// same method as DomainLabel
= $(Alpha TopLabelNonFirst)
/ Alpha
TopLabelNonFirst
= $((AlphaNum / Dash) TopLabelNonFirst)
/ AlphaNum
Dash = "-"
AlphaNum = Alpha / Num
Alpha = [a-zA-Z]
Num = [0-9] |
i am about to break my head trying to come up with a PEG grammar that would parse according the following BNF of RFC 2396
i got some serious help with
domainlabel
andtoplabel
in #487, so are not a problem (@gguerreiro, many thanks for that!)however
hostname
, it seems, cannot be expressed in PEG because just like in #487 the whole input is consumed by*(domainlabel ".")
which doesn't know when to stop sincetoplabel [ "." ]
is indistinguishable from itsimplified self-contained illustration:
would parse
t
,d.d.t
and fail ond.d.d
which is totally expected, but it fails to parset.
andd.d.t.
which both a valid casesThe text was updated successfully, but these errors were encountered: