Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear when to merge adjacet Junks #296

Open
alerque opened this issue Sep 26, 2019 · 3 comments
Open

Unclear when to merge adjacet Junks #296

alerque opened this issue Sep 26, 2019 · 3 comments
Labels

Comments

@alerque
Copy link
Contributor

alerque commented Sep 26, 2019

The EBNF defines Junk as:

/* Junk represents unparsed content.
 *
 * Junk is parsed line-by-line until a line is found which looks like it might
 * be a beginning of a new message, term, or a comment. Any whitespace
 * following a broken Entry is also considered part of Junk.
 */
Junk                ::= junk_line (junk_line - "#" - "-" - [a-zA-Z])*
junk_line           ::= /[^\n]*/ ("\u000A" | EOF)

This makes sense to me in the context of fixtures/comments.ftl:

# Errors
#error
##error
###error

This produces one Comment node and three Junk nodes because the sequential Junk lines start with a `#" with indicates a new Junk.

What I can't make sense of or seem to match is what the tooling parser does with sequential junk as in structure/unclosed.ftl:

err03 = {
FUNC(
arg
,
namedArg: "Value"
,
key04 = Value 04

If I'm reading the spec right this should produce 4 Junks because the lines starting with "F", "a", and "n" are potentially new Entries and would break the grammar definition of a Junk. The lines starting "," would both naturally be part of the previous Junks.

That's not what the Javascript implementation is actually doing though, it is combining all the Junks up until the start of key04:

"content": "err03 = {\nFUNC(\narg\n,\nnamedArg: \"Value\"\n,\n"

Is the tooling parser off-spec here? Or am I missing something?

There doesn't seem to be a reference fixture that would trigger this. I mocked up a quick one and tested with the reference parser and the result matches my understanding of the spec.

I'm trying to figure out how to implement this is in fluent-lua and am unsure whether to follow the letter of the law here and stick to the spec or take a cue from other tooling implementations and follow their suit.

If the latter, what is the actual rule? Merge all Junks excluding ones that start with #?

@SirNickolas
Copy link

If the latter, what is the actual rule? Merge all Junks excluding ones that start with #?

The actual rule is “Try to parse a placeable, and if you find an error inside, then everything you managed to consume up until this point is a single Junk”. It’s surely not conformant. Nice catch.

I have another question. We should (assuming the spec is to stay as is) split a Junk into multiple ones, but should we give an additional Error on each of them for the sake of uniformity? The translator perhaps made a mistake somewhere in their Placeable; will one Error be enough? On the other hand, the only way to reproduce this is not to indent lines, and it’s probably very rare.

@Pike Pike added the syntax label Sep 26, 2019
@Pike
Copy link
Contributor

Pike commented Sep 26, 2019

Yeah, this is a bug in the js (and probably python) impls. Also a bug here for not having exhaustive tests on that behavior.

We had an earlier version of this bug in projectfluent/fluent.js#248, but probably closed that too eagerly.

A bit of additional information that's not clear from the docs, the EBNF is a human readable doc, generated from the spec. It's not really the specification, nor is it exhaustive. The normative parts are the js files in syntax, notably grammar.js and abstract.js. Grammar being well-formed, grammar and abstract being valid Fluent syntax.

@alerque
Copy link
Contributor Author

alerque commented Sep 26, 2019

Thanks for the feedback @Pike. I've opened matching issues on the two implementations I confirmed did not work correctly. It shouldn't be to hard to add something to the reference fixtures to test this (especially since the reference parser works as expected). Would you like me to send a PR for that?

I did eventually figure out the EBNF was generated and secondary to the actual implementation, but you are right that is not at all clear from the available docs. As far as something to follow along with when writing a parser in another language I find it much more useful than the very idiomatic JavaScript. Perhaps that's because I was writing a PEG grammar which is so close to the EBNF it could almost (but not quite) have been auto-generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants