Unclear when to merge adjacet Junks #296

alerque · 2019-09-26T07:13:26Z

The EBNF defines Junk as:

/* Junk represents unparsed content.
 *
 * Junk is parsed line-by-line until a line is found which looks like it might
 * be a beginning of a new message, term, or a comment. Any whitespace
 * following a broken Entry is also considered part of Junk.
 */
Junk                ::= junk_line (junk_line - "#" - "-" - [a-zA-Z])*
junk_line           ::= /[^\n]*/ ("\u000A" | EOF)

This makes sense to me in the context of fixtures/comments.ftl:

# Errors
#error
##error
###error

This produces one Comment node and three Junk nodes because the sequential Junk lines start with a `#" with indicates a new Junk.

What I can't make sense of or seem to match is what the tooling parser does with sequential junk as in structure/unclosed.ftl:

err03 = {
FUNC(
arg
,
namedArg: "Value"
,
key04 = Value 04

If I'm reading the spec right this should produce 4 Junks because the lines starting with "F", "a", and "n" are potentially new Entries and would break the grammar definition of a Junk. The lines starting "," would both naturally be part of the previous Junks.

That's not what the Javascript implementation is actually doing though, it is combining all the Junks up until the start of key04:

"content": "err03 = {\nFUNC(\narg\n,\nnamedArg: \"Value\"\n,\n"

Is the tooling parser off-spec here? Or am I missing something?

There doesn't seem to be a reference fixture that would trigger this. I mocked up a quick one and tested with the reference parser and the result matches my understanding of the spec.

I'm trying to figure out how to implement this is in fluent-lua and am unsure whether to follow the letter of the law here and stick to the spec or take a cue from other tooling implementations and follow their suit.

If the latter, what is the actual rule? Merge all Junks excluding ones that start with #?

The text was updated successfully, but these errors were encountered:

SirNickolas · 2019-09-26T07:46:18Z

If the latter, what is the actual rule? Merge all Junks excluding ones that start with #?

The actual rule is “Try to parse a placeable, and if you find an error inside, then everything you managed to consume up until this point is a single Junk”. It’s surely not conformant. Nice catch.

I have another question. We should (assuming the spec is to stay as is) split a Junk into multiple ones, but should we give an additional Error on each of them for the sake of uniformity? The translator perhaps made a mistake somewhere in their Placeable; will one Error be enough? On the other hand, the only way to reproduce this is not to indent lines, and it’s probably very rare.

Pike · 2019-09-26T09:51:19Z

Yeah, this is a bug in the js (and probably python) impls. Also a bug here for not having exhaustive tests on that behavior.

We had an earlier version of this bug in projectfluent/fluent.js#248, but probably closed that too eagerly.

A bit of additional information that's not clear from the docs, the EBNF is a human readable doc, generated from the spec. It's not really the specification, nor is it exhaustive. The normative parts are the js files in syntax, notably grammar.js and abstract.js. Grammar being well-formed, grammar and abstract being valid Fluent syntax.

alerque · 2019-09-26T10:49:41Z

Thanks for the feedback @Pike. I've opened matching issues on the two implementations I confirmed did not work correctly. It shouldn't be to hard to add something to the reference fixtures to test this (especially since the reference parser works as expected). Would you like me to send a PR for that?

I did eventually figure out the EBNF was generated and secondary to the actual implementation, but you are right that is not at all clear from the available docs. As far as something to follow along with when writing a parser in another language I find it much more useful than the very idiomatic JavaScript. Perhaps that's because I was writing a PEG grammar which is so close to the EBNF it could almost (but not quite) have been auto-generated.

SirNickolas mentioned this issue Sep 26, 2019

Split Junks that occur because of an error in a Placeable SirNickolas/fluentd#5

Open

Pike added the syntax label Sep 26, 2019

This was referenced Sep 26, 2019

Adjacent Junks are incorrectly merged projectfluent/fluent.js#429

Open

Adjacent Junks are incorrectly merged projectfluent/python-fluent#128

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unclear when to merge adjacet Junks #296

Unclear when to merge adjacet Junks #296

alerque commented Sep 26, 2019

SirNickolas commented Sep 26, 2019

Pike commented Sep 26, 2019

alerque commented Sep 26, 2019

Unclear when to merge adjacet Junks #296

Unclear when to merge adjacet Junks #296

Comments

alerque commented Sep 26, 2019

SirNickolas commented Sep 26, 2019

Pike commented Sep 26, 2019

alerque commented Sep 26, 2019