Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Syntax 0.7 #287

Merged
merged 17 commits into from
Oct 23, 2018
Merged

Implement Syntax 0.7 #287

merged 17 commits into from
Oct 23, 2018

Conversation

stasm
Copy link
Contributor

@stasm stasm commented Sep 26, 2018

No description provided.

@stasm stasm changed the title Implement Syntax 0.7 in fluent-syntax Implement Syntax 0.7 Oct 12, 2018
unclenachoduh and others added 3 commits October 12, 2018 15:02
* Indentation/Whitespace in 0.7

* Apply feedback

* Apply 2nd round of feedback
This is a re-write of the runtime parser. It supports Fluent Syntax 0.7, runs
against the reference fixtures, has half the lines of code, and is as fast in
SpiderMonkey as the old one (and slightly faster in V8).


        Goals

  1. Support 100% of Fluent Syntax 0.7. This includes the indentation
     relaxation, dropping tabs and CR as syntax whitespace, normalizing new
     lines to LF, and only allowing numbers and identifiers as variant keys.

  2. Maintain good performance. The parser is used in performance-critical code
     paths. Back in the days of Firefox OS it had to be both fast _and_ produce
     tightly packed results so that translations don't take up too much space
     on the device. I think the storage requirements can be relaxed these days.

  3. Write code which will be easy to maintain in the future. The parser was
     first written even before Fluent branched off from L20n. It's seen many
     changes and additions over the last two years. As new features accrued it
     became hard to maintain it and also to keep track of all known bugs. My
     goal for the re-write was not only to clean it up but also to define the
     conformance story for the future and to improve the testing
     infrastructure.


        Design

The parser focuses on minimizing the number of false negatives at the expense
of increasing the risk of false positives. In other words, it aims at parsing
_valid_ Fluent messages with a success rate of 100%, but it may also parse some
invalid messages which the reference parser would reject. The parser doesn't
perform any validation and may produce entries  which wouldn't make sense in
the real world. For best results users are advised to validate translations
with the fluent-syntax parser pre-runtime.

The main parser loop iterates over the beginnings of messages and terms. This
is to efficiently skip over comments (which have no use on runtime), and to
recover from errors. When a fatal error is encountered, the parser instantly
bails out of the currently-parsed message and moves on to the next one. Errors
are discarded and are not visible to the users of `FluentResource`. The do
carry a minimal description of what went wrong which may be useful when reading
the code and for debugging, though.

The parser makes an extensive use of sticky regexes which can be anchored to
any offset of the source string without slicing it. In some places, it's easier
to just check the character currently at the cursor, so it does a fair share of
that, too.


        Conformance

My original plan was to base the parser on the EBNF and only parse well-formed
syntax. In this PR, I went for something a bit wider than that: a superset of
well-formed syntax. The main deviation from the EBNF is related to parsing
`VariantExpressions` and `CallExpressions`. The EBNF verifies that the they are
called on `Terms` and `Functions` respectively. The optimistic parser doesn't
differentiate between `Messages`, `Terms` and `Functions`. I decided to
implement it this way because this code might soon change anyways (see
projectfluent/fluent#176).

Another deviation is that the parser treats commas in argument lists  as
whitespace, similar to how Clojure treats them in sequence lists. I might
suggest we upstream this in the spec, too, because it makes the implementation
of args lists _much_ simpler.

I based this PR on top of the `zeroseven` branch. The `fluent-syntax` parser
already supports Syntax 0.7 and passes the [reference
fixtures](https://github.com/projectfluent/fluent/tree/master/test/fixtures).
This made it possible to also turn on the reference testing in the runtime
parser, too. `make fixtures` creates the parsed results for all reference
fixtures; for now they must be verified manually before they're committed.
`make test` can be used in development to assert that the output of the runtime
parser still matches the committed one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants