Refactor regular_char and text_char to avoid exceptions #107

stasm · 2018-05-16T14:09:18Z

I wonder if we should refactor regular_char and text_char alongside with word and text_cont to avoid the exceptions into an either. My guess is that many parser architectures will struggle to implement that as is, and we might be better off giving an example of how to implement it right?

Maybe with a full ordenary_char as regular_char without any jazz, and then just add our special chars explicitly where allowed?

Also, it might be easier to read if we added/removed character groups rather than single chars.

The text was updated successfully, but these errors were encountered:

stasm · 2018-05-23T11:09:08Z

@Pike Would you like to submit a PR for this?

Pike · 2018-05-23T11:24:08Z

I didn't plan to.

stasm · 2018-05-23T14:36:27Z

OK, thanks for letting me know. I'm trying to understand this request better. Can you explain why you think the following?

My guess is that many parser architectures will struggle to implement that as is.

Pike · 2018-05-23T17:36:15Z

Most parser architectures don't have access to and(not(exception), grammar_production), and need to resolve the lookahead.

So when trying to implement

let quoted_text_char =
    either(
        and(
            not(quote),
            text_char),
        sequence(
            backslash,
            quote).map(join));

you need to find out which parts of the either in

let text_char = defer(() =>
    either(
        inline_space,
        regex(/\\u[0-9a-fA-F]{4}/),
        sequence(
            backslash,
            backslash).map(join),
        sequence(
            backslash,
            char("{")).map(join),
        and(
            not(backslash),
            not(char("{")),
            regular_char)));

you need to take the not(quote) to. Each downstream parser would need to find out that that's

let text_char = defer(() =>
    either(
        inline_space,
        regex(/\\u[0-9a-fA-F]{4}/),
        sequence(
            backslash,
            backslash).map(join),
        sequence(
            backslash,
            char("{")).map(join),
        and(
            not(backslash),
            not(char("{")),
            not(quote),
            regular_char)));

If regular_char was just the mainstream characters, and we only added to it, I expect that many toolchains would have an easier life. As that's closer to the lexer, or a lookahead less.

stasm · 2018-05-23T19:28:38Z

Do you have any concrete architectures in mind?

I guess I don't really follow why the parser would need to go into each of the either's alternates. I'd imagine it would rather do something like if is_text_char(c) && c !== "\"". In fact, that's what fluent-syntax currently does.

stasm · 2018-05-23T19:29:09Z

Since this doesn't affect the syntax nor the AST, would you be OK moving this out of Syntax 0.6 into FUTURE?

Pike · 2018-09-21T20:04:57Z

I've taken an additional look at this from the perspective of regex-based parsers, and for them, positive and negative look-aheads work great.

I think this is WONTFIX, and stas suggested the same in our Friday EOW Fluent call. Resolving as such.

stasm mentioned this issue May 16, 2018

Implement the reference Fluent Syntax parser #103

Merged

stasm added this to the Syntax 0.6 milestone May 16, 2018

stasm added the syntax label May 16, 2018

stasm added this to To do in Syntax 0.6 via automation May 23, 2018

stasm removed this from the Syntax 0.6 milestone May 23, 2018

stasm added this to To do in Syntax 0.8 via automation May 25, 2018

stasm removed this from To do in Syntax 0.6 May 25, 2018

Pike closed this as completed Sep 21, 2018

Syntax 0.8 automation moved this from To do to Done Sep 21, 2018

Pike removed this from Done in Syntax 0.8 Sep 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor regular_char and text_char to avoid exceptions #107

Refactor regular_char and text_char to avoid exceptions #107

stasm commented May 16, 2018

stasm commented May 23, 2018

Pike commented May 23, 2018

stasm commented May 23, 2018

Pike commented May 23, 2018

stasm commented May 23, 2018

stasm commented May 23, 2018

Pike commented Sep 21, 2018

Refactor regular_char and text_char to avoid exceptions #107

Refactor regular_char and text_char to avoid exceptions #107

Comments

stasm commented May 16, 2018

stasm commented May 23, 2018

Pike commented May 23, 2018

stasm commented May 23, 2018

Pike commented May 23, 2018

stasm commented May 23, 2018

stasm commented May 23, 2018

Pike commented Sep 21, 2018