Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor regular_char and text_char to avoid exceptions #107

Closed
stasm opened this issue May 16, 2018 · 7 comments
Closed

Refactor regular_char and text_char to avoid exceptions #107

stasm opened this issue May 16, 2018 · 7 comments
Labels

Comments

@stasm
Copy link
Contributor

stasm commented May 16, 2018

In #103 (comment) @Pike writes:

I wonder if we should refactor regular_char and text_char alongside with word and text_cont to avoid the exceptions into an either. My guess is that many parser architectures will struggle to implement that as is, and we might be better off giving an example of how to implement it right?

Maybe with a full ordenary_char as regular_char without any jazz, and then just add our special chars explicitly where allowed?

Also, it might be easier to read if we added/removed character groups rather than single chars.

@stasm stasm added this to the Syntax 0.6 milestone May 16, 2018
@stasm stasm added the syntax label May 16, 2018
@stasm stasm added this to To do in Syntax 0.6 via automation May 23, 2018
@stasm
Copy link
Contributor Author

stasm commented May 23, 2018

@Pike Would you like to submit a PR for this?

@Pike
Copy link
Contributor

Pike commented May 23, 2018

I didn't plan to.

@stasm
Copy link
Contributor Author

stasm commented May 23, 2018

OK, thanks for letting me know. I'm trying to understand this request better. Can you explain why you think the following?

My guess is that many parser architectures will struggle to implement that as is.

@stasm stasm removed this from the Syntax 0.6 milestone May 23, 2018
@Pike
Copy link
Contributor

Pike commented May 23, 2018

Most parser architectures don't have access to and(not(exception), grammar_production), and need to resolve the lookahead.

So when trying to implement

let quoted_text_char =
    either(
        and(
            not(quote),
            text_char),
        sequence(
            backslash,
            quote).map(join));

you need to find out which parts of the either in

let text_char = defer(() =>
    either(
        inline_space,
        regex(/\\u[0-9a-fA-F]{4}/),
        sequence(
            backslash,
            backslash).map(join),
        sequence(
            backslash,
            char("{")).map(join),
        and(
            not(backslash),
            not(char("{")),
            regular_char)));

you need to take the not(quote) to. Each downstream parser would need to find out that that's

let text_char = defer(() =>
    either(
        inline_space,
        regex(/\\u[0-9a-fA-F]{4}/),
        sequence(
            backslash,
            backslash).map(join),
        sequence(
            backslash,
            char("{")).map(join),
        and(
            not(backslash),
            not(char("{")),
            not(quote),
            regular_char)));

If regular_char was just the mainstream characters, and we only added to it, I expect that many toolchains would have an easier life. As that's closer to the lexer, or a lookahead less.

@stasm
Copy link
Contributor Author

stasm commented May 23, 2018

Do you have any concrete architectures in mind?

I guess I don't really follow why the parser would need to go into each of the either's alternates. I'd imagine it would rather do something like if is_text_char(c) && c !== "\"". In fact, that's what fluent-syntax currently does.

@stasm
Copy link
Contributor Author

stasm commented May 23, 2018

Since this doesn't affect the syntax nor the AST, would you be OK moving this out of Syntax 0.6 into FUTURE?

@stasm stasm added this to To do in Syntax 0.8 via automation May 25, 2018
@stasm stasm removed this from To do in Syntax 0.6 May 25, 2018
@Pike
Copy link
Contributor

Pike commented Sep 21, 2018

I've taken an additional look at this from the perspective of regex-based parsers, and for them, positive and negative look-aheads work great.

I think this is WONTFIX, and stas suggested the same in our Friday EOW Fluent call. Resolving as such.

@Pike Pike closed this as completed Sep 21, 2018
Syntax 0.8 automation moved this from To do to Done Sep 21, 2018
@Pike Pike removed this from Done in Syntax 0.8 Sep 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants