Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce typed output from the parser #11

Closed
StoneCypher opened this issue Jan 22, 2018 · 27 comments
Closed

Produce typed output from the parser #11

StoneCypher opened this issue Jan 22, 2018 · 27 comments

Comments

@StoneCypher
Copy link
Contributor

StoneCypher commented Jan 22, 2018

It would be nice if my typescript parser had typescript annotations on what it produced.

Consider the following toy parser example.

Dumb bad code

Doc
  = str
  / num;
  
str
  = 'bob';
  
num
  = '2' { return 2; }
  / '3' { return 3; };

This allows three legal documents. It would be useful to be able to tell the plugin "announce that the result of this parse will be number | string."

Obviously in this case this is silly, but for something that returns highly typed content, that would actually be pretty powerful.

@pjmolina
Copy link
Contributor

The TS compiler already does it by default.
It is requires to statically analyze the functions to anticipate the possible types of output.
As pegjs was created with JS support in mind, it provides no way to decorate output types.
From ts-pegjs I do not see an easy way to incorporate such type info on top of pegjs specs in a way we can use it for constraining the output for the parsing functions.

@StoneCypher
Copy link
Contributor Author

This is actually really easy to do. Please don't close this

It's literally just a string injection

@pjmolina
Copy link
Contributor

Can you provide a sample to illustrate your idea?

@StoneCypher
Copy link
Contributor Author

a minimal garbage implementation would accept ' : number | string ' and then put that after the close parenthesis in the output for parse(foo) /* insert me here */ { ... }

@pjmolina
Copy link
Contributor

I see your point and I, like you would love to have a typed grammar parser generator.
On the other hand, pegjs targets JavaScript and not TypeScript (so types makes no sense, so far).
ts-pegjs is just a plugin over pegjs and cannot change the base DSL as defined on pegjs, it only can use the base API for plugins to consume the productions rules. As far as I known, we cannot read the comments in case you are looking for encoding the type info in comments.

Moreover, I am not big fan to encode semantics in comments (did it several times before and paid the penality of technical debt).

I think it makes more sense to create something like pegts than forcing the current pegjs design to accomodate TypeScript types or escalate this feature request for consideration in pegjs (could be useful also for a plugin targeting flow, for example).

@StoneCypher
Copy link
Contributor Author

it's really unfortunate that you're declining an easy immediate fix in hope of figuring out whether to create a whole new tool ☹️

@StoneCypher
Copy link
Contributor Author

i'm confused why a plugin for peg "to generate typescript parsers" answers to the request "please give this basic typescript functionality" is "this targets javascript, not typescript"

it's an easy one-off regular expression targeting the first closed parenthesis

@StoneCypher
Copy link
Contributor Author

if i produce a patch that does the job without causing a bad software situation, will it be considered?

@pjmolina
Copy link
Contributor

Maybe I was not clear enough on my comments:

  1. AFAIK there is no API for plugins to read comments from pegjs productions.
  2. If you see a way to fix it and implement it in a clear way, feel free to contribute with a Pull Request. Sure, I will consider it.

@pjmolina pjmolina reopened this May 25, 2018
@StoneCypher
Copy link
Contributor Author

i don't see any need for that type to come from the markup (in fact i see a lot of reasons that it shouldn't)

i think it should come from an argument to the compile call

that in turn suggests it can just be dropped into the tspegjs options object as a voluntary

@StoneCypher
Copy link
Contributor Author

would you consider an interface like this acceptable?

var parser = pegjs.generate("pos_integer = sint:[0-9]+ { return Number(sint); }", {
    plugins: [tspegjs]
    "tspegjs": {
        "noTslint": false,
        "tsReturnType": "number",
        "tslintIgnores": "rule1,rule2",
        "customHeader": "// import lib\nimport { Lib } from 'mylib';"
    }
});

@StoneCypher
Copy link
Contributor Author

the germane bit being the new "tsReturnType": "number" line as the second member of that options argument

@pjmolina
Copy link
Contributor

pjmolina commented May 25, 2018

Yes, options would be a suitable workaround solution if we cannot augment the base DSL.
But consider, we will need to pass it as a table/object with the production rule as key. Something like:

var parser = pegjs.generate("pos_integer = sint:[0-9]+ { return Number(sint); } ... ", {
    plugins: [tspegjs]
    "tspegjs": {
        "noTslint": false,
        "returnTypes": {
             "pos_integer": "number",
             "rule2": "string",
             "rule3": ... 
        },
       "tslintIgnores": "rule1,rule2",
        "customHeader": "// import lib\nimport { Lib } from 'mylib';"
    }
});

@StoneCypher
Copy link
Contributor Author

individual rules don't need to be typed, as the only thing that's ever exposed to the outside is parse/1

there's no circumstance under which they would ever be checked

@pjmolina
Copy link
Contributor

Sorry to disagree. Type checking if introduced, is useful for all productions: to find errors as soon as possible and not only in the top level rule.

Having a unique entry point is a valid use case. But consider, there are more use cases: several entrypoint rules for parsing.
I use This second case a lot.

If you only want to type the top level rule, consider wrapping the parse function inside another and you are done.

@StoneCypher
Copy link
Contributor Author

If you only want to type the top level rule, consider wrapping the parse function inside another and you are done.

So, wrapping it in a typecasting function is what I recommended in my other ticket about this, and what everyone does to cope, right now. It's very obvious.

It's not the same thing, though. And it's not as good. It's not an adequate replacement. You can't use the module production; you have to wrap or edit the module. You can't call parse; you have to either wrap every parser, or make sure everything has the same wrapper even if unnecessary, or you have to monkey-patch the module.

None of those are acceptable under definitely typed.

At this time, I can't register ts-pegjs stuff in definitelytyped. This change would allow me to.

A wrapper typecast isn't the same thing, and isn't as good.

.

Having a unique entry point is a valid use case. But consider, there are more use cases: several entrypoint rules for parsing.

I didn't know this was possible.

How does one do this?

I can easily support both the single and multiple entrypoints simultaneously. I just didn't realize this was a practical use case.

@StoneCypher
Copy link
Contributor Author

But consider, there are more use cases: several entrypoint rules for parsing.

I've wanted this for a very long time, and thought PEG didn't do this. This has the potential to radically change how I use the library

@pjmolina
Copy link
Contributor

You can enable multiple entry point rules with the option --allowed-start-rules see at doc.

@StoneCypher
Copy link
Contributor Author

has that been there a long time?

at any rate, my new opinion is that the object member should be either a string or an object

if a string, apply it to parse/1; if an object, apply them to the key matched signature

would that fit your preferences?

@StoneCypher
Copy link
Contributor Author

wait, this just means there are more than one start rule

i thought you meant this exposed more signatures than parse/1

if it's still just parse/1, isn't the end signature just the logical union of all the output types of your root rule collection?

@pjmolina
Copy link
Contributor

has that been there a long time?

Yes: that option is there for a long time.

When you use that option, you use it with parse/2 parse(input: string, options: any): any
in the following form:

 const res = <number> parse(inputString, { startRule: 'Integer' });

You are right, the output type of parse/2 will be the union of all types of the allowed rule start collection. And the TS compiler infers the type when possible. If we go, one step further and type intermediate production rules we are helping the type checker and finding errors faster.

If we add the type info for all the needed rules, we can still generate typed wrappers per entry rule like:

parseInteger(inputString: string, options: any): number {
    options = options || {}; 
    options.startRule = 'Integer';
    return <number> parse(inputString, options);
}
parseBool(inputString: string, options: any): boolean {
    options = options || {}; 
    options.startRule = 'Bool';
    return <boolean> parse(inputString, options);
}

I usually do the following: I use to generate Abstract Syntact Trees (AST), and everything returned by rules inherits from a base class: NodeAST making much more easy to predict and cast output type.

@StoneCypher
Copy link
Contributor Author

oh wow, so you can explicitly specify which rule it is, and it's just not by the signature

lol holy crap i've been re-hacking that on from the outside in a far less good way

i'm so glad you told me about that 😅

@StoneCypher
Copy link
Contributor Author

StoneCypher commented May 26, 2018

this would be more powerful if it actually exposed new methods rather than just decorating the call inbound, as i don't think typescript knows about conflicting return types

but also it's equally powerful to the thing i was asking for, so i'm not complaining

fundamentally, if given { foo: 'number', bar: 'string' }, i'm still always going to emit "number | string", right? because even though we're telling it what to call for, the argument is always a single string, so inbound is not enforcement worthy, and outbound doesn't differentiate in any way TS understands?

it's still worth writing that way to skip human maintenance of the union; just making sure we're on the same page about this

also i'm going to sleep after posting this; it's 2:30am here

@pjmolina
Copy link
Contributor

OK. Take a rest. Tomorrow morning take look to PR #16
I have a first minimal implementation to optionally type the production rules.

@pjmolina
Copy link
Contributor

Yes, your conclusions seems ok for me too:

  • If we use the generic parse/2 the return type will be a union of all allowed entrypoints. And no need to generate it, the TS compiler should discover it if it is not much complex.
  • If we generate specific wrappers per entrypoint rule, we can constraint the type to a concrete one and avoid the union.

@StoneCypher
Copy link
Contributor Author

nice

this is better than the implementation i had in mind

the only thing i would change - and it's fine without the change too - would be to accept either a string or an object, and if it's a string, to use the string as the return for parse

but that's no big deal and i can definitely just use {parse: 'Number'} instead, or whatever 😁

tyvm for sticking with this and implementing this :)

@pjmolina
Copy link
Contributor

the only thing i would change - and it's fine without the change too - would be to accept either a string or an object, and if it's a string, to use the string as the return for parse

Sorry: cannot change that. Options object is already an extension object in pegjs and ts-pegjs.
I will merge the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants