Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 2 discussion #230

Closed
wavebeem opened this issue Mar 10, 2018 · 19 comments
Closed

Version 2 discussion #230

wavebeem opened this issue Mar 10, 2018 · 19 comments

Comments

@wavebeem
Copy link
Collaborator

wavebeem commented Mar 10, 2018

The issue is for brainstorming what we can do in a 2.x version of Parsimmon to make it easy to port code from 1.x but clean up any rough edges that have evolved since the library's inception.

Ideas

Change our official supported environments

I'll be honest. I haven't tested any releases in IE7, and I don't plan on doing that ever either. Setting up browser testing is expensive and probably not something I'll end up doing for the project.

I think it would be best to state that we only routinely supported Node.js LTS releases, but that it should work in ES5 browsers.

Switch to TypeScript

TypeScript is really good at this point and I'm confident we could statically type probably all of Parsimmon, or at least enough of it that it's really helpful and maybe redesign some things that don't statically type well.

The current static types contributed by the DefinitelyTyped project are a bit out of date, and are going to be even more inaccurate once version 2 comes out.

Also this means we can use some modern features like classes and arrow functions but still compile down to ES5 for maximum support.

Rethink the documentation

The main problem with the current documentation is that it's referenced straight from the master branch in the repo, so people can see documentation for features that have not been released yet.

The other problem is that it's not as well organized as it could be since it's a simple markdown file. If we turned it into a web page we could make it much easier to read.

I think the best way to do it would be to use TypeDoc to generate a JSON file from code comments and write a small website to use that. This is the strategy I used with Ramda (except that I used JSDoc there not TypeDoc).

Breaking API changes

Generally just think about all the breaking change issues here and if they still make sense and if there's any other little tweaks like this.

Debugging

Parsimmon parsers are hard to debug. The callstack is filled with just Parsimmon._ functions the whole way usually, with no good way to tell where parser construction failed.

Tracking the flow of parsing is also difficult because no tools are provided.

See #193 and #143 for debugging ideas.

@bd82
Copy link
Contributor

bd82 commented Mar 10, 2018

I think it would be best to state that we only routinely supported Node.js LTS releases, but that it should work in ES5 browsers.

👍 TypeScript will help with that as you define a compilation target and your code won't compile
if you use APIs not available in ES5 (e.g. String.startsWith)

This is the strategy I used with Ramda (except that I used JSDoc there not TypeDoc).

Interesting, I use the TypeDoc generated webpage directly for API docs
But I would definitively prefer something like the Ramda webpage for APIs

For websites created from markdown files have a look at

To avoid the docs being "too new" you can only re-generate the website
only when performing an official release.

Setting up browser testing is expensive and probably not something I'll end up doing for the project.

SauceLabs has a limited free offering for open source projects.
and you get a pretty badge too:
Sauce Test Status

It is not trivial to setup but if you can get your tests bundled for running inside a browesr
and run them with karma it is do-able, particularly as you can base it on a working example:
https://github.com/SAP/chevrotain/blob/master/karma_sauce.conf.js

@MaxGabriel
Copy link

My only comment on typescript improvements is that current createLanguage does not generate types for its members. Standalone functions do have types though

@wavebeem
Copy link
Collaborator Author

Yeah, I'm not 100% sure if that's possible or not, but maybe since TS has a lot more features now. Currently the only type definitions are from the DefinitelyTyped project and have no direct link to this library and are very outdated :(

@leonardfactory
Copy link
Contributor

TypeScript is really good at this point and I'm confident we could statically type probably all of Parsimmon, or at least enough of it that it's really helpful and maybe redesign some things that don't statically type well.

Surely typescript is pretty well suited these days. Variadic generics & inferrables should adapt well to this library, even if I should dig deeper to check this. Having the parser in TS would be awesome.

The main problem with the current documentation is that it's referenced straight from the master branch in the repo, so people can see documentation for features that have not been released yet.

I've been playing with docusaurus lately, and it seems cool to me. Multi-version support is available, generate pages by markdown, is used everywhere at fb and configuration effort should be minimal.

Parsimmon parsers are hard to debug. The callstack is filled with just Parsimmon._ functions the whole way usually, with no good way to tell where parser construction failed.
Tracking the flow of parsing is also difficult because no tools are provided.

This is absolutely an important point. Not sure about how to implement it anyway, but we should think about giving nice stack info, showing the parsing tree (correct indentation, etc.), deducing useful names, etc. I'll think about it, hoping to give back something

@minamorl
Copy link
Contributor

Nice idea. It will be breaking change. Are you thinking to provide Parsimmon2 as different repository? I think it'd be better than developping in this repository because @types is already existing so it may cause confusion for the users.

@wavebeem
Copy link
Collaborator Author

wavebeem commented Mar 18, 2019 via email

@sveyret
Copy link

sveyret commented May 11, 2020

Oh, will that cause a conflict? I hadn't thought of that.

No, it won't. The DefinitlyTyped definition is for v1, and if v2 doesn't need it, then there will be no need to add @types. If v1 becomes deprecated, the DefinitlyTyped definition will simply be removed. This is how it is done on other projects either fully converting to TypeScript or adding their own TypeScript definition files.

The confusion may be if users are currently using v1 with @types and want to migrate to v2 and may forget to remove @types. But the semver process says that when updating major version, we should check for breaking changes…

@wavebeem
Copy link
Collaborator Author

I've spent some time over the last month working on a new library called bread-n-butter (bnb), a spiritual successor to parsimmon, that addresses some of these concerns:

👉 https://wavebeem-bread-n-butter.netlify.app

I've written a point-by-point summary of how Parsimmon v2 thoughts influenced the design of bnb below.

My question to anyone in this thread with the time to look at stuff is, would something like bnb be of interest to you? Parsimmon has a lot of little functions that don't seem that useful in it. Sort of like how Lodash has tripled in size since it was forked from Underscore... but most people are still using the same few functions from it.

A quick note on the life of Parsimmon

This is not me announcing that Parsimmon is dead or on life support, but I have felt a little trapped by the current state of Parsimmon. bnb has allowed me to work on parser combinators with a fresh perspective.

I want to test the waters with what people want to see out of v2. Parsimmon seems to be the only major player in the parser combinators space in JS, so I don't want to just release a version 2 that makes everyone mad.

Change our official supported environments

bnb requires ES2015 suppport.

Switch to TypeScript

No more out of date types, no overhead on library users to contribute back to DefinitelyTyped. This was also really helpful while developing the library as well.

Rethink the documentation

I tried using TypeDoc, but the resulting website was just not as well organized as I would like. So while I'm still providing brief TSDoc comments so IDE users can get descriptions populated in their intellisense, most documentation lives in markdown files and is rendered by 11ty.

There's still the problem that the master branch is what's shown on the website, so it's possible to see docs for a feature before it exists in npm.

Breaking API changes

I shrunk the API drastically. parsimmon supports a lot of niche features (fantasy land, binary parsing, built-in parsers that people normally replace, character based parsers like oneOf or noneOf or test instead of just using regexp).

I could probably stand to add a few more convenience methods in, but overall it feels fairly comfortable to work in already, and is a much smaller library.

Custom parsers are even easier to write now, in the rare occasion you should need to.

bnb has 8 value exports, 9 type exports, and 16 methods or properties on the Parser class.

parsimmon has over 50 value exports, 36 parser methods (and several odd aliases for fantasy land compat)

Debugging

I still haven't dug too deeply into this. One thing holding parsimmon back was IE7 support meant we couldn't even rely on console.log existing (though I guess we could've done feature tests for this).

For now, I think .map((x) => { console.log("...", x); return x; }) is acceptable for debugging, but I might add in .tap to simplify that.

I'll be honest that I still haven't spent the time to look into the Scala parser combinators debugging support to figure out what kind of stuff we could add.

@hillin
Copy link

hillin commented Dec 5, 2021

Some thoughts based on my experience:

Non-discarding API

When using Parsimmon to implement a tokenizer, we often want to keep all the tokens, even for the trivial ones (i.e. whitespaces and separators), in order to create a high-fidelity AST which can be mapped back to the original source. This renders some of the functions less useful, e.g. P.sepBy which discards the separators. If these APIs are to be kept in v2, I 'd suggest to redesign them as non-discarding, or provide a non-discarding version of them.

Improved indexing and range marking

This is already extensively discussed in #331.
Furthermore, in my experience we either want to mark the range of everything, or none of them. So it would be great if we can have a, say, tokenization mode which always yields the range of parse results automatically.

API trimming

Some APIs are just auxiliary wrappers of Parsimmon.regexp, I feel they can be removed (or provided in a separate utility package). The core API should only provide things that are externally irreplaceable or expensive-to-implement. For example, Parsimmon has digit (/\d/) and digits (/\d*/), but none of them are really useful to us because we find it much more often to parse /\d+/ instead. If we are going to improve this by adding yet another auxiliary API (maybe rename the original digits to optDigits and call this one digits)`, I'd prefer to remove all of them from the lib.
Edit: I see these APIs are already removed in bread-n-butter 👍.

Debugging

A parser combinator is a state machine, so I think the most important part of debugging is to track the state transitions. This can be done by using immutable states and storing all of them (much like the flux/redux way) for further analysis, by reading as log entries or via additional tooling.

@hillin
Copy link

hillin commented Dec 16, 2021

More on debugging:

We are working on a complex language with a heavily modified bnb, and explored debugging with ReduxDevTools as suggested above. Here is the outcome:

const dateParser = bnb.all(
  bnb
    .choice(
      bnb.text('Jan'),
      bnb.text('Feb'),
      bnb.text('Mar'),
      bnb.text('Apr'),
      bnb.text('May'),
      bnb.text('Jun'),
      bnb.text('Jul'),
      bnb.text('Aug'),
      bnb.text('Sep'),
      bnb.text('Oct'),
      bnb.text('Nov'),
      bnb.text('Dec')
    )
    .debug('month'),
  bnb.match(/\s+/),
  bnb.match(/\d{1,2}/).debug('day'),
  bnb.text(','),
  bnb.match(/\s*/),
  bnb.match(/\d{4}/).debug('year')
);

dateParser.parse('Aug 27, 1983', true);

The debug() method was added as a custom parser, which calls ReduxDevTools to log states (detailed in this blog post).
Here is a screenshot of the ReduxDevTools panel in chrome:

image

For us this is more than sufficient to debug our parser.
I'm sorry as the heavily modified bnb is currently deeply coupled with our project so we can not share it for now, but I hope the concept is simple enough to grab.

@anko
Copy link
Contributor

anko commented Dec 17, 2021

More related experimental thoughts on debugging, which I'm not sure if I wrote down last time it came up:

I wrote a p.debug combinator for partser which prints ANSI-coloured terminal output, with what I guess is a similar structure to @hillin's RDT thing, but more compact?

const parser = p.times(
  p.alt([p.string('ba'), p.string('na')]),
  0, 3)
const parserWithDebug = p.debug(parser)
const result = parserWithDebug('banana')
console.log(result)

terminal output

Can wrap it around any parser, and it prints everything that happens in that "sub-tree". The ? lines are printed when a parser is entered, and others when a parser is exited. The short window on the very left shows the current index in grey when entering a parser, and the consumed or rejected characters in green or red when a parser exits, with a configurable number of input stream context on either side.

Of course this only works in a console, and it's extremely noisy if used at top-level on a deeply nested real-world parser, but this is the best I could come up with for use in non-interactive console context. I think it's clearer than JSON logs, but I'm not sure by how much. 😅 It's effectively a continuous call stack visualisation, but filtered to just parser calls, and with corresponding context on parse progress.

The main commit for the implementation was anko/partser@f2cdec4. In short, I added an optional debug handler function to parser calls, which is called before and after parse logic; https://github.com/anko/partser/blob/870216a31c7f0943db8806b79c99b13d89e15c48/index.js#L76-L81. Parsers pass that handler function around when they call each other. All the fancy coloured indented terminal stuff is just the default debug handler function, which users can override with their own function (e.g. https://github.com/anko/partser/blob/870216a31c7f0943db8806b79c99b13d89e15c48/test.js#L856-L873) to handle the debug information in a different format.

It's nifty, but I'm not sure if it's actually a good idea.

  • Do big pretty-printing routines really belong in a parser library, or should it be restricted to just the un-opinionated "pass a debug handler function" API?
  • Could we instead be pre-empting further "pass data around between parsers" needs by just adding an optional user-data object to .parse calls (which e.g. P.map and P.chain parsers could access), and implement debugging through that?

@fresheneesz
Copy link

fresheneesz commented Feb 10, 2022

So I caught the parser bug and wrote my own library as well in the past couple weeks I called parsinator.js. It supports state, which was the main thing I was missing from parsimmon. I also built in similar debug trace console output ability to what you did @anko .

fail-debug

In parsinator.js, you can set it into debug mode, which causes the parsers to build a debug record which can then be displayed by other code that's not in the core library. Anko, it seems like it should be pretty easy to split out the debug display stuff from your main library so people can decide whether or not to include it. I do like how you have the handle debug method called before and after parsing. That seems like a better approach than I used in my library, which is slightly complicated even if it isn't that much code.

I'm curious about your right hand display of the text. What happens when the text is longer than a single word? Like an entire file? I guess you'd just sweep through with a constant sized window? Its a pretty cool way to display that. I also like that you have opening and closing lines for your combinators. It makes it more intuitive to read than my library's output.

@halfzebra I used your idea from #143 for the result display. And @rstacruz might be interested in the trace output from my l library and Anko's library.

One open question I have is how to handle state. Currently the state is stored with the rest of the context and so propagates in the same way as the current index - ie successful parsers pass any mutated state to all future parsers (while failed parsers don't). But I'm wondering if it should only propagate state to sub parsers and siblings, and not be propagated up the tree. Wondering if you have any thoughts on that @wavebeem or anyone.

@wavebeem
Copy link
Collaborator Author

I haven't really spent much of any time thinking about this, to be honest (this issue has been open since 2018 lol). On the few occasions I needed state, I used parameterized parsers instead.

@anko
Copy link
Contributor

anko commented Feb 12, 2022

@fresheneesz

I'm curious about your right hand display of the text. What happens when the text is longer than a single word? Like an entire file? I guess you'd just sweep through with a constant sized window?

Yep, that's correct. I wrote slice-with-context for that. The context window's size and left-right bias are configurable. The readme has examples.


And then lots of thoughts on state:

One open question I have is how to handle state. Currently the state is stored with the rest of the context and so propagates in the same way as the current index - ie successful parsers pass any mutated state to all future parsers (while failed parsers don't).

I've implemented exactly this in partser: The parse function takes both a string and an optional environment value, which is recursively passed to all sub-parsers, and to user functions for P.map, P.chain, P.from, etc.

But I'm wondering if it should only propagate state to sub parsers and siblings, and not be propagated up the tree. Wondering if you have any thoughts on that

You've clearly run into the same problems that I have, which is encouraging! 😄

I've found that in both cases (sub-, and sibling parsers) both options (propagate, and don't-propagate) need to be supported, because there are use-cases for each. Sometimes, a sub-parser represents a separate inner scope that needs a sub-environment, but sometimes it's just structurally necessary because you need to parse multiple pieces of a single thing. And sometimes, a sequence of parsers represents multiple things sharing the same environment which changes need to chain through the sequence of them, but sometimes again it's just structurally necessary because you need to parse a list of things that have no relation to each other.

In partser, I solved this such that by default the environment (or "state", terminology, same thing) object is passed as-is into sub-parsers, so every change to it appears everywhere, and sibling parsers are independent. But I added mechanisms to change those when necessary:

  • If you want to create an inner scope so modifications don't propagate up the tree, you can use P.subEnv, which lets a user function decide how the new sub-environment should be constructed from the current one.
  • If you want "sibling" parsers inside sequence combinators (P.seq and P.times) to pass their modified sub-environment from one to the next as those parsers succeed, you can pass an optional chainEnv function with which you can choose whether and how that previous environment is passed to the next.

This has worked well for me.

@fresheneesz
Copy link

I've found that in both cases (sub-, and sibling parsers) both options (propagate, and don't-propagate) need to be supported

That's an interesting point that the behavior might need to really depend on the specific situation. Do you have examples of cases where you've needed to use one or the other pattern? It would be interesting to see some places where you've used partser.

Wavebeem is right that anything you might use an environment or state for can be done by passing arguments, tho it would be cumbersome to pass the state to most every parser in your system. But I was thinking that perhaps the return value of parsers can be used in place of using state that can propagate "down" your parser call tree.

I'll have to think it over more. Thanks for your thoughts!

@ghost
Copy link

ghost commented Jul 11, 2022

Edited for disambiguation :)

Hello everyone. I have an update on the typescript docs, and a few drawbacks with using typescript.

I know that typescript has been praised for its type safety, but I (along with other typescript devs) notice that typescript falls apart in two ways: typescript requires devs to perform declaration merging when extending existing APIs; declaration merging can potentially cause type/module declarations to clash, similar to conflicting variables (e.g. which type declaration takes priority?). Another (potential) issue is P.createLanguage(); the issue is that you are trying to have two or more generic types with circular dependencies. I have created types that use circular dependencies, but not generic types [that use circular dependencies].

Now on to the updates. Typescript has new ways to build types; the docs can be found here. Below is a small example:

type Parsers = {
    [parser: string]: (r: Lang) => Parser<Any>
}

type Lang = {
    [ParserName in keyof Parsers]: ReturnType<Parser[ParserName]>
}

export function createLanguage(parsers: Parsers): Lang {
    //...
}

I hope this helps! Thank you all for making an incredible parser library!

@ghost
Copy link

ghost commented Jul 11, 2022

API trimming

Some APIs are just auxiliary wrappers of Parsimmon.regexp, I feel they can be removed (or provided in a separate utility package). The core API should only provide things that are externally irreplaceable or expensive-to-implement. For example, Parsimmon has digit (/\d/) and digits (/\d*/), but none of them are really useful to us because we find it much more often to parse /\d+/ instead. If we are going to improve this by adding yet another auxiliary API (maybe rename the original digits to optDigits and call this one digits)`, I'd prefer to remove all of them from the lib. Edit: I see these APIs are already removed in bread-n-butter 👍.

@hillin,

The only thing with your idea on separating APIs is that it may lead to the extensibility issue with typescript.

@ghost
Copy link

ghost commented Sep 13, 2022

Excuse me for barging in with my last two comments. I have a few suggestions. First, I would suggest switching to ESM or TS because CommonJS is, for the lack of a better word, slowly dying. My second suggestion answers @wavebeem's question:

My question to anyone in this thread with the time to look at stuff is, would something like bnb be of interest to you? Parsimmon has a lot of little functions that don't seem that useful in it. Sort of like how Lodash has tripled in size since it was forked from Underscore... but most people are still using the same few functions from it.

After running into cycle.js and, more importantly, xstream, I saw how xstream's streams have two different APIs: the first one is just like Parsimmon (stream.map(x => x + 5)), and the second one is a bit like RxJS' observable.pipe(), but in xstream, that method is called compose(), and it is only used for accessory operations. The key takeaway is that Parsimmon could use something similar to parser methods for core functionality and use something like parser.pipe() method for non-essential operators. That is how the library can be easy to use while improving extensibility.

Finally, I would like to thank the team for designing such an amazing library. I've been using Parsimmon on my biggest project, ExtVM. Part of the reason I mentioned phasing out CJS is because I've been thinking about switching ExtVM to ESM before CJS dies, but that's not an option right now because I don't know if Parsimmon v1 is compatible with ESM.

@wavebeem
Copy link
Collaborator Author

Closing this as it's obviously not happening 5 years later and without a maintainer. Whoever is in charge of this next can figure out what they want to do with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants