Indentation aware parsers #45

Marwes · 2015-08-21T09:04:35Z

It would be nice if there was an easy way to construct indentation aware parsers as in http://hackage.haskell.org/package/IndentParser-0.2.1 (Disclaimer: I haven't used this library seriously so I don't know how easy/well it works).

This would likely be in a separate crate such as https://github.com/Marwes/combine-language.

tailhook · 2015-08-21T12:08:10Z

I'm curious. For my needs combine works just fine with indentation aware syntax, by providing tokenizer which emits indent and unindent tokens. As far as I know python handles indentation at tokenizer level as well as most implementations of yaml.

So the question is how often you actually on raw bytes without any tokenizer? I have always thought it's required for any serious work. (I.e. for anything more complex than parsing 1+2/3). It's especially important because combine is too slow (to compile) and trying to use it on raw chars will probably make compilation times even slower.

Marwes · 2015-08-21T12:36:27Z

I guess this may just be me throwing the idea out there as I am thinking about moving to an indentation based syntax for https://github.com/Marwes/embed_lang. I am aware you can handle indentation in the lexer as I have done that way for Haskell https://github.com/Marwes/haskell-compiler/blob/master/src/lexer.rs#L384-L477. It is not really trivial though so it would be nice if there was a ready way to do it through a library.

I have only done parsers which work directly with char and while its not as efficient as using a separate lexer it does work and I find it makes the parser + lexer simple and easy to modify. If or when I actually need a separate lexer I think it should be easy to move over to that as well. (https://github.com/Marwes/embed_lang/blob/master/parser/src/lib.rs)

Its nice to see someone which has a dedicated lexer though, got any link to that? I am hoping that #37 will make it a bit easier to add a lexer, it would be nice to see and example of working one.

hawkw · 2015-08-21T14:03:00Z

This would definitely be a nice feature to have. I keep meaning to add support for I-expressions to my Scheme parser, and built-in support for indentation-sensitive syntax would make that a lot less work.

tailhook · 2015-08-21T16:22:59Z

Its nice to see someone which has a dedicated lexer though, got any link to that?

https://github.com/tailhook/marafet/tree/master/marafet_parser/src

It's a little bit shitty, because I've tried to quickly port it to new features (in particular Positioner and Range) without getting real understanding of how they are supposed to work.

Marwes · 2015-08-22T07:39:45Z

@tailhook Nice, just open an issue if you have trouble understanding Positioner and Range I should probably add some better docs for those. Anyway, for Range you don't need to invent a dummy type, just use the same type you have for Item. Range is only meant for RangeStream to have a way of storing errors.

Marwes · 2018-11-20T15:49:00Z

Out of scope.

rtfeldman · 2019-05-04T21:17:22Z

I came across this issue because I've been writing a parser with combine (and really enjoying it!) and was wondering about the best approach for making it indentation-sensitive.

I totally get that first-class support for this is out of scope, but I'm wondering if there's a recommended approach?

Thanks for a lovely library!

Marwes · 2019-05-04T21:34:25Z

@rtfeldman gluon is indentation sensitive but it is quite a mess, really https://github.com/gluon-lang/gluon/blob/master/parser/src/layout.rs .

The basic idea is that as you scan the input text you emit block open/block close tokens in between the normal, visible tokens. Then the parser is written to match on those tokens.

Other than that just google around I think, I don't have any good resources for it unfortunately.

Marwes added the enhancement label Aug 21, 2015

Marwes closed this as completed Nov 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indentation aware parsers #45

Indentation aware parsers #45

Marwes commented Aug 21, 2015

tailhook commented Aug 21, 2015

Marwes commented Aug 21, 2015

hawkw commented Aug 21, 2015

tailhook commented Aug 21, 2015

Marwes commented Aug 22, 2015

Marwes commented Nov 20, 2018

rtfeldman commented May 4, 2019

Marwes commented May 4, 2019

Indentation aware parsers #45

Indentation aware parsers #45

Comments

Marwes commented Aug 21, 2015

tailhook commented Aug 21, 2015

Marwes commented Aug 21, 2015

hawkw commented Aug 21, 2015

tailhook commented Aug 21, 2015

Marwes commented Aug 22, 2015

Marwes commented Nov 20, 2018

rtfeldman commented May 4, 2019

Marwes commented May 4, 2019