Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indentation aware parsers #45

Closed
Marwes opened this issue Aug 21, 2015 · 8 comments
Closed

Indentation aware parsers #45

Marwes opened this issue Aug 21, 2015 · 8 comments

Comments

@Marwes
Copy link
Owner

Marwes commented Aug 21, 2015

It would be nice if there was an easy way to construct indentation aware parsers as in http://hackage.haskell.org/package/IndentParser-0.2.1 (Disclaimer: I haven't used this library seriously so I don't know how easy/well it works).

This would likely be in a separate crate such as https://github.com/Marwes/combine-language.

@tailhook
Copy link

I'm curious. For my needs combine works just fine with indentation aware syntax, by providing tokenizer which emits indent and unindent tokens. As far as I know python handles indentation at tokenizer level as well as most implementations of yaml.

So the question is how often you actually on raw bytes without any tokenizer? I have always thought it's required for any serious work. (I.e. for anything more complex than parsing 1+2/3). It's especially important because combine is too slow (to compile) and trying to use it on raw chars will probably make compilation times even slower.

@Marwes
Copy link
Owner Author

Marwes commented Aug 21, 2015

I guess this may just be me throwing the idea out there as I am thinking about moving to an indentation based syntax for https://github.com/Marwes/embed_lang. I am aware you can handle indentation in the lexer as I have done that way for Haskell https://github.com/Marwes/haskell-compiler/blob/master/src/lexer.rs#L384-L477. It is not really trivial though so it would be nice if there was a ready way to do it through a library.

I have only done parsers which work directly with char and while its not as efficient as using a separate lexer it does work and I find it makes the parser + lexer simple and easy to modify. If or when I actually need a separate lexer I think it should be easy to move over to that as well. (https://github.com/Marwes/embed_lang/blob/master/parser/src/lib.rs)

Its nice to see someone which has a dedicated lexer though, got any link to that? I am hoping that #37 will make it a bit easier to add a lexer, it would be nice to see and example of working one.

@hawkw
Copy link
Contributor

hawkw commented Aug 21, 2015

This would definitely be a nice feature to have. I keep meaning to add support for I-expressions to my Scheme parser, and built-in support for indentation-sensitive syntax would make that a lot less work.

@tailhook
Copy link

Its nice to see someone which has a dedicated lexer though, got any link to that?

https://github.com/tailhook/marafet/tree/master/marafet_parser/src

It's a little bit shitty, because I've tried to quickly port it to new features (in particular Positioner and Range) without getting real understanding of how they are supposed to work.

@Marwes
Copy link
Owner Author

Marwes commented Aug 22, 2015

@tailhook Nice, just open an issue if you have trouble understanding Positioner and Range I should probably add some better docs for those. Anyway, for Range you don't need to invent a dummy type, just use the same type you have for Item. Range is only meant for RangeStream to have a way of storing errors.

@Marwes Marwes closed this as completed Nov 20, 2018
@Marwes
Copy link
Owner Author

Marwes commented Nov 20, 2018

Out of scope.

@rtfeldman
Copy link

I came across this issue because I've been writing a parser with combine (and really enjoying it!) and was wondering about the best approach for making it indentation-sensitive.

I totally get that first-class support for this is out of scope, but I'm wondering if there's a recommended approach?

Thanks for a lovely library!

@Marwes
Copy link
Owner Author

Marwes commented May 4, 2019

@rtfeldman gluon is indentation sensitive but it is quite a mess, really https://github.com/gluon-lang/gluon/blob/master/parser/src/layout.rs .

The basic idea is that as you scan the input text you emit block open/block close tokens in between the normal, visible tokens. Then the parser is written to match on those tokens.

Other than that just google around I think, I don't have any good resources for it unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants