support off-side rule languages #5

CircleCode · 2013-06-25T13:43:19Z

It seems Hoa\Compiler cannot parse Off-side rule languages.

Maybe it could be sufficient to have the compiler adding automatically INDENT (respectively UNINDENT) tokens each time indent increase (respectively decrease) by 1.

The tricky part seems to be the matching between spaces, tab, and indent length…

The text was updated successfully, but these errors were encountered:

CircleCode · 2013-06-25T13:45:56Z

since lookahead is supported in tokens, maybe this can be done with some magic tokens… I'll investigate on it.

By the way, even if possible, it would mean one cannot skip \s, thus making parsing a little bit more tedious. So even if some cool tokens can do this, I suppose it would be great if this could be done by the compiler itself.

CircleCode · 2013-06-25T13:48:36Z

after thinking about it, since INDENT (or UNINDENT) is relative to previous line, it would require look behind assertions, which I suppose are not supported (because of tokens trimming the text from the left)

CircleCode · 2013-11-25T08:51:18Z

maybe there is something that can be used from this paper: http://michaeldadams.org/papers/layout_parsing/LayoutParsing.pdf

CircleCode · 2013-12-02T17:39:16Z

side note: here are the rules used by python's lexer to add INDENT and DEDENT tokens ( from http://docs.python.org/2/reference/lexical_analysis.html#indentation ):

First, tabs are replaced (from left to right) by one to eight spaces such that the total number of characters up to and including the replacement is a multiple of eight (this is intended to be the same rule as used by Unix). The total number of spaces preceding the first non-blank character then determines the line’s indentation. Indentation cannot be split over multiple physical lines using backslashes; the whitespace up to the first backslash determines the indentation.

The indentation levels of consecutive lines are used to generate INDENT and DEDENT tokens, using a stack, as follows.

Before the first line of the file is read, a single zero is pushed on the stack; this will never be popped off again. The numbers pushed on the stack will always be strictly increasing from bottom to top. At the beginning of each logical line, the line’s indentation level is compared to the top of the stack. If it is equal, nothing happens. If it is larger, it is pushed on the stack, and one INDENT token is generated. If it is smaller, it must be one of the numbers occurring on the stack; all numbers on the stack that are larger are popped off, and for each number popped off a DEDENT token is generated. At the end of the file, a DEDENT token is generated for each number remaining on the stack that is larger than zero.

it seems not too hard to implement, but the difficulty comes from the fact that this has to be mixed with user defined grammar

If I find some time, I'll try to play with this

Note: since we are parsing the stream as a single string (and not line by line), we have to include newline in our analysis, and take precedence over user defined tokens

Hywan · 2017-08-22T14:42:51Z

Closing because it's old :-).

CircleCode mentioned this issue Jun 25, 2013

attempt to allow look-behind assertions in tokens #6

Closed

Hywan added the difficulty: hard label Aug 6, 2015

Hywan closed this as completed Aug 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support off-side rule languages #5

support off-side rule languages #5

CircleCode commented Jun 25, 2013 •

edited by Hywan

CircleCode commented Jun 25, 2013

CircleCode commented Jun 25, 2013

CircleCode commented Nov 25, 2013

CircleCode commented Dec 2, 2013

Hywan commented Aug 22, 2017

support off-side rule languages #5

support off-side rule languages #5

Comments

CircleCode commented Jun 25, 2013 • edited by Hywan

CircleCode commented Jun 25, 2013

CircleCode commented Jun 25, 2013

CircleCode commented Nov 25, 2013

CircleCode commented Dec 2, 2013

Hywan commented Aug 22, 2017

CircleCode commented Jun 25, 2013 •

edited by Hywan