py/lexer: don't throw an IndentationError on input that starts with CR #3063

tomlogic · 2017-05-04T23:36:24Z

I ran into this odd behavior when pasted code starts with a completely blank line.

MicroPython v1.8.7-609-ge31fbd9 on 2017-04-10; PYBv1.1 with STM32F405RG
Type "help()" for more information.
>>>
paste mode; Ctrl-C to cancel, Ctrl-D to finish
===
=== print('foo')
===
Traceback (most recent call last):
  File "<stdin>", line 1
IndentationError: unexpected indent
>>>

MicroPython v1.8.2 on 2016-07-13; PYBv1.1 with STM32F405RG
Type "help()" for more information.
>>> 
paste mode; Ctrl-C to cancel, Ctrl-D to finish
=== 
=== print('hello world')
Traceback (most recent call last):
  File "<stdin>", line 1
IndentationError: unexpected indent
>>>

After investigation, I found it was an issue with how lexer.c starts processing. This pull request seems to address the issue in manual testing I've done.

tomlogic · 2017-05-05T00:16:45Z

CI failure because core code size increased by 20 bytes.

tomlogic · 2017-05-05T00:18:29Z

py/lexer.c

    // if input stream is 0, 1 or 2 characters long and doesn't end in a newline, then insert a newline at the end
    if (lex->chr0 == MP_LEXER_EOF) {
        lex->chr0 = '\n';
    } else if (lex->chr1 == MP_LEXER_EOF) {
-        if (lex->chr0 == '\r') {


Note that we can remove this condition because convert_crlf() ensures chr0 is not '\r'.

dpgeorge · 2017-05-05T00:19:19Z

Thanks for the report. This is indeed a bug wrt to CPython behaviour, which the following simple test shows:

exec('\rprint(1)')

This fails on uPy.

I think there's a simpler (smaller) way to fix it, by calling next_char() in mp_lexer_new().

tomlogic · 2017-05-05T00:36:51Z

Ah, maybe put a dummy character in chr0 and initialize lex->column to 0 so it increments to 1 in the call to next_char()? That could simplify the EOF handling in mp_lexer_new() because next_char() has already tested lex->chr2. I'll explore that in the morning.

…ndling.

tomlogic · 2017-05-05T16:33:45Z

@dpgeorge, much happier with this updated version. Cleaner to rely on next_char() for all EOL/EOF handling, even at the start of the lexed code.

tomlogic · 2017-05-05T17:18:55Z

Not so great if I break blank lines in the REPL. Sorry for not testing better before pushing.

…ew().

tomlogic · 2017-05-05T17:43:05Z

I had used this for my testing, but should have run some test cases through the raw REPL or paste mode.

exec('\nprint(1)')
exec('\rprint(2)')
exec('\r\nprint(3)')
exec('\tprint(4)')
exec('\n5')
exec('\r6')
exec('\r\n7')
exec('\n')
exec('\r')
exec('\r\n')
exec('\t')
exec('')

Definitely squash and merge for this one. I really need to dig into the automated tests and start adding tests for what I'm working on.

dpgeorge · 2017-05-06T00:21:39Z

py/lexer.c

-    // preload characters
-    lex->chr0 = reader.readbyte(reader.data);
+    // load lexer with start of file, advancing lex->column to 1
+    lex->chr0 = 0;    // dummy character burned off by next_char()


You could make this a '\n', and initialise lex->line to 0, and then next_char() would handle correctly the case of an input stream of 0 chars. Then below you just need to handle the case of an input stream of 1 char.

dpgeorge · 2017-05-06T00:23:00Z

Regarding the tests you have above, please add them (in a separate commit as part of this PR) to the file tests/basics/lexer.py (there's a section called "short input" there).

Adds coverage for issues fixed in pull request 3063.

Now consistently uses the EOL processing ("/r" and "/r/n" convert to "/n") and EOF processing (ensure "/n" before EOF) provided in next_char().

tomlogic · 2017-05-09T00:38:54Z

@dpgeorge, I've added unit tests, and reverted py/lexer.c to the various versions with bugs (original and mine that broke for 0-byte input) and confirmed that the unit test failed.

I reviewed execution paths, and don't think changing chr0 to '\n' would have helped. The cleanest solution seems be priming the lexer with three nulls (can't be newline, tab or EOF marker) and then making three calls to next_char() to flush the dummy bytes and provide proper EOF handling (since next_char() will look at each character as it moves through the chr2 position).

dpgeorge · 2017-05-09T04:50:28Z

That's great, thanks @tomlogic! It shaves off 48 bytes from bare-arm, which is not easy to do these days. Squashed and merged in 2998647

py/lexer: don't throw an IndentationError on input that starts with CR

8a4e591

tomlogic commented May 5, 2017

View reviewed changes

py/lexer: Use next_char() in mp_lexer_new() for consistent EOL/EOF ha…

cc58077

…ndling.

py/lexer: Special handling of 0-byte and 1-byte streams in mp_lexer_n…

a31f6b1

…ew().

dpgeorge reviewed May 6, 2017

View reviewed changes

tomlogic added 2 commits May 8, 2017 17:31

tests/basics/lexer: Add lexer tests for input starting with newlines.

dcfb15d

Adds coverage for issues fixed in pull request 3063.

py/lexer: Simplify lexer startup by using dummy bytes and next_char().

da4dab7

Now consistently uses the EOL processing ("/r" and "/r/n" convert to "/n") and EOF processing (ensure "/n" before EOF) provided in next_char().

dpgeorge closed this May 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py/lexer: don't throw an IndentationError on input that starts with CR #3063

py/lexer: don't throw an IndentationError on input that starts with CR #3063

tomlogic commented May 4, 2017

tomlogic commented May 5, 2017

tomlogic May 5, 2017

dpgeorge commented May 5, 2017

tomlogic commented May 5, 2017

tomlogic commented May 5, 2017

tomlogic commented May 5, 2017

tomlogic commented May 5, 2017

dpgeorge May 6, 2017

dpgeorge commented May 6, 2017

tomlogic commented May 9, 2017

dpgeorge commented May 9, 2017

py/lexer: don't throw an IndentationError on input that starts with CR #3063

py/lexer: don't throw an IndentationError on input that starts with CR #3063

Conversation

tomlogic commented May 4, 2017

tomlogic commented May 5, 2017

tomlogic May 5, 2017

Choose a reason for hiding this comment

dpgeorge commented May 5, 2017

tomlogic commented May 5, 2017

tomlogic commented May 5, 2017

tomlogic commented May 5, 2017

tomlogic commented May 5, 2017

dpgeorge May 6, 2017

Choose a reason for hiding this comment

dpgeorge commented May 6, 2017

tomlogic commented May 9, 2017

dpgeorge commented May 9, 2017