Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py/lexer: don't throw an IndentationError on input that starts with CR #3063

Closed

Conversation

tomlogic
Copy link
Contributor

@tomlogic tomlogic commented May 4, 2017

I ran into this odd behavior when pasted code starts with a completely blank line.

MicroPython v1.8.7-609-ge31fbd9 on 2017-04-10; PYBv1.1 with STM32F405RG
Type "help()" for more information.
>>>
paste mode; Ctrl-C to cancel, Ctrl-D to finish
===
=== print('foo')
===
Traceback (most recent call last):
  File "<stdin>", line 1
IndentationError: unexpected indent
>>>
MicroPython v1.8.2 on 2016-07-13; PYBv1.1 with STM32F405RG
Type "help()" for more information.
>>> 
paste mode; Ctrl-C to cancel, Ctrl-D to finish
=== 
=== print('hello world')
Traceback (most recent call last):
  File "<stdin>", line 1
IndentationError: unexpected indent
>>> 

After investigation, I found it was an issue with how lexer.c starts processing. This pull request seems to address the issue in manual testing I've done.

@tomlogic
Copy link
Contributor Author

tomlogic commented May 5, 2017

CI failure because core code size increased by 20 bytes.

py/lexer.c Outdated
// if input stream is 0, 1 or 2 characters long and doesn't end in a newline, then insert a newline at the end
if (lex->chr0 == MP_LEXER_EOF) {
lex->chr0 = '\n';
} else if (lex->chr1 == MP_LEXER_EOF) {
if (lex->chr0 == '\r') {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we can remove this condition because convert_crlf() ensures chr0 is not '\r'.

@dpgeorge
Copy link
Member

dpgeorge commented May 5, 2017

Thanks for the report. This is indeed a bug wrt to CPython behaviour, which the following simple test shows:

exec('\rprint(1)')

This fails on uPy.

I think there's a simpler (smaller) way to fix it, by calling next_char() in mp_lexer_new().

@tomlogic
Copy link
Contributor Author

tomlogic commented May 5, 2017

Ah, maybe put a dummy character in chr0 and initialize lex->column to 0 so it increments to 1 in the call to next_char()? That could simplify the EOF handling in mp_lexer_new() because next_char() has already tested lex->chr2. I'll explore that in the morning.

@tomlogic
Copy link
Contributor Author

tomlogic commented May 5, 2017

@dpgeorge, much happier with this updated version. Cleaner to rely on next_char() for all EOL/EOF handling, even at the start of the lexed code.

@tomlogic
Copy link
Contributor Author

tomlogic commented May 5, 2017

Not so great if I break blank lines in the REPL. Sorry for not testing better before pushing.

@tomlogic
Copy link
Contributor Author

tomlogic commented May 5, 2017

I had used this for my testing, but should have run some test cases through the raw REPL or paste mode.

exec('\nprint(1)')
exec('\rprint(2)')
exec('\r\nprint(3)')
exec('\tprint(4)')
exec('\n5')
exec('\r6')
exec('\r\n7')
exec('\n')
exec('\r')
exec('\r\n')
exec('\t')
exec('')

Definitely squash and merge for this one. I really need to dig into the automated tests and start adding tests for what I'm working on.

py/lexer.c Outdated
// preload characters
lex->chr0 = reader.readbyte(reader.data);
// load lexer with start of file, advancing lex->column to 1
lex->chr0 = 0; // dummy character burned off by next_char()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make this a '\n', and initialise lex->line to 0, and then next_char() would handle correctly the case of an input stream of 0 chars. Then below you just need to handle the case of an input stream of 1 char.

@dpgeorge
Copy link
Member

dpgeorge commented May 6, 2017

Regarding the tests you have above, please add them (in a separate commit as part of this PR) to the file tests/basics/lexer.py (there's a section called "short input" there).

Adds coverage for issues fixed in pull request 3063.
Now consistently uses the EOL processing ("/r" and "/r/n" convert to "/n") and EOF processing (ensure "/n" before EOF) provided in next_char().
@tomlogic
Copy link
Contributor Author

tomlogic commented May 9, 2017

@dpgeorge, I've added unit tests, and reverted py/lexer.c to the various versions with bugs (original and mine that broke for 0-byte input) and confirmed that the unit test failed.

I reviewed execution paths, and don't think changing chr0 to '\n' would have helped. The cleanest solution seems be priming the lexer with three nulls (can't be newline, tab or EOF marker) and then making three calls to next_char() to flush the dummy bytes and provide proper EOF handling (since next_char() will look at each character as it moves through the chr2 position).

@dpgeorge
Copy link
Member

dpgeorge commented May 9, 2017

That's great, thanks @tomlogic! It shaves off 48 bytes from bare-arm, which is not easy to do these days. Squashed and merged in 2998647

@dpgeorge dpgeorge closed this May 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants