New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
py/lexer: don't throw an IndentationError on input that starts with CR #3063
py/lexer: don't throw an IndentationError on input that starts with CR #3063
Conversation
CI failure because core code size increased by 20 bytes. |
py/lexer.c
Outdated
// if input stream is 0, 1 or 2 characters long and doesn't end in a newline, then insert a newline at the end | ||
if (lex->chr0 == MP_LEXER_EOF) { | ||
lex->chr0 = '\n'; | ||
} else if (lex->chr1 == MP_LEXER_EOF) { | ||
if (lex->chr0 == '\r') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we can remove this condition because convert_crlf()
ensures chr0
is not '\r'
.
Thanks for the report. This is indeed a bug wrt to CPython behaviour, which the following simple test shows:
This fails on uPy. I think there's a simpler (smaller) way to fix it, by calling next_char() in mp_lexer_new(). |
Ah, maybe put a dummy character in |
@dpgeorge, much happier with this updated version. Cleaner to rely on |
Not so great if I break blank lines in the REPL. Sorry for not testing better before pushing. |
I had used this for my testing, but should have run some test cases through the raw REPL or paste mode.
Definitely squash and merge for this one. I really need to dig into the automated tests and start adding tests for what I'm working on. |
py/lexer.c
Outdated
// preload characters | ||
lex->chr0 = reader.readbyte(reader.data); | ||
// load lexer with start of file, advancing lex->column to 1 | ||
lex->chr0 = 0; // dummy character burned off by next_char() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could make this a '\n', and initialise lex->line to 0, and then next_char() would handle correctly the case of an input stream of 0 chars. Then below you just need to handle the case of an input stream of 1 char.
Regarding the tests you have above, please add them (in a separate commit as part of this PR) to the file tests/basics/lexer.py (there's a section called "short input" there). |
Adds coverage for issues fixed in pull request 3063.
Now consistently uses the EOL processing ("/r" and "/r/n" convert to "/n") and EOF processing (ensure "/n" before EOF) provided in next_char().
@dpgeorge, I've added unit tests, and reverted I reviewed execution paths, and don't think changing |
I ran into this odd behavior when pasted code starts with a completely blank line.
After investigation, I found it was an issue with how
lexer.c
starts processing. This pull request seems to address the issue in manual testing I've done.