-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenize module does not mirror "end-of-input" is newline behavior #78080
Comments
As was pointed out in https://bugs.python.org/issue33766 there is an edge case in the tokenizer whereby it will implicitly treat the end of input as a newline. The tokenize module in stdlib does not mirror the C code's behavior in this case. tokenizer.c: ~/cpython $ echo -n 'x' | ./python tokenize module: ~/cpython $ echo -n 'x' | ./python -m tokenize The instrumentation to have the C tokenizer dump out its tokens is mine, can provide a diff to produce that output if needed. |
Thanks for all of your work on this, Ammar! |
This change in behaviour is breaking pycodestyle: PyCQA/pycodestyle#786 Perhaps it shouldn't have been backported (especially all the way to python2.7?) |
This was backported since it was considered a bug, but you are right that it broke backwards compatibility, and perhaps shouldn't have been backported. Still, with 3.6.6 and 3.7.1 now released, that ship has sailed. We could perhaps revert this on the 2.7 branch, but I feel that reverting this change only on 2.7 would just cause even more confusion. |
I'm surprised this was classified as a bug! Though that's subjective so I get that it's difficult to decide what is and what isn't ¯\(ツ)/¯ |
Apparently this change also affected IPython. Perhaps we should add an entry to the whatsnew documents for 3.7.1 and 3.7.6: https://docs.python.org/3/whatsnew/3.7.html#notable-changes-in-python-3-7-1 https://docs.python.org/3.6/whatsnew/3.6.html#notable-changes-in-python-3-6-7 |
I'm sorry to have caused this mess, it was bad judgement on my part. Adding mention in What's is a good idea, Ned, I'll do that. |
Ned, should this also be added to the 2.7 What's New? Or perhaps reverted on the 2.7 branch? |
I don't have a strong opinion about 2.7 here. Ultimately, it's Benjamin's call. But it might make sense to revert for 2.7 since it hasn't been released yet. |
Please revert in 2.7. |
See PR #54281 for reverting in 2.7. |
FYI, An example of other fallout from this change - patsy broke and needed this fix: |
See PR #54282 adding mention in "What's New". |
some pylint fallout appears to be addressed in pylint-dev/pylint@2698cbe |
Thanks for helping with the fallout from this, Gregory. |
bpo-33766 was about documenting the C tokenizer change, some years ago, that made end-of-file EOF and end-of-string EOS generate the NEWLINE token required to properly terminate statements. "The end of input also serves Although the tokenizer module intentionally does not exactly mirror the C tokenizer (it adds COMMENT tokens), it plausibly seems like a bug that it was not changed along with the C tokenizer, as it has since been tokenizing valid code as grammatically invalid. But I agree that this fix is too disruptive for 2.7. |
https://bugs.python.org/issue35107 filed to track further fallout from this API change. |
Is it expected behavior that comments produce NEWLINE if they don't have a newline and don't produce NEWLINE if they do (that is, '# comment' produces NEWLINE but '# comment\n' does not)? |
In order to adapt code to this change, can we assume that a NEWLINE token with an empty string only occurs right before the ENDMARKER? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: