New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python 3 gives misleading errors when validating unicode identifiers #67452
Comments
PEP-3131 changed the definition of valid identifiers to match this pattern <XID_Start> <XID_Continue>* . Currently if you have an invalid character in an identifier you get this error ☺ = 4 This is fine in most cases. But in some cases the problem is not the character is invalid so much as the character may not be used to START the identifier. One example of this is the "combining grave accent" which is an XID_CONTINUE character but not an XID_START So ̀e is an invalid identifier but è is a valid identifier. So the ̀ character is not invalid in all cases. The attached patch attempts to clarify this by providing a different error when the start character is invalid. >>> ̀e = 4
File "<stdin>", line 1
̀e = 4
^
SyntaxError: invalid start character in identifier
However, if the character is simply not allowed (as it is neither an XID_START or an XID_CONTINUE character) the original error is used.
>>> ☺smile = 4
File "<stdin>", line 1
☺smile = 4
^
SyntaxError: invalid character in identifier |
While the request is reasonable, the patch seems to touch quite some code. |
Agreed with Ezio. Adding 7 new public names just to enhance one rare error message looks too hight cost. I am inclined to left all as is. Original message is not so bad. |
Alrighty. I'll investigate and see if I can cut down the code some. If I can't significantly I'll let the issue die quietly. I agree that it's a pretty nitpick ticket. I noticed it while doing some research into unicode and made the patch when I saw how languages like swift handle this case. Thanks for looking at it though! |
I dislike the patch. The error message "invalid character in identifier" is correct. I don't want to modify so much code for a little better error message. If you start to use non-ASCII identifier, you are probably already aware that you may get some issues. I close the issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: