-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
"_ if 1else _" does not compile #65841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
By the docs,
"_ if 1else _" should compile equivalently to "_ if 1 else _". The tokenize module does this correctly: import io
import tokenize
def print_tokens(string):
tokens = tokenize.tokenize(io.BytesIO(string.encode("utf8")).readline)
for token in tokens:
print("{:12}{}".format(tokenize.tok_name[token.type], token.string))
print_tokens("_ if 1else _")
#>>> ENCODING utf-8
#>>> NAME _
#>>> NAME if
#>>> NUMBER 1
#>>> NAME else
#>>> NAME _
#>>> ENDMARKER but it fails when compiled with, say, "compile", "eval" or "ast.parse". import ast
compile("_ if 1else _", "", "eval")
#>>> Traceback (most recent call last):
#>>> File "", line 32, in <module>
#>>> File "<string>", line 1
#>>> _ if 1else _
#>>> ^
#>>> SyntaxError: invalid token
eval("_ if 1else _")
#>>> Traceback (most recent call last):
#>>> File "", line 40, in <module>
#>>> File "<string>", line 1
#>>> _ if 1else _
#>>> ^
#>>> SyntaxError: invalid token
ast.parse("_ if 1else _")
#>>> Traceback (most recent call last):
#>>> File "", line 48, in <module>
#>>> File "/usr/lib/python3.4/ast.py", line 35, in parse
#>>> return compile(source, filename, mode, PyCF_ONLY_AST)
#>>> File "<unknown>", line 1
#>>> _ if 1else _
#>>> ^
#>>> SyntaxError: invalid token Further, some other forms work: 1 if 0b1else 0
#>>> 1
1 if 1jelse 0
#>>> 1 See
particularly,
for details. |
For those who want to skip reading the entire SO question: "1else" tokenizes as "1e" "lse", i.e. 1e is considered the beginning of floating point literal. By the description in the docs, that should not happen, since it is not a valid literal on its own, so no space should be needed after the 1 to tokenize it as an integer literal. |
Here's a minimal example of the difference: 1e
#>>> ... etc ...
#>>> SyntaxError: invalid token
1t
#>>> ... etc ...
#>>> SyntaxError: invalid syntax |
New changeset 4ad33d82193d by Benjamin Peterson in branch '3.4': New changeset 29d34f4f8900 by Benjamin Peterson in branch '2.7': New changeset d5998cca01a8 by Benjamin Peterson in branch 'default': |
FTR, I think this was a bad fix and we should have just changed the spec to require a space between numeric literals and identifiers. Closing as by design would have been fine in my opinion as well, since the spec says spaces are required when it's ambiguous, and this case looks fairly ambiguous. There's also a bit of a slippery slope here where we now have to fix "0x1and 3" or be very explicit about why it is different. I haven't even mentioned changing the parser in a dot release. That seems somewhat ridiculous. Everyone else who writes a Python parser (all the IDEs and type checkers, other implementations, etc.) would prefer it if we didn't need our tokenisers to look ahead two characters. |
My impression is that it was fixed the way it was because it makes the internal tokenizer match the what the tokenize module does. See also bpo-3353. As for changing it in a point release, it turns something that was an error into something that isn't, so it was unlikely to break existing working code. Going the other way in the tokenize module *would* have been a backward compatibility issue. If we wanted to change this, it would require a deprecation process, and it hardly seems worth it. I hear you about other tokenizers, though, and that is indeed unfortunate. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: