New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-110309: prune empty constant in format specs #110320
Conversation
🤖 New build scheduled with the buildbot fleet by @sunmy2019 for commit 19d1301 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
Thanks a lot for the PR @sunmy2019 ! |
🤖 New build scheduled with the buildbot fleet by @sunmy2019 for commit 60b4666 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
🤖 New build scheduled with the buildbot fleet by @sunmy2019 for commit 19c68e2 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
I'm wondering why we cannot do this in the tokenizer directly. Wouldn't something like this work? cpython on main [!] via C v15.0.0-clang via 🐍 pyenv 3.11.3 (venv) took 8s
❯ git diff
diff --git a/Parser/tokenizer.c b/Parser/tokenizer.c
index 41d0d16a47..504bc9bed9 100644
--- a/Parser/tokenizer.c
+++ b/Parser/tokenizer.c
@@ -2639,6 +2639,12 @@ tok_get_fstring_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct
tok->first_lineno = tok->lineno;
tok->starting_col_offset = tok->col_offset;
+ int in_format_spec = (
+ current_tok->last_expr_end != -1
+ &&
+ INSIDE_FSTRING_EXPR(current_tok)
+ );
+
// If we start with a bracket, we defer to the normal mode as there is nothing for us to tokenize
// before it.
int start_char = tok_nextc(tok);
@@ -2655,6 +2661,10 @@ tok_get_fstring_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct
return tok_get_normal_mode(tok, current_tok, token);
}
}
+ else if (start_char == '}' && in_format_spec) {
+ tok_backup(tok, start_char);
+ return tok_get_normal_mode(tok, current_tok, token);
+ }
else {
tok_backup(tok, start_char);
}
@@ -2726,11 +2736,6 @@ tok_get_fstring_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct
end_quote_size = 0;
}
- int in_format_spec = (
- current_tok->last_expr_end != -1
- &&
- INSIDE_FSTRING_EXPR(current_tok)
- );
if (c == '{') {
int peek = tok_nextc(tok);
if (peek != '{' || in_format_spec) { Didn't thoroughly test it, but one pass through the tests only shows one failure that's got to do with the error message when we have a lambda in the expression part. |
I have no objection to this. One thing to note: the same special handling of empty str constants is at the other parts of the code. If your idea is adopted, we'd better clean up those special handling. |
I'd propose we land this first and backport to 3.12. Then maybe we can optimize further on the 3.13 branch. |
Agreed. Let's land this and then iterate on it in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
I think we need to regenerate some files (run |
close #110309