-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-42827: Fix crash on SyntaxError in multiline expressions #24140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When trying to extract the error line for the error message there are two distinct cases: 1. The input comes from a file, which means that we can extract the error line by using `PyErr_ProgramTextObject` and which we already do. 2. The input does not come from a file, at which point we need to get the source code from the tokenizer: * If the tokenizer's current line number is the same with the line of the error, we get the line from `tok->buf` and we're ready. * Else, we can extract the error line from the source code in the following two ways: * If the input comes from a string we have all the input in `tok->str` and we can extract the error line from it. * If the input comes from stdin, i.e. the interactive prompt, we do not have access to the previous line. That's why a new field `tok->stdin_content` is added which holds the whole input for the current (multiline) statement or expression. We can then extract the error line from `tok->stdin_content` like we do in the string case above.
Thanks for the PR @lysnikolaou! I will review this soon but here is some initial question. My understanding is that Another though: another possibility is to not report the location of the error for stdin like the old parser does. This simplifies a bit this particular case and speeds up the parsing |
Yup, I checked this a bit more closely and it seems to be working fine. We can add some more tests, in order to be even more confident.
I like it a lot that the new parser reports the location in stdin as well and my feeling is that performance isn't such a big consideration here, since all the new code only runs, in case the input comes from the REPL. If you feel otherwise, we can of course explore other options as well. |
🤖 New build scheduled with the buildbot fleet by @pablogsal for commit bdfc2a4 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
🤖 New build scheduled with the buildbot fleet by @pablogsal for commit 78a0f9c 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
Bad news. The assertion
Even more oddly, this wasn't a problem without the assertion, because Any thoughts on what we should do here? Fixing the tokenizer to point to the correct line upon an |
Hummm, is this a problem that happens after the change in this PR or it happens already on master? What does the old parser do with this?
Isn't the tokenizer pointing to the last line upon E_EOF? Maybe I am misunderstanding the case where this happens. |
Co-authored-by: Irit Katriel <iritkatriel@yahoo.com>
It was maybe a little bit too late last night, so I got a bit confused after a while and didn't explain the situation too well. Let me try to explain again after some more digging into the whole thing. After some more consideration, I have reconsidered and this is not a tokenizer bug. It's an edge case, which we only handled correctly in the new parser code by accident, or at least it seems like that to me. Let me explain. When an If we have the file a = 0
\ Then the error looks like this:
Note that it says So if this makes sense, why is it a problem? My thought was that for this case the error line would be found by the call to Line 415 in 68e1f25
lineno points to a non-existent line. And so the error line is actually found by looking at p->tok->buf on Line 420 in 68e1f25
How do we fix this? Removing the assertion works. Changing the assertion to Hopefully everything is clear now. If not, let me know and I'll give it another go.
Just FYI the old parser used to have a specialized struct |
Also, thanks a lot @iritkatriel for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough explanation @lysnikolaou.
LGTM! Let's go with this. I assume if we go with something like #24161 that exercises this path more we will detect quickly if we missed anything
…H-24140) When trying to extract the error line for the error message there are two distinct cases: 1. The input comes from a file, which means that we can extract the error line by using `PyErr_ProgramTextObject` and which we already do. 2. The input does not come from a file, at which point we need to get the source code from the tokenizer: * If the tokenizer's current line number is the same with the line of the error, we get the line from `tok->buf` and we're ready. * Else, we can extract the error line from the source code in the following two ways: * If the input comes from a string we have all the input in `tok->str` and we can extract the error line from it. * If the input comes from stdin, i.e. the interactive prompt, we do not have access to the previous line. That's why a new field `tok->stdin_content` is added which holds the whole input for the current (multiline) statement or expression. We can then extract the error line from `tok->stdin_content` like we do in the string case above. Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
When trying to extract the error line for the error message there
are two distinct cases:
error line by using
PyErr_ProgramTextObject
and which we alreadydo.
the source code from the tokenizer:
of the error, we get the line from
tok->buf
and we're ready.following two ways:
in
tok->str
and we can extract the error line from it.do not have access to the previous line. That's why a new
field
tok->stdin_content
is added which holds the whole input for thecurrent (multiline) statement or expression. We can then extract the
error line from
tok->stdin_content
like we do in the string case above.https://bugs.python.org/issue42827