New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pegen _PyParser_ASTFromFile(): Use-After-Free in syntaxerror() #88562
Comments
Use After Free in python3.11 (commit 2ab27c4)
I attach some of the input that lead to the undefined behavior For the complete description you can find the asan report here: ==1082579==ERROR: AddressSanitizer: heap-use-after-free on address 0x626000045a40 at pc 0x000000735155 bp 0x7fffffffbed0 sp 0x7fffffffbec8 0x626000045a40 is located 2368 bytes inside of 10560-byte region [0x626000045100,0x626000047a40) previously allocated by thread T0 here: SUMMARY: AddressSanitizer: heap-use-after-free /home/elmanto/ddg/other_targets/cpython/Objects/unicodeobject.c:5091:28 in ascii_decode |
Lysandros and Pablo, this *only* occurs when the lexer is reading directly from a file, not when it's reading the same source code from a (bytes) string. All examples are syntax errors (some raise ValueError in the parser). |
Here is a smaller reproducer: x = "ijosdfsd\
def blech():
pass This seems to be an error with: commit a698d52
Batuhan, can you take a look? |
I think this should fix the issue, but someone should validate this: diff --git a/Parser/tokenizer.c b/Parser/tokenizer.c - errtext = PyUnicode_DecodeUTF8(tok->line_start, tok->cur - tok->line_start,
+ errtext = PyUnicode_DecodeUTF8(tok->buf, tok->inp - tok->buf,
"replace");
if (!errtext) {
goto error;
}
int offset = (int)PyUnicode_GET_LENGTH(errtext);
- Py_ssize_t line_len = strcspn(tok->line_start, "\n");
- if (line_len != tok->cur - tok->line_start) {
+ Py_ssize_t line_len = strcspn(tok->buf, "\n");
+ if (line_len != tok->buf - tok->inp) {
Py_DECREF(errtext);
- errtext = PyUnicode_DecodeUTF8(tok->line_start, line_len,
- "replace");
+ errtext = PyUnicode_DecodeUTF8(tok->buf, line_len, "replace");
}
if (!errtext) {
goto error; |
This affects 3.10 as well |
Ok, found the problem, we are not resetting the multi-line-start pointer when we are reallocating the tokenizer buffers. |
alessandro mantovani, one question, how did you generate the crash scripts? |
Fuzzing experimental techniques, but then I observed the same behavior was happening with vanilla afl++. As a starting queue I used the *.py files that I found in the repo under ‘test’ or so Best Alessandro Mantovani Inviato da iPhone
|
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: