Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure in get_error_line_from_tokenizer_buffers in pegen_errors.c #112387

Closed
bradlarsen opened this issue Nov 25, 2023 · 6 comments
Closed
Labels
type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@bradlarsen
Copy link
Contributor

bradlarsen commented Nov 25, 2023

Crash report

What happened?

I found some crashes when adding an additional fuzz target in #111721.

# This script fails in two different ways.
#
# First, it fails with a C assertion failure when assertions are enabled.
#
# Second, it *nondeterministically* gives the C assertion failure or an
# uncaught Python `SyntaxError` when assertions are enabled and pymalloc is
# disabled.
#
# I ran these on ARM64 macOS:
#
#     Darwin dialectic.local 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct  9 21:27:24 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6000 arm64
#
# For the first case, I ran this through Python built from source with
# assertions enabled:
#
#     ./configure --with-assertions --prefix "$PWD/debugbuild"
#     make -j12 altinstall
#     ./debugbuild/bin/python/crash1.py
#
# For the second case, I ran this through Python built from source with
# assertions enabled and pymalloc disabled:
#
#     ./configure --with-assertions --without-pymalloc --prefix "$PWD/debugbuild"
#     make -j12 altinstall
#     ./debugbuild/bin/python/crash1.py


# Input found via the fuzz target added in https://github.com/python/cpython/pull/111721, then manually minimized
s = b'# coding=latin\r(aaaaaaaaaaaaaaaaa\raaaaaaaaaaa\xb5'

# This line fails nondeterministically with either a C assertion failure or an
# uncaught Python `SyntaxError`, depending on the build configuration:
#
# Outcome 1:
#
#     Assertion failed: (new_line != NULL && new_line + 1 < buf_end), function get_error_line_from_tokenizer_buffers, file pegen_errors.c, line 286.
#
# Outcome 2:
#
#     Traceback (most recent call last):
#       File "crash1.py", line 17, in <module>
#         compile(s, 's', 'exec')
#       File "s", line 2
#         (aaaaaaaaaaaaaaaaa
#         ^
#     SyntaxError: '(' was never closed
compile(s, 's', 'exec')

CPython versions tested on:

CPython main branch

Operating systems tested on:

macOS

Output from running 'python -VV' on the command line:

Python 3.13.0a1+ (heads/main:3701f3bc10, Nov 24 2023, 23:05:42) [Clang 15.0.0 (clang-1500.0.40.1)]

Linked PRs

@bradlarsen bradlarsen added the type-crash A hard crash of the interpreter, possibly with a core dump label Nov 25, 2023
@bradlarsen
Copy link
Contributor Author

bradlarsen commented Nov 25, 2023

See also #112388

@bradlarsen
Copy link
Contributor Author

git bisect indicates that it was cedec19 where this crash was introduced.

cc @pablogsal

That commit introduced this code:

    Py_ssize_t relative_lineno = p->starting_lineno ? lineno - p->starting_lineno + 1 : lineno;

     for (int i = 0; i < relative_lineno - 1; i++) {
         char *new_line = strchr(cur_line, '\n') + 1;
         assert(new_line != NULL && new_line < p->tok->inp);
         if (new_line == NULL || new_line >= p->tok->inp) {
             break;
         }
         cur_line = new_line;

Note that the assertion and the following conditional are incorrect. new_line is assigned as strchr(cur_line, '\n') + 1, which will always be non-NULL.

@pablogsal
Copy link
Member

git bisect indicates that it was cedec19 where this crash was introduced.

cc @pablogsal

That commit introduced this code:

    Py_ssize_t relative_lineno = p->starting_lineno ? lineno - p->starting_lineno + 1 : lineno;

     for (int i = 0; i < relative_lineno - 1; i++) {
         char *new_line = strchr(cur_line, '\n') + 1;
         assert(new_line != NULL && new_line < p->tok->inp);
         if (new_line == NULL || new_line >= p->tok->inp) {
             break;
         }
         cur_line = new_line;

Note that the assertion and the following conditional are incorrect. new_line is assigned as strchr(cur_line, '\n') + 1, which will always be non-NULL.

Thanks for the report! I am taking a look but notice the code you are reporting for that commit has already been fixed, so this is something else:

for (int i = 0; i < relative_lineno - 1; i++) {
char *new_line = strchr(cur_line, '\n');
// The assert is here for debug builds but the conditional that
// follows is there so in release builds we do not crash at the cost
// to report a potentially wrong line.
assert(new_line != NULL && new_line + 1 < buf_end);
if (new_line == NULL || new_line + 1 > buf_end) {
break;
}
cur_line = new_line + 1;
}

pablogsal added a commit to pablogsal/cpython that referenced this issue Nov 25, 2023
…ds tokenize errors

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
@bradlarsen
Copy link
Contributor Author

Oh, interesting! There must be another cause of that assertion failure in newer commits. I'll have to bisect again and look at stack traces, not just the assertion failure.

@pablogsal
Copy link
Member

Don't worry I think I know what that is I have a PR up and will discuss and we can back port it once @lysnikolaou takes a look

pablogsal added a commit that referenced this issue Nov 27, 2023
…enize errors (#112409)

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 27, 2023
…ds tokenize errors (pythonGH-112409)

(cherry picked from commit 45d6485)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
@pablogsal
Copy link
Member

Thanks for the report!

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 27, 2023
…ds tokenize errors (pythonGH-112409)

(cherry picked from commit 45d6485)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
pablogsal added a commit that referenced this issue Nov 27, 2023
…rds tokenize errors (GH-112409) (#112468)

gh-112387: Fix error positions for decoded strings with backwards tokenize errors (GH-112409)
(cherry picked from commit 45d6485)

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
pablogsal added a commit that referenced this issue Nov 27, 2023
…rds tokenize errors (GH-112409) (#112469)

gh-112387: Fix error positions for decoded strings with backwards tokenize errors (GH-112409)
(cherry picked from commit 45d6485)

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…ds tokenize errors (python#112409)

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

2 participants