Skip to content

Allocation failures generated in _PyTokenizer_FromUTF8 lead to SystemError/abort()s #150207

@stestagg

Description

@stestagg

Crash report

What happened?

The following aborts on a debug build, and SystemError's otherwise:
(runs on linux as the rlimit is not portable, but any hard memory limit should do)

import resource 
MB_512 = 512 * 1024 * 1024
resource.setrlimit(resource.RLIMIT_AS, (MB_512, MB_512)) 
eval("A"* 420000000)

What seems to be happening is: _PyTokenizer_FromUTF8 calls _PyTokenizer_translate_newlines

tok->input = translated = _PyTokenizer_translate_newlines(str, exec_input, preserve_crlf, tok);

Which does some allocations. If any of those allocations fail, the tok->done field is set to E_NOMEM:

buf = PyMem_Malloc(needed_length);
if (buf == NULL) {
tok->done = E_NOMEM;
return NULL;
}

BUT, then _PyTokenizer_FromUTF8 just sees if the translated version is returned, and if not, frees the tok and returns NULL without setting the PyErr_NoMemory exception state:

if (translated == NULL) {
_PyTokenizer_Free(tok);
return NULL;
}

So the NULL bubbles up the chain and is returned to the eval() call, but there's no error set, so you get the abort coming out of: _Py_CheckFunctionResult

cpython/Objects/call.c

Lines 30 to 44 in 3c298e2

if (result == NULL) {
if (!_PyErr_Occurred(tstate)) {
if (callable)
_PyErr_Format(tstate, PyExc_SystemError,
"%R returned NULL without setting an exception",
callable);
else
_PyErr_Format(tstate, PyExc_SystemError,
"%s returned NULL without setting an exception",
where);
#ifdef Py_DEBUG
/* Ensure that the bug is caught in debug mode.
Py_FatalError() logs the SystemError exception raised above. */
Py_FatalError("a function returned NULL without setting an exception");
#endif

It seems to me that the solution should be setting PyErr_NoMemory if the token->done ends up with E_NOMEM in _PyTokenizer_FromUTF8, but I'm no expert here, and not sure if there are ramifications given that utf8_tokenizer.c doesn't do much pyobject stuff.

Note: there's another, much less likely allocation just below in _PyTokenizer_FromUTF8:

tok->encoding = _PyTokenizer_new_string("utf-8", 5, tok);

Which could in theory trip the same problem, but the allocation is tiny, so this much less likely.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.16.0a0 (heads/main:441af3a, May 21 2026, 15:18:12) [Clang 22.1.5 ]

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)topic-parsertype-crashA hard crash of the interpreter, possibly with a core dump
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions