Skip to content

Segfault: 12 types dereference NULL on Type.__new__(Type) without __init__() #291

@devdanzin

Description

@devdanzin

Summary

PyType_GenericNew creates zero-initialized instances, but zstandard's method implementations assume the C-level pointer fields (cctx, dctx, params, ...) are non-NULL. T.__new__(T).method(...) therefore segfaults for 12 of 13 extension types with nothing more than a standard-library import.

Impact

  • Severity: Segfault (some paths hit an assertion abort on debug builds).
  • Reachability: A single line of pure Python on a public type. No ZSTD-library involvement, no unusual input.
  • Version: 0.25.0 (commit 7a77a75); pattern very likely present in prior releases as well.
  • Platform: Confirmed Linux x86_64 / CPython 3.14 debug; the bug is platform-independent.

Reproducers

Each one-liner segfaults:

import zstandard as zstd

zstd.ZstdCompressor.__new__(zstd.ZstdCompressor).compress(b'x')
zstd.ZstdDecompressor.__new__(zstd.ZstdDecompressor).decompress(b'x')
zstd.ZstdCompressionParameters.__new__(zstd.ZstdCompressionParameters).estimated_compression_context_size()
zstd.ZstdCompressionWriter.__new__(zstd.ZstdCompressionWriter).write(b'x')
zstd.ZstdDecompressionWriter.__new__(zstd.ZstdDecompressionWriter).write(b'x')
zstd.ZstdCompressionReader.__new__(zstd.ZstdCompressionReader).read(10)
zstd.BufferWithSegmentsCollection.__new__(zstd.BufferWithSegmentsCollection)[0]

Five more types aren't in the top-level namespace but are reachable via type() introspection and crash the same way:

c = zstd.ZstdCompressor()
T = type(c.compressobj())        # ZstdCompressionObj
T.__new__(T).compress(b'x')      # segfault
# likewise: ZstdDecompressionObj, ZstdCompressorIterator,
#           ZstdDecompressorIterator, ZstdCompressionChunker

ZstdDecompressionReader (the 13th affected type) does not crash — read() returns b'' because input.pos == input.size == 0 takes an early-return branch. The instance is still in an invalid state; any future method that doesn't short-circuit on this state will crash.

Root cause

All 13 type specs install {Py_tp_new, PyType_GenericNew}. PyType_GenericNew zero-initializes the instance. tp_init is where real allocation (ZSTD_createCCtx, ZSTD_createDCtx, PyMem_Malloc, ...) happens — skip __init__ and the pointers stay NULL. Methods then do things like ZSTD_CCtx_reset(self->cctx, ...) on NULL.

Affected types

Type File NULL field / observable Trigger
ZstdCompressor c-ext/compressor.c cctx .compress(b'x')
ZstdDecompressor c-ext/decompressor.c dctx (via ensure_dctx) .decompress(b'x')
ZstdCompressionParameters c-ext/compressionparams.c params any accessor
ZstdCompressionWriter c-ext/compressionwriter.c compressor .write(b'x')
ZstdDecompressionWriter c-ext/decompressionwriter.c decompressor .write(b'x')
ZstdCompressionReader c-ext/compressionreader.c assertion abort .read(10)
BufferWithSegmentsCollection c-ext/bufferutil.c firstElements [0]
ZstdCompressionObj c-ext/compressobj.c compressor via type()
ZstdDecompressionObj c-ext/decompressobj.c decompressor via type()
ZstdCompressorIterator c-ext/compressoriterator.c compressor/reader via type()
ZstdDecompressorIterator c-ext/decompressoriterator.c decompressor/reader via type()
ZstdCompressionChunker c-ext/compressionchunker.c compressor via type()
ZstdDecompressionReader (silent) c-ext/decompressionreader.c zero-state early return .read(10)b''

Suggested fix

Two options; Option A is recommended. It yields a usable object immediately after __new__ and composes cleanly with a re-init leak finding noted below.

Option A — allocate in tp_new

Replace {Py_tp_new, PyType_GenericNew} with a type-specific tp_new that installs the minimum non-NULL state the methods need:

static PyObject *
ZstdCompressor_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    ZstdCompressor *self = (ZstdCompressor *)type->tp_alloc(type, 0);
    if (!self) {
        return NULL;
    }
    self->cctx = ZSTD_createCCtx();
    if (!self->cctx) {
        Py_DECREF(self);
        PyErr_NoMemory();
        return NULL;
    }
    /* other fields stay zero-initialized; tp_init will configure them. */
    return (PyObject *)self;
}

static PyType_Slot ZstdCompressor_slots[] = {
    {Py_tp_dealloc, ZstdCompressor_dealloc},
    {Py_tp_methods, ZstdCompressor_methods},
    {Py_tp_init,    (initproc)ZstdCompressor_init},
    {Py_tp_new,     ZstdCompressor_new},      /* was: PyType_GenericNew */
    {0, 0},
};

tp_init then only configures the already-allocated context. For types whose tp_init also allocates (ZstdCompressionParameters, BufferWithSegmentsCollection, ...), the allocation moves into tp_new and tp_init is reduced to argument parsing + configuration.

Option B — guard every method

Add a NULL check at the top of every method that dereferences an initializable field:

static PyObject *
ZstdCompressor_compress(ZstdCompressor *self, PyObject *args)
{
    if (!self->cctx) {
        PyErr_SetString(PyExc_ValueError,
                        "ZstdCompressor not initialized - call __init__");
        return NULL;
    }
    /* ... */
}

Less invasive but requires auditing every method on every affected type, and new methods have to remember the guard.

Related / follow-ups

  • This composes with a re-init leak finding (comp.__init__(...) on an already-initialized instance allocates new contexts without freeing the previous ones — ~5.5 KB per call) covered in the full report linked below. Option A cleanly separates "create fresh state" (tp_new) from "configure from kwargs" (tp_init), making it easy to either reject re-init or free-then-recreate.
  • The ZstdCompressionObj / ZstdDecompressionObj / iterator / chunker types are returned from factories and not directly constructable, but their __new__ remains reachable via type(). Same fix applies.

Methodology

Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). All 7 direct-namespace reproducers and 5 type()-introspection reproducers were verified live on CPython 3.14.3 debug build. Happy to open a PR — the Option A change is mechanical across the 13 specs. I'd propose a single PR with the atomic set of changes, but can split into a 2-commit PR (simple-allocation types first, more-complex-init types second) if you prefer.

Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.

Full report

Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions