Skip to content

Segfault: 4+ unchecked PyBytes_AsStringAndSize on user read() return uses uninitialized memory #292

@devdanzin

Description

@devdanzin

Summary

Several streaming paths call PyBytes_AsStringAndSize(result, &readBuffer, &readSize) on the return value of a caller-supplied source.read() and do not check the return value. When read() returns a non-bytes object (e.g., str, None, bytearray), readBuffer and readSize are left with their prior/uninitialized contents; the next ZSTD_compressStream2 call reads from those addresses. Observed: SEGV on release builds, _Py_CheckFunctionResult abort on debug builds.

Impact

  • Severity: SEGV on release builds; assertion abort on debug builds.
  • Reachability: Any caller-supplied source.read() whose return is not bytes. A trivial wrapper around a text file or a mistakenly-returned bytearray triggers it.
  • Version: 0.25.0 (commit 7a77a75).
  • Platform: Confirmed Linux x86_64 / CPython 3.14 debug; bug is platform-independent.

Reproducers

SEGV on release — via copy_stream:

import zstandard, io

class BadSource:
    def read(self, size):
        return 'not bytes'   # str, not bytes

comp = zstandard.ZstdCompressor()
comp.copy_stream(BadSource(), io.BytesIO())
# Segmentation fault

Assertion abort on debug — via iterator:

import zstandard

class BadSource:
    def read(self, size):
        return 'not bytes'

comp = zstandard.ZstdCompressor()
it = comp.read_to_iter(BadSource())
next(it)
# Fatal Python error: _Py_CheckFunctionResult: a function returned a result with an exception set
# TypeError: expected bytes, str found

Root cause

PyBytes_AsStringAndSize returns -1 and sets a TypeError when its argument is not a bytes object. On failure, the by-address output parameters readBuffer / readSize are not written. zstandard ignores the return code and proceeds to use those addresses, passing them to ZSTD_compressStream2 which reads from whatever happens to be on the stack / in registers.

Affected sites

File Line Function
c-ext/compressor.c 349 copy_stream
c-ext/decompressor.c 202 decompressor_copy_stream
c-ext/compressoriterator.c 98 ZstdCompressorIterator_iternext
c-ext/decompressoriterator.c 134 ZstdDecompressorIterator_iternext

Plus 2 additional sites reported in the full analysis (in read_compressor_input and the decompressor equivalent).

Suggested fix

Mechanical — add the standard error check after every call:

if (PyBytes_AsStringAndSize(result, &readBuffer, &readSize) < 0) {
    Py_DECREF(result);
    goto finally;
}

Optionally: tighten the documented API contract on source.read() to specify that the return must be a bytes object (the C code already expects this). Enforcing it at the boundary would be a small additional cleanup.

Methodology

Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). SEGV on release verified at the copy_stream site; assertion abort on debug verified at the iterator site. Four sites confirmed via direct reproducer; two more confirmed via static review. Happy to open a PR — the fix is a ~8-line diff.

Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.

Full report

Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions