Skip to content

[BUG]: heap-buffer-overflow in pythonbuf when handling UTF-8 data #5886

@hgarrereyn

Description

@hgarrereyn

Required prerequisites

What version (or hash if on master) of pybind11 are you using?

e6984c8

Problem description

Hi, there is a potential bug in pythonbuf reachable by providing truncated UTF-8 data to an undersized buffer.

This bug was reproduced on e6984c8.

Description

The pythonbuf class can be used to stream output data with a variable internal buffer_size. It implements some logic to prevent sending incomplete UTF-8 data between flushes. In certain cases, however if the buffer is undersized this write logic will write beyond the bounds of the buffer.

Specifically, I believe the core issue is in the overflow logic:

int overflow(int c) override {
if (!traits_type::eq_int_type(c, traits_type::eof())) {
*pptr() = traits_type::to_char_type(c);
pbump(1);
}
return sync() == 0 ? traits_type::not_eof(c) : traits_type::eof();
}

In particular, this function may be invoked when the streambuf is full, so unconditionally writing to *pptr() is a dangerous pattern that could write out of bounds.

POC

The following testcase demonstrates the bug:

testcase.cpp

#include <pybind11/pybind11.h>
#include <pybind11/iostream.h>

int main() {
    if (!Py_IsInitialized()) Py_Initialize();
    PyGILState_STATE g = PyGILState_Ensure();
    pybind11::object pyostream = pybind11::module_::import("sys").attr("stdout");
    // buffer_size=1 is accepted by the constructor but triggers an overflow later
    pybind11::detail::pythonbuf pb(pyostream, 1);
    PyGILState_Release(g);

    std::ostream os(&pb);
    // Emit an incomplete UTF-8 sequence split across writes to exercise utf8 remainder logic

    // This sequence crashes
    if (1) {
        os.put('\xE2');
        os.flush();
        os.put('\x80'); // ASan: heap-buffer-overflow in pythonbuf::overflow
        os.flush();
    }

    // This sequence does not crash
    if (0) {
        os.put('a');
        os.flush();
        os.put('b');
        os.flush();
    }

    return 0;
}

stdout

=================================================================
==1==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5020000003b1 at pc 0x555dd5c7e9b5 bp 0x7ffce660f1b0 sp 0x7ffce660f1a8
WRITE of size 1 at 0x5020000003b1 thread T0
    #0 0x555dd5c7e9b4 in pybind11::detail::pythonbuf::overflow(int) /fuzz/install/include/pybind11/iostream.h:49:21
    #1 0x7f7d89ba9269 in std::ostream::put(char) (/lib/x86_64-linux-gnu/libstdc++.so.6+0x13c269) (BuildId: e72c155b714bc42a767ec9c0dd94589110e5b42f)
    #2 0x555dd5c031a6 in main /fuzz/testcase.cpp:19:12
    #3 0x7f7d89750d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #4 0x7f7d89750e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #5 0x555dd5b27de4 in _start (/fuzz/test+0x35de4) (BuildId: e9cb09f22c5440d5b232e234a3bc1b5edf7930bb)

0x5020000003b1 is located 0 bytes after 1-byte region [0x5020000003b0,0x5020000003b1)
allocated by thread T0 here:
    #0 0x555dd5c00d0d in operator new[](unsigned long) (/fuzz/test+0x10ed0d) (BuildId: e9cb09f22c5440d5b232e234a3bc1b5edf7930bb)
    #1 0x555dd5c055be in pybind11::detail::pythonbuf::pythonbuf(pybind11::object const&, unsigned long) /fuzz/install/include/pybind11/iostream.h:121:43
    #2 0x555dd5c03126 in main /fuzz/testcase.cpp:9:33
    #3 0x7f7d89750d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

SUMMARY: AddressSanitizer: heap-buffer-overflow /fuzz/install/include/pybind11/iostream.h:49:21 in pybind11::detail::pythonbuf::overflow(int)
Shadow bytes around the buggy address:
  0x502000000100: fa fa 01 fa fa fa 01 fa fa fa fd fa fa fa fd fd
  0x502000000180: fa fa fd fd fa fa fd fa fa fa fd fd fa fa 01 fa
  0x502000000200: fa fa 01 fa fa fa 00 07 fa fa fd fd fa fa fd fd
  0x502000000280: fa fa fd fd fa fa fd fd fa fa 01 fa fa fa 06 fa
  0x502000000300: fa fa 00 00 fa fa 06 fa fa fa 00 00 fa fa fd fd
=>0x502000000380: fa fa 01 fa fa fa[01]fa fa fa 00 fa fa fa 00 00
  0x502000000400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000000480: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000000500: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000000580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000000600: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==1==ABORTING

stderr


Steps to Reproduce

The crash was triaged with the following Dockerfile:

Dockerfile

# Ubuntu 22.04 with some packages pre-installed
FROM hgarrereyn/stitch_repro_base@sha256:3ae94cdb7bf2660f4941dc523fe48cd2555049f6fb7d17577f5efd32a40fdd2c

RUN git clone https://github.com/pybind/pybind11.git /fuzz/src && \
    cd /fuzz/src && \
    git checkout e6984c805ec09c0e5f826e3081a32f322a6bfe63 && \
    git submodule update --init --remote --recursive

ENV LD_LIBRARY_PATH=/fuzz/install/lib
ENV ASAN_OPTIONS=hard_rss_limit_mb=1024:detect_leaks=0

RUN echo '#!/bin/bash\nexec clang-17 -fsanitize=address -O0 "$@"' > /usr/local/bin/clang_wrapper && \
    chmod +x /usr/local/bin/clang_wrapper && \
    echo '#!/bin/bash\nexec clang++-17 -fsanitize=address -O0 "$@"' > /usr/local/bin/clang_wrapper++ && \
    chmod +x /usr/local/bin/clang_wrapper++

RUN apt-get update && apt-get install -y --no-install-recommends \
    python3-dev python3-minimal cmake ninja-build libeigen3-dev \
    && rm -rf /var/lib/apt/lists/*

ENV CC=clang_wrapper \
    CXX=clang_wrapper++

WORKDIR /fuzz/src

RUN cmake -S . -B build -G Ninja \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/fuzz/install \
    -DPYBIND11_TEST=OFF

RUN cmake --build build --target install

Build Command

clang++-17 -fsanitize=address -g -O0 -o /fuzz/test /fuzz/testcase.cpp -I/fuzz/install/include -I/usr/include/python3.10 -I/usr/include/eigen3 -L/usr/lib/x86_64-linux-gnu -lpython3.10 && /fuzz/test

Reproduce

  1. Copy Dockerfile and testcase.cpp into a local folder.
  2. Build the repro image:
docker build . -t repro --platform=linux/amd64
  1. Compile and run the testcase in the image:
docker run \
    -it --rm \
    --platform linux/amd64 \
    --mount type=bind,source="$(pwd)/testcase.cpp",target=/fuzz/testcase.cpp \
    repro \
    bash -c "clang++-17 -fsanitize=address -g -O0 -o /fuzz/test /fuzz/testcase.cpp -I/fuzz/install/include -I/usr/include/python3.10 -I/usr/include/eigen3 -L/usr/lib/x86_64-linux-gnu -lpython3.10 && /fuzz/test"


Additional Info

This testcase was discovered by STITCH, an autonomous fuzzing system. All reports are reviewed manually (by a human) before submission.

Reproducible example code


Is this a regression? Put the last known working version here if it is.

Not a regression

Metadata

Metadata

Assignees

No one assigned

    Labels

    triageNew bug, unverified

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions