Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault during garbage collection with GzipFile + failed urllib3 request on 3.12.0 #111049

Closed
basepi opened this issue Oct 19, 2023 · 6 comments
Closed
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes topic-IO type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@basepi
Copy link

basepi commented Oct 19, 2023

Crash report

What happened?

import urllib3
import gzip
import io
import json
import faulthandler

faulthandler.enable()

def test():
    buffer = gzip.GzipFile(fileobj=io.BytesIO(), mode="w", compresslevel=5)
    fileobj = buffer.fileobj  # get a reference to the fileobj before closing the gzip file
    buffer.close()
    data = fileobj.getbuffer()
    headers = {}
    try:
        urllib3.request("POST", "http://127.0.0.1:8200/intake/v2/events")
    except Exception as e:
        print(e)

test()

The above example requires urllib3, so you'll need to install that first.

When the above example is run on Python 3.12.0, it results in a segfault:

HTTPConnectionPool(host='127.0.0.1', port=8200): Max retries exceeded with url: /intake/v2/events (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1053c5280>: Failed to establish a new connection: [Errno 61] Connection refused'))
Fatal Python error: Segmentation fault

Current thread 0x00000001e1499300 (most recent call first):
  Garbage-collecting
  <no Python frame>
[1]    38085 segmentation fault  python test.py

Some weird things:

  • If I remove the urllib3.request, it doesn't segfault.
  • If the urllib3.request succeeds (in this case, if I run the Elastic APM Server locally), it doesn't segfault.
  • If I pull the code out of the function and run it flat, it doesn't segfault 🤯:
import urllib3
import gzip
import io
import json
import faulthandler

faulthandler.enable()

buffer = gzip.GzipFile(fileobj=io.BytesIO(), mode="w", compresslevel=5)
fileobj = buffer.fileobj  # get a reference to the fileobj before closing the gzip file
buffer.close()
data = fileobj.getbuffer()
headers = {}
try:
    urllib3.request("POST", "http://127.0.0.1:8200/intake/v2/events")
except Exception as e:
    print(e)

This is an extremely simplified version of the code where I first saw the segfault. Note that you don't even have to send the data into the urllib3.request to cause the issue. You don't even have to write anything to the buffer! Note, writing to the buffer does not prevent the segfault.

I can reproduce this issue on cpython 3.12.0 (built via pyenv) locally on macOS and on the python:3.12.0 docker image.

The segfault does not happen on 3.11 and below.

CPython versions tested on:

3.12, 3.11 (only happens on 3.12)

Operating systems tested on:

Linux, macOS

Output from running 'python -VV' on the command line:

Python 3.12.0 (main, Oct 2 2023, 17:34:07) [Clang 15.0.0 (clang-1500.0.40.1)]

Linked PRs

@basepi basepi added the type-crash A hard crash of the interpreter, possibly with a core dump label Oct 19, 2023
@JelleZijlstra
Copy link
Member

Confirmed on macOS. Backtrace in lldb:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x38)
  * frame #0: 0x00000001008dccd4 libpython3.12.dylib`bytesiobuf_releasebuffer + 4
    frame #1: 0x0000000100711a20 libpython3.12.dylib`PyBuffer_Release + 56
    frame #2: 0x0000000100779670 libpython3.12.dylib`mbuf_clear + 68
    frame #3: 0x00000001008b544c libpython3.12.dylib`gc_collect_main + 2156
    frame #4: 0x00000001008b4b48 libpython3.12.dylib`PyGC_Collect + 160
    frame #5: 0x0000000100887ecc libpython3.12.dylib`Py_FinalizeEx + 144
    frame #6: 0x00000001008b3994 libpython3.12.dylib`Py_RunMain + 256
    frame #7: 0x00000001008b43c4 libpython3.12.dylib`pymain_main + 328
    frame #8: 0x00000001008b4464 libpython3.12.dylib`Py_BytesMain + 40
    frame #9: 0x0000000187737f28 dyld`start + 2236

@JelleZijlstra JelleZijlstra added the 3.12 bugs and security fixes label Oct 19, 2023
@JelleZijlstra
Copy link
Member

On main I instead get

HTTPConnectionPool(host='127.0.0.1', port=8200): Max retries exceeded with url: /intake/v2/events (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x105df47e0>: Failed to establish a new connection: [Errno 61] Connection refused'))
Exception ignored in: <_io.BytesIO object at 0x10559e350>
BufferError: Existing exports of data: object cannot be re-sized

@basepi
Copy link
Author

basepi commented Oct 19, 2023

BufferError: Existing exports of data: object cannot be re-sized

I get this error on 3.12.0 if I try to grab the fileobj buffer before closing the gzip buffer:

def test():
    buffer = gzip.GzipFile(fileobj=io.BytesIO(), mode="w", compresslevel=5)
    data = buffer.fileobj.getbuffer()
    buffer.close()
    headers = {}
    try:
        urllib3.request("POST", "http://127.0.0.1:8200/intake/v2/events")
    except Exception as e:
        print(e)
Traceback (most recent call last):
  File "/Users/basepi/src/cpython_gc/test2.py", line 19, in <module>
    test()
  File "/Users/basepi/src/cpython_gc/test2.py", line 12, in test
    buffer.close()
  File "/Users/basepi/.pyenv/versions/3.12.0/lib/python3.12/gzip.py", line 357, in close
    fileobj.write(self.compress.flush())
BufferError: Existing exports of data: object cannot be re-sized

@chgnrdv
Copy link
Contributor

chgnrdv commented Oct 21, 2023

Minimized repro:

import io

def test():
    f = io.BytesIO()
    buf = f.getbuffer()
    try:
        raise Exception()
    except Exception as e:
        x = e

test()

At finalization, internal buffer of f object gets cleared by bytesiobuf_clear() call, making source field of this buffer being assigned to NULL.

(gdb) r repro.py 
Starting program: /home/radislav/projects/cpython/python repro.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Exception ignored in: <_io.BytesIO object at 0x7ffff77a5940>
BufferError: Existing exports of data: object cannot be re-sized

Breakpoint 1, bytesiobuf_clear (self=0x7ffff777e750) at ./Modules/_io/bytesio.c:1097
1097	{
(gdb) s
1098	    Py_CLEAR(self->source);
(gdb) 

Then Python tries to clear buf object which holds the same buffer as f, releases this buffer by calling bytesiobuf_releasebuffer and dereferences NULL pointer.

Breakpoint 2, mbuf_clear (self=0x7ffff78d4f50) at Objects/memoryobject.c:139
...
(gdb) n
141	    mbuf_release(self);
(gdb) s
mbuf_release (self=0x7ffff78d4f50) at Objects/memoryobject.c:108
...
117	    PyBuffer_Release(&self->master);
(gdb) s
PyBuffer_Release (view=0x7ffff78d4f70) at Objects/abstract.c:748
...
754	        pb->bf_releasebuffer(obj, view);
(gdb) s
bytesiobuf_releasebuffer (obj=0x7ffff777e750, view=0x7ffff78d4f70) at ./Modules/_io/bytesio.c:1091
1091	    bytesio *b = (bytesio *) obj->source;
(gdb) s
1092	    b->exports--;
(gdb) s

Program received signal SIGSEGV, Segmentation fault.

Still not sure what it has to do with this try...except stuff.

@chgnrdv
Copy link
Contributor

chgnrdv commented Oct 22, 2023

I just checked and I can reproduce this issue on main branch (8c689c9) with minimal repro.
Backtrace is the same:

Exception ignored in: <_io.BytesIO object at 0x7ffff77e2190>
BufferError: Existing exports of data: object cannot be re-sized

Program received signal SIGSEGV, Segmentation fault.
bytesiobuf_releasebuffer (obj=0x7ffff77c6110, view=0x7ffff78d5d30) at ./Modules/_io/bytesio.c:1094
1094	    b->exports--;
(gdb) bt
#0  bytesiobuf_releasebuffer (obj=0x7ffff77c6110, view=0x7ffff78d5d30) at ./Modules/_io/bytesio.c:1094
#1  0x000055555564fe31 in PyBuffer_Release (view=0x7ffff78d5d30) at Objects/abstract.c:804
#2  0x00005555556e179b in mbuf_release (self=<optimized out>) at Objects/memoryobject.c:118
#3  0x00005555556e17d4 in mbuf_clear (self=<optimized out>) at Objects/memoryobject.c:142
#4  0x000055555589fce9 in delete_garbage (tstate=tstate@entry=0x555555c03938 <_PyRuntime+509240>, gcstate=gcstate@entry=0x555555b9e2a8 <_PyRuntime+93864>, 
    collectable=collectable@entry=0x7fffffffdfd0, old=old@entry=0x555555b9e2f0 <_PyRuntime+93936>) at Modules/gcmodule.c:1033
#5  0x00005555558a0244 in gc_collect_main (tstate=tstate@entry=0x555555c03938 <_PyRuntime+509240>, generation=generation@entry=2, 
    n_collected=n_collected@entry=0x7fffffffe048, n_uncollectable=n_uncollectable@entry=0x7fffffffe040, nofail=nofail@entry=0) at Modules/gcmodule.c:1313
#6  0x00005555558a1097 in gc_collect_with_callback (tstate=tstate@entry=0x555555c03938 <_PyRuntime+509240>, generation=generation@entry=2) at Modules/gcmodule.c:1445
#7  0x00005555558a1817 in PyGC_Collect () at Modules/gcmodule.c:2130
#8  0x000055555586ab23 in Py_FinalizeEx () at Python/pylifecycle.c:1920
#9  0x000055555589e6ca in Py_RunMain () at Modules/main.c:709
#10 0x000055555589e719 in pymain_main (args=args@entry=0x7fffffffe130) at Modules/main.c:737
#11 0x000055555589e78e in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:761
#12 0x00005555555d077e in main (argc=<optimized out>, argv=<optimized out>) at ./Programs/python.c:15

@serhiy-storchaka
Copy link
Member

Other example:

import gc, io
memio = io.BytesIO(b"1234567890")
buf = memio.getbuffer()
a = [buf]
a.append(a)
del memio, buf
del a
gc.collect()

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 23, 2023
@serhiy-storchaka serhiy-storchaka added topic-IO 3.13 bugs and security fixes labels Oct 23, 2023
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Dec 14, 2023
…uffer object (pythonGH-111221)

(cherry picked from commit bb36f72)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this issue Dec 14, 2023
…buffer object (GH-111221) (GH-113096)

(cherry picked from commit bb36f72)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
corona10 pushed a commit to corona10/cpython that referenced this issue Dec 15, 2023
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes topic-IO type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

4 participants