Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shim frames can be accessed during frame teardown (via tstate->cframe->current_frame) #100126

Closed
jaraco opened this issue Dec 9, 2022 · 14 comments
Assignees
Labels
3.12 bugs and security fixes release-blocker type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@jaraco
Copy link
Member

jaraco commented Dec 9, 2022

Crash report

In jaraco/jaraco.net#5, I've captured a failure that's emerged as Github updated Python 3.12 from a2 to a3 on Linux. The calling code is using ctypes to call libc functions.

I don't yet have a minimal reproducer. I'm unable to replicate the issue locally as I don't yet have Linux with a3.

  • Operating system and architecture: Ubuntu Linux

Linked PRs

@jaraco jaraco added the type-crash A hard crash of the interpreter, possibly with a core dump label Dec 9, 2022
@illia-v
Copy link
Contributor

illia-v commented Dec 17, 2022

Tests of urllib3 detected a segmentation fault on 3.12.0a3 too.
https://github.com/urllib3/urllib3/actions/runs/3720502510/jobs/6310016874

Fatal Python error: Segmentation fault on Linux and Windows fatal exception: access violation on Windows happened in one test function.

The function succeeded on macOS, but Fatal Python error: Bus error happened in a different one.

@illia-v
Copy link
Contributor

illia-v commented Dec 22, 2022

b72014c is the first commit where the segfault in urllib3's tests happens.

@AlexWaygood
Copy link
Member

b72014c is the first commit where the segfault in urllib3's tests happens.

Cc. @brandtbucher

@brandtbucher
Copy link
Member

I'll look into this too. No promises that I'll have it figured out before the holiday weekend, but it'll be my main priority starting next Tuesday.

Not sure if #99110 is related... hard to tell without a reproducer.

@illia-v
Copy link
Contributor

illia-v commented Dec 26, 2022

I'll look into this too. No promises that I'll have it figured out before the holiday weekend, but it'll be my main priority starting next Tuesday.

Thanks!

Not sure if #99110 is related... hard to tell without a reproducer.

It still happens after PR of #99110 is merged.

@brandtbucher
Copy link
Member

brandtbucher commented Dec 28, 2022

Here is a reproducer with only pytest:

# pytest crasher.py

import socket

def test_foo():
    _ = socket.socket()

Indeed, this looks eerily similar to #99729.

@brandtbucher
Copy link
Member

Found the issue. PyEval_GetGlobals returns tstate->cframe->current_frame->f_globals. When tearing a frame down, current_frame may be a shim frame. Shim frames have no globals, so the code crashes.

@brandtbucher
Copy link
Member

Looks like in general, it is not safe for current_frame to be a shim frame. Too many things can break.

@brandtbucher
Copy link
Member

brandtbucher commented Dec 28, 2022

Minimal reproducer, no pytest:

# python crasher.py

import operator

class DelRaises:
    def __del__(self):
        assert False

def f():
    local = DelRaises()

# Call from C, so there is a shim frame above f:
operator.call(f)

@brandtbucher brandtbucher added the 3.12 bugs and security fixes label Dec 29, 2022
@brandtbucher brandtbucher changed the title Regression in 3.12a3 - segfault in interpreter trampoline Shim frames can be accessed during frame teardown (via tstate->cframe->current_frame) Dec 29, 2022
@brandtbucher
Copy link
Member

I'm honestly not sure what to do here.

Perhaps, before tearing down a frame, if our previous frame is a shim frame, we should set tstate->cframe to be cframe.previous. But that's adding code to a perf-sensitive path, and it also makes the interaction with INTERPRETER_EXIT awkward, since it tries to do the same thing.

The best option is probably just to add more while (frame && _PyFrame_IsIncomplete(frame)) {frame = frame->previous;} loops to all of the affected areas. It probably makes sense to just add a helper function for this (_PyThreadState_GetInterpreterFrame?), rather than continuing to sprinkle them everywhere.

Thoughts, @markshannon? (I don't think you're out this week... feel free to ignore if so.)

@brandtbucher
Copy link
Member

Also going to tag as release-blocker (for now).

@Yhg1s
Copy link
Member

Yhg1s commented Jan 9, 2023

@brandtbucher what's the state of this issue (and the attached fix for it)? Should this hold up 3.12.0a4?

@brandtbucher
Copy link
Member

Sorry, just got back from vacation and thought it was merged while I was gone. I'm in a meeting right now, but I have one small review comment to apply, then I'll merge.

If it's not too much trouble, I'd like this to make it into the next release. It's not too hard to trigger using frameworks like pytest, so it would be nice to have non-crashy test coverage of the next alpha out in the wild.

@brandtbucher
Copy link
Member

@Yhg1s, this is done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes release-blocker type-crash A hard crash of the interpreter, possibly with a core dump
Projects
Development

No branches or pull requests

5 participants