Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyThreadState_Delete: invalid tstate #33586

Closed
gvanrossum opened this issue Dec 13, 2000 · 6 comments
Closed

PyThreadState_Delete: invalid tstate #33586

gvanrossum opened this issue Dec 13, 2000 · 6 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@gvanrossum
Copy link
Member

BPO 225673
Nosy @gvanrossum, @tim-one

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/gvanrossum'
closed_at = <Date 2001-01-23.01:48:56.000>
created_at = <Date 2000-12-13.16:25:14.000>
labels = ['interpreter-core']
title = 'PyThreadState_Delete: invalid tstate'
updated_at = <Date 2001-01-23.01:48:56.000>
user = 'https://github.com/gvanrossum'

bugs.python.org fields:

activity = <Date 2001-01-23.01:48:56.000>
actor = 'gvanrossum'
assignee = 'gvanrossum'
closed = True
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2000-12-13.16:25:14.000>
creator = 'gvanrossum'
dependencies = []
files = []
hgrepos = []
issue_num = 225673
keywords = []
message_count = 6.0
messages = ['2651', '2652', '2653', '2654', '2655', '2656']
nosy_count = 2.0
nosy_names = ['gvanrossum', 'tim.peters']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue225673'
versions = []

@gvanrossum
Copy link
Member Author

I am working on a simple couroutine/generator package using threads,
to prototype the API. It seems to be working fine, except it is
exposing a hard-to-find bug in the threadstate code. The following
script[] contains the API implementation and a simple example based on
Tim's "fringe()" code. When I run the example, I *sometimes
get:

Segmentation fault

but *sometimes* I get:

Fatal Python error: PyThreadState_Delete: invalid tstate
Aborted

and *sometimes* it succeeds. If I uncomment the raw_input("Exit?")
line at the end I never get an error. The error behavior seems very
fickle: making almost arbitrary changes to the code can trigger it or
make it go away. When I run it under gdb, I cannot reproduce the
problen, ever. (Haven't I heard this before?)

The only clue is the fatal error message: it seems to be a race
condition at thread termination. But how to debug this?


[*] I'm not including the script here.
I can mail it to interested parties though. For my own reference:
Subject: [Pycabal] Mysterious thread bug
To: <cabal>
Date: Thu, 16 Nov 2000 16:21:12 -0500

@gvanrossum gvanrossum self-assigned this Dec 13, 2000
@gvanrossum gvanrossum added interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Dec 13, 2000
@tim-one
Copy link
Member

tim-one commented Dec 13, 2000

I was never able to provoke a problem on Windows using Guido's script, so changed Group to Platform-specific and added "(Linux only?)" to Summary. Here's the script; assigned to Greg under the hope he can provoke a problem:

import thread

class EarlyExit(Exception):
    pass

class main_coroutine:

    def __init__(self):
        self.id = 0
        self.caller = None
        self.value = None
        self.lock = thread.allocate_lock()
        self.lock.acquire()
        self.done = 0

    def __call__(self, value=None):
        cur = current()
        assert cur is not self
        self.caller = cur
        self.value = value
        self.lock.release()
        cur.lock.acquire()
        if self.done:
            raise EarlyExit
        return cur.value

all_coroutines = {thread.get_ident(): main_coroutine()}

def current():
    return all_coroutines[thread.get_ident()]

def suspend(value=None):
    cur = current()
    caller = cur.caller
    assert caller and caller is not cur
    caller.value = value
    caller.lock.release()
    cur.lock.acquire()
    return cur.value

nextid = 1

class coroutine(main_coroutine):

    def __init__(self, func, *args):
        global nextid
        self.id = nextid
        nextid = nextid + 1
        self.caller = current()
        boot = thread.allocate_lock()
        boot.acquire()
        thread.start_new_thread(self.run, (boot, func, args))
        boot.acquire()

    def run(self, boot, func, args):
        me = thread.get_ident()
        all_coroutines[me] = self
        self.lock = thread.allocate_lock()
        self.lock.acquire()
        self.done = 0
        boot.release()
        self.lock.acquire()
        if self.value:
            print "Warning: initial value %s ignored" % `value`
        try:
            apply(func, args)
        finally:
            del all_coroutines[me]
            self.done = 1
            self.caller.lock.release()

def fringe(list):
    tl = type(list)
    for item in list:
        if type(item) is tl:
            fringe(item)
        else:
            suspend(item)

def printinorder(list):
    c = coroutine(fringe, list)
    try:
        while 1:
            print c(),
    except EarlyExit:
        pass
    print

if __name__ == '__main__':
    printinorder([1,2,3])
    l = [1,2,[3,4,[5],6]]
    printinorder(l)
    #raw_input("Exit?")

@tim-one
Copy link
Member

tim-one commented Jan 21, 2001

Guido, since we just fixed *a* thread termination problem (premature clearing of "initialized" in pythonrun.c), and I've never seen this fail on Windows, reassigning to you to see whether-- however unlikely it may seem --this problem has gone away by magic now.

@tim-one
Copy link
Member

tim-one commented Jan 21, 2001

Let's pretend:

threadmodule.c/t_bootstrap finishes running the user's spawned-thread code, PyEval_ReleaseThread(tstate) releases the global lock, then PyThreadState_Delete(tstate) is called to unlink tstate from the tstate->interp->tstate_head chain.

But the thread swaps out at that point, and the main thread resumes executing. It's got nothing left to do, so it gets into pythonrun.c/Py_Finalize() quickly, and soon enough calls PyInterpreterState_Delete(interp) from there. That calls zapthreads(). zapthreads calls PyThreadState_Delete(ts) on *every* threadstate ts in the interp->tstate_head chain.

If you're with me so far, the other thread still hasn't called PyThreadState_Delete on *its* threadstate, and the comment in zapthreads is semi-prophetic <wink>:

/* No need to lock the mutex here because this should
only happen when the threads are all really dead
(XXX famous last words). */

But the problem is not that the mutex isn't locked, it's that the other thread is still going to try deleting its tstate again *later* (the precise cause of an "invalid tstate" error msg): other threads aren't really dead, and AFAICT there's actually no reason to believe they *should* be dead at this point (other than luck).

Anyway, if this is right, we have two threads battling over who's going to delete a single tstate, and if the main thread gets in first, the other thread is certain to raise an error (except that, at least on Windows, if the main thread manages to exit before the other thread gets that far, the other thread will be killed off quietly in mid-stream by the OS; since Linux threads seem to be indistinguishable from Linux processes, I bet they run some pthreads emulation layer in user space that *may* take a fair amount of time to kill off child threads when the parent goes away).

Waddya think? Explains everything and solves nothing <wink>.

@gvanrossum
Copy link
Member Author

A likely story indeed! With the current CVS, the test script fails every time. If I comment out the call to PyThreadState_Delete() from t_bootstrap, it runs fine every time.

Suggestion: change PyThreadState_Delete() so that it can be called with the interpreter lock *held*, and in that case it should "atomically" delete the tstate object and release the lock.

I'll see if I can come up with a patch.

@gvanrossum
Copy link
Member Author

Fixed by introducing a new API: PyThreadState_DeleteCurrent().

AFAICT this is not Unix-only, but whether the behavior triggers depends on details of the thread implementation, so I cleared the platform-specific group and removed "(unix only?)" from subject.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)
Projects
None yet
Development

No branches or pull requests

2 participants