PyThreadState_Delete: invalid tstate #33586

gvanrossum · 2000-12-13T16:25:14Z

BPO	225673
Nosy	@gvanrossum, @tim-one

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/gvanrossum'
closed_at = <Date 2001-01-23.01:48:56.000>
created_at = <Date 2000-12-13.16:25:14.000>
labels = ['interpreter-core']
title = 'PyThreadState_Delete: invalid tstate'
updated_at = <Date 2001-01-23.01:48:56.000>
user = 'https://github.com/gvanrossum'

bugs.python.org fields:

activity = <Date 2001-01-23.01:48:56.000>
actor = 'gvanrossum'
assignee = 'gvanrossum'
closed = True
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2000-12-13.16:25:14.000>
creator = 'gvanrossum'
dependencies = []
files = []
hgrepos = []
issue_num = 225673
keywords = []
message_count = 6.0
messages = ['2651', '2652', '2653', '2654', '2655', '2656']
nosy_count = 2.0
nosy_names = ['gvanrossum', 'tim.peters']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue225673'
versions = []

gvanrossum · 2000-12-13T16:25:14Z

I am working on a simple couroutine/generator package using threads,
to prototype the API. It seems to be working fine, except it is
exposing a hard-to-find bug in the threadstate code. The following
script[] contains the API implementation and a simple example based on
Tim's "fringe()" code. When I run the example, I *sometimes get:

Segmentation fault

but *sometimes* I get:

Fatal Python error: PyThreadState_Delete: invalid tstate
Aborted

and *sometimes* it succeeds. If I uncomment the raw_input("Exit?")
line at the end I never get an error. The error behavior seems very
fickle: making almost arbitrary changes to the code can trigger it or
make it go away. When I run it under gdb, I cannot reproduce the
problen, ever. (Haven't I heard this before?)

The only clue is the fatal error message: it seems to be a race
condition at thread termination. But how to debug this?

[*] I'm not including the script here.
I can mail it to interested parties though. For my own reference:
Subject: [Pycabal] Mysterious thread bug
To: <cabal>
Date: Thu, 16 Nov 2000 16:21:12 -0500

tim-one · 2000-12-13T16:51:07Z

I was never able to provoke a problem on Windows using Guido's script, so changed Group to Platform-specific and added "(Linux only?)" to Summary. Here's the script; assigned to Greg under the hope he can provoke a problem:

import thread

class EarlyExit(Exception):
    pass

class main_coroutine:

    def __init__(self):
        self.id = 0
        self.caller = None
        self.value = None
        self.lock = thread.allocate_lock()
        self.lock.acquire()
        self.done = 0

    def __call__(self, value=None):
        cur = current()
        assert cur is not self
        self.caller = cur
        self.value = value
        self.lock.release()
        cur.lock.acquire()
        if self.done:
            raise EarlyExit
        return cur.value

all_coroutines = {thread.get_ident(): main_coroutine()}

def current():
    return all_coroutines[thread.get_ident()]

def suspend(value=None):
    cur = current()
    caller = cur.caller
    assert caller and caller is not cur
    caller.value = value
    caller.lock.release()
    cur.lock.acquire()
    return cur.value

nextid = 1

class coroutine(main_coroutine):

    def __init__(self, func, *args):
        global nextid
        self.id = nextid
        nextid = nextid + 1
        self.caller = current()
        boot = thread.allocate_lock()
        boot.acquire()
        thread.start_new_thread(self.run, (boot, func, args))
        boot.acquire()

    def run(self, boot, func, args):
        me = thread.get_ident()
        all_coroutines[me] = self
        self.lock = thread.allocate_lock()
        self.lock.acquire()
        self.done = 0
        boot.release()
        self.lock.acquire()
        if self.value:
            print "Warning: initial value %s ignored" % `value`
        try:
            apply(func, args)
        finally:
            del all_coroutines[me]
            self.done = 1
            self.caller.lock.release()

def fringe(list):
    tl = type(list)
    for item in list:
        if type(item) is tl:
            fringe(item)
        else:
            suspend(item)

def printinorder(list):
    c = coroutine(fringe, list)
    try:
        while 1:
            print c(),
    except EarlyExit:
        pass
    print

if __name__ == '__main__':
    printinorder([1,2,3])
    l = [1,2,[3,4,[5],6]]
    printinorder(l)
    #raw_input("Exit?")

tim-one · 2001-01-21T07:20:11Z

Guido, since we just fixed *a* thread termination problem (premature clearing of "initialized" in pythonrun.c), and I've never seen this fail on Windows, reassigning to you to see whether-- however unlikely it may seem --this problem has gone away by magic now.

tim-one · 2001-01-21T09:26:30Z

Let's pretend:

threadmodule.c/t_bootstrap finishes running the user's spawned-thread code, PyEval_ReleaseThread(tstate) releases the global lock, then PyThreadState_Delete(tstate) is called to unlink tstate from the tstate->interp->tstate_head chain.

But the thread swaps out at that point, and the main thread resumes executing. It's got nothing left to do, so it gets into pythonrun.c/Py_Finalize() quickly, and soon enough calls PyInterpreterState_Delete(interp) from there. That calls zapthreads(). zapthreads calls PyThreadState_Delete(ts) on *every* threadstate ts in the interp->tstate_head chain.

If you're with me so far, the other thread still hasn't called PyThreadState_Delete on *its* threadstate, and the comment in zapthreads is semi-prophetic <wink>:

/* No need to lock the mutex here because this should
only happen when the threads are all really dead
(XXX famous last words). */

But the problem is not that the mutex isn't locked, it's that the other thread is still going to try deleting its tstate again *later* (the precise cause of an "invalid tstate" error msg): other threads aren't really dead, and AFAICT there's actually no reason to believe they *should* be dead at this point (other than luck).

Anyway, if this is right, we have two threads battling over who's going to delete a single tstate, and if the main thread gets in first, the other thread is certain to raise an error (except that, at least on Windows, if the main thread manages to exit before the other thread gets that far, the other thread will be killed off quietly in mid-stream by the OS; since Linux threads seem to be indistinguishable from Linux processes, I bet they run some pthreads emulation layer in user space that *may* take a fair amount of time to kill off child threads when the parent goes away).

Waddya think? Explains everything and solves nothing <wink>.

gvanrossum · 2001-01-23T00:39:06Z

A likely story indeed! With the current CVS, the test script fails every time. If I comment out the call to PyThreadState_Delete() from t_bootstrap, it runs fine every time.

Suggestion: change PyThreadState_Delete() so that it can be called with the interpreter lock *held*, and in that case it should "atomically" delete the tstate object and release the lock.

I'll see if I can come up with a patch.

gvanrossum · 2001-01-23T01:48:56Z

Fixed by introducing a new API: PyThreadState_DeleteCurrent().

AFAICT this is not Unix-only, but whether the behavior triggers depends on details of the thread implementation, so I cleared the platform-specific group and removed "(unix only?)" from subject.

gvanrossum closed this as completed Dec 13, 2000

gvanrossum self-assigned this Dec 13, 2000

gvanrossum added interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Dec 13, 2000

ezio-melotti transferred this issue from another repository Apr 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyThreadState_Delete: invalid tstate #33586

PyThreadState_Delete: invalid tstate #33586

gvanrossum commented Dec 13, 2000

gvanrossum commented Dec 13, 2000

tim-one commented Dec 13, 2000

tim-one commented Jan 21, 2001

tim-one commented Jan 21, 2001

gvanrossum commented Jan 23, 2001

gvanrossum commented Jan 23, 2001

PyThreadState_Delete: invalid tstate #33586

PyThreadState_Delete: invalid tstate #33586

Comments

gvanrossum commented Dec 13, 2000

gvanrossum commented Dec 13, 2000

tim-one commented Dec 13, 2000

tim-one commented Jan 21, 2001

tim-one commented Jan 21, 2001

gvanrossum commented Jan 23, 2001

gvanrossum commented Jan 23, 2001