-
-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maint: Fix sporadic QtDims garbage collection failures by converting some stray references to weakrefs #5471
Conversation
I'm not totally sure this is correct, like does the thread always stop once the worker stops ? |
Codecov Report
@@ Coverage Diff @@
## main #5471 +/- ##
==========================================
- Coverage 89.31% 89.30% -0.02%
==========================================
Files 600 600
Lines 50964 51031 +67
==========================================
+ Hits 45518 45571 +53
- Misses 5446 5460 +14
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Oh, btw, before merge, could we rename the PR to something more concrete, e.g., "Fix sporadic QtDims garbage collection failures by converting some stray references to weakrefs"? And try to capture some of the broader narrative in the commit message — @andy-sweet pointed out that longer commit messages (e.g. copying the whole description into the commit message) make it much nicer to investigate issues later, sometimes saving a few round trips between the terminal and the browser. |
See superqt's So the answer to your question is yes, but it might take a little time depending on how busy the main event loop is at the time. |
def __init__(self, dims, axis, reverse=False, fps=10, mode=LoopMode.LOOP): | ||
def __init__( | ||
self, qt_dims, axis, reverse=False, fps=10, mode=LoopMode.LOOP | ||
): | ||
super().__init__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this doesn't have qt_dims
as a parent. And I wonder if this might be an alternative fix to holding a weak reference to QtDims
.
If I understand the QObject lifetime model correctly, I think that if the child object is still alive, then its parent must also be (i.e. the parent effectively owns the child and is controls its lifetime/destruction). So it would not be possible for _on_click
to be executed with an invalid reference to QtDims
.
More generally, I wonder how many of our object lifetime issues are caused by relying on Python references to Qt objects, rather than relying on the QObject hierarchy. Though that discussion is far beyond the scope of this PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I guess when we add the QtPlayButton
to the layout, then reparenting will occur. So this doesn't seem like a viable alternative fix. Though I still can't explain why we need the weak reference (i.e. why would self.qt_dims
be something we can't dereference in self._on_click
, since it should be the grandparent of this widget).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's magic for me as well, but usually I found that weakreferences, even if not strictly necessary seem to help to have at least consistant crashes, or make it clearer when doing refactor who is the owner, and simplify reference graph when using objgraph.
Something I haven't done there is also use wekref's callbacks to disconnect things but I don't think it's strictly necessary here.
I tend to prefer to know reliably I'm lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't read your PR description properly, but after reading this message and thinking a bit more, the weak reference is justifiable (regardless of any bugs it does or doesn't cause) because it breaks a cyclic reference between Python objects (specifically, QtDims
and QtPlayButton
with QtDimSliderWidget
in the middle of the cycle). Might be good to add that as an explanatory implementation comment to the code to help the next lost soul.
In general, my guess is there's only so much pyqt/pyside can do when it comes to keeping Python and QObject lifetimes consistent/safe. Would be great if anyone knows of a resource that explains how to use those more safely (other than reading through the code itself).
Anyway, I'm pretty close to approving this. I just need to think through the dropped reference to QtDimSliderWidget.thread
.
I'm not actually sure it fixes things, and I'm not totally sure I understand what is happening. But Yes.
I've started to skip doing long messages for napari because of the squash/rebase, and relied on the PR description. So it's up to you, but the "squash" and "investigate issues later" IMHO contradict each others.
Yes, that was my understanding (and how I came to this fix), but me thread two makeing are and condition race understand very well not. |
…some stray references to weakrefs As far as I can tell this is due to issue with garbage collection issue between the worker and thread. Mostly between the garbage collection and the worker cleanup if the last reference to the thread is set to None and collected while the thread is still active it causes a crash. This has two side: 1) changes to make this crash more reliably, by removing unused reference to the thread, and using weakrefs to make sure we free up things faster. Freeing things faster mean we crash more reliably. 2) actually fix the cleanup to be done after thread termination and not worker termination. Due to 1) we are more confident this fixes things. In particular, the animation thread, has a worker that refers to the dims via a callback/slot. We also remove the local reference to thread, to make sure there is not a standing, unused reference to thread, thus self._animation_thread will be the last standing reference. That is to say we currently have: - worker stop - delete last thread reference (if GC, crash) - Thread stops. Now we have - worker stop - Thread stops. - delete last thread (and worker) reference
I remove the unrelated typing changes and rephrased the commit. |
That's what I mean: the PR title is what by default gets put into the (final, squashed) commit message, so it should be properly descriptive. The description does not get copied by default into the squashed commit but we should copy it manually on merging the PR. Sorry that I wasn't more explicit here: when I said, "capture some of the broader narrative into the commit message", I meant "the PR squash-rebase commit message", and by "we" I meant, whoever pushes the merge button. Ironically @Carreau it was one of your commits that gave rise to this discussion — it had a wonderful long description, and it could be seen in the final squashed history because it was the only commit in that PR. In that case, GitHub copies the full commit message into the squashed commit message. |
Yeah, imho this is a GH ui issue, I think that each GH PR should have a shared editable area that will reflect the final merge/squash message. Maybe in the diff view ? So that this information is also part of the Review Process. |
Looks like this might be configurable, so that the default squashed merge commit message is the PR title + description? https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/configuring-commit-squashing-for-pull-requests That way the shared UI in GitHub is the PR title and description, and we don't have to rely on the merger to remember to copy the text from there (which I've been doing occasionally). They may still have to remind the author to keep the title/description up-to-date though, as here. Unfortunately, I don't have access to napari/napari settings - I'm assuming you do @jni? |
Life changing discovery @andy-sweet!!! ❤️🔥 I have changed it now to default to copying title and description. 🚀 Not quite as good as @Carreau's suggestion of having the commit message also being subject to review, but much much better than what we've had until now. @Carreau title here is still lackluster... ;) |
I've update the title and first message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge after 24 hours unless there are any other objections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small style remarks. I think that I understand how it works, but I'm not 100% sure
Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
We are in the same boat.
It's one problem with flaky test you can never be sure. One thing I can suggest is to set the
|
My observation is that sometimes, because of network/pypi problems 10 minutes buffer is required, but we still could use 40 minutes cap instead of six hours. |
Opened task to do so as #5488 |
Sure. But this PR offers one explanation of the aborting test (by attempting to dereference a C++ object that has been deleted) and proposes a fix for it. By contrast, I don't see how it could explain a hanging test - would be happy if it does though, especially with an explanation! |
…some stray references to weakrefs (#5471) As far as I can tell this is due to issue with garbage collection issue between the worker and thread. Mostly between the garbage collection and the worker cleanup if the last reference to the thread is set to None and collected while the thread is still active it causes a crash. This has two side: 1) changes to make this crash more reliably, by removing unused reference to the thread, and using weakrefs to make sure we free up things faster. Freeing things faster mean we crash more reliably. 2) actually fix the cleanup to be done after thread termination and not worker termination. Due to 1) we are more confident this fixes things. In particular, the animation thread, has a worker that refers to the dims via a callback/slot. We also remove the local reference to thread, to make sure there is not a standing, unused reference to thread, thus self._animation_thread will be the last standing reference. That is to say we currently have: - worker stop - delete last thread reference (if GC, crash) - Thread stops. Now we have - worker stop - Thread stops. - delete last thread (and worker) reference This should fix some issues with thread/garbage collection/cycle (#5443) Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
…some stray references to weakrefs (#5471) As far as I can tell this is due to issue with garbage collection issue between the worker and thread. Mostly between the garbage collection and the worker cleanup if the last reference to the thread is set to None and collected while the thread is still active it causes a crash. This has two side: 1) changes to make this crash more reliably, by removing unused reference to the thread, and using weakrefs to make sure we free up things faster. Freeing things faster mean we crash more reliably. 2) actually fix the cleanup to be done after thread termination and not worker termination. Due to 1) we are more confident this fixes things. In particular, the animation thread, has a worker that refers to the dims via a callback/slot. We also remove the local reference to thread, to make sure there is not a standing, unused reference to thread, thus self._animation_thread will be the last standing reference. That is to say we currently have: - worker stop - delete last thread reference (if GC, crash) - Thread stops. Now we have - worker stop - Thread stops. - delete last thread (and worker) reference This should fix some issues with thread/garbage collection/cycle (#5443) Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
…some stray references to weakrefs (#5471) As far as I can tell this is due to issue with garbage collection issue between the worker and thread. Mostly between the garbage collection and the worker cleanup if the last reference to the thread is set to None and collected while the thread is still active it causes a crash. This has two side: 1) changes to make this crash more reliably, by removing unused reference to the thread, and using weakrefs to make sure we free up things faster. Freeing things faster mean we crash more reliably. 2) actually fix the cleanup to be done after thread termination and not worker termination. Due to 1) we are more confident this fixes things. In particular, the animation thread, has a worker that refers to the dims via a callback/slot. We also remove the local reference to thread, to make sure there is not a standing, unused reference to thread, thus self._animation_thread will be the last standing reference. That is to say we currently have: - worker stop - delete last thread reference (if GC, crash) - Thread stops. Now we have - worker stop - Thread stops. - delete last thread (and worker) reference This should fix some issues with thread/garbage collection/cycle (#5443) Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
…some stray references to weakrefs (#5471) As far as I can tell this is due to issue with garbage collection issue between the worker and thread. Mostly between the garbage collection and the worker cleanup if the last reference to the thread is set to None and collected while the thread is still active it causes a crash. This has two side: 1) changes to make this crash more reliably, by removing unused reference to the thread, and using weakrefs to make sure we free up things faster. Freeing things faster mean we crash more reliably. 2) actually fix the cleanup to be done after thread termination and not worker termination. Due to 1) we are more confident this fixes things. In particular, the animation thread, has a worker that refers to the dims via a callback/slot. We also remove the local reference to thread, to make sure there is not a standing, unused reference to thread, thus self._animation_thread will be the last standing reference. That is to say we currently have: - worker stop - delete last thread reference (if GC, crash) - Thread stops. Now we have - worker stop - Thread stops. - delete last thread (and worker) reference This should fix some issues with thread/garbage collection/cycle (#5443) Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
…some stray references to weakrefs (#5471) As far as I can tell this is due to issue with garbage collection issue between the worker and thread. Mostly between the garbage collection and the worker cleanup if the last reference to the thread is set to None and collected while the thread is still active it causes a crash. This has two side: 1) changes to make this crash more reliably, by removing unused reference to the thread, and using weakrefs to make sure we free up things faster. Freeing things faster mean we crash more reliably. 2) actually fix the cleanup to be done after thread termination and not worker termination. Due to 1) we are more confident this fixes things. In particular, the animation thread, has a worker that refers to the dims via a callback/slot. We also remove the local reference to thread, to make sure there is not a standing, unused reference to thread, thus self._animation_thread will be the last standing reference. That is to say we currently have: - worker stop - delete last thread reference (if GC, crash) - Thread stops. Now we have - worker stop - Thread stops. - delete last thread (and worker) reference This should fix some issues with thread/garbage collection/cycle (#5443) Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
…some stray references to weakrefs (#5471) As far as I can tell this is due to issue with garbage collection issue between the worker and thread. Mostly between the garbage collection and the worker cleanup if the last reference to the thread is set to None and collected while the thread is still active it causes a crash. This has two side: 1) changes to make this crash more reliably, by removing unused reference to the thread, and using weakrefs to make sure we free up things faster. Freeing things faster mean we crash more reliably. 2) actually fix the cleanup to be done after thread termination and not worker termination. Due to 1) we are more confident this fixes things. In particular, the animation thread, has a worker that refers to the dims via a callback/slot. We also remove the local reference to thread, to make sure there is not a standing, unused reference to thread, thus self._animation_thread will be the last standing reference. That is to say we currently have: - worker stop - delete last thread reference (if GC, crash) - Thread stops. Now we have - worker stop - Thread stops. - delete last thread (and worker) reference This should fix some issues with thread/garbage collection/cycle (#5443) Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
…some stray references to weakrefs (#5471) As far as I can tell this is due to issue with garbage collection issue between the worker and thread. Mostly between the garbage collection and the worker cleanup if the last reference to the thread is set to None and collected while the thread is still active it causes a crash. This has two side: 1) changes to make this crash more reliably, by removing unused reference to the thread, and using weakrefs to make sure we free up things faster. Freeing things faster mean we crash more reliably. 2) actually fix the cleanup to be done after thread termination and not worker termination. Due to 1) we are more confident this fixes things. In particular, the animation thread, has a worker that refers to the dims via a callback/slot. We also remove the local reference to thread, to make sure there is not a standing, unused reference to thread, thus self._animation_thread will be the last standing reference. That is to say we currently have: - worker stop - delete last thread reference (if GC, crash) - Thread stops. Now we have - worker stop - Thread stops. - delete last thread (and worker) reference This should fix some issues with thread/garbage collection/cycle (#5443) Co-authored-by: Grzegorz Bokota <bokota+github@gmail.com>
As far as I can tell this is due to issue with garbage collection issue
between the worker and thread. Mostly between the garbage collection and
the worker cleanup if the last reference to the thread is set to None
and collected while the thread is still active it causes a crash.
This has two side:
reference to the thread, and using weakrefs to make sure we free up
things faster. Freeing things faster mean we crash more reliably.
worker termination. Due to 1) we are more confident this fixes
things.
In particular, the animation thread, has a worker that refers to the
dims via a callback/slot.
We also remove the local reference to thread, to make sure there is not
a standing, unused reference to thread, thus self._animation_thread will
be the last standing reference.
That is to say we currently have:
(if GC, crash)
Now we have
This should fix some issues with thread/garbage collection/cycle (#5443)