New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix-forward "Change synchronization of Preferences.reset() to addr…" #1889
base: main
Are you sure you want to change the base?
Conversation
I suspect there are more deadlocks lurking.
As discussed in-thread: I could also see an issue with thread A waiting on thread B, which is waiting on Preferences.lock held by thread A. I don't see a mechanism by which a thread holding Preferences.lock could spawn a sub-thread, but it's a possibility. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1889 +/- ##
============================================
- Coverage 37.73% 37.72% -0.01%
+ Complexity 20008 19972 -36
============================================
Files 1112 1110 -2
Lines 169930 169809 -121
Branches 35870 35867 -3
============================================
- Hits 64124 64067 -57
+ Misses 95672 95583 -89
- Partials 10134 10159 +25
... and 46 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
A more explicit check that we're not deadlocking any threads:
although, really, you'll notice that the test actually completes... |
Thinking out loud - a ConcurrentMap should manage consistency and it looks like we are already doing that although it's not 100% clear to me. We might want to rethink saveSettingsOnSet. I am not certain it provides a useful service any more. If KoLmafia crashes then you want it to recover as much as you can but how much is KoLmafia crashing these days? I wonder if saveSettingsOnSet should be ignored any time there is preference file I/O going on? I also wonder about putting save on its own thread with a timer and it writes if there have been changes? |
Lint suggested synchronizing on a non local object so I figured I'd try that. There are some file cleanups I am looking at while I try and figure out how to break this or change something that feels like a fix. |
This generates a debug log when we detect a deadlock. Once it's generated a debug log, there's generally no value in generating another one, so we cancel the periodic task.
Various listener updates can cause Swing changes in rendering etc, and since the Swing thread has a dependency on Preferences.lock, we want to avoid a deadlock. WARNING: This may resolve in a noticeable slowdown / visual hiccups / stalls in the GUI, as we're moving noticeably more processing into the Swing thread as a result. This also updates DeadlockDetector to print out the rest of the lines of the stack trace, since 8 lines is often not enough for debugging these deep call stacks.
Hm. I broke something in my latest commit such that it now deadlocks on login. Looking into it... |
I initially used invokeLater in ListenerRegistry, but that caused a bunch of test failures (even if that was fine when running the jar. Pondering how to resolve that conflict... |
I couldn't get this to pass tests without using SwingUtilities.invokeAndWait, except that caused login to hang (albeit not deadlocked in the traditional sense that the thread bean can detect). This change is merely a workaround so I have something committed even if it doesn't fix the Mickey Mantle card bug (which I'm pretty certain now is a deadlock between Swing primitives held by some thread and Preferences.lock held by the Swing dispatch thread).
Windows, GitHub had as failure. It is probably a leakage issue since the test basically resets everything and confirms nothing is there. Tests passed on my Windows machine. Going to build from the PR and see if I can burn turns. |
That's not a leakage so much as a change of behavior. I had shifted MoodManager.updateFromPreferences() into SwingUtilities.invokeLater, which meant we were no longer synchronizing on that method call completing. I doubt I'll have the brainpower to rework this logic before the weekend, but I promise I haven't forgotten about it! |
https://github.com/kolmafia/kolmafia/actions/runs/5895179238/job/15995387554?pr=1889 Concurrent Modification exception on MacOS. Might be interesting. |
I need to review this. I had my first ever pref wipe after a timein a week ago. Fortunately, I write prefs after every set, and have automatic system backups every 15 minutes, so I manually restored everything. Interesting how many obsolete/removed prefs I had. 😀 I thought heeheehee and you were going to work this out and release it. |
I never got to the point where I had a possible fix and I stopped trying when @heeheehee-kolmafia said they had an idea. I then went down the maximizer not equipping weapon rabbit hole. Since I could pretty reliably get this code to freeze when VMF used the Deck I figured it is not ready. Yours is the first report I have heard of wiped preferences in several months which also kept it on my back burner. |
P.S. - I am convinced the test failures are due to timing or sequencing but I would love to understand why the new code causes a failure that other code does not. Probably should add that to the list. |
A side issue with this PR has been the intermittent test failures. The question has been asked whether something in the deadlock test is causing problems. The failures that have been investigated in detail can be attributed to timing. One example is test A writes a file into a directory so the test can run and test B cleans up after itself by deleting a directory and contents. When A starts before B finishes there is a problem. #2281 was pushed to this PR and and #2197 and the tests for both PRs failed with the same custom script test. That reinforces my belief that the test failures are not the direct result of new code in the current PR. Which, in turn, means I would merge this once the deadlock issue was demonstrably solved. |
…ess concurrency issue between timein() and Preferences.saveToFile() (#1811)" (#1886)"
This reverts commit dfcf477.
First step for a fix-forward.