Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix-forward "Change synchronization of Preferences.reset() to addr…" #1889

Open
wants to merge 250 commits into
base: main
Choose a base branch
from

Conversation

heeheehee-kolmafia
Copy link
Contributor

…ess concurrency issue between timein() and Preferences.saveToFile() (#1811)" (#1886)"

This reverts commit dfcf477.

First step for a fix-forward.

…ess concurrency issue between timein() and Preferences.saveToFile() (#1811)" (#1886)"

This reverts commit dfcf477.

First step for a fix-forward.
I suspect there are more deadlocks lurking.
@heeheehee-kolmafia
Copy link
Contributor Author

As discussed in-thread: I could also see an issue with thread A waiting on thread B, which is waiting on Preferences.lock held by thread A.

I don't see a mechanism by which a thread holding Preferences.lock could spawn a sub-thread, but it's a possibility.

@codecov
Copy link

codecov bot commented Aug 8, 2023

Codecov Report

Attention: Patch coverage is 86.81319% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 37.72%. Comparing base (6897526) to head (aabeb4d).

❗ Current head aabeb4d differs from pull request most recent head 06344ef. Consider uploading reports for the commit 06344ef to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #1889      +/-   ##
============================================
- Coverage     37.73%   37.72%   -0.01%     
+ Complexity    20008    19972      -36     
============================================
  Files          1112     1110       -2     
  Lines        169930   169809     -121     
  Branches      35870    35867       -3     
============================================
- Hits          64124    64067      -57     
+ Misses        95672    95583      -89     
- Partials      10134    10159      +25     
Files Coverage Δ
...net/sourceforge/kolmafia/session/LoginManager.java 29.09% <100.00%> (+27.23%) ⬆️
.../sourceforge/kolmafia/preferences/Preferences.java 83.19% <86.20%> (-0.56%) ⬇️

... and 46 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6897526...06344ef. Read the comment docs.

@heeheehee-kolmafia
Copy link
Contributor Author

A more explicit check that we're not deadlocking any threads:

import java.lang.management.ManagementFactory;
...
      var threads = ManagementFactory.getThreadMXBean().findDeadlockedThreads();
      assertEquals(threads, null);

although, really, you'll notice that the test actually completes...

@jaadams5
Copy link
Contributor

jaadams5 commented Aug 8, 2023

Thinking out loud - a ConcurrentMap should manage consistency and it looks like we are already doing that although it's not 100% clear to me. We might want to rethink saveSettingsOnSet. I am not certain it provides a useful service any more. If KoLmafia crashes then you want it to recover as much as you can but how much is KoLmafia crashing these days?

I wonder if saveSettingsOnSet should be ignored any time there is preference file I/O going on? I also wonder about putting save on its own thread with a timer and it writes if there have been changes?

@jaadams5
Copy link
Contributor

jaadams5 commented Aug 8, 2023

Lint suggested synchronizing on a non local object so I figured I'd try that. There are some file cleanups I am looking at while I try and figure out how to break this or change something that feels like a fix.

jaadams5 and others added 3 commits August 8, 2023 15:49
This generates a debug log when we detect a deadlock.

Once it's generated a debug log, there's generally no value in generating
another one, so we cancel the periodic task.
Various listener updates can cause Swing changes in rendering etc, and
since the Swing thread has a dependency on Preferences.lock, we want to
avoid a deadlock.

WARNING: This may resolve in a noticeable slowdown / visual hiccups /
stalls in the GUI, as we're moving noticeably more processing into the
Swing thread as a result.

This also updates DeadlockDetector to print out the rest of the lines of
the stack trace, since 8 lines is often not enough for debugging these deep
call stacks.
@heeheehee-kolmafia
Copy link
Contributor Author

Hm. I broke something in my latest commit such that it now deadlocks on login. Looking into it...

@heeheehee-kolmafia
Copy link
Contributor Author

I initially used invokeLater in ListenerRegistry, but that caused a bunch of test failures (even if that was fine when running the jar. Pondering how to resolve that conflict...

I couldn't get this to pass tests without using
SwingUtilities.invokeAndWait, except that caused login to hang (albeit not
deadlocked in the traditional sense that the thread bean can detect).

This change is merely a workaround so I have something committed even if
it doesn't fix the Mickey Mantle card bug (which I'm pretty certain now is
a deadlock between Swing primitives held by some thread and
Preferences.lock held by the Swing dispatch thread).
@jaadams5
Copy link
Contributor

jaadams5 commented Aug 9, 2023

Windows, GitHub had
MoodManagerTest > itShouldHaveNoTriggersWhenThereAreNotAnyDefined() FAILED
org.opentest4j.AssertionFailedError at MoodManagerTest.java:139

as failure. It is probably a leakage issue since the test basically resets everything and confirms nothing is there.

Tests passed on my Windows machine. Going to build from the PR and see if I can burn turns.

@heeheehee-kolmafia
Copy link
Contributor Author

That's not a leakage so much as a change of behavior. I had shifted MoodManager.updateFromPreferences() into SwingUtilities.invokeLater, which meant we were no longer synchronizing on that method call completing.

I doubt I'll have the brainpower to rework this logic before the weekend, but I promise I haven't forgotten about it!

@jaadams5
Copy link
Contributor

https://github.com/kolmafia/kolmafia/actions/runs/5895179238/job/15995387554?pr=1889

Concurrent Modification exception on MacOS. Might be interesting.

@jaadams5 jaadams5 marked this pull request as ready for review February 26, 2024 21:20
@Veracity0
Copy link
Contributor

Veracity0 commented Mar 28, 2024

I need to review this. I had my first ever pref wipe after a timein a week ago. Fortunately, I write prefs after every set, and have automatic system backups every 15 minutes, so I manually restored everything.

Interesting how many obsolete/removed prefs I had. 😀

I thought heeheehee and you were going to work this out and release it.

@jaadams5
Copy link
Contributor

I need to review this. I had my first ever pref wipe after a timein a week ago. Fortunately, I write prefs after every set, and have automatic system backups every 15 minutes, so I manually restored everything.

Interesting how many obsolete/removed prefs I had. 😀

I thought heeheehee and you were going to work this out and release it.

I never got to the point where I had a possible fix and I stopped trying when @heeheehee-kolmafia said they had an idea. I then went down the maximizer not equipping weapon rabbit hole. Since I could pretty reliably get this code to freeze when VMF used the Deck I figured it is not ready. Yours is the first report I have heard of wiped preferences in several months which also kept it on my back burner.

@jaadams5
Copy link
Contributor

P.S. - I am convinced the test failures are due to timing or sequencing but I would love to understand why the new code causes a failure that other code does not. Probably should add that to the list.

@jaadams5
Copy link
Contributor

A side issue with this PR has been the intermittent test failures. The question has been asked whether something in the deadlock test is causing problems. The failures that have been investigated in detail can be attributed to timing. One example is test A writes a file into a directory so the test can run and test B cleans up after itself by deleting a directory and contents. When A starts before B finishes there is a problem. #2281 was pushed to this PR and and #2197 and the tests for both PRs failed with the same custom script test. That reinforces my belief that the test failures are not the direct result of new code in the current PR. Which, in turn, means I would merge this once the deadlock issue was demonstrably solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants