New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISPN-6869 Avoid deadlock when stopping DefaultCacheManager #4459
Conversation
@@ -134,7 +136,8 @@ | |||
private final CacheContainerStats stats; | |||
private final ConfigurationManager configurationManager; | |||
|
|||
@GuardedBy("this") | |||
private final ReadWriteLock stoppingLock = new ReentrantReadWriteLock(); | |||
@GuardedBy("stoppingLock") | |||
private boolean stopping; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking - do we need both - a lock and a flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you have to synchronize on something to change the flag (before it was DefaultCacheManager
instance, which leads to deadlock) and without flag and just blocking on the lock would results into even more deadlocks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using [Lock#tryLock](https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Lock.html#tryLock(long,%20java.util.concurrent.TimeUnit) method? If this method returns false we know we have to throw and exception here. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand how it can help here? The issue is not about synchronizing on one particular lock (see jira description for more details) |
Frankly, I think the synchronization policy in When starting a new cache (inside As you can see we use different objects for synchronizing starting caches ( |
I don't think you need a read lock for getting the cache, accessing CHM is fine. And I would refrain Nevertheless, I agree that all the synchronization should be executed on single lock (be it intrinsic lock or explicit one) and I would rather use |
@slaskawi @rvansa can you elaborate why do you think one lock can solve the problem? This was the first solution I tried and it resulted into several other deadlock on different places. I didn't investigate much further, but it seems to me that that it would require changes in several other places (maybe substantial changes) and the risk to introduce new deadlock(s) would be IMHO quite high. Also synchronizing everything on one single lock (even if it's deadlock free) would very likely result into much worse performance. @slaskawi as for sync on @rvansa |
@vjuranek - Could you please tell us more how you implemented this (ideally share a branch, PR or a diff)? I believe you can not deadlock when using a single re-entrant lock, so there must be something else going on there... As I suggested - IMO one lock should be ok - a read lock for obtaining a cache and a write lock for wiring it up and killing it (@rvansa prefers component status whereas I StampedLock with an upgrade but frankly - it doesn't matter it's only a detail). BTW - I believe we should also have @danberindei opinion on that. I wouldn't like to move this forward without his comment... |
@slaskawi sorry for late reply, I was on PTO on Thu and Fri. Unfortunately I haven't my old implementation as it didn't work (probably not very helpful, but I have some issue with starting a cache from cache starting listener listener - i.e. with StartCacheFromListenerTest and core testsuite execution was considerably slower). I was thinking about it little bit and agree it would be probably doable to have only one lock (and adjusting other part to avoid e.g. Anyway, completely agree we should wait for insight from @danberindei |
rebased |
The failure after rebase is fixed now |
Unfortunately I was looking at the wrong place... Of course LGTM! |
@vjuranek Maybe if you added a test case, alternative solutions would prove as wrong immediately. |
@rvansa IIRC I tried, but wasn't able to figure out any reliable reproducer .... will look on it again |
@vjuranek @rvansa use [1] http://www.jboss.org/dms/judcon/2013unitedstates/presentations/judcon2013_day3track1session1.pdf |
Sorry it took so long to look at this @vjuranek! The deadlock seems to come from the fact that
I thought this would be a good time to change The branch also includes a reproducer. It doesn't rely on WDYT? |
Thanks for review @danberindei ! |
@danberindei which PR is replacing this one? AFAICT, the problem is still happening on master |
https://issues.jboss.org/browse/ISPN-6869