Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(fix) incorrect reachability assumption in ResourceLeakDetector #8410

Merged
merged 1 commit into from Oct 24, 2018

Conversation

@almson
Copy link
Contributor

commented Oct 19, 2018

As mentioned here, trackedObject != null gives no guarantee that trackedObject remains reachable. It is an expression that can obviously be moved around by the compiler. The recommended solution is to use synchronized blocks to ensure reachability, because they're never optimized out and releasing the monitor requires the reference to remain reachable. This is discussed in https://docs.oracle.com/javase/9/docs/api/java/lang/ref/Reference.html#reachabilityFence-java.lang.Object- However, I am not an expert with intimate knowledge of the JDK. (If you have one, ask them!)

@netty-bot

This comment has been minimized.

Copy link

commented Oct 19, 2018

Can one of the admins verify this patch?

@almson almson force-pushed the almson:fix/resource-leak-reachability branch from a4a08e0 to 79f5ea4 Oct 20, 2018

// it is unreasonable that anyone else, anywhere, is hold a lock on the trackedObject.
// Therefore, it seems using a synchronization block is possible here, so we will use one!
synchronized (trackedObject)
{

This comment has been minimized.

Copy link
@normanmaurer

normanmaurer Oct 22, 2018

Member

can you move the { up one line to match our coding style ?

This comment has been minimized.

Copy link
@ejona86

ejona86 Oct 22, 2018

Member

Would it be more clear if written this way?

boolean b = close();
synchronized (trackedObject) {}
return b;

I know that looks weird, but it makes it clear we aren't relying on the synchronization when running the close() method.

(Alternatively, maybe with a try-finally, like discussed in the reachabilityFence documentation)

This comment has been minimized.

Copy link
@normanmaurer

normanmaurer Oct 22, 2018

Member

@ejona86 I wonder if it could just optimise it away when it detects that the block is empty ?

This comment has been minimized.

Copy link
@ejona86

ejona86 Oct 23, 2018

Member

Nope, because it still has side-effects. See my comment in https://github.com/grpc/grpc-java/pull/1526/files#r55582622

@normanmaurer
Copy link
Member

left a comment

One nit...

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 22, 2018

@netty-bot test this please

normanmaurer added a commit that referenced this pull request Oct 22, 2018

Fix incorrect reachability assumptions which could lead to false-posi…
…tives during leak detection.

Motivation:

The compiler may re-order the != check and so may garbage collect the tracked object before we call close().

See also #8410

Modifications:

Ensure the compiler can not re-order and so produce a false-positive.

Result:

More correct and robust leak detector with no false-positives.
@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 22, 2018

@carl-mastrangelo @ejona86 @almson this may be an alternative fix for it: #8422

@carl-mastrangelo
Copy link
Member

left a comment

I'm sorry, but I don't understand the bug. It is not clear to me that the JVM's behavior is wrong.

@almson

This comment has been minimized.

Copy link
Contributor Author

commented Oct 23, 2018

@ejona86 I agree that try { close() } finally { synchronized (trackedObject) { } } should work. Empty synchronized can't be optimized away because it sets up a happens-before relationship for the surrounding code (people use them: https://stackoverflow.com/a/31933260/1151521). I suppose we can even stick synchronized (trackedObject) { } into PlatformDependent.

I got as far as I could with the new commit. However, I do not know what I should do in PlatformDependent to reference the Java 9 method.

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

Use reflection to call it 👍

@almson

This comment has been minimized.

Copy link
Contributor Author

commented Oct 23, 2018

@carl-mastrangelo I'm not sure what you're asking. The bug is that the expression trackedObject != null is not a substitute for Reference.reachabilityFence. (it can be moved all the way to the top of the method.) The documentation for reachabilityFence says that synchronized is a substitute. A synchronized block cannot be moved because it creates happens-before semantics. A synchronized block keeps the object reachable because it uses its reference to call releaseMonitor. Optimizing out a synchronize is nearly impossible because the compiler would have to exclude any possibility that anyone else might synchronize on the object. (Actually, the GC can tell that nobody holds a reference to the object and therefore can't synchronize on it, but not the compiler.) In any case, this seems to be the recommended solution even in Java 9.

See: https://docs.oracle.com/javase/9/docs/api/java/lang/ref/Reference.html#reachabilityFence-java.lang.Object-

@almson

This comment has been minimized.

Copy link
Contributor Author

commented Oct 23, 2018

@normanmaurer You know what, trying to call Reference.reachabilityFence is a bad idea. The biggest problem is creating different behavior between JDK versions. There is no real reason to prefer reachabilityFence and its documentation implies that it's not meant to be preferred over synchronize. We don't even know if it will have the intended effect when called through reflection. Let's keep it simple.

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

@almson good point... then lets just put the synchronised stuff in in the leak detector. no need to add something to PlatformDependent.

@almson

This comment has been minimized.

Copy link
Contributor Author

commented Oct 23, 2018

Well, it might be useful in other parts. Does any other code anywhere use finalizer or a weak/phantom reference? Does netty only do leak detection, or is there code somewhere that tries to clean up leaks?

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

@almson I think we only need it here.

@almson

This comment has been minimized.

Copy link
Contributor Author

commented Oct 23, 2018

I see WeakOrderQueue, HashedWheelTimer, PoolArena, and PoolThreadCache have finalizers. I also see other places where WeakReference and SoftReference are used. All of those places might require reachabilityFence. I haven't actually looked at them yet.

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

@almson from my understanding they need not... Let us just focus on the leakdetector in this pr and we can do another one if we really need more changes.

@almson

This comment has been minimized.

Copy link
Contributor Author

commented Oct 23, 2018

Ok, I'm going to save reachabilityFence together with its javadoc, but put it in ResourceLeakDetector.

@almson

This comment has been minimized.

Copy link
Contributor Author

commented Oct 23, 2018

From what I see so far:

  • PoolArena.finalize is useless. Its only purpose is to hurry up the GC. But because it's a finalizer, all it does is delay it! For correctness, PoolArena would need a reachabilityFence in every method. I'm guessing it hasn't yet caused problems because it's so long lived it only becomes unreachable at shutdown.
  • Recycler.WeakOrderQueue.Head.finalize also seems useless. It's supposed to be a GC optimization--but, again, a finalizer delays GC. If clearing the references really is a good optimization, then it will be even better if invoked directly not through finalization. It's not clear how dangerous this finalizer is.
  • HashedWheelTimer.finalize seems to only be part of a logging feature, so is harmless.
  • PoolThreadCache.finalize ok if free is called (but then, it doesn't do anything). Will suffer from premature finalization if free is not called. It should be hooked up to the leak detector instead.
  • There is an ObjectCleaner, but I don't see it being used anywhere. If it is, it's going to cause problems or require reachabilityFence.

The other uses of weak references don't use ReferenceQueues, other than the leak detector.

@normanmaurer
Copy link
Member

left a comment

Two nits

// It should not, because somewhere up the callstack should be a (successful) `trackedObject.release`,
// therefore it is unreasonable that anyone else, anywhere, is holding a lock on the trackedObject.
// (Unreasonable but possible, unfortunately.)
reachabilityFence (trackedObject);

This comment has been minimized.

Copy link
@normanmaurer

normanmaurer Oct 23, 2018

Member

nit: remove space before (

* @see java.lang.ref.Reference#reachabilityFence
*/
private static void reachabilityFence (Object ref) {
if (ref != null)

This comment has been minimized.

Copy link
@normanmaurer

normanmaurer Oct 23, 2018

Member

Add { } to match our coding style

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

@almson let’s focus on the one case here and then we can open a new issue and discuss about the others there. I want to keep the focus on this change

@ejona86
Copy link
Member

left a comment

(Assuming Norman's requested changes are made)

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

@almson for now not... if we need it at the end we can easily do later on.

@almson

This comment has been minimized.

Copy link
Contributor Author

commented Oct 23, 2018

@normanmaurer I don't understand. Now not what?

Should I squash the commits?

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

@almson ignore my comment... I may have read an old comment and tried to answer 🤣

Just squash and thanks again for your help!

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

@netty-bot test this please

@almson almson force-pushed the almson:fix/resource-leak-reachability branch from ba79476 to 76c82bf Oct 23, 2018

@almson

This comment has been minimized.

Copy link
Contributor Author

commented Oct 23, 2018

Np! This was my first github pull request, btw.

try {
return close();
}
finally {

This comment has been minimized.

Copy link
@normanmaurer

normanmaurer Oct 23, 2018

Member

sorry missed this... please merge with previous line..

}
}


This comment has been minimized.

Copy link
@normanmaurer

normanmaurer Oct 23, 2018

Member

please remove one empty line

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

@almson please fix all the check style errors:

11:57:35 [INFO] Starting audit...
11:57:35 /code/common/src/main/java/io/netty/util/ResourceLeakDetector.java:466:13: '}' should be on the same line.
11:57:35 /code/common/src/main/java/io/netty/util/ResourceLeakDetector.java:468: Line is longer than 120 characters (found 121).
11:57:35 /code/common/src/main/java/io/netty/util/ResourceLeakDetector.java:475: trailing whitespace
11:57:35 /code/common/src/main/java/io/netty/util/ResourceLeakDetector.java:476: trailing whitespace
11:57:35 /code/common/src/main/java/io/netty/util/ResourceLeakDetector.java:484: trailing whitespace
11:57:35 /code/common/src/main/java/io/netty/util/ResourceLeakDetector.java:488: trailing whitespace
11:57:35 /code/common/src/main/java/io/netty/util/ResourceLeakDetector.java:489: Line is longer than 120 characters (found 128).
11:57:35 /code/common/src/main/java/io/netty/util/ResourceLeakDetector.java:492: trailing whitespace
11:57:35 /code/common/src/main/java/io/netty/util/ResourceLeakDetector.java:496:47: '(' is preceded with whitespace.
11:57:35 /code/common/src/main/java/io/netty/util/ResourceLeakDetector.java:499:37: '{' is not followed by whitespace.
11:57:35 /code/common/src/main/java/io/netty/util/ResourceLeakDetector.java:499:37: '}' is not preceded with whitespace.
11:57:35 Audit done.

Just run ./mvnw clean package to see if there are any left.

@almson almson force-pushed the almson:fix/resource-leak-reachability branch from 76c82bf to 5deefd5 Oct 23, 2018

@almson

This comment has been minimized.

Copy link
Contributor Author

commented Oct 23, 2018

Updated. Thanks for your patience.

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

@netty-bot test this please

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

@almson you will need to change the name of the method to be able to compile on java9 and later. See the error message on the ci

https://ci.netty.io/job/netty-centos6-java11-prb/152/console

@almson

This comment has been minimized.

Copy link
Contributor Author

commented Oct 23, 2018

Wow, ironic. Should I call it reachabilityFence0? That seems to be popular.

@almson almson force-pushed the almson:fix/resource-leak-reachability branch from 5deefd5 to 357aef6 Oct 23, 2018

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 24, 2018

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 24, 2018

@netty-bot test this please

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 24, 2018

@almson this looks good now... One last thing can you please amend your commit message to match our commit template and force push ?
After this is done I will pull in:

https://netty.io/wiki/writing-a-commit-message.html

@normanmaurer normanmaurer added the defect label Oct 24, 2018

@normanmaurer normanmaurer added this to the 4.1.31.Final milestone Oct 24, 2018

Fix incorrect reachability assumption in ResourceLeakDetector
Motivation:

trackedObject != null gives no guarantee that trackedObject remains reachable. This may cause problems related to premature finalization: false leak detector warnings.
 
Modifications:

Add private method reachabilityFence0 that works on JDK 8 and can be factored out into PlatformDependent. Later, it can be swapped for the real Reference.reachabilityFence.
 
Result:

No false leak detector warnings in future versions of JDK.
@almson

This comment has been minimized.

Copy link
Contributor Author

commented Oct 24, 2018

done

@almson almson force-pushed the almson:fix/resource-leak-reachability branch from 357aef6 to 218563f Oct 24, 2018

@normanmaurer normanmaurer merged commit 1cc692d into netty:4.1 Oct 24, 2018

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 24, 2018

@almson thanks a lot for all your work here!

@normanmaurer normanmaurer self-assigned this Oct 24, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.