-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8198540: Dynalink leaks memory when generating type converters #1918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back attila! A progress list of the required criteria for merging this PR into |
@szegedi To determine the appropriate audience for reviewing this pull request, one or more labels corresponding to different subsystems will normally be applied automatically. However, no automatic labelling rule matches the changes in this pull request. In order to have an "RFR" email sent to the correct mailing list, you will need to add one or more applicable labels manually using the /label pull request command. Applicable Labels
|
/label core-libs |
@szegedi |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small suggestion using MethodHandles.empty
instead of MethodHandles.constant().asType().dropArguments()
.
Co-authored-by: Johannes Kuhn <DasBrain@users.noreply.github.com>
Hi Attila, This looks good. If I understand correctly, your
...with that you could get rid of the intermediary BiClassValues object. It would be replaced with a ConcurrentHashMap subclass that would contain a field to hold the cached ClassLoader of the c1/c2. One de-reference less... As for the main logic, it would be similar to what you have now. Or it could be different. I wonder what is the performance of canReferenceDirectly(). If you used SharedSecrets to obtain a ClassLoader of a Class without security checks (and thus overhead), you could perhaps evaluate the canReferenceDirectly() before lookups so you would not needlessly trigger initialization of ClassValue(s) that don't need to get initialized. WDYT? |
… relationships in a doPrivileged block
Hey Peter, thank you for the most thoughtful review! You understood everything right. Your approach looks mostly equivalent to mine, with bit of different tradeoffs. I'm indeed using I'm even thinking I could afford to read the class loader on each use of main logic when a cached value is not available and not even use a CHM subclass to hold it. canReferenceDirectly is basically equivalent to "isDescendantOf" and can be evaluated fairly quickly, unless there's a really long class loader chain. (FWIW, it is also equivalent to In any case, your suggestion nudged me towards a different rework: I also realized that FWIW, your suggestion to use |
final var entries = new ArrayList<>(map.entrySet()); | ||
entries.add(Map.entry(c, value)); | ||
@SuppressWarnings("rawtypes") | ||
final var newEntries = entries.toArray(new Map.Entry[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since map is immutable now, instead of:
var entries = new ArrayList<>(map.entrySet());
entries.add(Map.entry(c, value));
var newEntries = entries.toArray(new Map.Entry[0]);
you could write:
var entries = map.entrySet().toArray(new Map.Entry[map.size() + 1]);
entries[map.size()] = Map.entry(c, value);
private volatile Map<Class<?>, T> forward; | ||
private volatile Map<Class<?>, T> reverse; | ||
private volatile Map<Class<?>, T> forward = Map.of(); | ||
private volatile Map<Class<?>, T> reverse = Map.of(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since maps are immutable now and allow safe publication via data race (i.e. their internal state is composed of final fields), you could relax the forward/referse fields to be plain fields. You would just have to leave them uninitialized then and check for null at some places if BiClassValues could be published via data race since I don't know whether ClassValue guarantees that the associated values are published safely or not. If it does, then you could pre-initialize the field with empty maps as now, if it does not, then even volatile modifiers don't guarantee that unsafely published BiClassValues object can only be seen with the fields initialized. i.e. you could read null from them.
So what do you think of this variant: |
I like it. I originally kept the fields volatile so that we don't observe stale values on IIUC, your changes mostly all flow from the decision to declare the fields as non-volatile; if they were still declared as volatile then it'd be impossible to observe null in them, I think (correct me if I'm wrong; it seems like you thought through this quite thoroughly) as then I don't see how could a volatile read happen before the initial volatile writes as the writes are part of the ClassValues constructor invocation and the reference to the ClassValues object is unavailable externally before the constructor completes. In any case, your approach definitely avoids any of these concerns so I'm inclined to go with it. |
It depends entirely on the guarantees of ClassValue and not on whether the fields are volatile or not. If ClassValue publishes the BiClassValues object via data race then even if the fields in BiClassValues are volatile and initialized in constructor, the publishing write in ClassValue could get reordered before the volatile writes of the fields, so you could observe the fields uninitialized. |
Map.Entry<Class<?>, T>[] entries = map.entrySet().toArray(new Map.Entry[map.size() + 1]); | ||
entries[map.size()] = Map.entry(c, value); | ||
entries[map.size()] = Map.entry(c, newValue); | ||
newMap = Map.ofEntries(entries); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pardon me! A result of introducing new local late in the process.
Mailing list message from Peter Levart on core-libs-dev: On 1/8/21 2:03 PM, Peter Levart wrote:
correction: past -> before To explain: Normal writes that appear in program order before a volatile Regards, Peter |
I checked the code of ClassValue and it can be assumed that it publishes associated values safely. The proof is that it keeps values that it publishes assigned to the final field |
So, are you saying the solution where I kept the fields volatile and initialized them with |
Yes, the pre-initialized fields to Map.of() are safe regardless of whether they are volatile or not (so I would keep them non-volatile to optimize fast-path). Because the BiClassValues instance is published safely to other threads via ClassValue and because you never assign null to the fields later on. |
…ocumenting this way.
Alright, I made a new hybrid of non-volatile fields and never null fields. Hopefully we're getting to the ideal. Again, I really appreciate all the advice and direction you provided here. |
Hello Attila, |
I think they'd be extremely unlikely. The only user of these right now is Dynalink's type converter factory. I can imagine a situation where there's a conversion from a dynamic language runtime's internal "object" type to an application-specific Java interface, or from its internal "function object" type to an app-specific Java SAM type, and for some reason the app-specific types aren't in the same or descendant class loader of the language runtime's loader. |
Well, I was just thinking if it might be more frequent and would benefit from caching the result too. But if it is not, then what you have now is OK. |
@plevart would you be then be okay with approving this PR? Also, @hns or @sundararajana can I maybe get a review from either of you? |
@plevart Unknown command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good Attila.
@szegedi This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 571 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@szegedi I've looked at the BiClassValue code and it looks good to me. If that's ok for you I'll review the rest of the PR tomorrow morning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work as ever, Attila. Looks good to me.
Thanks for the review, Hannes! I must credit Peter too with how the final version of the code ended up, he really helped a lot with insightful comments and advice. |
/integrate |
@szegedi Since your change was applied there have been 578 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 8f4c15f. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
(NB: For this leak to occur, an application needs to be either creating and discarding linkers frequently, or creating and discarding class loaders frequently while creating Dynalink type converters for classes in them. Neither is typical usage, although they can occur in dynamic code loading environments such as OSGi.)
I'll explain this one in a bit more detail. Dynalink creates and caches method handles for language-specific conversions between types. Each conversion is thus characterized by a
Class
object representing the type converted from, and aClass
object representing the type converted to, thus they're keyed by a pair of(Class, Class)
. Java API provides the excellentClassValue
class for associating values with a single class, but it lacks the – admittedly more niche – facility for associating a value with a pair of classes. I originally solved this problem using something that looks like aClassValue<Map<Class<?>, T>>
. I say "looks like" because Dynalink has a specializedClassMap
class that works likeMap<Class<?>, T>
but it also goes to some pains to not establish strong references from a class loader to either its children or to unrelated class loaders, for obvious reasons.Surprisingly, the problem didn't lie in there, though. The problem was in the fact that
TypeConverterFactory
usedClassValue
objects that were its non-static anonymous inner classes, and further the values they computed were also of non-static anonymous inner classes. Due to the wayClassValue
internals work, this led to theTypeConverterFactory
objects becoming anchored in a GC root ofClass.classValueMap
through the syntheticthis$0
reference chains in said inner classes… talk about non-obvious memory leaks. (I guess there's a lesson in there about not usingClassValue
as an instance field, but there's a genuine need for them here, so…)… so the first thing I did was I deconstructed all those anonymous inner classes into named static inner classes, and ended up with something that did fix the memory leak (yay!) but at the same time there was a bunch of copying of constructor parameters from outer classes into the inner classes, the whole thing was very clunky with scary amounts of boilerplate. I felt there must exist a better solution.
And it exists! I ended up replacing the
ClassValue<ClassMap<T>>
construct with a ground-up implementation ofBiClassValue<T>
, representation of lazily computed values associated with a pair of types. This was the right abstraction theTypeConverterFactory
code needed all along.BiClassValue<T>
is also not implemented as an abstract class but rather it is a final class and takes aBiFunction<Class<?>, Class<?>, T>
for computation of its values, thus there's never a risk of making it an instance-specific inner class (well, you still need to be careful with the bifunction lambda…) It also has an even better solution for avoiding strong references to child class loaders.And that's why I'm not submitting here the smallest possible point fix to the memory leak, but rather a slightly larger one that architecturally fixes it the right way.
There are two test classes, they test subtly different scenarios.
TypeConverterFactoryMemoryLeakTest
tests that when aTypeConverterFactory
instance becomes unreachable, all method handles it created also become eligible for collection.TypeConverterFactoryRetentionTests
on the other hand test that the factory itself won't prevent class loaders from being collected by virtue of creating converters for them.Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/1918/head:pull/1918
$ git checkout pull/1918