-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8336492: Regression in lambda serialization #20349
Conversation
👋 Welcome back mcimadamore! A progress list of the required criteria for merging this PR into |
@mcimadamore This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 324 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@mcimadamore The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
@@ -2337,7 +2243,7 @@ private boolean shouldEmitOuterThis(ClassSymbol sym) { | |||
// Enclosing instance field is used | |||
return true; | |||
} | |||
if (rs.isSerializable(sym.type)) { | |||
if (sym.owner.kind != MTH && rs.isSerializable(sym.type)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, this change serialized form for local/anon classes, by omitting capture of enclosing this
where possible. (see discussion on serialization compatibility in the PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: consistency with the lambda case is kind of nice, but if this turns out to be problematic, we can omit this change from this PR, and always capture this$0
in case of a serializable local/anon class.
/csr needed |
@mcimadamore has indicated that a compatibility and specification (CSR) request is needed for this pull request. @mcimadamore please create a CSR request for issue JDK-8336492 with the correct fix version. This pull request cannot be integrated until the CSR request is approved. |
Webrevs
|
Refactoring Awesomeness Ratio (not counting unit tests) = 1.58. 👍 |
…tured parameter matches the name of the captured local (skipping any intermediate synthetic local capture in local classes)
This looks exciting!
I'd be happy to. I have started a round of testing at 76a972b, it'll take about a day to have results. One caveat is that Google's use of serialization generally doesn't rely on persisting the serialized form, and we had disabled the check for serialization in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great simplification, some comments for your consideration
* and to compute the name of the synthetic lambda method (which depends | ||
* on _where_ the lambda capture occurs). | ||
*/ | ||
record CaptureSiteInfo(Symbol owner, boolean inStaticInit, VarSymbol pendingVar) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor suggestion, add another constructor with only the first two arguments as there are two constructor invocations where the last argument is null
} | ||
|
||
/** | ||
* @return Name of the enclosing method to be folded into synthetic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably the comment needs to also mention that it can return new
or static
no only the method name
public JCTree translateTopLevelClass(Env<AttrContext> env, JCTree cdef, TreeMaker make) { | ||
this.make = make; | ||
this.attrEnv = env; | ||
this.context = null; | ||
this.contextMap = new HashMap<>(); | ||
cdef = analyzer.analyzeAndPreprocessClass((JCClassDecl) cdef); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
@@ -338,22 +401,15 @@ public void visitLambda(JCLambda tree) { | |||
owner::setTypeAttributes, | |||
sym::setTypeAttributes); | |||
|
|||
apportionTypeAnnotations(tree, | |||
owner::getInitTypeAttributes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure that this is semantically the same as before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure that this is semantically the same as before
Yeah, this is a bit tricky. Basically, the issue with type annotations is that they are encoded in different places depending on the case. E.g. a local variable inside a static/instance initializer has the type annotation recorded in the class symbol (!!). There's also different methods to fetch type annotations, depending on whether it was a normal type annotation, or one in a static/instance initializer.
The general idea is that we try to figure out the symbol that "owns" the type annotation while visiting, and then move all lambda annotations from that owner onto the new lambda method symbol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be better after this push
// Lambda methods are private synthetic. | ||
// Inherit ACC_STRICT from the enclosing method, or, for clinit, | ||
// from the class. | ||
long flags = SYNTHETIC | LAMBDA_METHOD | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side: I wonder if we could use the same bits for flags: LAMBDA_METHOD
and RECORD
and safe a bit position for other flags
@@ -1731,17 +1339,15 @@ final class ReferenceTranslationContext extends TranslationContext<JCMemberRefer | |||
enum LambdaSymbolKind { | |||
PARAM, // original to translated lambda parameters | |||
LOCAL_VAR, // original to translated lambda locals | |||
CAPTURED_VAR, // variables in enclosing scope to translated synthetic parameters | |||
CAPTURED_THIS; // class symbols to translated synthetic parameters (for captured member access) | |||
CAPTURED_VAR; // variables in enclosing scope to translated synthetic parameters | |||
|
|||
boolean propagateAnnotations() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: there is only one use of this method and the method can be rewritten with a ternary operator and probably inlined in its only client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed here
/** | ||
* Simple subclass modelling the translation context of a method reference. | ||
*/ | ||
final class ReferenceTranslationContext extends TranslationContext<JCMemberReference> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor, we can probably remove this class it is not adding any behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed here
ret = new VarSymbol((sym.flags() & FINAL) | PARAMETER, sym.name, types.erasure(sym.type), translatedSym); | ||
ret.pos = sym.pos; | ||
// Set ret.data. Same as case LOCAL_VAR above. | ||
if (sym.isExceptionParameter()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a change from your patch but, do we really need to set this exception parameter thing for a method param? Also this could be a TypeMetadata. Also, there is no test affected if this if
with its block is removed so either unnecessary, me thinks, or we need more tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure, but eyeballing the code, I see that sym.isExceptionParameter
is used in the backend (e.g. Code
) so I'd suspect that a failure in doing this would lead to issues when generating bytecode, which I suspect is why this code was added in the first place. But, while that seems useful for a local variable declared inside a lambda, I don't see how a lambda parameter can itself be an exception parameter...
@@ -289,9 +336,18 @@ public JCTree translateTopLevelClass(Env<AttrContext> env, JCTree cdef, TreeMake | |||
@Override | |||
public void visitClassDef(JCClassDecl tree) { | |||
KlassInfo prevKlassInfo = kInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future improvement? we have class info, lambda context, capture site info, should we probably gather them all in an environment like structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes and no. First, note that capture site info is gone.
As for lambda context and klass info, we use them in different ways: the lambda context is set when we're in a lambda. The klass info is set when we are inside a klass. We create many lambda contexts for each klass info. We could of course move the lambda context field from the visitor to klass info, but I'm not sure how much that buys.
Some initial testing shows a regression in type annotation handling with this change. Given an example like the following, the type annotation in the lambda is now being incorrectly propagated to the constructor. A import java.lang.annotation.ElementType;
import java.lang.annotation.Target;
import java.util.ArrayList;
import java.util.List;
class T {
@Target(ElementType.TYPE_USE)
@interface A {}
Runnable r =
() -> {
List<@A Object> xs = new ArrayList<>();
xs.add(1);
xs.add(2);
xs.add(3);
xs.add(5);
xs.add(6);
xs.add(7);
xs.add(8);
};
T() {}
} javap reports an error because the type annotation's length is longer than the constructor:
Using the JDK 11 javap shows
|
Thanks, I'll take a look |
Ugh. This is quite ugly. The type annotation is seen, and correctly moved by |
After trying different approaches, I realized that the cleanest (by far) solution is to simply capture the lambda owner symbol during
We even had a test which enforced the wrong behavior (!!). Last, the static/instance initializer symbol used to have a As a result, |
Symbol fakeOwner = | ||
new MethodSymbol(tree.flags | BLOCK | | ||
env.info.scope.owner.flags() & STRICTFP, names.empty, null, | ||
env.info.scope.owner.flags() & STRICTFP, names.empty, fakeOwnerType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the type was left null here, which caused some crashes in LambdaToMethod
. There was really no reason for this to be null, except that ClassWriter
looks at type being null as a test to see if owner is a static/instance init block. But we can use BLOCK
for that (as TypeAnnotations
does)
@@ -3572,7 +3573,7 @@ public Env<AttrContext> lambdaEnv(JCLambda that, Env<AttrContext> env) { | |||
if (clinit == null) { | |||
Type clinitType = new MethodType(List.nil(), | |||
syms.voidType, List.nil(), syms.methodClass); | |||
clinit = new MethodSymbol(STATIC | SYNTHETIC | PRIVATE, | |||
clinit = new MethodSymbol(BLOCK | STATIC | SYNTHETIC | PRIVATE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added missing BLOCK
, so that we're now consistent with what the rest of the code (not only LambdaToMethod
) expects.
@@ -329,7 +329,7 @@ protected int writeEnclosingMethodAttribute(Name attributeName, ClassSymbol c) { | |||
int alenIdx = writeAttr(attributeName); | |||
ClassSymbol enclClass = c.owner.enclClass(); | |||
MethodSymbol enclMethod = | |||
(c.owner.type == null // local to init block | |||
((c.owner.flags() & BLOCK) != 0 // local to init block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new test, which looks at the BLOCK
flag
src/jdk.compiler/share/classes/com/sun/tools/javac/comp/CaptureScanner.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest changes look good
I did another round of testing with the latest version of the PR, everything still looks good to me.
|
Thanks for confirming. I think this is in good shape, and we can now proceed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using SequencedSet
would allow to skip the call to com.sun.tools.javac.util.List::reverse()
on the result of CaptureScanner::analyzeCaptures()
.
*/ | ||
private final ListBuffer<VarSymbol> fvs = new ListBuffer<>(); | ||
private final Set<VarSymbol> fvs = new LinkedHashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private final Set<VarSymbol> fvs = new LinkedHashSet<>(); | |
private final SequencedSet<VarSymbol> fvs = new LinkedHashSet<>(); |
@@ -96,6 +93,6 @@ public void visitVarDef(JCTree.JCVariableDecl tree) { | |||
*/ | |||
List<Symbol.VarSymbol> analyzeCaptures() { | |||
scan(tree); | |||
return fvs.toList(); | |||
return List.from(fvs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return List.from(fvs); | |
return List.from(fvs.reversed()); |
@@ -314,7 +314,7 @@ List<VarSymbol> freevars(ClassSymbol c) { | |||
return fvs; | |||
} | |||
FreeVarCollector collector = new FreeVarCollector(classDef(c)); | |||
fvs = collector.analyzeCaptures(); | |||
fvs = collector.analyzeCaptures().reverse(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fvs = collector.analyzeCaptures().reverse(); | |
fvs = collector.analyzeCaptures(); |
|
||
import java.util.HashSet; | ||
import java.util.LinkedHashSet; | ||
import java.util.Set; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import java.util.Set; | |
import java.util.Set; | |
import java.util.SequencedSet; |
Not sure about that. Note that |
Well, at least typing the field as a |
/integrate |
@mcimadamore This pull request has not yet been marked as ready for integration. |
@mcimadamore This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
/integrate |
@mcimadamore This pull request has not yet been marked as ready for integration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
/integrate |
Going to push as commit 8a4ea09.
Your commit was automatically rebased without conflicts. |
@mcimadamore Pushed as commit 8a4ea09. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This PR consolidates the code for dealing with local captures in both
Lower
andLambdaToMethod
. It does so by adding a new tree scanner, namelyCaptureScanner
, which is then subclassed byLower.FreeVarCollector
and also by the newLambdaToMethod.LambdaCaptureScanner
.The main idea behind the new visitor is that we can compute the set of (most) captured locals w/o relying on complex state from
Lower
, and make the logic general enough to work on any tree. This can be done by keeping track of all local variable declarations occurring in a given subtreeT
. Then, ifT
contains a reference to a local variable that has not been seen, we can assume that this variable is a captured variable. This simple logic works rather well, and doesn't rely on checking e.g. that the accessed variable has the same owner of the method that owns a local class (which then caused other artifacts such asLower::ownerToCopyFreeVarsFrom
, now removed).The bigger rewrite is the one in
LambdaToMethod
. That class featured a custom visitor calledLambdaAnalyzerPreprocessor
(now removed) which maintains a stack of frames encountered during a visit, and tries to establish which references to local variables needs to be captured by the lambda, as well as whetherthis
needs to be referenced by the lambda. Thanks to the newCaptureScanner
visitor, all this logic can be achieved in a very small subclass ofCaptureScanner
, namelyLambdaCaptureScanner
.This removes the need to keep track of frames, and other ancillary data structures.
LambdaTranslationContext
is now significantly simpler, and its most important field is aMap<VarSymbol, VarSymbol>
which maps logical lambda symbols to translated ones (in the generated lambda method). We no longer need to maintain different maps for different kinds of captured symbols.The code for patching identifiers in a lambda expression also became a lot simpler: we just check if we have pending lambda translation context and, if so, we ask if the identifier symbol is contained in the translation map. Otherwise, we leave the identifier as is.
Fixing the issue referenced by this PR is now very simple: inside
LambdaCaptureScanner
, we just need to make sure that any identifier referring to a captured local field generated byLower
is re-captured, and the rest just works. To make the code more robust I've added a new variable flag to denote synthetic captured fields generated byLower
.Compatibility
This PR changes the set of fields that are added to local/anonymous classes. Consider the following case:
This used to generate the following bytecode for
Local
:With this patch, we only generate this:
That is, the field for
this$0
is omitted (as that's not used). This affects the serialized form of this local class, as there's now one less field in the serialized form. While this change is desirable (it brings behavior in sync with lambda expressions, and avoiding capture of enclosing instance might be beneficial for GC reachability), we need to make sure this is ok.Also, it is possible that the order in which local capture fields appear in classes/constructor parameters of local/anon classes changes slightly, due the fact we're using a different visitor.
Another, minor issue in this rewrite is that now generated methods for non-serializable lambdas can have different names from before. In the old code, a shared counter for all lambdas in a class was used. As the new code shares the same logic for checking name clashing as the one used in serializable lambdas, we now increment the suffix after
lambda$foo$
if needed. So if two lambdas are defined in two methodsfoo
andbar
, we now getlambda$foo$0
andlambda$foo$bar$0
(before, one of them would have had the index set to1
). While this affects some tests, which name we generate for synthetic lambda methods is a private contract, so I didn't think that preserving 100% compatibility was relevant here.@cushon perhaps you could help assessing the compatibility impact of this change (as you seem to have a codebase which rely on serialization of local classes/lambdas) ?
Progress
Issues
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20349/head:pull/20349
$ git checkout pull/20349
Update a local copy of the PR:
$ git checkout pull/20349
$ git pull https://git.openjdk.org/jdk.git pull/20349/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 20349
View PR using the GUI difftool:
$ git pr show -t 20349
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20349.diff
Webrev
Link to Webrev Comment