Skip to content

Conversation

@cushon
Copy link
Contributor

@cushon cushon commented Dec 2, 2025

This change adds a field to JCTree to store end positions, instead of using a separate EndPosTable map. See also this compile-dev@ thread.

I performed the refactoring in stages, preserving existing semantics at each step.

There are two known places where this changes existing behaviour that are reflected in changes to tests:

  • test/langtools/tools/javac/api/TestJavacTask_Lock.java - this test asserts that calling JavacTask#parse first and then calling #call or #parse second will fail. The assertion that the test is currently expecting is thrown when the EndPosTable gets set a second time, and this change means that no longer results in an exception. If desired JavacTask#parse could be updated to explicitly check if it is called twice and fail, instead of indirectly relying on the EndPosTable for that.

  • test/langtools/tools/javac/diags/DiagnosticGetEndPosition.java - there's a comment that 'ideally would be "0", but the positions are not fully set yet', and with the new approach the end position is available to the test, so it resolves the comment


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8372948: Store end positions directly in JCTree (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28610/head:pull/28610
$ git checkout pull/28610

Update a local copy of the PR:
$ git checkout pull/28610
$ git pull https://git.openjdk.org/jdk.git pull/28610/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28610

View PR using the GUI difftool:
$ git pr show -t 28610

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28610.diff

Using Webrev

Link to Webrev Comment

@cushon cushon marked this pull request as ready for review December 2, 2025 16:05
@bridgekeeper
Copy link

bridgekeeper bot commented Dec 2, 2025

👋 Welcome back cushon! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 2, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added javadoc javadoc-dev@openjdk.org compiler compiler-dev@openjdk.org labels Dec 2, 2025
@openjdk
Copy link

openjdk bot commented Dec 2, 2025

@cushon The following labels will be automatically applied to this pull request:

  • compiler
  • javadoc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 2, 2025
@mlbridge
Copy link

mlbridge bot commented Dec 2, 2025

Webrevs

@mlbridge
Copy link

mlbridge bot commented Dec 2, 2025

Mailing list message from Jonathan Gibbons on javadoc-dev:

Without looking in detail at this specific proposal, I wonder if you considered the alternative to only store end positions in the subtypes of JCTree that actually "need" them. In other words, you only need store end positions in tree nodes that "end" in a lexical token and not in a child tree node. Effectively, you only need store the end position in tree nodes that would otherwise have entries in the EndPosTable.

-- Jon

On Tue, Dec 2, 2025, at 8:12 AM, Liam Miller-Cushon wrote:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/javadoc-dev/attachments/20251202/41630644/attachment.htm>

@cushon
Copy link
Contributor Author

cushon commented Dec 2, 2025

Without looking in detail at this specific proposal, I wonder if you considered the alternative to only store end positions in the subtypes of JCTree that actually "need" them. In other words, you only need store end positions in tree nodes that "end" in a lexical token and not in a child tree node. Effectively, you only need store the end position in tree nodes that would otherwise have entries in the EndPosTable.

Good question--I hadn't investigated that option. It seems do-able, perhaps with a shared interface for subtypes that needed end positions to simplify the handling of them.

What tradeoffs do you see here, would only declaring the field on trees that need it be mostly about saving memory?

Also is that unique to end positions? Or could javac potentially avoid storing start positions for nodes that don't start with a lexical token as well?

@mlbridge
Copy link

mlbridge bot commented Dec 2, 2025

Mailing list message from Jonathan Gibbons on javadoc-dev:

What tradeoffs do you see here, would only declaring the field on trees that need it be mostly about saving memory?

Probably, yes, as well as just being a closer equivalent to the existing code.

Also is that unique to end positions? Or could javac potentially avoid storing start positions for nodes that don't start with a lexical token as well?

Every JCTree node already has a 'pos' field, representing the position of the first character that is unique to the tree node. That means it is the 'start' position for those nodes that begin with a lexical token, and so no additional field is necessary.

The asymmetry between start and end positions is indicative of why there is only an EndPosTable, without any need for a StartPosTable.

-- Jon

On Tue, Dec 2, 2025, at 8:30 AM, Liam Miller-Cushon wrote:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/javadoc-dev/attachments/20251202/0d4fee58/attachment-0001.htm>

@mlbridge
Copy link

mlbridge bot commented Dec 2, 2025

Mailing list message from Jonathan Gibbons on javadoc-dev:

I'm not sure a shared interface gets you anything significant, since you cannot inherit a shared field that way.

Instead, you could have a `setEndPos` on `JCTree` that is a no-op on subtypes that do not need it, and which sets a locally declared field on subtypes that do need it.

-- Jon

On Tue, Dec 2, 2025, at 8:30 AM, Liam Miller-Cushon wrote:

Good question--I hadn't investigated that option. It seems do-able, perhaps with a shared interface for subtypes that needed end positions to simplify the handling of them.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/javadoc-dev/attachments/20251202/88ae10e9/attachment.htm>

@cushon
Copy link
Contributor Author

cushon commented Dec 3, 2025

I'm not sure a shared interface gets you anything significant, since you cannot inherit a shared field that way.

Partly I was thinking the interface could make helpers that set end positions type safe, e.g. for the storeEnd method in JavacParser if we could write something like:

protected <T extends JCTree & JCTree.HasEndPos> T storeEnd(T tree, int endpos) { ... }

But there are other places that store end positions on a JCTree instead of a specific subtype, so that approach only goes so far.

Instead, you could have a setEndPos on JCTree that is a no-op on subtypes that do not need it, and which sets a locally declared field on subtypes that do need it.

I think the set of trees that end positions are stored for is: JCAnnotation, JCArrayAccess, JCArrayTypeTree, JCAssert, JCAssign, JCBinary, JCBindingPattern, JCBlock, JCBreak, JCCase, JCClassDecl, JCConstantCaseLabel, JCContinue, JCDefaultCaseLabel, JCDoWhileLoop, JCExports, JCExpressionStatement, JCFieldAccess, JCIdent, JCIf, JCImport, JCLambda, JCLiteral, JCMemberReference, JCMethodDecl, JCMethodInvocation, JCModifiers, JCModuleDecl, JCNewArray, JCNewClass, JCOpens, JCPackageDecl, JCParens, JCPatternCaseLabel, JCPrimitiveTypeTree, JCProvides, JCRequires, JCReturn, JCSkip, JCSwitch, JCThrow, JCTypeApply, JCTypeParameter, JCTypeUnion, JCUnary, JCUses, JCVariableDecl, JCWildcard, TypeBoundKind

I got that by instrumenting SimpleEndPosTable and building the JDK, it's possible it missed a few.

And then we could add the following snippet to all of those classes:

    private int endpos;

    public int getEndPos() {
        return endpos;
    }

    public void setEndPos(int endpso) {
        this.endpos = endpos;
    }

My feeling is that perhaps it's worth the extra memory to not have to duplicate that code for all of those JCTrees, and also to avoid the risk of trying to store end positions on trees that don't support it. But I am open to making those changes if there's a preference for it.

@mcimadamore
Copy link
Contributor

My feeling is that perhaps it's worth the extra memory to not have to duplicate that code for all of those JCTrees, and also to avoid the risk of trying to store end positions on trees that don't support it. But I am open to making those changes if there's a preference for it.

I tend to agree with your assessment. Having code duplication is kind of bad, unless we can somehow "common" the code -- and that's is probably possible, but not straightforward due the JCStatement vs. JCExpression split (and also JCFunctionalExpression).

Also, it's hard to estimate which trees might need this... for instance the trees you show in your analysis don't include some patterns (e.g. record patterns), but that's just because probably there's no record pattern in the JDK, not because the end pos is not useful there.

If we exclude JCSkip and maybe LetExpr (as that's only used by the backend), I'm not sure there's much stuff that actually doesn't require an end position? Perhaps with some keywords like break, continue, ... we might be able to infer the end pos (given it's just start pos + number of chars in the keyword). But not sure how much we are willing to bend the code for special cases like these?

One more interesting experiment could be to try to enable end position in all trees, then run the JDK build and compare with mainline, to see what the memory usage looks like (maybe enabling -verbose:gc and looking where it peaks).

@mlbridge
Copy link

mlbridge bot commented Dec 3, 2025

Mailing list message from Jonathan Gibbons on javadoc-dev:

On Wed, Dec 3, 2025, at 1:47 AM, Liam Miller-Cushon wrote:

My feeling is that perhaps it's worth the extra memory to not have to duplicate that code for all of those `JCTree`s, and also to avoid the risk of trying to store end positions on trees that don't support it. But I am open to making those changes if there's a preference for it.

Yeah, you've done the analysis I think I would have done. And, it does seem l;ke there has been a fundamental shift in the desire/need to keep end positions around since times past.

Simplicity says to go with a field in every JCTree these days. A more informed opinion would probably require more detailed analysis. I note that you can offset the space of the endPos fields that are not strictly required against the savings of the EndPosTable itself.

I also wonder, just for curiousity sake, whether this would help with some of the (uncommon) problem cases in the existing situation, where the endPos of a non-terminal end element might not be the endPos of the entire tree -- IIRC, the issue was with C-style array declarations.

The one test I would suggest to add (if necessary) in the "tree position" set of tests would be to verify that the new field is set in all applicable cases, with maybe some thought being given to synthetic nodes.

-- Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/javadoc-dev/attachments/20251203/1a697c9d/attachment.htm>

@mlbridge
Copy link

mlbridge bot commented Dec 3, 2025

Mailing list message from Jonathan Gibbons on javadoc-dev:

On Wed, Dec 3, 2025, at 11:13 AM, Maurizio Cimadamore wrote:

Perhaps with some keywords like `break`, `continue`, ... we might be able to infer the end pos (given it's just start pos + number of chars in the keyword). But not sure how much we are willing to bend the code for special cases like these?

Doesn't the endPos records the position of the semicolon, not the end of the keyword?

-- Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/javadoc-dev/attachments/20251203/124197b6/attachment.htm>

@mlbridge
Copy link

mlbridge bot commented Dec 3, 2025

Mailing list message from Jonathan Gibbons on javadoc-dev:

On Wed, Dec 3, 2025, at 11:56 AM, Jonathan Gibbons wrote:

On Wed, Dec 3, 2025, at 11:13 AM, Maurizio Cimadamore wrote:

Perhaps with some keywords like `break`, `continue`, ... we might be able to infer the end pos (given it's just start pos + number of chars in the keyword). But not sure how much we are willing to bend the code for special cases like these?

Doesn't the endPos records the position of the semicolon, not the end of the keyword?

-- Jon

A useful heuristic is to check the `visit...` methods in `JCPretty`. If there is a call to `print(String)` or `print(char)` before the `} catch (IOException` then the node should have an endPos. In other words, an endPos is required for all nodes that end in a specific lexical token.

For example, compare this from `visitLambda`

printExpr(tree.body);
} catch (IOException e) {

and this from `visitParens`

print(')');
} catch (IOException e) {
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/javadoc-dev/attachments/20251203/d9402e9d/attachment-0001.htm>

@cushon
Copy link
Contributor Author

cushon commented Dec 4, 2025

One more interesting experiment could be to try to enable end position in all trees, then run the JDK build and compare with mainline, to see what the memory usage looks like (maybe enabling -verbose:gc and looking where it peaks).

I attempted this experiment. I ran configure with --disable-javac-server and added -verbose:gc to the javac args with the change below. Then I made trivial edits to src/java.base/share/classes/java/lang/String.java and rebuilt.

With this PR the output was something like

Compiling up to 3385 files for java.base
[0.003s][info][gc] Using G1
[0.239s][info][gc] GC(0) Pause Young (Normal) (G1 Evacuation Pause) 34M->4M(64M) 2.704ms
[0.415s][info][gc] GC(1) Pause Young (Normal) (G1 Evacuation Pause) 31M->10M(64M) 4.419ms
[0.531s][info][gc] GC(2) Pause Young (Normal) (G1 Evacuation Pause) 37M->16M(64M) 4.672ms
[0.568s][info][gc] GC(3) Pause Young (Concurrent Start) (Metadata GC Threshold) 24M->18M(64M) 2.825ms
[0.568s][info][gc] GC(4) Concurrent Mark Cycle
[0.573s][info][gc] GC(4) Pause Remark 19M->19M(64M) 1.349ms
[0.575s][info][gc] GC(4) Pause Cleanup 19M->19M(64M) 0.007ms
[0.575s][info][gc] GC(4) Concurrent Mark Cycle 6.227ms

And without these changes, it looks like it peaked at 36M instead of 37M

[info][gc] GC(2) Pause Young (Normal) (G1 Evacuation Pause) 36M->16M(64M)

diff --git a/make/common/JavaCompilation.gmk b/make/common/JavaCompilation.gmk
index 33f5d10535a..e9a800bce5a 100644
--- a/make/common/JavaCompilation.gmk
+++ b/make/common/JavaCompilation.gmk
@@ -254,7 +254,7 @@ define SetupJavaCompilationBody
           javacserver.Main --conf=$$($1_JAVAC_SERVER_CONFIG)
     else
       # No javac server
-      $1_JAVAC := $$(INTERIM_LANGTOOLS_ARGS) -m jdk.compiler.interim/com.sun.tools.javac.Main
+      $1_JAVAC := -verbose:gc $$(INTERIM_LANGTOOLS_ARGS) -m jdk.compiler.interim/com.sun.tools.javac.Main

       ifeq ($$($1_SMALL_JAVA), true)
         $1_JAVAC_CMD := $$(JAVA_SMALL) $$($1_JAVA_FLAGS) $$($1_JAVAC)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

compiler compiler-dev@openjdk.org javadoc javadoc-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

2 participants