8372948: Store end positions directly in JCTree #28610

cushon · 2025-12-02T16:05:27Z

This change adds a field to JCTree to store end positions, instead of using a separate EndPosTable map. See also this compile-dev@ thread.

I performed the refactoring in stages, preserving existing semantics at each step.

There are two known places where this changes existing behaviour that are reflected in changes to tests:

test/langtools/tools/javac/api/TestJavacTask_Lock.java - this test asserts that calling JavacTask#parse first and then calling #call or #parse second will fail. The assertion that the test is currently expecting is thrown when the EndPosTable gets set a second time, and this change means that no longer results in an exception. If desired JavacTask#parse could be updated to explicitly check if it is called twice and fail, instead of indirectly relying on the EndPosTable for that.
test/langtools/tools/javac/diags/DiagnosticGetEndPosition.java - there's a comment that 'ideally would be "0", but the positions are not fully set yet', and with the new approach the end position is available to the test, so it resolves the comment

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8372948: Store end positions directly in JCTree (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28610/head:pull/28610
$ git checkout pull/28610

Update a local copy of the PR:
$ git checkout pull/28610
$ git pull https://git.openjdk.org/jdk.git pull/28610/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28610

View PR using the GUI difftool:
$ git pr show -t 28610

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28610.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-12-02T16:06:34Z

👋 Welcome back cushon! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-12-02T16:08:34Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-12-02T16:09:57Z

@cushon The following labels will be automatically applied to this pull request:

compiler
javadoc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-12-02T16:12:57Z

Webrevs

00: Full (cd10d5a0)

mlbridge · 2025-12-02T16:21:21Z

Mailing list message from Jonathan Gibbons on javadoc-dev:

Without looking in detail at this specific proposal, I wonder if you considered the alternative to only store end positions in the subtypes of JCTree that actually "need" them. In other words, you only need store end positions in tree nodes that "end" in a lexical token and not in a child tree node. Effectively, you only need store the end position in tree nodes that would otherwise have entries in the EndPosTable.

-- Jon

On Tue, Dec 2, 2025, at 8:12 AM, Liam Miller-Cushon wrote:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/javadoc-dev/attachments/20251202/41630644/attachment.htm>

cushon · 2025-12-02T16:28:39Z

Without looking in detail at this specific proposal, I wonder if you considered the alternative to only store end positions in the subtypes of JCTree that actually "need" them. In other words, you only need store end positions in tree nodes that "end" in a lexical token and not in a child tree node. Effectively, you only need store the end position in tree nodes that would otherwise have entries in the EndPosTable.

Good question--I hadn't investigated that option. It seems do-able, perhaps with a shared interface for subtypes that needed end positions to simplify the handling of them.

What tradeoffs do you see here, would only declaring the field on trees that need it be mostly about saving memory?

Also is that unique to end positions? Or could javac potentially avoid storing start positions for nodes that don't start with a lexical token as well?

mlbridge · 2025-12-02T16:38:44Z

Mailing list message from Jonathan Gibbons on javadoc-dev:

What tradeoffs do you see here, would only declaring the field on trees that need it be mostly about saving memory?

Probably, yes, as well as just being a closer equivalent to the existing code.

Also is that unique to end positions? Or could javac potentially avoid storing start positions for nodes that don't start with a lexical token as well?

Every JCTree node already has a 'pos' field, representing the position of the first character that is unique to the tree node. That means it is the 'start' position for those nodes that begin with a lexical token, and so no additional field is necessary.

The asymmetry between start and end positions is indicative of why there is only an EndPosTable, without any need for a StartPosTable.

-- Jon

On Tue, Dec 2, 2025, at 8:30 AM, Liam Miller-Cushon wrote:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/javadoc-dev/attachments/20251202/0d4fee58/attachment-0001.htm>

mlbridge · 2025-12-02T16:43:43Z

Mailing list message from Jonathan Gibbons on javadoc-dev:

I'm not sure a shared interface gets you anything significant, since you cannot inherit a shared field that way.

Instead, you could have a `setEndPos` on `JCTree` that is a no-op on subtypes that do not need it, and which sets a locally declared field on subtypes that do need it.

-- Jon

On Tue, Dec 2, 2025, at 8:30 AM, Liam Miller-Cushon wrote:

Good question--I hadn't investigated that option. It seems do-able, perhaps with a shared interface for subtypes that needed end positions to simplify the handling of them.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/javadoc-dev/attachments/20251202/88ae10e9/attachment.htm>

cushon · 2025-12-03T09:44:44Z

I'm not sure a shared interface gets you anything significant, since you cannot inherit a shared field that way.

Partly I was thinking the interface could make helpers that set end positions type safe, e.g. for the storeEnd method in JavacParser if we could write something like:

protected <T extends JCTree & JCTree.HasEndPos> T storeEnd(T tree, int endpos) { ... }

But there are other places that store end positions on a JCTree instead of a specific subtype, so that approach only goes so far.

Instead, you could have a setEndPos on JCTree that is a no-op on subtypes that do not need it, and which sets a locally declared field on subtypes that do need it.

I think the set of trees that end positions are stored for is: JCAnnotation, JCArrayAccess, JCArrayTypeTree, JCAssert, JCAssign, JCBinary, JCBindingPattern, JCBlock, JCBreak, JCCase, JCClassDecl, JCConstantCaseLabel, JCContinue, JCDefaultCaseLabel, JCDoWhileLoop, JCExports, JCExpressionStatement, JCFieldAccess, JCIdent, JCIf, JCImport, JCLambda, JCLiteral, JCMemberReference, JCMethodDecl, JCMethodInvocation, JCModifiers, JCModuleDecl, JCNewArray, JCNewClass, JCOpens, JCPackageDecl, JCParens, JCPatternCaseLabel, JCPrimitiveTypeTree, JCProvides, JCRequires, JCReturn, JCSkip, JCSwitch, JCThrow, JCTypeApply, JCTypeParameter, JCTypeUnion, JCUnary, JCUses, JCVariableDecl, JCWildcard, TypeBoundKind

I got that by instrumenting SimpleEndPosTable and building the JDK, it's possible it missed a few.

And then we could add the following snippet to all of those classes:

    private int endpos;

    public int getEndPos() {
        return endpos;
    }

    public void setEndPos(int endpso) {
        this.endpos = endpos;
    }

My feeling is that perhaps it's worth the extra memory to not have to duplicate that code for all of those JCTrees, and also to avoid the risk of trying to store end positions on trees that don't support it. But I am open to making those changes if there's a preference for it.

mcimadamore · 2025-12-03T19:11:14Z

My feeling is that perhaps it's worth the extra memory to not have to duplicate that code for all of those JCTrees, and also to avoid the risk of trying to store end positions on trees that don't support it. But I am open to making those changes if there's a preference for it.

I tend to agree with your assessment. Having code duplication is kind of bad, unless we can somehow "common" the code -- and that's is probably possible, but not straightforward due the JCStatement vs. JCExpression split (and also JCFunctionalExpression).

Also, it's hard to estimate which trees might need this... for instance the trees you show in your analysis don't include some patterns (e.g. record patterns), but that's just because probably there's no record pattern in the JDK, not because the end pos is not useful there.

If we exclude JCSkip and maybe LetExpr (as that's only used by the backend), I'm not sure there's much stuff that actually doesn't require an end position? Perhaps with some keywords like break, continue, ... we might be able to infer the end pos (given it's just start pos + number of chars in the keyword). But not sure how much we are willing to bend the code for special cases like these?

One more interesting experiment could be to try to enable end position in all trees, then run the JDK build and compare with mainline, to see what the memory usage looks like (maybe enabling -verbose:gc and looking where it peaks).

mlbridge · 2025-12-03T19:56:55Z

Mailing list message from Jonathan Gibbons on javadoc-dev:

On Wed, Dec 3, 2025, at 1:47 AM, Liam Miller-Cushon wrote:

My feeling is that perhaps it's worth the extra memory to not have to duplicate that code for all of those `JCTree`s, and also to avoid the risk of trying to store end positions on trees that don't support it. But I am open to making those changes if there's a preference for it.

Yeah, you've done the analysis I think I would have done. And, it does seem l;ke there has been a fundamental shift in the desire/need to keep end positions around since times past.

Simplicity says to go with a field in every JCTree these days. A more informed opinion would probably require more detailed analysis. I note that you can offset the space of the endPos fields that are not strictly required against the savings of the EndPosTable itself.

I also wonder, just for curiousity sake, whether this would help with some of the (uncommon) problem cases in the existing situation, where the endPos of a non-terminal end element might not be the endPos of the entire tree -- IIRC, the issue was with C-style array declarations.

The one test I would suggest to add (if necessary) in the "tree position" set of tests would be to verify that the new field is set in all applicable cases, with maybe some thought being given to synthetic nodes.

-- Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/javadoc-dev/attachments/20251203/1a697c9d/attachment.htm>

mlbridge · 2025-12-03T19:58:49Z

Mailing list message from Jonathan Gibbons on javadoc-dev:

On Wed, Dec 3, 2025, at 11:13 AM, Maurizio Cimadamore wrote:

Perhaps with some keywords like `break`, `continue`, ... we might be able to infer the end pos (given it's just start pos + number of chars in the keyword). But not sure how much we are willing to bend the code for special cases like these?

Doesn't the endPos records the position of the semicolon, not the end of the keyword?

-- Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/javadoc-dev/attachments/20251203/124197b6/attachment.htm>

mlbridge · 2025-12-03T20:05:11Z

Mailing list message from Jonathan Gibbons on javadoc-dev:

On Wed, Dec 3, 2025, at 11:56 AM, Jonathan Gibbons wrote:

On Wed, Dec 3, 2025, at 11:13 AM, Maurizio Cimadamore wrote:

Perhaps with some keywords like `break`, `continue`, ... we might be able to infer the end pos (given it's just start pos + number of chars in the keyword). But not sure how much we are willing to bend the code for special cases like these?

Doesn't the endPos records the position of the semicolon, not the end of the keyword?

-- Jon

A useful heuristic is to check the `visit...` methods in `JCPretty`. If there is a call to `print(String)` or `print(char)` before the `} catch (IOException` then the node should have an endPos. In other words, an endPos is required for all nodes that end in a specific lexical token.

For example, compare this from `visitLambda`

printExpr(tree.body);
} catch (IOException e) {

and this from `visitParens`

print(')');
} catch (IOException e) {
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/javadoc-dev/attachments/20251203/d9402e9d/attachment-0001.htm>

cushon · 2025-12-04T14:06:42Z

One more interesting experiment could be to try to enable end position in all trees, then run the JDK build and compare with mainline, to see what the memory usage looks like (maybe enabling -verbose:gc and looking where it peaks).

I attempted this experiment. I ran configure with --disable-javac-server and added -verbose:gc to the javac args with the change below. Then I made trivial edits to src/java.base/share/classes/java/lang/String.java and rebuilt.

With this PR the output was something like

Compiling up to 3385 files for java.base
[0.003s][info][gc] Using G1
[0.239s][info][gc] GC(0) Pause Young (Normal) (G1 Evacuation Pause) 34M->4M(64M) 2.704ms
[0.415s][info][gc] GC(1) Pause Young (Normal) (G1 Evacuation Pause) 31M->10M(64M) 4.419ms
[0.531s][info][gc] GC(2) Pause Young (Normal) (G1 Evacuation Pause) 37M->16M(64M) 4.672ms
[0.568s][info][gc] GC(3) Pause Young (Concurrent Start) (Metadata GC Threshold) 24M->18M(64M) 2.825ms
[0.568s][info][gc] GC(4) Concurrent Mark Cycle
[0.573s][info][gc] GC(4) Pause Remark 19M->19M(64M) 1.349ms
[0.575s][info][gc] GC(4) Pause Cleanup 19M->19M(64M) 0.007ms
[0.575s][info][gc] GC(4) Concurrent Mark Cycle 6.227ms

And without these changes, it looks like it peaked at 36M instead of 37M

[info][gc] GC(2) Pause Young (Normal) (G1 Evacuation Pause) 36M->16M(64M)

diff --git a/make/common/JavaCompilation.gmk b/make/common/JavaCompilation.gmk
index 33f5d10535a..e9a800bce5a 100644
--- a/make/common/JavaCompilation.gmk
+++ b/make/common/JavaCompilation.gmk
@@ -254,7 +254,7 @@ define SetupJavaCompilationBody
           javacserver.Main --conf=$$($1_JAVAC_SERVER_CONFIG)
     else
       # No javac server
-      $1_JAVAC := $$(INTERIM_LANGTOOLS_ARGS) -m jdk.compiler.interim/com.sun.tools.javac.Main
+      $1_JAVAC := -verbose:gc $$(INTERIM_LANGTOOLS_ARGS) -m jdk.compiler.interim/com.sun.tools.javac.Main

       ifeq ($$($1_SMALL_JAVA), true)
         $1_JAVAC_CMD := $$(JAVA_SMALL) $$($1_JAVA_FLAGS) $$($1_JAVAC)

8372948: Store end positions directly in JCTree

cd10d5a

cushon marked this pull request as ready for review December 2, 2025 16:05

openjdk bot added javadoc javadoc-dev@openjdk.org compiler compiler-dev@openjdk.org labels Dec 2, 2025

openjdk bot added the rfr Pull request is ready for review label Dec 2, 2025

cushon mentioned this pull request Dec 2, 2025

Store end positions directly in JCTree #28506

Closed

3 tasks

8372948: Store end positions directly in JCTree #28610

Are you sure you want to change the base?

8372948: Store end positions directly in JCTree #28610

Conversation

cushon commented Dec 2, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Dec 2, 2025

Uh oh!

openjdk bot commented Dec 2, 2025

Uh oh!

openjdk bot commented Dec 2, 2025

Uh oh!

mlbridge bot commented Dec 2, 2025

Webrevs

Uh oh!

mlbridge bot commented Dec 2, 2025

Uh oh!

cushon commented Dec 2, 2025

Uh oh!

mlbridge bot commented Dec 2, 2025

Uh oh!

mlbridge bot commented Dec 2, 2025

Uh oh!

cushon commented Dec 3, 2025

Uh oh!

mcimadamore commented Dec 3, 2025

Uh oh!

mlbridge bot commented Dec 3, 2025

Uh oh!

mlbridge bot commented Dec 3, 2025

Uh oh!

mlbridge bot commented Dec 3, 2025

Uh oh!

cushon commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

cushon commented Dec 2, 2025 •

edited by openjdk bot

Loading