Skip to content

Conversation

@hjohn
Copy link
Collaborator

@hjohn hjohn commented Oct 23, 2025

This new check is much more accurate to detect whether a parent is currently laying out its children. The previous code almost never worked, resulting in additional unnecessary layouts.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (3 reviews required, with at least 1 Reviewer, 2 Authors)

Issue

  • JDK-8370498: Improve how Node detects whether a layout property change requires a new layout pass (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jfx.git pull/1945/head:pull/1945
$ git checkout pull/1945

Update a local copy of the PR:
$ git checkout pull/1945
$ git pull https://git.openjdk.org/jfx.git pull/1945/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1945

View PR using the GUI difftool:
$ git pr show -t 1945

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jfx/pull/1945.diff

Using Webrev

Link to Webrev Comment

This new check is much more accurate to detect whether a parent is currently laying out its children. The previous code almost never worked, resulting in additional unnecessary layouts.
@bridgekeeper
Copy link

bridgekeeper bot commented Oct 23, 2025

👋 Welcome back jhendrikx! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 23, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@hjohn hjohn changed the title Improve how Node detects whether a relayout is required 8370498 Improve how Node detects whether a layout property change requires a new layout pass Oct 23, 2025
@openjdk openjdk bot changed the title 8370498 Improve how Node detects whether a layout property change requires a new layout pass 8370498: Improve how Node detects whether a layout property change requires a new layout pass Oct 23, 2025
@openjdk openjdk bot added the rfr Ready for review label Oct 23, 2025
@hjohn
Copy link
Collaborator Author

hjohn commented Oct 23, 2025

It looks like there is one failing test for ToolBarSkin. Likely there is something relying on the superfluous 2nd layout. I'm looking into it.

@mlbridge
Copy link

mlbridge bot commented Oct 23, 2025

Webrevs

@kevinrushforth
Copy link
Member

Reviewers: @johanvos @kevinrushforth @arapte

We will need a unit test for this. What is the risk of regression? Are there additional tests that we could add to help detect any regressions?

/reviewers 3

@openjdk
Copy link

openjdk bot commented Oct 23, 2025

@kevinrushforth
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 3 (with at least 1 Reviewer, 2 Authors).

@kevinrushforth kevinrushforth self-requested a review October 23, 2025 16:00
Reusing a toolbar as part of several scenes, in combination with the StubToolkit that doesn't handle pulses makes this test fail with the relayout detection fix.
@hjohn
Copy link
Collaborator Author

hjohn commented Oct 23, 2025

ToolBarSkinTest failed due to a combination of using the StubToolkit (which doesn't run pulses) and the reuse of the ToolBar node in several different stages (which somehow is allowed without warning). I've modified the test so that it still detects the original problem that it intended to fix (I reverted the fix for https://bugs.openjdk.org/browse/JDK-8364049 and checked that the rewritten test still fails without the fix).

The rewritten test works without problem with this fix, and also without this fix.

The reason why the test failed is because the size cache in Parent was re-used between tests. Before this PR, the test would do this:

  • Creates a Stage with a reusable ToolBar
  • Shows the stage (this triggers a layout, it's not because of a pulse as StubToolKit doesn't run those)
  • Lots of layout occurs, including an attempt to start a 2nd layout pass (which StubToolKit won't run)
  • The triggering of the 2nd layout pass however would clear the size cache in preparation for a next layout pass
  • The stage is hidden, and the 2nd layout pass never runs
  • A new Stage is created with the reused ToolBar, and luckily its size cache was cleared so it had to redo the calculations with the new render scale.

In the version with the fix in this PR applied, no 2nd layout pass is triggered, and thus the size cache was not cleared, and the reuse of the ToolBar node would then happily use size values belonging to the old render scale.

I also tested ToolBar on my own system (with a real program), and dragged the window from a monitor with 150% to 125% scale, and the tool bar looked okay.

@hjohn
Copy link
Collaborator Author

hjohn commented Oct 23, 2025

Reviewers: @johanvos @kevinrushforth @arapte

We will need a unit test for this.

I think I can add one that confirms that no 2nd layout pass occurs when it is not needed when detected changes are part of the current pass.

What is the risk of regression?

I think it is certainly possible that some poorly constructed layoutChildren methods may inadvertently be relying on a 2nd layout pass occurring whenever there is a major UI change (when a control changes sufficiently that its parent must also change size). If that 2nd pass however did anything more than "confirming" the first layout pass, then this likely would result in an infinite layout loop (whereas now it would stop after a single pass). Assuming that controls that create infinite layout loops are fixed before they reach a wide audience (as you'd likely notice the drain on resources) it seems to me that it is unlikely such controls are in active use.

Note that it is really easy to get the "old" behavior; just return false in Parent::inLayoutChildren -- I've been using this to test differences. If we're really worried this may cause problems, we could turn this into a system property so people may opt out of this fix.

The benefits of this fix could be a reduction in layout passes by half when doing things like resizing windows. This may be quite noticeable on heavy controls like TableViews.

Are there additional tests that we could add to help detect any regressions?

Code that may not work correctly with this fix is code where a container is unable to size and position its direct children in a stable fashion in one pass. If the container is tracking state between layout passes, is not clearing caches at the right time, etc, then a 2nd layout pass (with all else being equal) may result in its direct children being positioned differently. It's possible some code out there has such problems; they would currently require a 2nd (and perhaps a 3rd or 4th pass) before reaching a stable set of positions and sizes for its direct children. Note that if they never stabilize, the code before this fix would just keep running layout passes indefinitely. The fixed code will only do so once regardless.

As soon as anything other than the container and its direct children is modified (ie. a sibling container, a grand child, or a child of a sibling container), then this code won't block another layout pass, so any regressions will be limited to containers being unable to decide how to position its children in a single pass. If the layout children code modifies some grand child, or some other node it happens to have a reference to, then this will still result in a new layout pass.

I could write a test that will work with the old code, and fail with the new code (simple counting layout passes, or having some state that requires multiple passes would work to "detect" this fix) but am unsure what purpose that would serve. It's not a supported or documented use of the layout system. Layouts that need time to settle should still have a strict opinion on the positioning of a single container's direct children, but are likely being jostled by other influences (sibling containers, grand children). Such situations should still trigger further layout passes until everything converges (hopefully).

Note that "failure" here would just mean that it shows you the result of the first pass; FX would not crash or go into infinite loops.

@mlbridge
Copy link

mlbridge bot commented Oct 23, 2025

Mailing list message from Andy Goryachev on openjfx-dev:

We will need a unit test for this.

Perhaps we could even get it a bit further than that. What would you say could be a sufficiently comprehensive set of tests/scenarios to reduce the probability of regression?

Let's say, we limit the depth of the hierarchy to 3 (node-parent-parent), then the combinatorial complexity should still be manageable.

What do you think?

-andy

From: openjfx-dev <openjfx-dev-retn at openjdk.org> on behalf of John Hendrikx <jhendrikx at openjdk.org>
Date: Thursday, October 23, 2025 at 10:47
To: openjfx-dev at openjdk.org <openjfx-dev at openjdk.org>
Subject: Re: RFR: 8370498: Improve how Node detects whether a layout property change requires a new layout pass

On Thu, 23 Oct 2025 15:52:55 GMT, Kevin Rushforth <kcr at openjdk.org> wrote:

Reviewers: @johanvos @kevinrushforth @arapte

We will need a unit test for this.

I think I can add one that confirms that no 2nd layout pass occurs when it is not needed when detected changes are part of the current pass.

What is the risk of regression?

I think it is certainly possible that some poorly constructed `layoutChildren` methods may inadvertently be relying on a 2nd layout pass occurring whenever there is a major UI change (when a control changes sufficiently that its parent must also change size). If that 2nd pass however did anything more than "confirming" the first layout pass, then this likely would result in an infinite layout loop (whereas now it would stop after a single pass). Assuming that controls that create infinite layout loops are fixed before they reach a wide audience (as you'd likely notice the drain on resources) it seems to me that it is unlikely such controls are in active use.

Note that it is really easy to get the "old" behavior; just return `false` in `Parent::inLayoutChildren` -- I've been using this to test differences. If we're really worried this may cause problems, we could turn this into a system property so people may opt out of this fix.

The benefits of this fix could be a reduction in layout passes by half when doing things like resizing windows. This may be quite noticeable on heavy controls like TableViews.

Are there additional tests that we could add to help detect any regressions?

Code that may not work correctly with this fix is code where a container is unable to size and position its direct children in a stable fashion in one pass. If the container is tracking state between layout passes, is not clearing caches at the right time, etc, then a 2nd layout pass (with all else being equal) may result in its direct children being positioned differently. It's possible some code out there has such problems; they would currently require a 2nd (and perhaps a 3rd or 4th pass) before reaching a stable set of positions and sizes for its direct children. Note that if they never stabilize, the code before this fix would just keep running layout passes indefinitely. The fixed code will only do so once regardless.

As soon as anything other than the container and its direct children is modified (ie. a sibling container, a grand child, or a child of a sibling container), then this code won't block another layout pass, so any regressions will be limited to containers being unable to decide how to position its children in a single pass. If the layout children code modifies some grand child, or some other node it happens to have a reference to, then this will still result in a new layout pass.

I could write a test that will work with the old code, and fail with the new code (simple counting layout passes, or having some state that requires multiple passes would work to "detect" this fix) but am unsure what purpose that would serve. It's not a supported or documented use of the layout system. Layouts that need time to settle should still have a strict opinion on the positioning of a single container's direct children, but are likely being jostled by other influences (sibling containers, grand children). Such situations should still trigger further layout passes until everything converges (hopefully).

Note that "failure" here would just mean that it shows you the result of the first pass; FX would not crash or go into infinite loops.

-------------

PR Comment: https://git.openjdk.org/jfx/pull/1945#issuecomment-3438319381
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/openjfx-dev/attachments/20251023/6bf3e529/attachment-0001.htm>

@hjohn hjohn mentioned this pull request Oct 24, 2025
4 tasks
@hjohn
Copy link
Collaborator Author

hjohn commented Oct 24, 2025

Mailing list message from Andy Goryachev on openjfx-dev:

We will need a unit test for this.

Perhaps we could even get it a bit further than that. What would you say could be a sufficiently comprehensive set of tests/scenarios to reduce the probability of regression?

Let's say, we limit the depth of the hierarchy to 3 (node-parent-parent), then the combinatorial complexity should still be manageable.

What do you think?

As much as I'd like to help out more, after analyzing this problem already for almost two full days, I don't have time to write a comprehensive test suite for one of the more complex systems in JavaFX. I can of course extend existing tests to get behavioral coverage on this new fix.

So, I think we then should choose what we want. We can revert https://bugs.openjdk.org/browse/JDK-8360940 and go back to a situation where the layout system can easily get into a bad state while using documented API's, or we can take a critical look at this new fix, which is fixing a long standing problem that was hidden by the bug that was fixed, and could be a major performance improvement for complex UI's.

I think manual testing with some large UI's and/or Monkey Tester should be more than sufficient to discover if this is likely to cause regressions, and I've already been doing so before I submitted this fix (I tend to just apply these fixes to my own code directly, to see how they work out).

If we're still unsure, a system property can turn this on/off easily, at the cost of getting some redundant layout passes in the "off" mode (as it was before).

I already knew that the detection that Node was doing for triggering a 2nd layout pass was bogus, because isCurrentLayoutChild is simply not updated at the correct moments for this to work. It works if you ask if a Node's parent is the current layout child of the grand parent, but it does not work for one level deeper: asking if a node is the current layout child of its parent.

That the logic looked correct at first glance may have have convinced the original authors that it does work, but it doesn't. More importantly, it can't be fixed as a fix would require each modification of layout X/Y to wrap this change by setting the current layout child first -- that would require updating all layout containers, all 3rd party layout containers, and all 3rd party code that calls relocate or sets layout X/Y directly -- an impossibility IMHO. That's also the reason I didn't fix that immediately, as at the time I didn't see a workable solution.

@kevinrushforth
Copy link
Member

@hjohn Thank you for providing the additional analysis. This will help those of us who will review this proposed fix evaluate it to see if we can poke any holes in it.

As you say, we have two choices before us: fix this bug or revert the fix for JDK-8360940. If we can convince ourselves that this is the right fix, taking this fix would be preferable to re-introducing JDK-8360940.

As for testing, I think manual testing using various apps (e.g., Ensemble, MonkeyTester, SceneBuilder) is needed. And while we also need a new test for this fix -- that is, a test that fails before and passes after the fix -- I would agree that adding a whole battery of functional tests for layout is out of scope and better done as a follow-on.

Finally, as you point out, we could provide a system property to go back to the way we did it before. If we do that (which I am not advocating at this point in the review), we might consider having that flag also revert the behavior of JDK-8360940.

@mlbridge
Copy link

mlbridge bot commented Oct 24, 2025

Mailing list message from Andy Goryachev on openjfx-dev:

Right. Apart from a unit test for this specific issue, I would like us at least to think? about enumerating the typical scenarios and scenarios where the outcome might be different.

-andy

From: openjfx-dev <openjfx-dev-retn at openjdk.org> on behalf of Kevin Rushforth <kcr at openjdk.org>
Date: Friday, October 24, 2025 at 07:40
To: openjfx-dev at openjdk.org <openjfx-dev at openjdk.org>
Subject: Re: RFR: 8370498: Improve how Node detects whether a layout property change requires a new layout pass [v2]

On Fri, 24 Oct 2025 05:33:45 GMT, John Hendrikx <jhendrikx at openjdk.org> wrote:

John Hendrikx has updated the pull request incrementally with one additional commit since the last revision:

Fix ToolBarSkinTest

Reusing a toolbar as part of several scenes, in combination with the StubToolkit that doesn't handle pulses makes this test fail with the relayout detection fix.

_Mailing list message from [Andy Goryachev](mailto:andy.goryachev at oracle.com) on [openjfx-dev](mailto:openjfx-dev at mail.openjdk.org):_

We will need a unit test for this.

Perhaps we could even get it a bit further than that. What would you say could be a sufficiently comprehensive set of tests/scenarios to reduce the probability of regression?

Let's say, we limit the depth of the hierarchy to 3 (node-parent-parent), then the combinatorial complexity should still be manageable.

What do you think?

As much as I'd like to help out more, after analyzing this problem already for almost two full days, I don't have time to write a comprehensive test suite for one of the more complex systems in JavaFX. I can of course extend existing tests to get behavioral coverage on this new fix.

So, I think we then should choose what we want. We can revert https://bugs.openjdk.org/browse/JDK-8360940 and go back to a situation where the layout system can easily get into a bad state while using documented API's, or we can take a critical look at this new fix, which is fixing a long standing problem that was hidden by the bug that was fixed, and could be a major performance improvement for complex UI's.

I think manual testing with some large UI's and/or Monkey Tester should be more than sufficient to discover if this is likely to cause regressions, and I've already been doing so before I submitted this fix (I tend to just apply these fixes to my own code directly, to see how they work out).

If we're still unsure, a system property can turn this on/off easily, at the cost of getting some redundant layout passes in the "off" mode (as it was before).

I already knew that the detection that `Node` was doing for triggering a 2nd layout pass was bogus, because `isCurrentLayoutChild` is simply not updated at the correct moments for this to work. It works if you ask if a Node's **parent** is the current layout child of the **grand parent**, but it does not work for one level deeper: asking if a node is the current layout child of its parent.

That the logic looked correct at first glance may have have convinced the original authors that it does work, but it doesn't. More importantly, it can't be fixed as a fix would require **each** modification of layout X/Y to wrap this change by setting the current layout child first -- that would require updating all layout containers, all 3rd party layout containers, and all 3rd party code that calls `relocate` or sets...

@hjohn Thank you for providing the additional analysis. This will help those of us who will review this proposed fix evaluate it to see if we can poke any holes in it.

As you say, we have two choices before us: fix this bug or revert the fix for [JDK-8360940](https://bugs.openjdk.org/browse/JDK-8360940). If we can convince ourselves that this is the right fix, taking this fix would be preferable to re-introducing JDK-8360940.

As for testing, I think manual testing using various apps (e.g., Ensemble, MonkeyTester, SceneBuilder) is needed. And while we also need a new test for _this_ fix -- that is, a test that fails before and passes after the fix -- I would agree that adding a whole battery of functional tests for layout is out of scope and better done as a follow-on.

Finally, as you point out, we could provide a system property to go back to the way we did it before. If we do that (which I am not advocating at this point in the review), we might consider having that flag also revert the behavior of [JDK-8360940](https://bugs.openjdk.org/browse/JDK-8360940).

-------------

PR Comment: https://git.openjdk.org/jfx/pull/1945#issuecomment-3443507993
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/openjfx-dev/attachments/20251024/57ebcb74/attachment.htm>

@johanvos
Copy link
Collaborator

would agree that adding a whole battery of functional tests for layout is out of scope and better done as a follow-on.

I'm not sure I agree with this. Personally, I consider the battery of functional tests to be more important than the fix, so without the tests, I would be very conservative.
However, we're still relatively early in the 26 release cycle, and we won't do a 26 LTS, so risks on real-world drama's are limited. We can still revert this PR (if it got merged) and the previous one later in the 26-development phase.

I'll look into a more deterministic test scenario -- the suggestion from Andy with 3 levels make sense (although reality is much more complex, with Platform.runLater() running in between 2 pulses, and that is non-deterministic as it depends on processor speed etc.

@hjohn
Copy link
Collaborator Author

hjohn commented Oct 26, 2025

would agree that adding a whole battery of functional tests for layout is out of scope and better done as a follow-on.

I'm not sure I agree with this. Personally, I consider the battery of functional tests to be more important than the fix, so without the tests, I would be very conservative. However, we're still relatively early in the 26 release cycle, and we won't do a 26 LTS, so risks on real-world drama's are limited. We can still revert this PR (if it got merged) and the previous one later in the 26-development phase.

I'll look into a more deterministic test scenario -- the suggestion from Andy with 3 levels make sense (although reality is much more complex, with Platform.runLater() running in between 2 pulses, and that is non-deterministic as it depends on processor speed etc.

It would actually be much easier to create deterministic tests after this change (unless we're dealing with a converging layout that can sometimes happen with biased containers). In the before situation, you should check if another pulse is scheduled after the layout pass, and execute it as well to be sure that the layout is stable. In the after situation, this will almost never be the case and the presence of a pulse request can be asserted to be false.

In any case, I'd still recommend either reverting https://bugs.openjdk.org/browse/JDK-8364049 or combining it with this fix. The intermediate situation is still fine for most cases, but there can be nasty surprises if left in by itself (albeit only in somewhat self-contradicting layout code).

I think having both the fixes guarded behind a system property averts most risks. I'd recommend setting it on by default though, to get real world feedback, as I think we do want to incorporate this fix permanently if it works out. I think it definitely much closer matches the intent of the original implementation.

@johanvos
Copy link
Collaborator

I could write a test that will work with the old code, and fail with the new code (simple counting layout passes, or having some state that requires multiple passes would work to "detect" this fix) but am unsure what purpose that would serve.

The main purpose is testing for regression. We want to make sure a future PR does not re-introduces the old situation.

@arapte
Copy link
Member

arapte commented Oct 27, 2025

The method isCurrentLayoutChild() was introduced with the fix for JDK-8137252.
With this PR change, the issue JDK-8137252 recurs.

@hjohn
Copy link
Collaborator Author

hjohn commented Oct 27, 2025

The method isCurrentLayoutChild() was introduced with the fix for JDK-8137252. With this PR change, the issue JDK-8137252 recurs.

Thanks, I will have a look. If I can resolve the issue, I'll add a test case for this problem as well so we can detect a regression more easily.

@hjohn
Copy link
Collaborator Author

hjohn commented Oct 27, 2025

The fix in https://bugs.openjdk.org/browse/JDK-8137252 is only "solving" the problem because now a relayout always occurs (until nobody modifies layout X/Y to different values). The isCurrentLayoutChild check never returns true for a StackPane or any other layout container, because they don't update the current layout child before positioning said child. The code in Parent::layoutChildren is a red herring and will not apply this logic magically to all other layout containers.

So, yes, https://bugs.openjdk.org/browse/JDK-8137252 fixes the problem, but it does so at the cost of an extra layout pass in all cases. In the sample application, even if there was only a Label in that stack pane and no Ellipse, then the first time it is laid out, it will do a 2nd pass because Label's layout X/Y was modified...

I'm now looking further how this problem relates to Shape and its subtypes, as this fix is not a problem for node types based on Parent.

@hjohn
Copy link
Collaborator Author

hjohn commented Oct 27, 2025

So I've been playing with the program in https://bugs.openjdk.org/browse/JDK-8137252 and here are some observations:

  • It is binding width/height of a sibling to another (size altering) property of another sibling. This is questionable practice, especially when both siblings are managed and part of a container that uses managed children (StackPane in this case). You get away with this because Shapes are not resizable (by the standard width/height mechanism), and so the container can't force its preferred size onto a Shape. If it was anything else (like a Region or Button) then after changing the size, relayout would run, and your change would be overridden, at least, until you set that control to unmanaged.

  • Because containers do not resize non-resizable controls, but do set their positions, this is a very special half managed/half unmanaged territory.

  • The whole thing works by doing a 2nd layout pass. This means that when you modify the width/height of the source sibling, that this will trigger a first layout pass. During the layout pass, you modify the values of the destination sibling. This then requires another layout pass. Two layout passes aside, this will show up on your screen as the layout "jumping" (ie. it shows the destination sibling first in the wrong position, then in the 2nd pass it corrects this). I don't see why anyone would want to do it this way, as this will make the layout jump. In fact, if you go further, and bind more properties in these ways, you can get a 3rd layout pass, and a 4th etc. This is not a good solution.

Currently, on each change (showing the stage, changing the source sibling's size) we see the layoutX of the destination sibling jumping:

      lx = 0.0
      Showing scene
      lx = 9.333332697550457
      lx = 17.99999936421712
      Triggering 2nd change
      lx = 17.333334604899086
      lx = 34.00000127156576

What's the correct way then to draw a circle around a label that doesn't make your layout jump?

The wanted solution does not make it easy. Because the label's width is used as radius, the desired circle actually is twice as wide as the label. The stack pane however is unaware of this requirement as the circle is not resizable and does not have min/pref/max properties. It has some similarities with solutions that must know the size of text (in the correct font after CSS has been applied) because other elements depend on it.

The solution is to add a listener to the label's needsLayoutProperty. Every time something of consequence changes on the Label, one can now calculate how large the circle should be. It is a bit more involved, but you get a single layout pass, and there is no UI jumping anymore:

public class Test1 extends Application {

    public static void main(String[] args) {
        launch(args);
    }

    @Override
    public void start(Stage stage) {

        Label l1 = new Label("1 2");
        StackPane sp = new StackPane(l1, new EllipseWrapper(l1));

        sp.relocate(200, 100);

        Pane topPane = new Pane(sp);
        Scene scene = new Scene(topPane, 600, 400);

        sp.setStyle("-fx-border-color: RED;");

        stage.setScene(scene);
        System.out.println("Showing scene");

        stage.show();

        Thread.ofVirtual().start(() -> {
          for(;;) {
            try {
              Thread.sleep(1500);
            }
            catch(InterruptedException e) {
              break;
            }

            Platform.runLater(() -> {
              l1.setText("" + (int)(2000 * Math.random()));
            });
          }
        });
    }

    // wrapper because Ellipse has no layout properties for StackPane to use:
    class EllipseWrapper extends Region {
      private final Ellipse ellipse = new Ellipse();
      private final Label label;

      public EllipseWrapper(Label label) {
          this.label = label;
          ellipse.setOpacity(0.5);
          getChildren().add(ellipse);

          // using a change listener to ensure property is revalidated:
          label.needsLayoutProperty().subscribe(v -> { if(v) requestLayout(); });
      }

      @Override
      protected double computePrefWidth(double height) {
          return 2 * label.prefWidth(-1);  // proper calculation for layout!
      }

      @Override
      protected double computePrefHeight(double width) {
          return 2 * label.prefHeight(-1);
      }

      @Override
      protected void layoutChildren() {
          double w = getWidth();
          double h = getHeight();
          ellipse.setRadiusX(w / 2);
          ellipse.setRadiusY(h / 2);
          ellipse.setCenterX(w / 2);
          ellipse.setCenterY(h / 2);
      }
  }
}

@hjohn
Copy link
Collaborator Author

hjohn commented Oct 27, 2025

IMHO there are now two ways forward:

  1. We wish to support this rather odd program in https://bugs.openjdk.org/browse/JDK-8137252 despite it requiring two layout passes (with a visible UI jump). This is possible by allowing only non-resizable containers to unconditionally trigger the relayout logic (instead of having everything trigger it because of the current disfunctional isCurrentLayoutChild logic):

                 if (!isResizable() || (p != null && !p.inLayoutChildren())) {
    
  2. We don't want to support this, as IMHO, it is bad practice to bind to layout properties as this will inevitably lead to multiple layout passes and jumps (which makes JavaFX UI's appear flakey).

I've already written a couple of tests that nicely capture the advantage of this fix. The tests clearly show that only one layout pass now occurs with this fix.

Copy link
Contributor

@andy-goryachev-oracle andy-goryachev-oracle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I would like to thank you John for looking into the layout problems. We've got these long standing issues that are very difficult to debug and fix.

I think this is valuable work as it definitely improves the platform, so Danke schön.

The reason I asked about tests and test scenarios is the possibility of regression. Case in point - with this PR, on macOS with an external monitor at scale=1:

Image

I would second @johanvos in suggesting that the regression is what we should be guarding against, and perhaps expanding the tests.

try {
layoutChildren();
}
finally {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor suggestion:
} finally {

@hjohn
Copy link
Collaborator Author

hjohn commented Oct 28, 2025

First of all, I would like to thank you John for looking into the layout problems. We've got these long standing issues that are very difficult to debug and fix.

I think this is valuable work as it definitely improves the platform, so Danke schön.

The reason I asked about tests and test scenarios is the possibility of regression. Case in point - with this PR, on macOS with an external monitor at scale=1:

It is starting to look like there may be more code relying on double layouts than I thought, even though most controls work absolutely fine. I think that it is still worth pursuing eliminating the need for these (as you can temporarily see the "wrong" positions), but it may take longer as each of these will need an investigation.

I remain convinced though that the original fix in https://bugs.openjdk.org/browse/JDK-8137252 was applied too hastily, and only worked because it, unintentionally, simply always allowed double layouts.

I'll see if I can reproduce the menu issue and what is the culprit there. I'm not on Mac though, so hoping that the problem is also present on other platforms.

@johanvos
Copy link
Collaborator

It is starting to look like there may be more code relying on double layouts than I thought, even though most controls work absolutely fine. I think that it is still worth pursuing eliminating the need for these (as you can temporarily see the "wrong" positions), but it may take longer as each of these will need an investigation.

I remain convinced though that the original fix in https://bugs.openjdk.org/browse/JDK-8137252 was applied too hastily, and only worked because it, unintentionally, simply always allowed double layouts.

That is very well possible, but it is what it is. I believe the main goal of the layout phase in a pulse is to make sure that "ultimately", all direct/indirect requests are handled. The secondary goal is that it should be done as efficient as possible, e.g. do not require 2 (or more) passes unless absolutely needed.
The issue is, though, that there are a huge amount of possible scenarios, leading to situations that can not be dealt with by a single, optimized flow. I started documenting and analyzing scenarios, and even a very basic 2 node case poses issues that can go wrong.
Looking at it with our openjfx glasses, some of these scenarios are really bad. However, developers without internal knowledge often have no idea how and why it can go wrong. Adding bindings/listeners between siblings is a very common pattern, and when looking at my old code, I made tons of "mistakes" by having too much bindings that asked the layout phase to solve an almost impossible job.
At least the layout system in JavaFX gives developers lots of freedom, and it promises to handle all edge cases. That fulfills the main goal (correct rendering, perhaps after a number of pulses, leading to flickering), but it makes the second goal (top-efficiency) really hard.

To make it harder (but very understandable), the layout phase spans a number of classes, where different concepts/choices are made (e.g. both Parent and Node have internal state that is used to determine whether to initiate a new pulse). That makes it really hard to come up with a system that would typically be used in these situations: use an algorithm that always works (although maybe less performant), and use optimizations in specific cases (e.g. no bindings in properties of children in a chontainer -> use this branch).

Something I've been thinking about every now and then is to introduce runtime warnings (ideally compiler errors, but that would require lots of upfront analysis), where the layout subsystem can warn that "a pretty complex situation occurred", e.g. when it is running into cyclic conditions that are not trivial to resolve without performance degradation. But that would be a major effort and conceptual change.

TLDR: I feel your pain and it can be very frustrating having to deal with non-optimal but valid code, deployed in the wild.

@andy-goryachev-oracle
Copy link
Contributor

I'll see if I can reproduce the menu issue and what is the culprit there.

Thanks! For what it's worth, the issue also appears when I move the window from the main retina screen (scale=2) to an external monitor (scale=1). I would imagine it should be easy to reproduce on Windows, especially with a fractional scale.

@andy-goryachev-oracle
Copy link
Contributor

That fulfills the main goal (correct rendering, perhaps after a number of pulses, leading to flickering), but it makes the second goal (top-efficiency) really hard.

I might be way off, but I wanted to ask you this:

How many pulses are needed to finish the layout? If we ignore for a second some pathological cases when the layout process never ends causing continuous flicker, is there a safe upper limit?

What I am getting at is - what if we run more than one layout pass (Scene::doLayoutPass) per pulse? In other words, if the layout is still dirty, we keep doing the layout until it's settled, without the associated re-rendering and flicker, and if it's still dirty after N cycles we print a warning (if said warning is enabled)?

What do you think?

@hjohn
Copy link
Collaborator Author

hjohn commented Oct 28, 2025

That fulfills the main goal (correct rendering, perhaps after a number of pulses, leading to flickering), but it makes the second goal (top-efficiency) really hard.

I might be way off, but I wanted to ask you this:

How many pulses are needed to finish the layout? If we ignore for a second some pathological cases when the layout process never ends causing continuous flicker, is there a safe upper limit?

What I am getting at is - what if we run more than one layout pass (Scene::doLayoutPass) per pulse? In other words, if the layout is still dirty, we keep doing the layout until it's settled, without the associated re-rendering and flicker, and if it's still dirty after N cycles we print a warning (if said warning is enabled)?

What do you think?

It's not a bad idea to just run layout again if after running layout the root is still dirty. That would kill the flicker, but would just reinforce bad patterns. Also, the CPU cost is still there (ie. if you resize a Window with say a large complex tableview, then it will still have to do a whole lot of calculations twice, while the 2nd run basically changes nothing).

@andy-goryachev-oracle
Copy link
Contributor

Also, the CPU cost is still there

that's true, but the cost will be there anyway - but now we are skipping the rendering and removing the flicker. So it's a win-win, as long as the layout converges.

One example is when the layout must further change based on the current layout pass, such as when the scroll bar appears or disappears.

Also, doing these burst micro-layouts might be independent of any other work we are doing in terms of removing "bad patters" (the scroll bar scenario above is not really a bad pattern on itself, just a fact of life, I think).

@johanvos
Copy link
Collaborator

How many pulses are needed to finish the layout? If we ignore for a second some pathological cases when the layout process never ends causing continuous flicker, is there a safe upper limit?

Realistically, I'd say 10 iterations is pretty common, and I wouldn't be surprised if it goes much higher in some common applications. With 10 iterations, I mean the amount of "layout phases" that are chained by a requestNextPulse during the previous layout phase. Once there is a layout phase without a requestNextPulse, I consider the rendering "stable".

This will have a few other major impacts that are hard to predict:

  1. what about CSS passes? repeat those too? (probably yes -> really expensive)
  2. the flow of apps will be completely different. As long as a pulse is running, no Runnables scheduled via Platform.runLater() can be executed. If you do 10 layout passes inside a pulse, the time between the Runnables being executed becomes an order of magnitude more than before, and that will have a major impact for some applications.

I fear that approach is going to be even more disruptive for existing applications.

@johanvos
Copy link
Collaborator

johanvos commented Oct 28, 2025

It's not a bad idea to just run layout again if after running layout the root is still dirty.

There must be historical reasons why this is not the case. @kevinrushforth might have more background info. I would guess that the original design goal was to keep a pulse as short as possible, and the requestNextPulse inside a pulse would be useless with this approach. I think the requestNextPulse was created for this reason: don't stop the world for too long, render what we have, and try to render again as soon as possible (in the next pulse). This prevents the pulse runnable from lock the JavaFX Application Thread for too long -- as said above, there are other users for this Thread.

@andy-goryachev-oracle
Copy link
Contributor

Interesting, thanks! I was thinking more of 2-3 cycles, actually.

@kevinrushforth
Copy link
Member

It's not a bad idea to just run layout again if after running layout the root is still dirty.

There must be historical reasons why this is not the case. @kevinrushforth might have more background info. I would guess that the original design goal was to keep a pulse as short as possible, and the requestNextPulse inside a pulse would be useless with this approach. I think the requestNextPulse was created for this reason: don't stop the world for too long, render what we have, and try to render again as soon as possible (in the next pulse). This prevents the pulse runnable from lock the JavaFX Application Thread for too long -- as said above, there are other users for this Thread.

From what I can remember, that was indeed the main reason. Most of the layout stabilizes pretty quickly (1-3 passes), but as Johan points out, there can be cases where ~ 10 are needed. It is better for responsiveness to do run event handlers, app Runnables (via runLater), and allow animation to proceed after each layout + CSS pass than iterate until stable.

@hjohn
Copy link
Collaborator Author

hjohn commented Oct 28, 2025

How many pulses are needed to finish the layout? If we ignore for a second some pathological cases when the layout process never ends causing continuous flicker, is there a safe upper limit?

Realistically, I'd say 10 iterations is pretty common, and I wouldn't be surprised if it goes much higher in some common applications. With 10 iterations, I mean the amount of "layout phases" that are chained by a requestNextPulse during the previous layout phase. Once there is a layout phase without a requestNextPulse, I consider the rendering "stable".

In what software do you get 10 iterations as "common"? That's close to 200 ms @ 60 fps. The software would look and perform worse than decades old software. I'm running quite complex software with FX, and never does a layout take more than 1 pass (not counting the superfluous pass made by FX currently) as that would be absolutely unacceptable for my work that must look absolutely smooth and polished.

This will have a few other major impacts that are hard to predict:

  1. what about CSS passes? repeat those too? (probably yes -> really expensive)

Ehr, no definitely not. Just like you shouldn't be modifying layout positions during layout, you should not be doing things (during layout) that modify CSS styles because the CSS pass has already completed. Modifying CSS during layout (which changes anything size or position related) is a guaranteed jump of your UI as a next pass is required. This looks flakey and unprofessional.

For example, you also should not be modifying the scene graph during layout, because if you say add a new child (like a list cell or something) that cell will be rendered without styles, resulting in things like white flashes on your black background because that's the default background. This kind of stuff must be done in a Scene::addPreLayoutListener if you want to ensure your application looks smooth.

  1. the flow of apps will be completely different. As long as a pulse is running, no Runnables scheduled via Platform.runLater() can be executed. If you do 10 layout passes inside a pulse, the time between the Runnables being executed becomes an order of magnitude more than before, and that will have a major impact for some applications.

Nobody is suggesting running layout passes until the UI settles.

Anyway, the only practical use would be to cover up self created layout problems, and encourage more bad behavior, so perhaps it is better to not even consider this.

You mentioned that FX "promises to handle all edge cases". Do you care to show me where it does so? Because FX would be the first system with complex layouts that would be doing so.

@hjohn
Copy link
Collaborator Author

hjohn commented Oct 28, 2025

Also, the CPU cost is still there

that's true, but the cost will be there anyway - but now we are skipping the rendering and removing the flicker. So it's a win-win, as long as the layout converges.

If nothing else happens in the scene, then it would also save a rendering pass yes.

One example is when the layout must further change based on the current layout pass, such as when the scroll bar appears or disappears.

That should not be the case because of how the layout pass is split into the compute calls and the layoutChildren calls.

The scene or any layout root needs to know what size things will be, and so will use the prefWidth (etc) calls (that delegate to the compute calls) before doing any layoutChildren call. These calls cascade down as deep as necessary through-out most of the scene graph (even parts not needing layout), and this would be expensive if they weren't also aggressively cached.

The compute calls don't need to be a one call = one result deal. Often additional calls are done for more complex layouts (like for biased controls where the opposite compute call is done first so the related size can be passed to the wanted compute axis). And this can also happen for ScrollPane which can look at the result of a compute call then decide it needs a scroll bar, redo the computation of how much space is available for the content (which may cause the other scroll bar to also be needed). It can then position its children correctly in one pass.

See ScrollPaneSkin#layoutChildren code where you can see it do several checks in a row to determine the correct size when scrollbars must appear, all in one pass :)

Also of note is that it always creates and adds the scroll bars, and just keeps them invisible until needed, as creating them on demand would be too late and cause flicker (unless you do this in a prelayout listener).

@johanvos
Copy link
Collaborator

johanvos commented Oct 30, 2025

You mentioned that FX "promises to handle all edge cases". Do you care to show me where it does so? Because FX would be the first system with complex layouts that would be doing so.

That is a bit taken out of its context. My text said:

At least the layout system in JavaFX gives developers lots of freedom, and it promises to handle all edge cases. That fulfills the main goal (correct rendering, perhaps after a number of pulses, leading to flickering), but it makes the second goal (top-efficiency) really hard.

What I wanted to say with this is that the current handling gives priority to making sure things got rendered (e.g. adding a second pulse in cases where it's not needed), and not to performance. If you want to change that, all good. I simply tried to give my reading of historical context, while trying to understand why things are the way they are.
Again, I'm not saying this should not be modified (rather the contrary). I had the impression you wondered about why it was like it was, and I tried to answer that. Sorry if that message came over a bit hyperbolic.

@hjohn
Copy link
Collaborator Author

hjohn commented Oct 30, 2025

What I wanted to say with this is that the current handling gives priority to making sure things got rendered (e.g. adding a second pulse in cases where it's not needed), and not to performance. If you want to change that, all good. I simply tried to give my reading of historical context, while trying to understand why things are the way they are. Again, I'm not saying this should not be modified (rather the contrary). I had the impression you wondered about why it was like it was, and I tried to answer that. Sorry if that message came over a bit hyperbolic.

Sorry, I definitely misinterpreted that and got a bit wound up about it.

I'm however unsure how to proceed. I don't think I can solve all the problems we've been seeing:

Without any fixes, 2 + 3 + 4 works.

With just the layout flags fix 1 + 3 + 4 works.

With this PR 1 + 2 will work, and probably can get 4 fixed as well.

With this PR and perhaps always triggering another pass if a non-resizable control was modified, we may be able to get all of them working. I however can't really see the logic why non-resizable controls should be an exception here, and if we can't also run into the same problems with a resizable control if I modified the example in JDK-8137252.

:-)

@Maran23
Copy link
Member

Maran23 commented Oct 30, 2025

With this PR 1 + 2 will work, and probably can get 4 fixed as well.

I would prefer this option.
In my opinion, it is worth pursuing this option because it improves consistency, performance and simply makes sense to me.
The previous behavior felt like an oversight that just worked accidently.

I wonder if we need to combine what Johan said: Having warnings (if easily possible) for things you should rather not do.
For example, reading what you wrote above:

Modifying CSS during layout (which changes anything size or position related) is a guaranteed jump of your UI as a next pass is required
For example, you also should not be modifying the scene graph during layout, because if you say add a new child (like a list cell or something) that cell will be rendered without styles

This is good information I even did not know in full detail (although it looks like I made this correct for my own components, because I did what JavaFX is doing).
Having warnings or information for that would be nice. Since we now can detect correctly when we layout children, we may can detect is something triggered another CSS pass although it should better not do that now, and what to do instead. This way, even is something now breaks, we could get information why this unusual combination does not work as expected.

@hjohn
Copy link
Collaborator Author

hjohn commented Oct 30, 2025

I wonder if we need to combine what Johan said: Having warnings (if easily possible) for things you should rather not do.

I'll give this some thought as well. The trouble is that a lot of things are allowed during layout, and how to distinguish the good from the bad. I still haven't looked at the menu problem that Andy mentioned, but if I can think of a good way to do some warnings, it might help to figure out quickly that it might be doing wrong.

This is good information I even did not know in full detail (although it looks like I made this correct for my own components, because I did what JavaFX is doing).
Having warnings or information for that would be nice. Since we now can detect correctly when we layout children, we may can detect is something triggered another CSS pass although it should better not do that now, and what to do instead. This way, even is something now breaks, we could get information why this unusual combination does not work as expected.

Yeah, I discovered this one in a dark-mode application, where I was adding new cells (on demand) during layout, and I noticed a brief white flash. Solved it with Scene::addPreLayoutListener. No idea what the virtual flows do :)

@Maran23
Copy link
Member

Maran23 commented Nov 2, 2025

I did tests on 3 different applications and could not spot any problems!

What I did not test:

  • ToolBar
  • BubbleChart
  • AreaChart
  • Pagination
  • Most of the 3D Shape classes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfr Ready for review

Development

Successfully merging this pull request may close these issues.

6 participants