-
Notifications
You must be signed in to change notification settings - Fork 542
8370498: Improve how Node detects whether a layout property change requires a new layout pass #1945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
8370498: Improve how Node detects whether a layout property change requires a new layout pass #1945
Conversation
This new check is much more accurate to detect whether a parent is currently laying out its children. The previous code almost never worked, resulting in additional unnecessary layouts.
|
👋 Welcome back jhendrikx! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
|
It looks like there is one failing test for ToolBarSkin. Likely there is something relying on the superfluous 2nd layout. I'm looking into it. |
Webrevs
|
|
Reviewers: @johanvos @kevinrushforth @arapte We will need a unit test for this. What is the risk of regression? Are there additional tests that we could add to help detect any regressions? /reviewers 3 |
|
@kevinrushforth |
Reusing a toolbar as part of several scenes, in combination with the StubToolkit that doesn't handle pulses makes this test fail with the relayout detection fix.
|
The rewritten test works without problem with this fix, and also without this fix. The reason why the test failed is because the size cache in
In the version with the fix in this PR applied, no 2nd layout pass is triggered, and thus the size cache was not cleared, and the reuse of the ToolBar node would then happily use size values belonging to the old render scale. I also tested |
I think I can add one that confirms that no 2nd layout pass occurs when it is not needed when detected changes are part of the current pass.
I think it is certainly possible that some poorly constructed Note that it is really easy to get the "old" behavior; just return The benefits of this fix could be a reduction in layout passes by half when doing things like resizing windows. This may be quite noticeable on heavy controls like TableViews.
Code that may not work correctly with this fix is code where a container is unable to size and position its direct children in a stable fashion in one pass. If the container is tracking state between layout passes, is not clearing caches at the right time, etc, then a 2nd layout pass (with all else being equal) may result in its direct children being positioned differently. It's possible some code out there has such problems; they would currently require a 2nd (and perhaps a 3rd or 4th pass) before reaching a stable set of positions and sizes for its direct children. Note that if they never stabilize, the code before this fix would just keep running layout passes indefinitely. The fixed code will only do so once regardless. As soon as anything other than the container and its direct children is modified (ie. a sibling container, a grand child, or a child of a sibling container), then this code won't block another layout pass, so any regressions will be limited to containers being unable to decide how to position its children in a single pass. If the layout children code modifies some grand child, or some other node it happens to have a reference to, then this will still result in a new layout pass. I could write a test that will work with the old code, and fail with the new code (simple counting layout passes, or having some state that requires multiple passes would work to "detect" this fix) but am unsure what purpose that would serve. It's not a supported or documented use of the layout system. Layouts that need time to settle should still have a strict opinion on the positioning of a single container's direct children, but are likely being jostled by other influences (sibling containers, grand children). Such situations should still trigger further layout passes until everything converges (hopefully). Note that "failure" here would just mean that it shows you the result of the first pass; FX would not crash or go into infinite loops. |
|
Mailing list message from Andy Goryachev on openjfx-dev:
Perhaps we could even get it a bit further than that. What would you say could be a sufficiently comprehensive set of tests/scenarios to reduce the probability of regression? Let's say, we limit the depth of the hierarchy to 3 (node-parent-parent), then the combinatorial complexity should still be manageable. What do you think? -andy From: openjfx-dev <openjfx-dev-retn at openjdk.org> on behalf of John Hendrikx <jhendrikx at openjdk.org> On Thu, 23 Oct 2025 15:52:55 GMT, Kevin Rushforth <kcr at openjdk.org> wrote:
I think I can add one that confirms that no 2nd layout pass occurs when it is not needed when detected changes are part of the current pass.
I think it is certainly possible that some poorly constructed `layoutChildren` methods may inadvertently be relying on a 2nd layout pass occurring whenever there is a major UI change (when a control changes sufficiently that its parent must also change size). If that 2nd pass however did anything more than "confirming" the first layout pass, then this likely would result in an infinite layout loop (whereas now it would stop after a single pass). Assuming that controls that create infinite layout loops are fixed before they reach a wide audience (as you'd likely notice the drain on resources) it seems to me that it is unlikely such controls are in active use. Note that it is really easy to get the "old" behavior; just return `false` in `Parent::inLayoutChildren` -- I've been using this to test differences. If we're really worried this may cause problems, we could turn this into a system property so people may opt out of this fix. The benefits of this fix could be a reduction in layout passes by half when doing things like resizing windows. This may be quite noticeable on heavy controls like TableViews.
Code that may not work correctly with this fix is code where a container is unable to size and position its direct children in a stable fashion in one pass. If the container is tracking state between layout passes, is not clearing caches at the right time, etc, then a 2nd layout pass (with all else being equal) may result in its direct children being positioned differently. It's possible some code out there has such problems; they would currently require a 2nd (and perhaps a 3rd or 4th pass) before reaching a stable set of positions and sizes for its direct children. Note that if they never stabilize, the code before this fix would just keep running layout passes indefinitely. The fixed code will only do so once regardless. As soon as anything other than the container and its direct children is modified (ie. a sibling container, a grand child, or a child of a sibling container), then this code won't block another layout pass, so any regressions will be limited to containers being unable to decide how to position its children in a single pass. If the layout children code modifies some grand child, or some other node it happens to have a reference to, then this will still result in a new layout pass. I could write a test that will work with the old code, and fail with the new code (simple counting layout passes, or having some state that requires multiple passes would work to "detect" this fix) but am unsure what purpose that would serve. It's not a supported or documented use of the layout system. Layouts that need time to settle should still have a strict opinion on the positioning of a single container's direct children, but are likely being jostled by other influences (sibling containers, grand children). Such situations should still trigger further layout passes until everything converges (hopefully). Note that "failure" here would just mean that it shows you the result of the first pass; FX would not crash or go into infinite loops. ------------- PR Comment: https://git.openjdk.org/jfx/pull/1945#issuecomment-3438319381 |
As much as I'd like to help out more, after analyzing this problem already for almost two full days, I don't have time to write a comprehensive test suite for one of the more complex systems in JavaFX. I can of course extend existing tests to get behavioral coverage on this new fix. So, I think we then should choose what we want. We can revert https://bugs.openjdk.org/browse/JDK-8360940 and go back to a situation where the layout system can easily get into a bad state while using documented API's, or we can take a critical look at this new fix, which is fixing a long standing problem that was hidden by the bug that was fixed, and could be a major performance improvement for complex UI's. I think manual testing with some large UI's and/or Monkey Tester should be more than sufficient to discover if this is likely to cause regressions, and I've already been doing so before I submitted this fix (I tend to just apply these fixes to my own code directly, to see how they work out). If we're still unsure, a system property can turn this on/off easily, at the cost of getting some redundant layout passes in the "off" mode (as it was before). I already knew that the detection that That the logic looked correct at first glance may have have convinced the original authors that it does work, but it doesn't. More importantly, it can't be fixed as a fix would require each modification of layout X/Y to wrap this change by setting the current layout child first -- that would require updating all layout containers, all 3rd party layout containers, and all 3rd party code that calls |
|
@hjohn Thank you for providing the additional analysis. This will help those of us who will review this proposed fix evaluate it to see if we can poke any holes in it. As you say, we have two choices before us: fix this bug or revert the fix for JDK-8360940. If we can convince ourselves that this is the right fix, taking this fix would be preferable to re-introducing JDK-8360940. As for testing, I think manual testing using various apps (e.g., Ensemble, MonkeyTester, SceneBuilder) is needed. And while we also need a new test for this fix -- that is, a test that fails before and passes after the fix -- I would agree that adding a whole battery of functional tests for layout is out of scope and better done as a follow-on. Finally, as you point out, we could provide a system property to go back to the way we did it before. If we do that (which I am not advocating at this point in the review), we might consider having that flag also revert the behavior of JDK-8360940. |
|
Mailing list message from Andy Goryachev on openjfx-dev: Right. Apart from a unit test for this specific issue, I would like us at least to think? about enumerating the typical scenarios and scenarios where the outcome might be different. -andy From: openjfx-dev <openjfx-dev-retn at openjdk.org> on behalf of Kevin Rushforth <kcr at openjdk.org> On Fri, 24 Oct 2025 05:33:45 GMT, John Hendrikx <jhendrikx at openjdk.org> wrote:
@hjohn Thank you for providing the additional analysis. This will help those of us who will review this proposed fix evaluate it to see if we can poke any holes in it. As you say, we have two choices before us: fix this bug or revert the fix for [JDK-8360940](https://bugs.openjdk.org/browse/JDK-8360940). If we can convince ourselves that this is the right fix, taking this fix would be preferable to re-introducing JDK-8360940. As for testing, I think manual testing using various apps (e.g., Ensemble, MonkeyTester, SceneBuilder) is needed. And while we also need a new test for _this_ fix -- that is, a test that fails before and passes after the fix -- I would agree that adding a whole battery of functional tests for layout is out of scope and better done as a follow-on. Finally, as you point out, we could provide a system property to go back to the way we did it before. If we do that (which I am not advocating at this point in the review), we might consider having that flag also revert the behavior of [JDK-8360940](https://bugs.openjdk.org/browse/JDK-8360940). ------------- PR Comment: https://git.openjdk.org/jfx/pull/1945#issuecomment-3443507993 |
I'm not sure I agree with this. Personally, I consider the battery of functional tests to be more important than the fix, so without the tests, I would be very conservative. I'll look into a more deterministic test scenario -- the suggestion from Andy with 3 levels make sense (although reality is much more complex, with Platform.runLater() running in between 2 pulses, and that is non-deterministic as it depends on processor speed etc. |
It would actually be much easier to create deterministic tests after this change (unless we're dealing with a converging layout that can sometimes happen with biased containers). In the before situation, you should check if another pulse is scheduled after the layout pass, and execute it as well to be sure that the layout is stable. In the after situation, this will almost never be the case and the presence of a pulse request can be asserted to be false. In any case, I'd still recommend either reverting https://bugs.openjdk.org/browse/JDK-8364049 or combining it with this fix. The intermediate situation is still fine for most cases, but there can be nasty surprises if left in by itself (albeit only in somewhat self-contradicting layout code). I think having both the fixes guarded behind a system property averts most risks. I'd recommend setting it on by default though, to get real world feedback, as I think we do want to incorporate this fix permanently if it works out. I think it definitely much closer matches the intent of the original implementation. |
The main purpose is testing for regression. We want to make sure a future PR does not re-introduces the old situation. |
|
The method |
Thanks, I will have a look. If I can resolve the issue, I'll add a test case for this problem as well so we can detect a regression more easily. |
|
The fix in https://bugs.openjdk.org/browse/JDK-8137252 is only "solving" the problem because now a relayout always occurs (until nobody modifies layout X/Y to different values). The So, yes, https://bugs.openjdk.org/browse/JDK-8137252 fixes the problem, but it does so at the cost of an extra layout pass in all cases. In the sample application, even if there was only a Label in that stack pane and no Ellipse, then the first time it is laid out, it will do a 2nd pass because Label's layout X/Y was modified... I'm now looking further how this problem relates to |
|
So I've been playing with the program in https://bugs.openjdk.org/browse/JDK-8137252 and here are some observations:
Currently, on each change (showing the stage, changing the source sibling's size) we see the layoutX of the destination sibling jumping:
The wanted solution does not make it easy. Because the label's width is used as radius, the desired circle actually is twice as wide as the label. The stack pane however is unaware of this requirement as the circle is not resizable and does not have min/pref/max properties. It has some similarities with solutions that must know the size of text (in the correct font after CSS has been applied) because other elements depend on it. The solution is to add a listener to the label's public class Test1 extends Application {
public static void main(String[] args) {
launch(args);
}
@Override
public void start(Stage stage) {
Label l1 = new Label("1 2");
StackPane sp = new StackPane(l1, new EllipseWrapper(l1));
sp.relocate(200, 100);
Pane topPane = new Pane(sp);
Scene scene = new Scene(topPane, 600, 400);
sp.setStyle("-fx-border-color: RED;");
stage.setScene(scene);
System.out.println("Showing scene");
stage.show();
Thread.ofVirtual().start(() -> {
for(;;) {
try {
Thread.sleep(1500);
}
catch(InterruptedException e) {
break;
}
Platform.runLater(() -> {
l1.setText("" + (int)(2000 * Math.random()));
});
}
});
}
// wrapper because Ellipse has no layout properties for StackPane to use:
class EllipseWrapper extends Region {
private final Ellipse ellipse = new Ellipse();
private final Label label;
public EllipseWrapper(Label label) {
this.label = label;
ellipse.setOpacity(0.5);
getChildren().add(ellipse);
// using a change listener to ensure property is revalidated:
label.needsLayoutProperty().subscribe(v -> { if(v) requestLayout(); });
}
@Override
protected double computePrefWidth(double height) {
return 2 * label.prefWidth(-1); // proper calculation for layout!
}
@Override
protected double computePrefHeight(double width) {
return 2 * label.prefHeight(-1);
}
@Override
protected void layoutChildren() {
double w = getWidth();
double h = getHeight();
ellipse.setRadiusX(w / 2);
ellipse.setRadiusY(h / 2);
ellipse.setCenterX(w / 2);
ellipse.setCenterY(h / 2);
}
}
} |
|
IMHO there are now two ways forward:
I've already written a couple of tests that nicely capture the advantage of this fix. The tests clearly show that only one layout pass now occurs with this fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, I would like to thank you John for looking into the layout problems. We've got these long standing issues that are very difficult to debug and fix.
I think this is valuable work as it definitely improves the platform, so Danke schön.
The reason I asked about tests and test scenarios is the possibility of regression. Case in point - with this PR, on macOS with an external monitor at scale=1:
I would second @johanvos in suggesting that the regression is what we should be guarding against, and perhaps expanding the tests.
| try { | ||
| layoutChildren(); | ||
| } | ||
| finally { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor suggestion:
} finally {
It is starting to look like there may be more code relying on double layouts than I thought, even though most controls work absolutely fine. I think that it is still worth pursuing eliminating the need for these (as you can temporarily see the "wrong" positions), but it may take longer as each of these will need an investigation. I remain convinced though that the original fix in https://bugs.openjdk.org/browse/JDK-8137252 was applied too hastily, and only worked because it, unintentionally, simply always allowed double layouts. I'll see if I can reproduce the menu issue and what is the culprit there. I'm not on Mac though, so hoping that the problem is also present on other platforms. |
That is very well possible, but it is what it is. I believe the main goal of the layout phase in a pulse is to make sure that "ultimately", all direct/indirect requests are handled. The secondary goal is that it should be done as efficient as possible, e.g. do not require 2 (or more) passes unless absolutely needed. To make it harder (but very understandable), the layout phase spans a number of classes, where different concepts/choices are made (e.g. both Parent and Node have internal state that is used to determine whether to initiate a new pulse). That makes it really hard to come up with a system that would typically be used in these situations: use an algorithm that always works (although maybe less performant), and use optimizations in specific cases (e.g. no bindings in properties of children in a chontainer -> use this branch). Something I've been thinking about every now and then is to introduce runtime warnings (ideally compiler errors, but that would require lots of upfront analysis), where the layout subsystem can warn that "a pretty complex situation occurred", e.g. when it is running into cyclic conditions that are not trivial to resolve without performance degradation. But that would be a major effort and conceptual change. TLDR: I feel your pain and it can be very frustrating having to deal with non-optimal but valid code, deployed in the wild. |
Thanks! For what it's worth, the issue also appears when I move the window from the main retina screen (scale=2) to an external monitor (scale=1). I would imagine it should be easy to reproduce on Windows, especially with a fractional scale. |
I might be way off, but I wanted to ask you this: How many pulses are needed to finish the layout? If we ignore for a second some pathological cases when the layout process never ends causing continuous flicker, is there a safe upper limit? What I am getting at is - what if we run more than one layout pass ( What do you think? |
It's not a bad idea to just run layout again if after running layout the root is still dirty. That would kill the flicker, but would just reinforce bad patterns. Also, the CPU cost is still there (ie. if you resize a Window with say a large complex tableview, then it will still have to do a whole lot of calculations twice, while the 2nd run basically changes nothing). |
that's true, but the cost will be there anyway - but now we are skipping the rendering and removing the flicker. So it's a win-win, as long as the layout converges. One example is when the layout must further change based on the current layout pass, such as when the scroll bar appears or disappears. Also, doing these burst micro-layouts might be independent of any other work we are doing in terms of removing "bad patters" (the scroll bar scenario above is not really a bad pattern on itself, just a fact of life, I think). |
Realistically, I'd say 10 iterations is pretty common, and I wouldn't be surprised if it goes much higher in some common applications. With 10 iterations, I mean the amount of "layout phases" that are chained by a This will have a few other major impacts that are hard to predict:
I fear that approach is going to be even more disruptive for existing applications. |
There must be historical reasons why this is not the case. @kevinrushforth might have more background info. I would guess that the original design goal was to keep a pulse as short as possible, and the |
|
Interesting, thanks! I was thinking more of 2-3 cycles, actually. |
From what I can remember, that was indeed the main reason. Most of the layout stabilizes pretty quickly (1-3 passes), but as Johan points out, there can be cases where ~ 10 are needed. It is better for responsiveness to do run event handlers, app Runnables (via runLater), and allow animation to proceed after each layout + CSS pass than iterate until stable. |
In what software do you get 10 iterations as "common"? That's close to 200 ms @ 60 fps. The software would look and perform worse than decades old software. I'm running quite complex software with FX, and never does a layout take more than 1 pass (not counting the superfluous pass made by FX currently) as that would be absolutely unacceptable for my work that must look absolutely smooth and polished.
Ehr, no definitely not. Just like you shouldn't be modifying layout positions during layout, you should not be doing things (during layout) that modify CSS styles because the CSS pass has already completed. Modifying CSS during layout (which changes anything size or position related) is a guaranteed jump of your UI as a next pass is required. This looks flakey and unprofessional. For example, you also should not be modifying the scene graph during layout, because if you say add a new child (like a list cell or something) that cell will be rendered without styles, resulting in things like white flashes on your black background because that's the default background. This kind of stuff must be done in a
Nobody is suggesting running layout passes until the UI settles. Anyway, the only practical use would be to cover up self created layout problems, and encourage more bad behavior, so perhaps it is better to not even consider this. You mentioned that FX "promises to handle all edge cases". Do you care to show me where it does so? Because FX would be the first system with complex layouts that would be doing so. |
If nothing else happens in the scene, then it would also save a rendering pass yes.
That should not be the case because of how the layout pass is split into the The scene or any layout root needs to know what size things will be, and so will use the The See Also of note is that it always creates and adds the scroll bars, and just keeps them invisible until needed, as creating them on demand would be too late and cause flicker (unless you do this in a prelayout listener). |
That is a bit taken out of its context. My text said: What I wanted to say with this is that the current handling gives priority to making sure things got rendered (e.g. adding a second pulse in cases where it's not needed), and not to performance. If you want to change that, all good. I simply tried to give my reading of historical context, while trying to understand why things are the way they are. |
Sorry, I definitely misinterpreted that and got a bit wound up about it. I'm however unsure how to proceed. I don't think I can solve all the problems we've been seeing:
Without any fixes, 2 + 3 + 4 works. With just the layout flags fix 1 + 3 + 4 works. With this PR 1 + 2 will work, and probably can get 4 fixed as well. With this PR and perhaps always triggering another pass if a non-resizable control was modified, we may be able to get all of them working. I however can't really see the logic why non-resizable controls should be an exception here, and if we can't also run into the same problems with a resizable control if I modified the example in JDK-8137252. :-) |
I would prefer this option. I wonder if we need to combine what Johan said: Having warnings (if easily possible) for things you should rather not do.
This is good information I even did not know in full detail (although it looks like I made this correct for my own components, because I did what JavaFX is doing). |
I'll give this some thought as well. The trouble is that a lot of things are allowed during layout, and how to distinguish the good from the bad. I still haven't looked at the menu problem that Andy mentioned, but if I can think of a good way to do some warnings, it might help to figure out quickly that it might be doing wrong.
Yeah, I discovered this one in a dark-mode application, where I was adding new cells (on demand) during layout, and I noticed a brief white flash. Solved it with |
|
I did tests on 3 different applications and could not spot any problems! What I did not test:
|
This new check is much more accurate to detect whether a parent is currently laying out its children. The previous code almost never worked, resulting in additional unnecessary layouts.
Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jfx.git pull/1945/head:pull/1945$ git checkout pull/1945Update a local copy of the PR:
$ git checkout pull/1945$ git pull https://git.openjdk.org/jfx.git pull/1945/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 1945View PR using the GUI difftool:
$ git pr show -t 1945Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jfx/pull/1945.diff
Using Webrev
Link to Webrev Comment