Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8269921 TextFlow: listeners on bounds can throw NPE while computing text bounds #564

Closed

Conversation

FlorianKirmaier
Copy link
Member

@FlorianKirmaier FlorianKirmaier commented Jul 6, 2021

It's "a bit" complicated.
In some situations, getRuns get's called because listeners on bounds are set.
This causes TextFlow to layout to compute the runs.
Afterward, the bounds of the parents get updated.
This triggers a call to compute bounds - which cascades up to the children.
When the geometry of the previous Text gets computed in this big stack - it throws an nullpointer.
The Text doesn't have its runs, and calling TextFlow.layout is now a noop (it detects repeated calls in the same stack)

In the case it happened - it didn't repair and the application kinda crashed.
This bug most likely can also be triggered by ScenicView or similar tools, which sets listeners to the bounds.
It also can cause unpredictable performance issues.

Unit test and example stacktrace are in the ticket.

The suggested fix makes sure that recomputing the geometry of the Text, doesn't trigger the layout of the TextFlow.
The Textflow should be layouting by the Parent.
This might change the behavior in some cases, but as far as I've tested it works without issues in TextFlow Heavy applications.

Benefits:

  • Better Tooling Support For ScenicView etc.
  • Fixes complicated but reproducible crashes
  • Might fix some rare crashes, which are hard to reproduce
  • Likely improves performance - might fix some edge cases with unpredictable bad performance

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8269921: TextFlow: listeners on bounds can throw NPE while computing text bounds (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jfx.git pull/564/head:pull/564
$ git checkout pull/564

Update a local copy of the PR:
$ git checkout pull/564
$ git pull https://git.openjdk.org/jfx.git pull/564/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 564

View PR using the GUI difftool:
$ git pr show -t 564

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jfx/pull/564.diff

Webrev

Link to Webrev Comment

Fixing a crash related to Text and TextFlow with bounds listeners
@bridgekeeper
Copy link

bridgekeeper bot commented Jul 6, 2021

👋 Welcome back fkirmaier! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Ready for review label Jul 6, 2021
@kevinrushforth
Copy link
Member

/reviewers 2

@openjdk
Copy link

openjdk bot commented Jul 6, 2021

@kevinrushforth
The number of required reviews for this PR is now set to 2 (with at least 1 of role reviewers).

Comment on lines 387 to 386
/* List of run is initialized when the TextFlow layout the children */
getParent().layout();
} else {
if (!isSpan()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the call to getParent().layout() here seems very likely to cause a regression in behavior. This will need some further analysis and lots of testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the change is quite dangerous.
But the current behavior is in my opinion "broken by design".
updating bounds should never trigger layout. Group also has similar behavior, which causes issues from time to time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling, that this change changes much less behavior than expected.

@mlbridge
Copy link

mlbridge bot commented Jul 6, 2021

Webrevs

Added a copyright header
@kevinrushforth kevinrushforth self-requested a review July 7, 2021 18:46
@christianheilmann
Copy link
Contributor

How to proceed to get this PR reviewed?

@FlorianKirmaier
Copy link
Member Author

I did just run into this issue again, in a totally different application.
I guess we have now make a fork also for our desktop applications. (Currently we only for for JPro)

So it would be great to get this into JavaFX, because I'm quite sure it fixes an important issue.

@mstr2
Copy link
Collaborator

mstr2 commented Jul 27, 2022

For what it's worth, I agree that updating the bounds or synchronizing peers shouldn't trigger a new layout cycle.
Unfortunately, test coverage is not good for the javafx.scene.text classes, so we won't catch regressions that way.
Do you think that you can come up with a minimal test that shows a different outcome with or without the call to getParent().layout()? If we had that, we could decide with more certainty which behavior we think is more correct.

@Maran23
Copy link
Member

Maran23 commented Aug 5, 2022

I got this problem as well today. NPE as runs is null in line 359. Does it make sense to first 'resolve' this by adding a simple null check and later we may try your other change which removes the call to getParent().layout()?

Edit: I also got another interesting NPE in PrismTextLayout:

java.lang.NullPointerException: Cannot read the array length because "this.runs" is null
    at com.sun.javafx.text.PrismTextLayout.addTextRun(PrismTextLayout.java:770)
    at com.sun.javafx.text.GlyphLayout.addTextRun(GlyphLayout.java:140)
    at com.sun.javafx.text.GlyphLayout.breakRuns(GlyphLayout.java:210)
    at com.sun.javafx.text.PrismTextLayout.buildRuns(PrismTextLayout.java:785)
    at com.sun.javafx.text.PrismTextLayout.layout(PrismTextLayout.java:1036)
    at com.sun.javafx.text.PrismTextLayout.ensureLayout(PrismTextLayout.java:223)
    at com.sun.javafx.text.PrismTextLayout.getBounds(PrismTextLayout.java:246)
    at javafx.scene.text.Text.getLogicalBounds(Text.java:432)
    at javafx.scene.text.Text.doComputeGeomBounds(Text.java:1187)
    at javafx.scene.text.Text$1.doComputeGeomBounds(Text.java:149)
    at com.sun.javafx.scene.shape.TextHelper.computeGeomBoundsImpl(TextHelper.java:90)
    at com.sun.javafx.scene.NodeHelper.computeGeomBounds(NodeHelper.java:116)
    at javafx.scene.Node.updateGeomBounds(Node.java:3818)
    at javafx.scene.Node.getGeomBounds(Node.java:3780)
    at javafx.scene.Node.updateBounds(Node.java:776)
    at javafx.scene.Parent.updateBounds(Parent.java:1833)
    ...
   <<pulse>>

@FlorianKirmaier
Copy link
Member Author

I've just verified that the test is also green with the call to parent.layout()
So it might be an option, to change this PR to only contain the null check.

When I find some time, I will write a test to show how the parent.layout() call can cause issues.
Similar issues exist with Group - which sometimes has the effect, that applications stop working when scenic view is used. (or other tooling)

The exception in PrismTextLayout tends to happen when also the bug of this PR happens - but I don't really know how they relate.

@kevinrushforth
Copy link
Member

So it might be an option, to change this PR to only contain the null check.

If that is the route you want to go, then filing a new issue / PR is better. A fix with a null check might be a good option as long as we understand why the null is happening (so we can know whether the null check is sufficient).

When I find some time, I will write a test to show how the parent.layout() call can cause issues.
Similar issues exist with Group - which sometimes has the effect, that applications stop working when scenic view is used. (or other tooling)

That would be helpful. As it is, this PR is not really ready for review for the reasons I mentioned (much) earlier.

Reverted the change to the layout, so we can fix the main-bug without further discussions.
@FlorianKirmaier
Copy link
Member Author

I've just removed the change in the layouting.
So I guess the PR should be good now.

@danielpeintner
Copy link

FYI: in the near past we have also many of those "crashes" when the UI gets heavily restructured. Crashes in our case means that on Windows for example you get the loading circle but the UI freezes endlessly. I assume it might be related.

Question: does the simple null check help in such a case?

@FlorianKirmaier
Copy link
Member Author

The simple null check helps in this case.
But sometimes I see the exception, posted by @Maran23 .
So the root cause is probably not fixed - but the symptoms are.
I guess, this PR is then a good progress.

Imo the main issue is, that having listeners on the bounds causes a change in the behavior of TextFlow and Group - causing unpredictable behavior.

@mstr2
Copy link
Collaborator

mstr2 commented Sep 21, 2022

The simple null check helps in this case. But sometimes I see the exception, posted by @Maran23 . So the root cause is probably not fixed - but the symptoms are. I guess, this PR is then a good progress.

Maybe it's not good progress. You've investigated the issue, and have identified a likely root cause. Not fixing the root cause just means that it most likely won't be fixed for years to come, and the insights you've accumulated will fade.

I think we should just fix the root cause. If some application breaks as a result of this, not much harm is done: we'd gain valuable insight into a real-world situation that depends on the current behavior, and can then decide whether to revert the fix, or provide a viable alternative solution (for users) that accomplishes the same goal.

@danielpeintner
Copy link

danielpeintner commented Sep 22, 2022

@mstr2 I see you arguments and your concerns. On the other hand, as a developer affected by this problem, it is different. Costumers are complaining and now it seems to occur even more frequently.

Note: I do not have any voting rights and I do not want to make any claims. Anyhow, I think we all agree that the simple null check solves the issue by hiding it and definitely does not create any regression. Hence, affected applications may get a quick fix and applications are running stable. The actual fix can come later, taking more time, with more sophisticated testing et cetera. I am happy to help there with our application as well. Thanks!

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 31, 2023

@FlorianKirmaier This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@kevinrushforth
Copy link
Member

@FlorianKirmaier this one fell off my radar. The updated fix seems fine as long the check for null runs isn't making some other problem. Btw, the title of the bug doesn't match the problem being fixed, so I recommend changing it to something like "TextFlow: listeners on bounds can throw NPE while computing text bounds".

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 29, 2023

@FlorianKirmaier This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented May 27, 2023

@FlorianKirmaier This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this May 27, 2023
@FlorianKirmaier FlorianKirmaier changed the title JDK-8269921 Text in Textflow and listeners on bounds can cause endless loop/crash and other performance issues JDK-8269921 TextFlow: listeners on bounds can throw NPE while computing text bounds Sep 26, 2023
@FlorianKirmaier
Copy link
Member Author

@kevinrushforth
I guess it also fell under my radar too.

I've merged the branch with master!
I also changed the title as suggested.

Testing:
I've extensively tested the Change since the PR was created.
All JPro versions use this bugfix, and a very big Desktop application uses these fixes.
So I'm sure this doesn't cause any instability.

Copy link
Member

@kevinrushforth kevinrushforth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks fine. I can confirm that the newly added test fails without the fix and passes with the fix.

@kevinrushforth
Copy link
Member

@prrace do you want to be the second reviewer?

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 24, 2023

@FlorianKirmaier This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@kevinrushforth kevinrushforth removed the request for review from prrace October 25, 2023 17:39
@kevinrushforth
Copy link
Member

@andy-goryachev-oracle Can you be the second reviewer?

@@ -356,7 +356,7 @@ void layoutSpan(GlyphList[] runs) {
BaseBounds getSpanBounds() {
if (spanBoundsInvalid) {
GlyphList[] runs = getRuns();
if (runs.length != 0) {
if (runs != null && runs.length != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a surface, this check looks ok.

however, in getRuns(), Text:388 we have

            /* List of run is initialized when the TextFlow layout the children */
            getParent().layout();

meaning that textRuns did not initialize even though the getParent().layout(); was invoked - why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, are we missing another step?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are definitely missing something.
I looked at it for quite some time, without a better solution as this check.
If you'd like to investigate further - I would recommend you look into the stack trace of the ticket.
It's related to updating the bounds of the Node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest investigating this a bit further.

The semantics of this method is to return valid Bounds, and it looks like we do not honor this contract. It is possible that we get the original exception because the listener is being called during layout (just guessing) - will this method be called again and return the valid value second time? Or do we need to actually call the layout() to ensure that we do, in fact, have a valid layout every time this method is called?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally speaking - the behavior relating to the bounds is broken.

Some components behave differently, after setting a listener to the bounds.
1.) Groups tend to call layout in the children.
2.) And Text seems to call layout on it's parent TextFlow
Both cases are serious problems in my opinion - and this should be designed somehow differently.

But for now, i would be happy if the affected applications don't crash by this bug here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we have some understanding of how runs can become null at this point (which I think we do), then the current proposed fix seems fine to me.

Let's also file a follow-up bug to investigate whether we can make the computation of bounds during layout more robust so that we don't have transiently wrong results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a similar problem - solved by explicitly recomputing the layout if it was "dirty".
In this case, it might be ok if the call to getSpanBounds with null runs is followed by another where runs are not null so the client code gets the valid bounds eventually.

But if it's not, the caller ends up with incorrect bounds.

*/
package test.javafx.scene.text;

import org.junit.BeforeClass;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this class to use JUnit5?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to junit5!

return scrollPane;
}

public void onEveryNode(Node node) {
Copy link
Collaborator

@mstr2 mstr2 Oct 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this method doesn't seem to describe what it does. It seems to do very little.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've improved the name of the method!

improved name for method onEveryNode -> addBoundsListener
Copy link
Contributor

@andy-goryachev-oracle andy-goryachev-oracle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as subsequent layout cycles fix the bounds (and apps like ScenicView show them correctly), I am ok with this fix.

@openjdk
Copy link

openjdk bot commented Nov 14, 2023

@FlorianKirmaier This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8269921: TextFlow: listeners on bounds can throw NPE while computing text bounds

Reviewed-by: kcr, angorya

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 47 new commits pushed to the master branch:

  • 0d33417: 8320267: WebView crashes on macOS 11 with WebKit 616.1
  • d65d8a6: 8316518: javafx.print.Paper getWidth / getHeight rounds values, causing errors.
  • c46c172: 8320773: [macOS] All IME input blocked
  • b80ec39: 8313648: JavaFX application continues to show a black screen after graphic card driver crash
  • 3548cdd: 8320444: Column drag header is positioned wrong for nested columns
  • 606878a: 8318387: Update GStreamer to 1.22.6
  • a1036b2: 8303826: Add FX test for JDK-8252255
  • cda623d: 8087700: [KeyCombination, Mac] KeyCharacterCombinations behave erratically
  • 2e73304: 8319996: Update to GCC 13.2.0 on Linux
  • 59c86bb: 8303478: DatePicker throws uncatchable exception on tab out from garbled text
  • ... and 37 more: https://git.openjdk.org/jfx/compare/5d1254fb1a126ac3d6baff5c5a303ed750d083fa...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@kevinrushforth, @andy-goryachev-oracle) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Ready to be integrated label Nov 14, 2023
@kevinrushforth
Copy link
Member

@FlorianKirmaier This PR is ready for you to integrate.

@kevinrushforth
Copy link
Member

@FlorianKirmaier As a reminder, this PR is ready for you to /integrate

@FlorianKirmaier
Copy link
Member Author

/integrate

@openjdk openjdk bot added the sponsor Ready to sponsor label Nov 30, 2023
@openjdk
Copy link

openjdk bot commented Nov 30, 2023

@FlorianKirmaier
Your change (at version dda1a0c) is now ready to be sponsored by a Committer.

@kevinrushforth
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Nov 30, 2023

Going to push as commit 036d81b.
Since your change was applied there have been 47 commits pushed to the master branch:

  • 0d33417: 8320267: WebView crashes on macOS 11 with WebKit 616.1
  • d65d8a6: 8316518: javafx.print.Paper getWidth / getHeight rounds values, causing errors.
  • c46c172: 8320773: [macOS] All IME input blocked
  • b80ec39: 8313648: JavaFX application continues to show a black screen after graphic card driver crash
  • 3548cdd: 8320444: Column drag header is positioned wrong for nested columns
  • 606878a: 8318387: Update GStreamer to 1.22.6
  • a1036b2: 8303826: Add FX test for JDK-8252255
  • cda623d: 8087700: [KeyCombination, Mac] KeyCharacterCombinations behave erratically
  • 2e73304: 8319996: Update to GCC 13.2.0 on Linux
  • 59c86bb: 8303478: DatePicker throws uncatchable exception on tab out from garbled text
  • ... and 37 more: https://git.openjdk.org/jfx/compare/5d1254fb1a126ac3d6baff5c5a303ed750d083fa...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 30, 2023
@openjdk openjdk bot closed this Nov 30, 2023
@openjdk openjdk bot removed ready Ready to be integrated rfr Ready for review sponsor Ready to sponsor labels Nov 30, 2023
@openjdk
Copy link

openjdk bot commented Nov 30, 2023

@kevinrushforth @FlorianKirmaier Pushed as commit 036d81b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated
7 participants