Skip to content

JDK-8298104: NPE on synchronizeSceneNodes() #1123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

buedi
Copy link

@buedi buedi commented May 3, 2023

A null pointer exception occurs on the ScenePulseListener when iterating through the dirty node list.
A null check is needed on the node before calling node.getScene().

The error occurs occasionally and causes the application to crash.
/signed

Issue: JDK-8298104: NPE on synchronizeSceneNodes()


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8298104: NPE on synchronizeSceneNodes() (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jfx.git pull/1123/head:pull/1123
$ git checkout pull/1123

Update a local copy of the PR:
$ git checkout pull/1123
$ git pull https://git.openjdk.org/jfx.git pull/1123/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1123

View PR using the GUI difftool:
$ git pr show -t 1123

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jfx/pull/1123.diff

Webrev

Link to Webrev Comment

@bridgekeeper bridgekeeper bot added the oca Needs verification of OCA signatory status label May 3, 2023
@bridgekeeper
Copy link

bridgekeeper bot commented May 3, 2023

Hi @buedi, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user buedi" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

@buedi
Copy link
Author

buedi commented May 3, 2023

@bridgekeeper bridgekeeper bot added the oca-verify Needs verification of OCA signatory status label May 3, 2023
@bridgekeeper
Copy link

bridgekeeper bot commented May 3, 2023

Thank you! Please allow for up to two weeks to process your OCA, although it is usually done within one to two business days. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!

@bridgekeeper bridgekeeper bot removed oca Needs verification of OCA signatory status oca-verify Needs verification of OCA signatory status labels May 3, 2023
@openjdk openjdk bot added the rfr Ready for review label May 3, 2023
@mlbridge
Copy link

mlbridge bot commented May 3, 2023

Webrevs

@hjohn
Copy link
Collaborator

hjohn commented May 4, 2023

Hi @buedi, I've taken a look at this, but would like to know the root cause. I get the impression that this happens a lot for you, could you share when this happens so I can reproduce this as well?

You could also try this code and report back if you get any of the exceptions I added:

    private void synchronizeSceneNodes() {
        Toolkit.getToolkit().checkFxUserThread();

        Scene.inSynchronizer = true;

        // if dirtyNodes is null then that means this Scene has not yet been
        // synchronized, and so we will simply synchronize every node in the
        // scene and then create the dirty nodes array list
        if (Scene.this.dirtyNodes == null) {
            // must do this recursively
            syncAll(getRoot());
            dirtyNodes = new Node[MIN_DIRTY_CAPACITY];

        } else {
            if (peer == null) {
                throw new IllegalStateException("peer shouldn't be null here: " + dirtyNodesSize);
            }

            // This is not the first time this scene has been synchronized,
            // so we will only synchronize those nodes that need it
            for (int i = 0 ; i < dirtyNodesSize; ++i) {
                try {
                    Node node = dirtyNodes[i];
                    dirtyNodes[i] = null;
                    if (node.getScene() == Scene.this) {
                        node.syncPeer();
                    }
                }
                catch (Throwable e) {
                    throw new IllegalStateException("exception will break dirtyNodes!", e);
                }
            }
            dirtyNodesSize = 0;
        }

        if (!Scene.inSynchronizer) {
            throw new IllegalStateException("Synchronizer flag was reset, reentrant call?");
        }

        Scene.inSynchronizer = false;
    }

@buedi
Copy link
Author

buedi commented May 4, 2023

Hello John, this issue occurs very rarely, approximately 1-2 times per month, on systems that run 24/7. However, when it does occur, the JavaFx application gets stuck in an endless loop (--> logError-JavaFxScene.txt) and cannot recover from it. Currently, I am unable to replicate this phenomenon. I fully agree with you that it's essential to address the problem at its root cause. However, adding an additional check for a null value would at least solve a rare but blocking issue.

sceneError_

@hjohn
Copy link
Collaborator

hjohn commented May 4, 2023

Hello John, this issue occurs very rarely, approximately 1-2 times per month, on systems that run 24/7. However, when it does occur, the JavaFx application gets stuck in an endless loop (--> logError-JavaFxScene.txt) and cannot recover from it. Currently, I am unable to replicate this phenomenon. I fully agree with you that it's essential to address the problem at its root cause. However, adding an additional check for a null value would at least solve a rare but blocking issue.

sceneError_

The problem is, this kind of fix is unlikely to be accepted as it hides the root cause of perhaps a much more serious issue. Skipping nulls for example will mean that some nodes probably didn't get synced in the last run due to an exception or some other kind of bug in the code (I was looking at Node reuse as a possible cause for this problem).

The NPE you show is probably not the first exception that is logged, but the one that is most visible because it keeps being repeated once the dirty nodes list gets into a bad state.

You said you already fixed this yourself, would you be willing to add diagnostics to your build that you could share so the root cause can be tracked down? A few log lines when a bad state is detected that could give us insights into what went wrong could go a long way. I will take another look tonight to see if I can find how this list is getting corrupted; most likely it is an exception thrown by syncPeer; if that could be caught and logged it will probably tell us the rest of the story.

Edit: it could be an exception that only happens occasionally, like ConcurrentModificationException because some internal structure is not synced properly...

@hjohn
Copy link
Collaborator

hjohn commented May 5, 2023

I've looked a bit through the JavaFX bugs, and I noticed that many problems involving synchronizeSceneNodes are caused by modifying the SceneGraph outside the FX thread which would sometimes throw ConcurrentModificationException. As you see this problem only occasionally, it's a pretty good fit. It might be worthwhile to check your logs for such an exception (or similar, like IndexOutOfBoundsException).

@buedi
Copy link
Author

buedi commented May 5, 2023

Hi John, thank you for this information. I couldn't find any instances of ConcurrentModificationException, but about an hour ago before the NPE, we did have an IllegalStateException. Could this be the cause of the issue?
image
Two weeks ago, we fixed the issue by implementing a check to see if the dialog header was being called from a JavaFX thread.
image

@hjohn
Copy link
Collaborator

hjohn commented May 5, 2023

Hi John, thank you for this information. I couldn't find any instances of ConcurrentModificationException, but about an hour ago before the NPE, we did have an IllegalStateException.

No, I don't think that is it. The check there prevented things from going wrong, and only alerted you that you were about to manipulate the SceneGraph on the wrong thread. These checks are not everywhere though, so there are many places where you can still change the SceneGraph without being alerted.

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 2, 2023

@buedi This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 30, 2023

@buedi This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Jun 30, 2023
@buedi
Copy link
Author

buedi commented Dec 22, 2023

Hi John,

I've been working on moving our project from Java 8 to Java 17 and JavaFX 21 for the past month. But I can't seem to fix a bug that keeps showing up, even after running the applications for over 10 days on JavaFX version 21.0.1.
Now, I'm having trouble compiling javafx-graphics for Windows. Any ideas on how to figure out what's causing the problem?

If anyone has the ability to assist us with this issue, please get in touch with me. I would be willing to compensate for your help.

best regards,
Daniel

image

2023-12-15 11:00:37,026 ERROR [JavaFX Application Thread] c.m.a.e.AdviseUncaughtExceptionHandler [?:?] AdviseUncaughtExceptionHandler uncaughtException detects NullPointerException
2023-12-15 11:00:37,026 ERROR [JavaFX Application Thread] c.m.a.e.AdviseUncaughtExceptionHandler [?:?] NullPointerException =>
java.lang.NullPointerException: Cannot invoke "javafx.scene.Node.getScene()" because "" is null
at javafx.scene.Scene$ScenePulseListener.synchronizeSceneNodes(Scene.java:2483)
at javafx.scene.Scene$ScenePulseListener.pulse(Scene.java:2630)
at com.sun.javafx.tk.Toolkit.lambda$runPulse$2(Toolkit.java:401)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:399)
at com.sun.javafx.tk.Toolkit.runPulse(Toolkit.java:400)
at com.sun.javafx.tk.Toolkit.firePulse(Toolkit.java:430)
at com.sun.javafx.tk.quantum.QuantumToolkit.pulse(QuantumToolkit.java:592)
at com.sun.javafx.tk.quantum.QuantumToolkit.pulse(QuantumToolkit.java:572)
at com.sun.javafx.tk.quantum.QuantumToolkit.pulseFromQueue(QuantumToolkit.java:565)
at com.sun.javafx.tk.quantum.QuantumToolkit.lambda$runToolkit$11(QuantumToolkit.java:352)
at com.sun.glass.ui.InvokeLaterDispatcher$Future.run(InvokeLaterDispatcher.java:95)
at com.sun.glass.ui.win.WinApplication._runLoop(Native Method)
at com.sun.glass.ui.win.WinApplication.lambda$runLoop$3(WinApplication.java:185)
at java.base/java.lang.Thread.run(Thread.java:840)
2023-12-15 11:00:37,027 ERROR [JavaFX Application Thread] c.m.a.e.AdviseUncaughtExceptionHandler [?:?] Exception
java.lang.NullPointerException: Cannot invoke "javafx.scene.Node.getScene()" because "" is null
at javafx.scene.Scene$ScenePulseListener.synchronizeSceneNodes(Scene.java:2483)
at javafx.scene.Scene$ScenePulseListener.pulse(Scene.java:2630)
at com.sun.javafx.tk.Toolkit.lambda$runPulse$2(Toolkit.java:401)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:399)
at com.sun.javafx.tk.Toolkit.runPulse(Toolkit.java:400)
at com.sun.javafx.tk.Toolkit.firePulse(Toolkit.java:430)
at com.sun.javafx.tk.quantum.QuantumToolkit.pulse(QuantumToolkit.java:592)
at com.sun.javafx.tk.quantum.QuantumToolkit.pulse(QuantumToolkit.java:572)
at com.sun.javafx.tk.quantum.QuantumToolkit.pulseFromQueue(QuantumToolkit.java:565)
at com.sun.javafx.tk.quantum.QuantumToolkit.lambda$runToolkit$11(QuantumToolkit.java:352)
at com.sun.glass.ui.InvokeLaterDispatcher$Future.run(InvokeLaterDispatcher.java:95)
at com.sun.glass.ui.win.WinApplication._runLoop(Native Method)
at com.sun.glass.ui.win.WinApplication.lambda$runLoop$3(WinApplication.java:185)
at java.base/java.lang.Thread.run(Thread.java:840)
2023-12-15 11:00:37,029 ERROR [JavaFX Application Thread] c.m.a.e.AdviseUncaughtExceptionHandler [?:?] BÄÄHHMM, get more informations => java.lang.NullPointerException: Cannot invoke "javafx.scene.Node.getScene()" because "" is null
at javafx.scene.Scene$ScenePulseListener.synchronizeSceneNodes(Scene.java:2483)

@hjohn
Copy link
Collaborator

hjohn commented Dec 22, 2023

If you are not using modules, you don't need to compile javafx.graphics to attempt to analyze the problem. You can include a copy of Scene with your sources (making sure that it is in the original package javafx.scene) to debug this further.

If you can reproduce this problem in a small program that you can share, we can try and find out why there is a NPE there -- if only a single thread is using the dirtyNodes array, I don't think there can be a null in there, but perhaps there is a mistake in the code.

As noted before, a possible cause of this problem is accessing FX components that are currently part of the Scene graph on the wrong thread. This is not allowed, and such accesses must be wrapped in Platform::runLater. I realize that in a large application it might be hard to find what is doing this, but if you have multiple threads, then inspect if any thread is accessing FX components (like Nodes, Timelines, etc) directly. Not all such calls will inform you of problems, so assuming that a call is allowed because you didn't get an exception is incorrect.

edit: feel free to contact me more directly (gmail john.hendrikx)

@elfoxus
Copy link

elfoxus commented Jul 3, 2024

I'm interested if you @buedi handled this issue, I mean whether you found reason behind it. In my current project I am facing the same problem. It happens hardly ever and only on one machine in my client's environment, so debugging is kinda hard.

Couldn't we add at least some logging, so that a programmer could find what fault has been made?

I wonder how this can occur in placing null in this array? As I can see addToDirtyList is called only from inside of the Node class:

private void addToSceneDirtyList() {
        Scene s = getScene();
        if (s != null) {
            s.addToDirtyList(this);
            if (getSubScene() != null) {
                getSubScene().setDirty(this);
            }
        }
    }

How this can be null?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfr Ready for review
Development

Successfully merging this pull request may close these issues.

3 participants