Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8263402: MemoryLeak: Node hardreferences it's previous Parent after csslayout and getting removed from the scene #424

Closed

Conversation

@FlorianKirmaier
Copy link
Member

@FlorianKirmaier FlorianKirmaier commented Mar 10, 2021

Fixing a memory leak.
A node hard references its old parent after CSS layout and getting removed.
This shouldn't be the case, this is very counterintuitive.

The fix uses a WeakReference in CSSStyleHelper for firstStyleableAncestor.
This should be fine because the CSS should only depend on it if it's still the real parent.
In that case, it doesn't get collected.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8263402: MemoryLeak: Node hardreferences it's previous Parent after csslayout and getting removed from the scene

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jfx pull/424/head:pull/424
$ git checkout pull/424

Update a local copy of the PR:
$ git checkout pull/424
$ git pull https://git.openjdk.java.net/jfx pull/424/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 424

View PR using the GUI difftool:
$ git pr show -t 424

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jfx/pull/424.diff

Fixing memory leak, node hardreferences it's old parent after csslayout / getting removed.
@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Mar 10, 2021

👋 Welcome back fkirmaier! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Mar 10, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Mar 10, 2021

@kevinrushforth
Copy link
Member

@kevinrushforth kevinrushforth commented Mar 10, 2021

Since this touches CSS, it needs a second reviewer.

/reviewers 2

@openjdk
Copy link

@openjdk openjdk bot commented Mar 10, 2021

@kevinrushforth
The number of required reviews for this PR is now set to 2 (with at least 1 of role reviewers).

@kevinrushforth kevinrushforth requested review from aghaisas and arapte Mar 10, 2021
@kevinrushforth
Copy link
Member

@kevinrushforth kevinrushforth commented Mar 11, 2021

I think others can review this. I do have a couple questions:

  1. In general, I don't like the idea of just making everything a weak reference, since it often masks a design issue. Two exceptions are for caches and for back references to a "parent" or controlling object that has a strong reference to "this" object (most of our weak listeners fall into this latter pattern). It sounds like latter case also applies here. Can you confirm that?
  2. Have you verified that all the places that use the fields that are now WeakReferences are prepared to deal with get() returning a null object?

Platform.exit();
});
}
}
Copy link
Member

@arapte arapte Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make this test similar to existing system tests, I made some relevant changes. Modified test is added below.
The modified test fails with this fix, but I expected it to pass. Can you please check this.

Changes are

  1. Thread.sleep() is removed.
  2. root and toBeRemoved button are now class members.
  3. Scenegraph is constructed and shown in TestApp.start() method.
public class StyleMemoryLeakTest {

    static CountDownLatch startupLatch;
    static Stage stage;
    static Button toBeRemoved;
    static Group root;

    public static class TestApp extends Application {

        @Override
        public void start(Stage primaryStage) throws Exception {
            stage = primaryStage;
            toBeRemoved = new Button();
            root = new Group();
            root.getChildren().add(toBeRemoved);
            stage.setScene(new Scene(root));
            stage.setOnShown(l -> {
                Platform.runLater(() -> startupLatch.countDown());
            });
            stage.show();
        }
    }

    @BeforeClass
    public static void initFX() throws Exception {
        startupLatch = new CountDownLatch(1);
        new Thread(() -> Application.launch(StyleMemoryLeakTest.TestApp.class, (String[])null)).start();
        assertTrue("Timeout waiting for FX runtime to start", startupLatch.await(15, TimeUnit.SECONDS));
    }

    @Test
    public void testRootNodeMemoryLeak() throws Exception {
        Util.runAndWait(() -> {
            root.getChildren().clear();
            stage.hide();
        });
        JMemoryBuddy.memoryTest((checker) -> {
            checker.assertCollectable(stage);
            checker.setAsReferenced(toBeRemoved);
            stage = null;
        });
    }

    @AfterClass
    public static void teardownOnce() {
        Platform.runLater(() -> {
            Platform.exit();
        });
    }
}

Copy link
Member Author

@FlorianKirmaier FlorianKirmaier Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added your verison of the unit-test. You forgot root = null; which was why the test failed.

If I would rewrite the test, I would put everything into the TestMethod. Because this way it's not necessary to set all the class-variables to null. It also wouldn't reuse the Stage of the Application. The scope of the test would be much smaller, because the actual test and the initialization of JavaFX would be separated.

Should I change it that way?

Copy link
Member

@arapte arapte Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with that too. In that case my only suggestion would be, use method Stage.setOnShown() like in the current version to make sure that Stage is shown and not to rely on Thread.sleep().

It also wouldn't reuse the Stage of the Application.

In that case, test will not require the TestApp class. You can use Platform.startup() method to start JavaFX runtime and also may need a call to Platform.setImplicitExit​(false) to make sure JavaFX runtime does not exit when Stage is hidden. This way we can add multiple tests that create and hide stage in a same test file.

Added new verison of the unit-test
@FlorianKirmaier
Copy link
Member Author

@FlorianKirmaier FlorianKirmaier commented Mar 12, 2021

About whether we should use WeakReference here or not.

It definitely falls into the exception for referring to the Parrent of a Node. (Or to the Parent in a more abstract sense, I think it can also be the parent of the parent, or even from another SceneGraph.)

I don't know the CSS code very well, so I currently don't have the knowledge how to change it.
But if we would change this variable, whenever the node is removed from the SceneGraph, my concern would be that it would have an unfavorable complexity. Currently (I hope) the complexity of removing a Node from the SceneGraph is O(1). If we would remove the stylehelper from the node, then the complexity would be O().

The current change assumes that we don't process the CSS, when a node is removed from the SG. This is why it isn't checked for null. I would argue, if this causes an Error, then it just reveals another issue, which would be preferable over a more complicated fix, and also changing the complexity of removing nodes from the SG.

@arapte
Copy link
Member

@arapte arapte commented Mar 15, 2021

Below is a fix that I tried, It fixes the issue. The system test passed with this change.
I was suggesting a solution like this where we can know exactly when to null the reference. The change is not extensively tested though. (For example, test what happens if we remove a node and add it back, OR remove a node and add it to a different scenegraph)
However, in this case using WeakReference approach seems harmless. Using WeakReference for Listeners is not clean and may cause issues, but a WeakReference to Node should not cause any harm.

I would recommend earlier way to explicitly release references/resources when it is possible. That way we get to have the control instead of gc.


diff --git a/modules/javafx.graphics/src/main/java/javafx/scene/CssStyleHelper.java b/modules/javafx.graphics/src/main/java/javafx/scene/CssStyleHelper.java
index fd02c0c1ad..471d0071b5 100644
--- a/modules/javafx.graphics/src/main/java/javafx/scene/CssStyleHelper.java
+++ b/modules/javafx.graphics/src/main/java/javafx/scene/CssStyleHelper.java
@@ -36,6 +36,7 @@ import java.util.Set;
 
 import javafx.beans.property.ObjectProperty;
 import javafx.beans.property.SimpleObjectProperty;
+import javafx.beans.value.ChangeListener;
 import javafx.beans.value.WritableValue;
 import com.sun.javafx.css.CascadingStyle;
 import javafx.css.CssMetaData;
@@ -82,6 +83,14 @@ final class CssStyleHelper {
         this.triggerStates = new PseudoClassState();
     }
 
+    ChangeListener<Scene> sceneChangeListener;
+
+    static void dispose(CssStyleHelper styleHelper, Node node) {
+        styleHelper.resetToInitialValues(node);
+        styleHelper.firstStyleableAncestor = null;
+        node.sceneProperty().removeListener(styleHelper.sceneChangeListener);
+    }
+
     /**
      * Creates a new StyleHelper.
      */
@@ -158,7 +167,7 @@ final class CssStyleHelper {
                 // If this node had a style helper, then reset properties to their initial value
                 // since the node won't have a style helper after this call
                 if (node.styleHelper != null) {
-                    node.styleHelper.resetToInitialValues(node);
+                    dispose(node.styleHelper, node);
                 }
 
                 //
@@ -181,8 +190,14 @@ final class CssStyleHelper {
         // If this node had a style helper, then reset properties to their initial value
         // since the style map might now be different
         if (node.styleHelper != null) {
-            node.styleHelper.resetToInitialValues(node);
+            dispose(node.styleHelper, node);
         }
+        helper.sceneChangeListener = (ov, oldScene, newScene) -> {
+            if (newScene == null) {
+                helper.firstStyleableAncestor = null;
+            }
+        };
+        node.sceneProperty().addListener(helper.sceneChangeListener);
         return helper;
     }
 

Rewrote the style memoryleak test
@FlorianKirmaier
Copy link
Member Author

@FlorianKirmaier FlorianKirmaier commented Mar 22, 2021

I rewrote now the test. The initialization and teardown of JavaFX are now separated from the actual test. Now also none of the variables which is used in the test, are accessible from outside the test, and vise versa.

Should I switch the fix to your solution, or should I keep mine which is based on WeakReferences? As you've mentioned, WeakReference should be fine here too.
As I've mentioned, doing something for every node, whose scene is set to null, might change the complexity of removing a node from O(1) to O({size-of-tree}). I think it's also quite important to support fast removing/adding subtrees.

@kevinrushforth
Copy link
Member

@kevinrushforth kevinrushforth commented Mar 22, 2021

I think the fix based on WeakReference is fine in this case, for the reasons discussed (it is a back link to an ancestor node) and given the performance concerns. I'd like Ambarish to comment as well.

@arapte
Copy link
Member

@arapte arapte commented Mar 23, 2021

I think it's also quite important to support fast removing/adding subtrees.

Yes, Adding listener seems over kill here. Lets go with WeakReference approach. I will take a look at the test update.

Copy link
Member

@arapte arapte left a comment

Provided few comments on test.

Minor cleanup based on codereview
@FlorianKirmaier
Copy link
Member Author

@FlorianKirmaier FlorianKirmaier commented Mar 26, 2021

I've added the requested changes from the codereview

arapte
arapte approved these changes Apr 7, 2021
Copy link
Member

@arapte arapte left a comment

looks good to me.

Copy link
Member

@kevinrushforth kevinrushforth left a comment

The fix looks OK, although I left a question below about ensuring that there cannot be an NPE.

The test doesn't catch the bug on my system (Windows 10). Even when I run it without your fix, it passes. Is it platform-specific?

@@ -351,7 +352,7 @@ private static boolean canReuseStyleHelper(final Node node, final StyleMap style

/* This is the first Styleable parent (of Node this StyleHelper belongs to)
* having a valid StyleHelper */
private Node firstStyleableAncestor;
private WeakReference<Node> firstStyleableAncestor = null;
Copy link
Member

@kevinrushforth kevinrushforth Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since firstStyleableAncestor is initialized to null, rather than new WeakReference(null), it might be possible for a call to firstStyleableAncestor.get() to throw an NPE. Can you comment on what analysis and testing you've done to ensure that this won't happen in some situations? A quick look shows that in the two places a CssStyleHelper object is created, firstStyleableAncestor is set shortly after, but it wouldn't hurt to take a closer look.

Copy link
Member

@kevinrushforth kevinrushforth Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution would be to create an EMPTY_NODE singleton and initialize it to that:

    private static final WeakReference<Node> EMPTY_NODE = new WeakReference<>(null);

    private WeakReference<Node> firstStyleableAncestor = EMPTY_NODE; 

Minor simplification of the code, based on code review
@FlorianKirmaier
Copy link
Member Author

@FlorianKirmaier FlorianKirmaier commented Apr 11, 2021

It shouldn't be platform-specific,
but maybe there are some subtle differences
in the ordering between CSS/layout/rendering, which can cause OS-specific differences.

As suggested, I've now changed the initial value also to be a WeakReference!
This way, it's definitely easier to reason that the code still behaves the same.

arapte
arapte approved these changes Apr 21, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Apr 21, 2021

@FlorianKirmaier This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8263402: MemoryLeak: Node hardreferences it's previous Parent after csslayout and getting removed from the scene

Reviewed-by: arapte, kcr

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 45 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@arapte, @kevinrushforth) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready label Apr 21, 2021
@kevinrushforth
Copy link
Member

@kevinrushforth kevinrushforth commented Apr 27, 2021

@FlorianKirmaier this is ready to integrate whenever you are ready.

@FlorianKirmaier
Copy link
Member Author

@FlorianKirmaier FlorianKirmaier commented Apr 27, 2021

/integrate

@FlorianKirmaier
Copy link
Member Author

@FlorianKirmaier FlorianKirmaier commented Apr 27, 2021

Seem's like i've missed it, started integrate now.

@openjdk openjdk bot added the sponsor label Apr 27, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Apr 27, 2021

@FlorianKirmaier
Your change (at version db7e208) is now ready to be sponsored by a Committer.

@nlisker
Copy link
Collaborator

@nlisker nlisker commented Apr 27, 2021

/sponsor

@openjdk
Copy link

@openjdk openjdk bot commented Apr 27, 2021

@nlisker @FlorianKirmaier Since your change was applied there have been 45 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

Pushed as commit 33bbf3f.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants