8255248: NullPointerException in JFXPanel due to race condition in HostContainer#1178
8255248: NullPointerException in JFXPanel due to race condition in HostContainer#1178prsadhuk wants to merge 9 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back psadhukhan! A progress list of the required criteria for merging this PR into |
Webrevs
|
|
This avoids the NPE. Overall, I think that we should choose between below options-
I am not well versed with this area of the code and hence cannot comment on ramifications of Option 2. Also, can we add a test? (either automated or manual) - I see that there is a sample in JBS that demonstrates this issue. |
|
The proposed patch does not appear to fix the issue. Here's how the code currently looks like (after the patch is applied): 786 @Override
787 protected void paintComponent(Graphics g) {
788 if (scenePeer == null) {
789 return;
790 }
791 if (pixelsIm == null) {
792 createResizePixelBuffer(scaleFactorX, scaleFactorY);
793 if (pixelsIm == null) {
794 return;
795 }
796 }
797 DataBufferInt dataBuf = (DataBufferInt)pixelsIm.getRaster().getDataBuffer();
798 int[] pixelsData = dataBuf.getData();
799 IntBuffer buf = IntBuffer.wrap(pixelsData);
800 if (scenePeer != null && !scenePeer.getPixels(buf, pWidth, pHeight)) {
801 // In this case we just render what we have so far in the buffer.
802 }Note that in L788, we return early if Similiar issues can be observed in other places, for example later in the same file: private class HostContainer implements HostInterface {
@Override
public void setEmbeddedScene(EmbeddedSceneInterface embeddedScene) {
...
scenePeer = embeddedScene;
....
invokeOnClientEDT(() -> {
....
if (scenePeer != null) {
scenePeer.setDragStartListener(dnd.getDragStartListener());
}
});
}
}Note that Since it appears that In addition to that, we need to account for concurrent code execution. What happens if thread A runs code on If it is indeed safe to concurrently run code on different instances of a scene peer, a possible approach could be to read the value of the shared field into a local variable, and perform the null checking on the local variable: EmbeddedSceneInterface scenePeer = null;
synchronized (this) {
scenePeer = this.scenePeer;
}
if (scenePeer != null) {
...
} |
|
/reviewers 2 |
|
@kevinrushforth |
modules/javafx.swing/src/main/java/javafx/embed/swing/JFXPanel.java
Outdated
Show resolved
Hide resolved
|
After looking at the class for a bit longer and seeing how its state is read and mutated on different threads, it might be a better approach to enforce serialized code execution in all cases. This might be done as follows:
If done correctly, no method of the class will ever run concurrently with another method of the class, and the memory effects of |
I think it makes sense, especially since there is at least one place where the whole method might need to be synchronized (JXFPanel:1090) The key word is if done correctly. One thing to look for is the possibility of introducing a deadlock and regression, so we may need to analyze the callers and perform more extensive testing. |
| lStagePeer = stagePeer; | ||
| } | ||
| lStagePeer = embeddedStage; | ||
| if (lStagePeer == null) { |
There was a problem hiding this comment.
there is something wrong with this code:
why we need a locked assignment on line 1044 that immediately gets overwritten on line 1046?
| } | ||
| lScenePeer.setPixelScaleFactors((float) scaleFactorX, (float) scaleFactorY); | ||
| synchronized(LOCK) { | ||
| scenePeer = lScenePeer; |
There was a problem hiding this comment.
I wonder if the whole method should be synchronized?
the logic around scenePeer is asking to be atomic, isn't it?
There was a problem hiding this comment.
The change as-is doesn't work correctly; if embeddedScene is null, scenePeer is now unchanged.
| } | ||
| }); | ||
| SwingUtilities.invokeAndWait(JFXPanelNPETest::createUI); | ||
| for (int i = 0; i < 300; i++) { |
There was a problem hiding this comment.
is 300 sufficient?
| for (int i = 0; i < 300; i++) { | ||
| SwingUtilities.invokeLater(contentPane::repaint); | ||
| Platform.runLater(() -> contentPane.setScene(null)); | ||
| Thread.sleep(100); |
There was a problem hiding this comment.
I would recommend replacing a fixed number with a random value.
(Typically, when a random is added to the test, it would make sense to print its seed to stdout, for the sake of reproducing a possible failure later; but in this case it might be unnecessary because we are dealing with multiple threads and that adds its own degree of randomness, completely eliminating reproducibility. Or, perhaps, we still need to print the seed to maximize the probability of reproducing the failed scenario.)
There was a problem hiding this comment.
Despite downvote, I still think randomness in the unit test is a good thing.
For more background, please take a look at the second answer in
https://stackoverflow.com/questions/32458/random-data-in-unit-tests
That is exactly my concern. It would need a lot more very careful analysis before making this synchronous. |
It seems to me that this whole class is asking for more careful analysis, considering that there seem to be several concurrency bugs in there. I don't quite like the idea of a system test "proving" that there's no race condition. That should be done through rigorous analysis instead. The first (and incorrect) proposed fix shows that the system test would have failed in demonstrating that the defects are no longer there. |
hjohn
left a comment
There was a problem hiding this comment.
It's not entirely clear to me which thread is calling which methods in this class, and it would help to clearly mark which part of the class can be called from the FX thread and which parts from the AWT thread.
I also question how some of the other fields are used. Many are marked volatile which is insufficient if you don't want to observe partial changes of x/y coordinates for example.
I agree with @mstr2 that a more careful analysis couldn't hurt. There seem to be only 2 threads involved, the AWT and FX Thread. The fields they are sharing should be grouped and marked with a comment that they're shared. Any access to these fields should then be only while holding a lock.
modules/javafx.swing/src/main/java/javafx/embed/swing/JFXPanel.java
Outdated
Show resolved
Hide resolved
| } | ||
| lScenePeer.setPixelScaleFactors((float) scaleFactorX, (float) scaleFactorY); | ||
| synchronized(LOCK) { | ||
| scenePeer = lScenePeer; |
There was a problem hiding this comment.
The change as-is doesn't work correctly; if embeddedScene is null, scenePeer is now unchanged.
I guess if we have to make all public methods in JFXPanel synchronous, then the same might be called for SwingNode too, .. |
|
@prsadhuk this pull request can not be integrated into git checkout JDK-8255248
git fetch https://git.openjdk.org/jfx.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
andy-goryachev-oracle
left a comment
There was a problem hiding this comment.
overall looks good - the new LOCK'ing logic appears I have some minor comments
| synchronized(LOCK) { | ||
| return pHeight; | ||
| } | ||
| } |
There was a problem hiding this comment.
perhaps we could add setCompSize(int, int) to set both pHeight and pWidth at the same time, in order to avoid multiple LOCK'ed blocks down below (L638, L1068)?
| synchronized(LOCK) { | ||
| return scaleFactorX; | ||
| } | ||
| } |
There was a problem hiding this comment.
same idea, possibly add setScaleFactor(double, double)?
| int lHeight = getCompHeight(); | ||
| double lScaleFactorX = getScaleFactorX(); | ||
| double lScaleFactorY = getScaleFactorY(); | ||
| if (lScenePeer == null) { |
There was a problem hiding this comment.
minor: move check on L885 right after L880, no need to lock 4 times if lScenePeer is null
| scenePeer.setDragStartListener(dnd.getDragStartListener()); | ||
| } | ||
| SwingDnD lDnD = getDnD(); | ||
| lDnD = new SwingDnD(JFXPanel.this, embeddedScene); |
There was a problem hiding this comment.
is there a need for getting getDnd() on L1146 and immediately overwriting it on L1147?
should it be
SwingDnD lDnD = new SwingDnD(JFXPanel.this, embeddedScene);
?
| import org.junit.AfterClass; | ||
| import org.junit.Assert; | ||
| import org.junit.BeforeClass; | ||
| import org.junit.Test; |
There was a problem hiding this comment.
should we be using junit5?
| for (int i = 0; i < 300; i++) { | ||
| SwingUtilities.invokeLater(contentPane::repaint); | ||
| Platform.runLater(() -> contentPane.setScene(null)); | ||
| Thread.sleep(100); |
There was a problem hiding this comment.
Despite downvote, I still think randomness in the unit test is a good thing.
For more background, please take a look at the second answer in
https://stackoverflow.com/questions/32458/random-data-in-unit-tests
andy-goryachev-oracle
left a comment
There was a problem hiding this comment.
overall looks good - the new LOCK'ing logic seems well designed.
added some minor suggestions - I hope my comments are still there after I accidentally clicked on 'Cancel Review'...
|
|
||
| private final Object LOCK = new Object(); | ||
|
|
||
| // Accessed on EDT only |
There was a problem hiding this comment.
is this comment valid anymore?
perhaps we should mention the fact that access to these fields should be LOCK'ed instead?
There was a problem hiding this comment.
The changes as they are now are IMHO entirely inadequate. Simply surrounding all accesses to shared fields with a narrowly scoped synchronized block is not a magic solution. You will want the fields to be coherent for a longer period than that of a single field access. For example:
int lWidth = getCompWidth();
int lHeight = getCompHeight();
The new methods getCompWidth and getCompHeight are synchronized, but the overall code here is not. This means that if the control resizes from 1000 x 1 to 1 x 1000 you may end up with a lWidth and lWeight of 1000 x 1000 or 1 x 1.
IMHO the steps to resolve this to properly synchronized code is:
- Find all external entry points (all non-private methods, and any private methods used by a listener callback)
- For the entry points found above, mark which are called by FX and which by AWT
- For the AWT methods note down all fields it accesses, including any in any methods it is calling
- For the FX methods note down all fields it accesses, including any in any methods it is calling
- Find the fields that are accessed in both the AWT and FX lists
- Mark any methods that access those fields as
synchronized-- this may be too course grained if these methods are large, do I/O, call back into other systems -- in that case you may need to move some code around to do all the code that needs synchronization first or last and use asynchronizedblock.
edit: further clarifications
|
You do have a valid point, @hjohn . This way no locking/synchronization is required, we have no shared fields, and everything happens in the right thread (at the expense of some memory allocation). What do you guys think? |
|
I agree with @hjohn and will add one more comment: even if all local effects are taken into account, the code in this class might be racing against entirely unrelated code in the JavaFX framework. These races can't be reasonably addressed with local synchronization. Coordinating multiple threads with potentially unknown side effects requires a bigger analysis effort, and the changes here do not inspire confidence that this has happened. I see no reason to replace the existing defective implementation with another potentially defective implementation. |
|
@prsadhuk This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
|
@prsadhuk This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the |
Due to transient datatype of scenePeer, it can become null which can result in NPE in scenarios where scene is continuously been reset and set, which warrants a null check, as is done in other places for the same variable.
Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jfx.git pull/1178/head:pull/1178$ git checkout pull/1178Update a local copy of the PR:
$ git checkout pull/1178$ git pull https://git.openjdk.org/jfx.git pull/1178/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 1178View PR using the GUI difftool:
$ git pr show -t 1178Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jfx/pull/1178.diff
Webrev
Link to Webrev Comment