Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8201567: QuantumRenderer modifies buffer in use by JavaFX Application Thread #255

Closed
wants to merge 2 commits into from

Conversation

@jgneff
Copy link
Contributor

jgneff commented Jun 26, 2020

Fixes JDK-8201567.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8201567: QuantumRenderer modifies buffer in use by JavaFX Application Thread

Reviewers

  • Kevin Rushforth (kcr - Reviewer)
  • Ambarish Rapte (arapte - Reviewer)

Download

$ git fetch https://git.openjdk.java.net/jfx pull/255/head:pull/255
$ git checkout pull/255

… Thread
@bridgekeeper
Copy link

bridgekeeper bot commented Jun 26, 2020

👋 Welcome back jgneff! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request.

@openjdk openjdk bot added the rfr label Jun 26, 2020
@mlbridge
Copy link

mlbridge bot commented Jun 26, 2020

Webrevs

@jgneff
Copy link
Contributor Author

jgneff commented Jun 26, 2020

The method QueuedPixelSource.usesSameBuffer calls Pixels.getPixels on the QuantumRenderer thread while trying to find a buffer that's not in use, yet in doing so it rewinds buffers in use on the JavaFX Application Thread.

This pull request modifies QueuedPixelSource.usesSameBuffer to call a new method, Pixels.getBuffer, that returns the buffer without rewinding it.

Because the issue only affects the final rendered pixels, I added assertions to catch the error in the Monocle classes HeadlessScreen and EPDScreen instead of creating a unit test case. I tried to stay within the guidelines of Programming With Assertions, which states:

As a rule, the expressions contained in assertions should be free of side effects: evaluating the expression should not affect any state that is visible after the evaluation is complete. One exception to this rule is that assertions can modify state that is used only from within other assertions.

Below are animated images showing a few frames from a JavaFX animation on the Monocle VNC platform before and after the fix.

Before the fix

monocle-vnc-sw-before

After the fix

monocle-vnc-sw-after

Background

The error occurs with software rendering when a subclass of com.sun.glass.ui.View uses the buffer position to upload a pixel buffer to the screen. That can occur in at least two cases:

  1. when the pixels are uploaded to a composition byte buffer on the heap, and

  2. when the width of the JavaFX scene is less than the width of the screen.

In the first case, the non-direct byte buffer method IntBuffer.put(IntBuffer) relies on the source buffer position when copying the pixels, while the corresponding direct byte buffer method DirectIntBufferU.put(IntBuffer) does not.

In the second case, Framebuffer.composePixels, for example, uses the buffer position to loop through the source pixel buffer when its width does not match the width of the destination pixel buffer.

Most subclasses of View upload pixels with native methods that do not use the buffer position, so their final rendered pixels are not corrupted.

Because of implementation choices, only the Monocle platforms end up having visible errors in their rendered pixels. While there are ways to work around the issue just for Monocle, this pull request is an attempt to correct the error at its source.

Testing

To reproduce the problem, I used the JavaFX application epd-javafx with the following command:

$ java --add-modules=javafx.graphics \
  --module-path=$HOME/lib/armv6hf-sdk/lib \
  -Dglass.platform=Monocle -Dmonocle.platform=VNC -Dprism.order=sw \
  -jar dist/epd-javafx.jar --pattern=3 --loops=0

You can run the command on an actual ARM device or on a QEMU armhf virtual machine running on an amd64 Linux host. I connected to the Monocle VNC server with the Remmina VNC client on port 5901 with 24-bit color and encryption disabled.

Even without access to an ARM device or virtual machine, you can capture the error on any Linux desktop by adding similar assert statements to the _uploadPixels method of GtkView, along with calls to Thread.sleep to change the timing of the two threads.

Let me know if you would like information on installing a QEMU armhf virtual machine, or details on the assertions and sleep calls that allowed me to captured the error directly on my Dell Linux workstation.

@kevinrushforth
Copy link
Member

kevinrushforth commented Jun 26, 2020

/reviewers 2

@openjdk
Copy link

openjdk bot commented Jun 26, 2020

@kevinrushforth
The number of required reviews for this PR is now set to 2 (with at least 1 of role reviewers).

Copy link
Member

kevinrushforth left a comment

At first glance the fix looks fine (aside from the assert statements that need to be removed). It will also need to be tested using the SW pipeline on all platforms.

@jgneff
Copy link
Contributor Author

jgneff commented Jun 30, 2020

It will also need to be tested using the SW pipeline on all platforms.

Thanks for the reminder. I managed to build JavaFX on Windows and macOS today, so I'll test this pull request on those platforms in addition to Linux desktop and embedded.

@jgneff
Copy link
Contributor Author

jgneff commented Jul 1, 2020

I tested this pull request on all of the following platforms:

  • JavaFX desktop platforms (amd64 architecture)
    • Windows SDK on Windows 10 Pro Version 2004
    • Mac OS X SDK on macOS 10.15.5 (Catalina)
    • Linux SDK on Ubuntu 16.04.6 LTS (Xenial Xerus)
  • JavaFX embedded platforms (armhf architecture)
    • armv6hf SDK (Monocle EPD) on Ubuntu 14.04.6 LTS (Trusty Tahr)
    • armv6hf SDK (Monocle VNC) on Ubuntu 20.04 LTS (Focal Fossa)

This is a tricky timing problem on most platforms, so I used rather intrusive techniques to catch the error and verify it was fixed. The bug is easily visible only on the Monocle VNC platform.

Desktop

I tested the desktop platforms as follows:

  1. I added assert statements and calls to Thread.sleep as shown by this commit to catch the error. It looks like a lot of changes, but they simply call sleep to change the timing of the threads so that the JavaFX Application Thread is unable to keep up with the QuantumRenderer. The assert statements catch the error when the rendering thread starts looking for an unused buffer.

    All platforms printed many InvalidMarkException errors as the buffer position was modified while in use on the JavaFX application thread. The Linux and Windows platforms printed, "Exception in thread 'JavaFX Application Thread' java.nio.InvalidMarkException," while the macOS platform printed, "Exception in thread 'AppKit Thread' java.nio.InvalidMarkException."

  2. I applied the fix to call getBuffer instead of getPixels in QueuedPixelSource.

  3. After the fix, no errors were detected by the assertions on any of the platforms.

Embedded

I tested the embedded platforms as follows:

  1. I added only assert statements to the HeadlessScreen and EPDScreen classes, as shown below. Both platforms printed many InvalidMarkException errors as the buffer position was modified while in use on the JavaFX application thread.

  2. I applied the fix to call getBuffer instead of getPixels in QueuedPixelSource.

  3. After the fix, no errors were detected by the assertions on either platform.

HeadlessScreen and EPDScreen

     @Override
     public synchronized void uploadPixels(Buffer b, int x, int y,
            int width, int height, float alpha) {
+        assert b.mark() == b;
         pixels.composePixels(b, x, y, width, height, alpha);
+        assert b.reset() == b;
     }

For the Monocle VNC platform, you can also simply connect and watch the bug corrupt the frames sent to the VNC client, as shown in my prior comment on this pull request.

Other results

I found some unexpected results, listed below.

  • JavaFX on Linux does not limit its frame rate to 60 Hz when using the hardware pipeline, reaching over 350 frames per second.

  • JavaFX on macOS ignores the system property -Djavafx.animation.pulse=8 and runs at 60 Hz, even though it prints the message "Setting PULSE_DURATION to 8 hz."

  • JavaFX on macOS prints the error shown below when Platform.exit is called to end the application. The error also occurs on JavaFX 14.0.1 and 15-ea+6. The error does not occur when the window is closed manually using the mouse.

Java has been detached already, but someone is still trying to use it at
-[GlassViewDelegate dealloc]:
/Users/john/src/jfx/modules/javafx.graphics/src/main/native-glass/mac/GlassViewDelegate.m:204
0   libglass.dylib   0x000000010eb6c05d -[GlassViewDelegate dealloc] + 221
1   libglass.dylib   0x000000010eb71af6 -[GlassView3D dealloc] + 246
2   libobjc.A.dylib  0x00007fff66937054 _ZN19AutoreleasePoolPage12releaseUntilEPP11objc_object + 134
3   libobjc.A.dylib  0x00007fff6691bdba objc_autoreleasePoolPop + 175
4   CoreFoundation   0x00007fff2dad23c5 __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 23
5   CoreFoundation   0x00007fff2dad22f7 __CFRunLoopDoObservers + 457
6   CoreFoundation   0x00007fff2dad1895 __CFRunLoopRun + 874
7   CoreFoundation   0x00007fff2dad0ece CFRunLoopRunSpecific + 462
8   libjli.dylib     0x000000010be945c1 CreateExecutionEnvironment + 399
9   libjli.dylib     0x000000010be90752 JLI_Launch + 1354
10  java             0x000000010be83ca1 main + 375
11  libdyld.dylib    0x00007fff67acdcc9 start + 1
12  ???              0x000000000000000b 0x0 + 11

Test scripts

Below are the scripts I used to run the tests. The scripts include the system property -Djavafx.animation.pulse=8, but I removed it when trying to recreate the bug with the added assert and sleep statements.

linux.sh

#!/bin/bash
# Linux desktop
$HOME/opt/jdk-14.0.1/bin/java -ea --show-version \
    --module-path=$HOME/lib/javafx-sdk-dev/lib \
    --add-modules=javafx.graphics \
    -Dprism.order=sw -Djavafx.animation.pulse=8 \
    -jar dist/epd-javafx.jar \
    --width=800 --height=600 --pattern=3 --loops=0

macos.sh

#!/bin/bash
# macOS desktop
$HOME/opt/jdk-14.0.1.jdk/Contents/Home/bin/java -ea --show-version \
    --module-path=$HOME/lib/javafx-sdk-dev/lib \
    --add-modules=javafx.graphics \
    -Dprism.order=sw -Djavafx.animation.pulse=8 \
    -jar dist/epd-javafx.jar \
    --width=800 --height=600 --pattern=3 --loops=0

windows.sh

#!/bin/bash
# Windows desktop under Cygwin
$HOME/opt/jdk-14.0.1/bin/java -ea --show-version \
    --module-path=$(cygpath -m $HOME/lib/javafx-sdk-dev/lib) \
    --add-modules=javafx.graphics \
    -Dprism.order=sw -Djavafx.animation.pulse=8 \
    -jar dist/epd-javafx.jar \
    --width=800 --height=600 --pattern=3 --loops=0

windows.bat

REM Windows desktop under Command Prompt
C:\cygwin64\home\john\opt\jdk-14.0.1\bin\java -ea --show-version^
 --module-path=C:/cygwin64/home/john/lib/javafx-sdk-dev/lib^
 --add-modules=javafx.graphics^
 -Dprism.order=sw -Djavafx.animation.pulse=8^
 -jar dist\epd-javafx.jar^
 --width=800 --height=600 --pattern=3 --loops=0

monocle-epd.sh

Run with sudo.

#!/bin/bash
# Monocle EPD platform on Linux armhf
$HOME/opt/jdk-14.0.1+7/bin/java -ea --show-version \
    --module-path=$HOME/lib/armv6hf-sdk/lib \
    --add-modules=javafx.graphics \
    -Dglass.platform=Monocle -Dmonocle.platform=EPD -Dprism.order=sw \
    -Dmonocle.input.18/0/0/0.maxX=800 -Dmonocle.input.18/0/0/0.maxY=600 \
    -Dmonocle.epd.waveformMode=4 -jar dist/epd-javafx.jar \
    --width=800 --height=600 --pattern=3 --loops=0

monocle-vnc.sh

#!/bin/bash
# Monocle VNC platform on Linux armhf
/usr/lib/jvm/java-14-openjdk-armhf/bin/java -ea --show-version \
    --module-path=$HOME/lib/armv6hf-sdk/lib \
    --add-modules=javafx.graphics \
    -Dprism.order=sw -Djavafx.animation.pulse=8 \
    -Dglass.platform=Monocle -Dmonocle.platform=VNC \
    -jar dist/epd-javafx.jar \
    --width=1024 --height=600 --pattern=3 --loops=0
@mlbridge
Copy link

mlbridge bot commented Jul 2, 2020

Mailing list message from Ty Young on openjfx-dev:

On 6/25/20 10:56 PM, John Neffenger wrote:

Does this fix the years old Linux JavaFX buffer reset bug?

@kevinrushforth
Copy link
Member

kevinrushforth commented Jul 2, 2020

Does this fix the years old Linux JavaFX buffer reset bug?

Possibly. This is a race condition that can affect the use of UploadingPainter, which is used by the SW pipeline.

@kevinrushforth
Copy link
Member

kevinrushforth commented Jul 2, 2020

This fix might be a candidate for JavaFX 15, so I recommend to not merge the master branch.

If we don't spot anything of concern during the review, then we might ask you to retarget your PR to the jfx15 branch.

@arapte
Copy link

arapte commented Jul 8, 2020

This seems to be a safe fix to me. As mentioned in Pixels.java, the class is not thread safe but most of the other methods have the check Application.checkEventThread(), so those methods seem safe. But the the method getPixels() does not have this check, as it is at least accessed by two threads. The new method getBuffer() is a true get method and more correct for comparing the buffers.

@arapte
arapte approved these changes Jul 8, 2020
@kevinrushforth
Copy link
Member

kevinrushforth commented Jul 8, 2020

@jgneff this looks safe enough to get into 15. Go ahead an retarget it to the jfx15 branch.

@arapte this will invalidate your review, so you will need to re-review it once retargeted.

@kevinrushforth
Copy link
Member

kevinrushforth commented Jul 8, 2020

Oh, and regarding the following:

JavaFX on macOS prints the error shown below when Platform.exit is called to end the application.

That's a known issue on macOS 10.15:

JDK-8231558: [macos 10.15] Platform.exit causes assertion error on macOS Catalina

@kevinrushforth
Copy link
Member

kevinrushforth commented Jul 9, 2020

I am done with my testing. All looks good. Once this PR is retargeted to jfx15 I will approve it.

@jgneff jgneff changed the base branch from master to jfx15 Jul 9, 2020
@arapte
arapte approved these changes Jul 10, 2020
@openjdk
Copy link

openjdk bot commented Jul 10, 2020

@jgneff This change now passes all automated pre-integration checks. When the change also fulfills all project specific requirements, type /integrate in a new comment to proceed. After integration, the commit message will be:

8201567: QuantumRenderer modifies buffer in use by JavaFX Application Thread

Reviewed-by: kcr, arapte
  • If you would like to add a summary, use the /summary command.
  • To credit additional contributors, use the /contributor command.
  • To add additional solved issues, use the /issue command.

Since the source branch of this PR was last updated there have been 10 commits pushed to the jfx15 branch:

  • 126637f: 8201570: Get two bytes for the Linux input event type, not four
  • f3a0446: 8247963: Update SQLite to version 3.32.3
  • 32584db: 8238954: Improve performance of tiled snapshot rendering
  • 869ea40: 8244212: Optionally download media and webkit libraries from latest openjfx EA build
  • 62f8cee: 8247947: Build DirectShow Samples (Base Classes) from source checked into repo
  • 527cc2b: 8248551: [TestBug] Ignore two failing FXML unit tests which use Nashorn script engine
  • 45c9854: 8238080: FXMLLoader: if script engines implement javax.script.Compilable compile scripts
  • 15f97ed: 8240264: iOS: Unnecessary logging on every pulse when GL context changes
  • 2ca509a: 8193800: TreeTableView selection changes on sorting
  • a5878e0: 8214699: Node.getPseudoClassStates must return the same instance on every call

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid automatic rebasing, please merge jfx15 into your branch, and then specify the current head hash when integrating, like this: /integrate 126637f5889fb9338606f6d69b3045d28c2094aa.

As you are not a known OpenJDK Author, an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@kevinrushforth, @arapte) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready label Jul 10, 2020
@jgneff
Copy link
Contributor Author

jgneff commented Jul 10, 2020

/integrate

@openjdk openjdk bot added the sponsor label Jul 10, 2020
@openjdk
Copy link

openjdk bot commented Jul 10, 2020

@jgneff
Your change (at version d408506) is now ready to be sponsored by a Committer.

@kevinrushforth
Copy link
Member

kevinrushforth commented Jul 10, 2020

/sponsor

@openjdk openjdk bot closed this Jul 10, 2020
@openjdk openjdk bot added the integrated label Jul 10, 2020
@openjdk openjdk bot removed sponsor ready rfr labels Jul 10, 2020
@openjdk
Copy link

openjdk bot commented Jul 10, 2020

@kevinrushforth @jgneff The following commits have been pushed to jfx15 since your change was applied:

  • 126637f: 8201570: Get two bytes for the Linux input event type, not four
  • f3a0446: 8247963: Update SQLite to version 3.32.3
  • 32584db: 8238954: Improve performance of tiled snapshot rendering
  • 869ea40: 8244212: Optionally download media and webkit libraries from latest openjfx EA build
  • 62f8cee: 8247947: Build DirectShow Samples (Base Classes) from source checked into repo
  • 527cc2b: 8248551: [TestBug] Ignore two failing FXML unit tests which use Nashorn script engine
  • 45c9854: 8238080: FXMLLoader: if script engines implement javax.script.Compilable compile scripts
  • 15f97ed: 8240264: iOS: Unnecessary logging on every pulse when GL context changes
  • 2ca509a: 8193800: TreeTableView selection changes on sorting
  • a5878e0: 8214699: Node.getPseudoClassStates must return the same instance on every call

Your commit was automatically rebased without conflicts.

Pushed as commit d67c47f.

@ebresie
Copy link

ebresie commented Jul 23, 2020

Just curious and this may be OBE by now but... I see the check thread but when accessing the buffer itself, should it have some sort of synchronization of some type to make it more thread safe?

@kevinrushforth
Copy link
Member

kevinrushforth commented Jul 23, 2020

There is no need to synchronize the getBuffer method, since all it does is return the immutable buffer field.

In any case, this PR has been integrated, so if an additional issue is discovered, a new bug would need to be filed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.