-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8321176: [Screencast] make a second attempt on screencast failure #16978
Conversation
…prove pipewire error handling
👋 Welcome back antbob! A progress list of the required criteria for merging this PR into |
@azvegint please have a look and let me know what you think. i dont have access to the issue tracker so lets have a discussion here and adopt this patch accordingly if possible. |
Webrevs
|
src/java.desktop/unix/classes/sun/awt/screencast/ScreencastHelper.java
Outdated
Show resolved
Hide resolved
src/java.desktop/unix/native/libawt_xawt/awt/screencast_pipewire.c
Outdated
Show resolved
Hide resolved
src/java.desktop/unix/native/libawt_xawt/awt/screencast_pipewire.c
Outdated
Show resolved
Hide resolved
src/java.desktop/unix/classes/sun/awt/screencast/ScreencastHelper.java
Outdated
Show resolved
Hide resolved
@azvegint thanks for review! yeah the CSR looks like a long story so lets skip it in this patch however i feel that the problem has to be addressed somehow going forward as i dont think the current behavior is acceptable. there could be cases where it can cause real problems eg imagine some sort of medical imaging app where it could fail intermittently and report wrong pixel colors etc. can you please file a bug based on my description above ? and also something i thought i should mention wrt doCleanup() function. it might be a good idea to protect it with a mutex or something. it does look safe ATM but i haven't looked at related libpipewire internals to assert that with full certainty. there perhaps could be a situation where doCleanup() is called internally but also from java cleanup thread (the one that sleeps for 2 seconds) and if they interleave/race it might cause unexpected behavior or crash in libpipewire as its rather sensible in terms of teardown sequence (i've seen its crashing due to that in my testing and thats why i have reshuffled some related calls in this patch). anyways, this is something to look into and check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
can you please file a bug based on my description above ?
Filed the JDK-8321475 for this
and also something i thought i should mention wrt doCleanup() function. it might be a good idea to protect it with a mutex or something. it does look safe ATM but i haven't looked at related libpipewire internals to assert that with full certainty. there perhaps could be a situation where doCleanup() is called internally but also from java cleanup thread (the one that sleeps for 2 seconds) and if they interleave/race it might cause unexpected behavior or crash in libpipewire as its rather sensible in terms of teardown sequence (i've seen its crashing due to that in my testing and thats why i have reshuffled some related calls in this patch). anyways, this is something to look into and check.
Sure, we should revisit it later.
@antbob This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 107 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@azvegint, @prrace) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
@azvegint super, thanks again! |
/integrate |
fp_pw_thread_loop_lock(pw.loop); | ||
fp_pw_stream_disconnect(screenProps->data->stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this disconnect inside the lock in an earlier PR
https://github.com/openjdk/jdk/pull/14428/files?diff=unified&w=0#diff-3ba5df79cdd3da36b21bf29b4e8de462dd61e6a21dfe0816e4d84c18bbfb76b2R92
but then it was moved out of the lock in another PR to fix a bug https://bugs.openjdk.org/browse/JDK-8310334
https://github.com/openjdk/jdk/pull/15250/files#diff-3ba5df79cdd3da36b21bf29b4e8de462dd61e6a21dfe0816e4d84c18bbfb76b2
So how can it be right to put it back inside the lock ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prrace the related previous fixes didnt really fix anything and just re-shuffled things around to dodge specific problems encountered. the libpipewire API locking and signaling has been fundamentally broken before my patch for JDK-8320655. this is understandable given libpipewire lack of proper documentation in that area and no clear specs on when and how locking and signaling should be done.
this particular case you refer to you need to lock around disconnect bc if you don't there is a chance the disconnect API can fail here
https://github.com/PipeWire/pipewire/blob/master/src/pipewire/thread-loop.c#L100
since this is recursive lock, taking it explicitly before calling disconnect ensures this does not happen. if you test the old implementation with libpipewire debug enabled you would see it constantly failing in related API and complaining functions being called in the "wrong context", including this code in question.
In the client, we have a rule that PR requires two people approval for integration (despite the hard rule of 1 mentioned in the header). |
/sponsor |
Going to push as commit 92fd490.
Your commit was automatically rebased without conflicts. |
This patch adds re-try logic to libpipewire screencast error handling as discussed in PR #16794 and also brings some additional error handling and thread safety improvements. Specifically around cleanup order where incorrect ordering lead to native memory corruption issues, and lock/unlock accounting that while mostly harmless (with the current libpipewire implementation) did pollute the stderr on jtreg tests, making some tests (expecting clean stderr) to fail.
The real major change here however is the throw of the RuntimeException which can propagate to public
java.awt.Robot. createMultiResolutionScreenCapture, createScreenCapture, getPixelColor methods. I'm not sure the plain RuntimeException is the way to go here so this is just a placeholder of sorts. A separate/specific runtime exception can be created for this BUT something needs to be done here as the current implementation fails to convey libpipewire failures to those public API callers and since they have no way of detecting such errors otherwise they have no way of knowing that the data returned by those API is in fact invalid (eg black screens etc). The reason for using an unchecked exception here is driven mainly by the following factors:
Since those are public API, their contracts can potentially make it difficult to introduce specific additional checked exceptions or return values (as appropriate) as those could potentially break existing API use.
The libpipewire errors of that kind are rare and usually indicate there is something wrong with the desktop stack eg some fatal configuration or run time error that is normally not supposed to happen and given this patch now goes extra step re-trying on such failures it stands to reason runtime unchecked exception makes sense when that fails as well.
Creating checked exceptions for such specific native implementation dependent errors and propagating such exceptions thru the call tree does not make much sense as most public API users won't even know how to handle them without knowing native implementation specifics.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16978/head:pull/16978
$ git checkout pull/16978
Update a local copy of the PR:
$ git checkout pull/16978
$ git pull https://git.openjdk.org/jdk.git pull/16978/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16978
View PR using the GUI difftool:
$ git pr show -t 16978
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16978.diff
Webrev
Link to Webrev Comment