Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8321176: [Screencast] make a second attempt on screencast failure #16978

Closed
wants to merge 2 commits into from

Conversation

antbob
Copy link
Contributor

@antbob antbob commented Dec 5, 2023

This patch adds re-try logic to libpipewire screencast error handling as discussed in PR #16794 and also brings some additional error handling and thread safety improvements. Specifically around cleanup order where incorrect ordering lead to native memory corruption issues, and lock/unlock accounting that while mostly harmless (with the current libpipewire implementation) did pollute the stderr on jtreg tests, making some tests (expecting clean stderr) to fail.

The real major change here however is the throw of the RuntimeException which can propagate to public

java.awt.Robot. createMultiResolutionScreenCapture, createScreenCapture, getPixelColor methods. I'm not sure the plain RuntimeException is the way to go here so this is just a placeholder of sorts. A separate/specific runtime exception can be created for this BUT something needs to be done here as the current implementation fails to convey libpipewire failures to those public API callers and since they have no way of detecting such errors otherwise they have no way of knowing that the data returned by those API is in fact invalid (eg black screens etc). The reason for using an unchecked exception here is driven mainly by the following factors:

  1. Since those are public API, their contracts can potentially make it difficult to introduce specific additional checked exceptions or return values (as appropriate) as those could potentially break existing API use.

  2. The libpipewire errors of that kind are rare and usually indicate there is something wrong with the desktop stack eg some fatal configuration or run time error that is normally not supposed to happen and given this patch now goes extra step re-trying on such failures it stands to reason runtime unchecked exception makes sense when that fails as well.

  3. Creating checked exceptions for such specific native implementation dependent errors and propagating such exceptions thru the call tree does not make much sense as most public API users won't even know how to handle them without knowing native implementation specifics.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8321176: [Screencast] make a second attempt on screencast failure (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16978/head:pull/16978
$ git checkout pull/16978

Update a local copy of the PR:
$ git checkout pull/16978
$ git pull https://git.openjdk.org/jdk.git pull/16978/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16978

View PR using the GUI difftool:
$ git pr show -t 16978

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16978.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 5, 2023

👋 Welcome back antbob! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 5, 2023
@openjdk
Copy link

openjdk bot commented Dec 5, 2023

@antbob The following label will be automatically applied to this pull request:

  • client

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the client client-libs-dev@openjdk.org label Dec 5, 2023
@antbob
Copy link
Contributor Author

antbob commented Dec 5, 2023

@azvegint please have a look and let me know what you think. i dont have access to the issue tracker so lets have a discussion here and adopt this patch accordingly if possible.

@mlbridge
Copy link

mlbridge bot commented Dec 5, 2023

Webrevs

@antbob
Copy link
Contributor Author

antbob commented Dec 6, 2023

@azvegint thanks for review! yeah the CSR looks like a long story so lets skip it in this patch however i feel that the problem has to be addressed somehow going forward as i dont think the current behavior is acceptable. there could be cases where it can cause real problems eg imagine some sort of medical imaging app where it could fail intermittently and report wrong pixel colors etc. can you please file a bug based on my description above ?

and also something i thought i should mention wrt doCleanup() function. it might be a good idea to protect it with a mutex or something. it does look safe ATM but i haven't looked at related libpipewire internals to assert that with full certainty. there perhaps could be a situation where doCleanup() is called internally but also from java cleanup thread (the one that sleeps for 2 seconds) and if they interleave/race it might cause unexpected behavior or crash in libpipewire as its rather sensible in terms of teardown sequence (i've seen its crashing due to that in my testing and thats why i have reshuffled some related calls in this patch). anyways, this is something to look into and check.

Copy link
Member

@azvegint azvegint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

can you please file a bug based on my description above ?

Filed the JDK-8321475 for this

and also something i thought i should mention wrt doCleanup() function. it might be a good idea to protect it with a mutex or something. it does look safe ATM but i haven't looked at related libpipewire internals to assert that with full certainty. there perhaps could be a situation where doCleanup() is called internally but also from java cleanup thread (the one that sleeps for 2 seconds) and if they interleave/race it might cause unexpected behavior or crash in libpipewire as its rather sensible in terms of teardown sequence (i've seen its crashing due to that in my testing and thats why i have reshuffled some related calls in this patch). anyways, this is something to look into and check.

Sure, we should revisit it later.

@antbob antbob changed the title 8321176: [Screencast] make a second attempt on screencast failure, improve pipewire error handling 8321176: [Screencast] make a second attempt on screencast failure Dec 6, 2023
@openjdk
Copy link

openjdk bot commented Dec 6, 2023

@antbob This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8321176: [Screencast] make a second attempt on screencast failure

Reviewed-by: azvegint, prr

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 107 new commits pushed to the master branch:

  • 5c12a18: 8320790: Update --release 22 symbol information for JDK 22 build 27
  • 7180088: 8321429: (fc) FileChannel.lock creates a FileKey containing two long index values, they could be stored as int values
  • 0c178be: 8321206: Make Locale related system properties StaticProperty
  • 6c13a30: 8312307: Obsoleted code in hb-jdk-font.cc
  • 5e6bfc5: 8321539: Minimal build is broken by JDK-8320935
  • 2c2d4d2: 8321485: remove serviceability/attach/ConcAttachTest.java from problemlist on macosx
  • 0eb299a: 8316141: Improve CEN header validation checking
  • b893a2b: 8321597: Use .template consistently for files treated as templates by the build
  • 05f9509: 8321374: Add a configure option to explicitly set CompanyName property in VersionInfo resource for Windows exe/dll
  • 701bc3b: 8295166: IGV: dump graph at more locations
  • ... and 97 more: https://git.openjdk.org/jdk/compare/c17b8cfafe5a2bbe29d38cfc6793c72b0430f6ca...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@azvegint, @prrace) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 6, 2023
@antbob
Copy link
Contributor Author

antbob commented Dec 6, 2023

@azvegint super, thanks again!

@antbob
Copy link
Contributor Author

antbob commented Dec 6, 2023

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Dec 6, 2023
@openjdk
Copy link

openjdk bot commented Dec 6, 2023

@antbob
Your change (at version a473b9d) is now ready to be sponsored by a Committer.

fp_pw_thread_loop_lock(pw.loop);
fp_pw_stream_disconnect(screenProps->data->stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prrace the related previous fixes didnt really fix anything and just re-shuffled things around to dodge specific problems encountered. the libpipewire API locking and signaling has been fundamentally broken before my patch for JDK-8320655. this is understandable given libpipewire lack of proper documentation in that area and no clear specs on when and how locking and signaling should be done.

this particular case you refer to you need to lock around disconnect bc if you don't there is a chance the disconnect API can fail here

https://github.com/PipeWire/pipewire/blob/master/src/pipewire/thread-loop.c#L100

since this is recursive lock, taking it explicitly before calling disconnect ensures this does not happen. if you test the old implementation with libpipewire debug enabled you would see it constantly failing in related API and complaining functions being called in the "wrong context", including this code in question.

@antbob
Copy link
Contributor Author

antbob commented Dec 7, 2023

@azvegint @prrace unless there are any outstanding issues with this patch that need addressing/correcting can you please sponsor it for me?

@azvegint
Copy link
Member

azvegint commented Dec 7, 2023

@azvegint @prrace unless there are any outstanding issues with this patch that need addressing/correcting can you please sponsor it for me?

In the client, we have a rule that PR requires two people approval for integration (despite the hard rule of 1 mentioned in the header).
So I am waiting for another approval.

@azvegint
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Dec 11, 2023

Going to push as commit 92fd490.
Since your change was applied there have been 109 commits pushed to the master branch:

  • d13302f: 8321387: SegmentAllocator:allocateFrom(AddressLayout, MemorySegment) does not throw stated UnsupportedOperationException
  • ce10844: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays)
  • 5c12a18: 8320790: Update --release 22 symbol information for JDK 22 build 27
  • 7180088: 8321429: (fc) FileChannel.lock creates a FileKey containing two long index values, they could be stored as int values
  • 0c178be: 8321206: Make Locale related system properties StaticProperty
  • 6c13a30: 8312307: Obsoleted code in hb-jdk-font.cc
  • 5e6bfc5: 8321539: Minimal build is broken by JDK-8320935
  • 2c2d4d2: 8321485: remove serviceability/attach/ConcAttachTest.java from problemlist on macosx
  • 0eb299a: 8316141: Improve CEN header validation checking
  • b893a2b: 8321597: Use .template consistently for files treated as templates by the build
  • ... and 99 more: https://git.openjdk.org/jdk/compare/c17b8cfafe5a2bbe29d38cfc6793c72b0430f6ca...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 11, 2023
@openjdk openjdk bot closed this Dec 11, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Dec 11, 2023
@openjdk
Copy link

openjdk bot commented Dec 11, 2023

@azvegint @antbob Pushed as commit 92fd490.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@antbob antbob deleted the bad_robot branch December 11, 2023 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client client-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants