Skip to content

4200096: OffScreenImageSource.removeConsumer NullPointerException #13408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from

Conversation

mickleness
Copy link
Contributor

@mickleness mickleness commented Apr 10, 2023

This resolves a 25 year old P4 ticket: a NullPointerException is printed to System.err needlessly.

This resolution involves confirming that an ImageConsumer is still registered before every notification.

I'll understand if this is rejected as unimportant, but I stumbled across this in the real world the other day and thought this was a simple enough bug to practice on.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-4200096: OffScreenImageSource.removeConsumer NullPointerException

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13408/head:pull/13408
$ git checkout pull/13408

Update a local copy of the PR:
$ git checkout pull/13408
$ git pull https://git.openjdk.org/jdk.git pull/13408/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 13408

View PR using the GUI difftool:
$ git pr show -t 13408

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13408.diff

Webrev

Link to Webrev Comment

mickleness and others added 7 commits May 22, 2022 04:50
Merge openjdk/jdk into mickleness/jdk
Updating mickleness/jdk from openjdk/jdk
Removing a consumer (which can happen implicitly if an ImageObserve returns false) could cause a NPE in OffScreenImageSource. This NPE was caught, but it was printed to System.err.

This came to my attention when a particular set of steps flooded our log with NPEs that were actually harmless/meaningless.

The new attached test case includes two related tests: one mimics the NPE I observed, and the other is more aligned with the original wording in ticket JDK-4200096.
Code cleanup: fixing error condition, remove redundant 'this'.
Rewriting this resolution. I think the previous approach would be too big a change for such a low priority bug.
@bridgekeeper
Copy link

bridgekeeper bot commented Apr 10, 2023

👋 Welcome back mickleness! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 10, 2023
@openjdk
Copy link

openjdk bot commented Apr 10, 2023

@mickleness The following label will be automatically applied to this pull request:

  • client

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the client client-libs-dev@openjdk.org label Apr 10, 2023
@mlbridge
Copy link

mlbridge bot commented Apr 10, 2023

Webrevs

@prrace
Copy link
Contributor

prrace commented Apr 10, 2023

So the repeated checks for null make it look like you are dealing with some other thread setting it to null

If that were so, I don't think this approach is correct because the repeated null checks aren't a guarantee.
It can still go null between you checking it and then de-referencing it.

But I see that add/removeConsumer are synchronized methods
And the private method sendPixels() is only called from the private method produce()
which is only called from addConsumer()

So it seems impossible for some other thread to do this.
Therefore you must be doing it to yourself on the same thread.

I tracked down that the catch of null and printStackTrace were added under this bug
https://bugs.openjdk.org/browse/JDK-4905411

There it is implied that the callbacks such as
theConsumer.imageComplete(ImageConsumer.SINGLEFRAMEDONE);
are where the null is happening

The person who suggested the fix (not the person who implemented the fix)
was the person who architected the whole image producer/consumer model and near as I can tell
the idea was that you have a programming bug removing the consumer too soon.
I'd expect that you'd see this problem consistently since there's only one thread involved
(unless the callback kicks off another thread to do it)

It also points out that you're unlikely to need the repeated checks since it can only be nulled out
during the call backs (again unless the callback starts another thread to do it - unlikely I think).

If you think what you are doing is correct (I think the architect of the code supposed otherwise
[update - but I think JDK code could also do this to you, so not necessarily your code]
and you want to be sure, I'd capture "theConsumer" into a local var in each of the methods
sendPixel() and produce() and check and use the local var. Then no one can ever null that
out while you are using it. Since that's the intention of the design, I think that's a safe thing
to do.

Thinking about this a bit more, I think I understand why you are doing what you are doing.
(explanations can save a lot of time).
You are trying to deal with all those callbacks (setPixels, imageComplete) potentially
removing the consumer and since its not a threading problem, the check will be valid.
So the question is do we want to

  1. check and return as soon as it is removed (as now), or
  2. capture to a local

I can see both sides of it.
I'd like to know what stack trace you see.

However I suggest that if the null check finds it is null on entry that there be an

ImageConsumer localConsumer = theConsumer;
if (localConsumer == null) {
boolean debugging = <wrap in doprivleged get property("awt.consumer.debug")>
if (debugging) {
System.err.println("theConsumer is null - did you remove the consumer before production is complete ?"); e.printStackTrace())
}
return
}
at the beginning of both methods
PS it occurred that on entry to produce() it can never be null so a check there may not ever show anything.

public void println(Object x) {
super.println(x);
if (x instanceof Throwable)
System.exit(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests should not use the System.exit() as it might affect the execution of other tests. It is better to set some flag and check it at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; this is fixed.

In code review mrserb pointed out we shouldn't use System.exit(1) as an error condition.

openjdk#13408 (comment)
In code review prrace pointed out:
> I'd capture "theConsumer" into a local var in each of the methods sendPixel() and produce() and check and use the local var

openjdk#13408 (comment)

This is actually my previous/original draft for JDK-4200096 ( 49a49ee ), but I shied way from this approach because I was afraid code reviewers would point out its changes are excessively invasive for the original complaint. (For ex: now OSIS supports multithreading, which nobody asked for. Should I re-add the `synchronized` method modifiers? (although they are theoretically no longer necessary, keeping them reduces the risk of unintended consequences...) Or if multithreaded support is worth keeping: should I at least add a unit test for it? These questions seemed like they're straying off-topic from the original bug.)
Re-adding the 'synchronized' keyword.

We can probably do without it now, but I'm just trying to minimize possible unintended side-effects of this PR. Nobody asked for improved multithreaded support. And this class is used to iterate over BufferedImages (which are already in memory), so (as long as the ImageConsumer isn't taking too long -- or blocking) this should be very fast.
Adding createAbstractImage() with a little documentation to clarify why we're creating an image the way we are.
Changing `runImageConsumerTest` to make sure we're explicitly testing a `OffScreenImageSource`.

By contrast: the `runImageDimensionTest` is more of an integration-style test that *happens* to test the `OffScreenImageSource`.
Make sure OffScreenImageSource#addConsumer doesn't throw a NPE if the argument is null.

The rationale here is:
The preexisting implementation wouldn't throw a NPE, so we shouldn't change that now.

(Or more specifically: the preexisting implementation *would* throw a NPE, but it would also catch it and print it to System.err. The caller wouldn't need to anticipate a NPE.)
@mickleness
Copy link
Contributor Author

Thanks for the thorough review.

To recap:

Yes, the original problem (as I understand it) has to do with listeners that detach mid-production.

I apologize if I failed to explain this up-front. I tried to describe the problem I set out to "solve" in the comments preceding the first unit test:

https://github.com/mickleness/jdk/blob/73e9f010b3c356c2cf405855f5a33f387e7bb7ee/test/jdk/sun/awt/image/OffScreenImageSource/bug4200096.java#L57-L91

I'd like to know what stack trace you see.

In the master openjdk branch this test fails as follows:

----------System.err:(29/2496)----------
java.lang.NullPointerException: Cannot invoke "java.awt.image.ImageConsumer.setProperties(java.util.Hashtable)" because "this.theConsumer" is null
java.lang.RuntimeException: java.lang.NullPointerException: Cannot invoke "java.awt.image.ImageConsumer.setProperties(java.util.Hashtable)" because "this.theConsumer" is null
	at bug4200096$1.println(bug4200096.java:49)
	at java.base/java.lang.Throwable$WrappedPrintStream.println(Throwable.java:785)
	at java.base/java.lang.Throwable.lockedPrintStackTrace(Throwable.java:684)
	at java.base/java.lang.Throwable.printStackTrace(Throwable.java:673)
	at java.base/java.lang.Throwable.printStackTrace(Throwable.java:660)
	at java.base/java.lang.Throwable.printStackTrace(Throwable.java:651)
	at java.desktop/sun.awt.image.OffScreenImageSource.produce(OffScreenImageSource.java:204)
	at java.desktop/sun.awt.image.OffScreenImageSource.addConsumer(OffScreenImageSource.java:66)
	at java.desktop/sun.awt.image.OffScreenImageSource.startProduction(OffScreenImageSource.java:80)
	at java.desktop/java.awt.image.FilteredImageSource.startProduction(FilteredImageSource.java:184)
	at java.desktop/sun.awt.image.ImageRepresentation.startProduction(ImageRepresentation.java:732)
	at java.desktop/sun.awt.image.ToolkitImage.addWatcher(ToolkitImage.java:221)
	at java.desktop/sun.awt.image.ToolkitImage.getWidth(ToolkitImage.java:110)
	at bug4200096.runImageDimensionTest(bug4200096.java:90)
	at bug4200096.main(bug4200096.java:53)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138)
	at java.base/java.lang.Thread.run(Thread.java:1623)
Caused by: java.lang.NullPointerException: Cannot invoke "java.awt.image.ImageConsumer.setProperties(java.util.Hashtable)" because "this.theConsumer" is null
	at java.desktop/sun.awt.image.OffScreenImageSource.produce(OffScreenImageSource.java:186)
	... 12 more

So in this test: I'm not explicitly adding or removing an ImageConsumer or ImageObserver. My implementation of ImageObserver#imageUpdate is returning false once it received the dimensions. Returning false prompts something else to remove an ImageConsumer on my behalf, which resulted in the NPE in System.err when the OffScreenImageSource proceeded to keep producing the image.

I just pushed a few revisions that, among other things, pass in the ImageConsumer as an argument as you suggested. (But they also still constantly check isConsumer(ImageObserver)

mickleness added a commit to mickleness/jdk that referenced this pull request Apr 11, 2023
mrserb recommended against this in a separate PR

openjdk#13408 (comment)
mickleness added a commit to mickleness/jdk that referenced this pull request Apr 11, 2023
mrserb recommended against this in a separate PR

openjdk#13408 (comment)
mickleness added a commit to mickleness/jdk that referenced this pull request Apr 11, 2023
mrserb recommended against this in a separate PR

openjdk#13408 (comment)
@prrace
Copy link
Contributor

prrace commented Apr 11, 2023

I think I am starting to get the overall picture here.

The fix I cited (4905411) was essentially a fix for 4200096, since it prevented NPEs breaking the app
and was how they chose to avoid the constant null checking. Just catch the NPE and return.
The reasons for the printStackTrace() are not entirely clear. You'd know there something
removed the consumer but not where. FYI I added calls to dumpStack and ran your app
and the test program from 4905411 : java/awt/image/OffScreenImageTest/CropFilterTest.java
and the removal of the consumer can happen in multiple places for multiple reasons.
I can see why "handling" the null was more straightforward than hunting those down
and doing serious re-working of the code.

So

  • Perhaps all you really needed to do here was remove the printStackTrace, catching the NPE
    is a valid way to handle this if you want to bail as soon as the consumer is removed

  • The catch block means that anyone adding new callbacks in here doesn't have to also
    add yet another check. The difference maker would be if ending up with an NPE is really
    common here, and I don't know if it is. If you keep all the null checks then I think
    several in-line comments are warranted.

  • I'm back peddling on continuing with the captured variable. Its a behavioural change.

  • I don't understand the reason you are now allowing multiple consumers, nor what
    the consequences would be.

  • I note that you are now not allowing someone to add a null consumer.
    I'd wondered why there was no upfront check
    I think this is OK since it looks completely compatible - the null would have been
    immediately de-referenced and the exception thrown.

  • I would definitely keep the synchronized modifiers

Also I'm breaking out the unit tests (now there are 4 of them) into their own directory.

The new LegitimateNullPointerTest confirms that we DO want to see NullPointerExceptions printed to System.err IF they come from the ImageConsumer itself.

In this ticket most of our focus has been on the NPE's that stem from removing an ImageConsumer from OSIS mid-production (so the NPE was when OSIS tried to interact with its `theConsumer` field). This test tries to add other possible NPE's to our consideration.

This new test passed in the master branch before this branch. I want to preserve this existing behavior if this proposal is accepted.
@mickleness
Copy link
Contributor Author

I don't like the idea of removing e.printStackTrace(). For every case we've discussed so far: I agree that sounds good/harmless. But if the ImageConsumer is buggy and ends up throwing its own NullPointerException: then I'd argue developers need that NPE printed to the console so they can see what's going on.

I'll add a new (4th) unit test for this condition: I DO expect a NPE from the ImageConsumer to be printed to System.err.

I think what I'm hearing (please correct me if I'm wrong) is: constantly confirming the ImageConsumer is still attached to the OffScreenImageSource hurts readability. So I'll propose a new approach for your consideration shortly.

@prrace
Copy link
Contributor

prrace commented Apr 12, 2023

I don't like the idea of removing e.printStackTrace(). For every case we've discussed so far: I agree that sounds good/harmless. But if the ImageConsumer is buggy and ends up throwing its own NullPointerException: then I'd argue developers need that NPE printed to the console so they can see what's going on.

I'll add a new (4th) unit test for this condition: I DO expect a NPE from the ImageConsumer to be printed to System.err.

Yes, that's a good point. Just check if consumer is null and if it isn't, then the NPE can't be from de-referemcing
that and should be reported.

Also note my suggestion from yesterday that the NPE could still be reported if a system property is set.

If you prefer to have more discussion to agreement/clarification before coding up something just to find we revise it again , or if you prefer to code it up and ask "what do you think about this ? ", so there's clarity on what you mean, either is fine.

I don't like that we're *expecting* a NullPointerException to get thrown in the course of an allowed use case, but that's probably just a preferences on my part. More importantly: this branch passes all new unit tests. Plus it's readable and easy to code review.

This is related to the latest round of PR feedback here:
openjdk#13408 (comment)
@mickleness
Copy link
Contributor Author

I just pushed an update that uses the null-check for theConsumer, but it otherwise still expects the NPE.

Regarding the System property / debugger message:

If we established that removing the listener before production is complete is an OK thing to do, then do you still want this debugger output? (Or to put that another way: do you think that will really get used?) I don't have a lot of experience with debugger output from sun classes, so if you think it's helpful I'll take your word for it.

@@ -175,7 +175,7 @@ else if (cm instanceof DirectColorModel) {
scanline[x] = image.getRGB(x, y);
}
theConsumer.setPixels(0, y, width, 1, newcm, scanline, 0,
width);
width);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we revert the white space changes ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done

Copy link
Contributor

@prrace prrace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean before pushing.

@openjdk
Copy link

openjdk bot commented Apr 13, 2023

⚠️ @mickleness the full name on your profile does not match the author name in this pull requests' HEAD commit. If this pull request gets integrated then the author name from this pull requests' HEAD commit will be used for the resulting commit. If you wish to push a new commit with a different author name, then please run the following commands in a local repository of your personal fork:

$ git checkout JDK-4200096
$ git commit --author='Preferred Full Name <you@example.com>' --allow-empty -m 'Update full name'
$ git push

@openjdk
Copy link

openjdk bot commented Apr 13, 2023

@mickleness This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

4200096: OffScreenImageSource.removeConsumer NullPointerException

Reviewed-by: prr, serb

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 510 new commits pushed to the master branch:

  • 8d696ae: 8306575: Clean up and open source four Dialog related tests
  • 9ed456f: 8306634: Open source AWT Event related tests
  • b2240bf: 8304696: Duplicate class names in dynamicArchive tests can lead to test failure
  • cb158ff: 8296153: Bump minimum boot jdk to JDK 20
  • 117c5b1: 8279216: Investigate implementation of premultiplied alpha in the Little-CMS 2.13
  • 723037a: 8298048: Combine CDS archive heap into a single block
  • d518dbf: 8306440: Rename PSS:_num_optional_regions to _max_num_optional_regions
  • 9cd5741: 8306436: Rename PSS*:_n_workers to PSS*:_num_workers
  • 6e77e14: 8306456: Don't leak _worklist's memory in PhaseLive::compute
  • be6031b: 8303703: Add support of execution tests using virtual thread factory jtreg plugin
  • ... and 500 more: https://git.openjdk.org/jdk/compare/98a7a60fcb7d1efdba60438df3c468f5320fb64c...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@prrace, @mrserb) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 13, 2023
prrace pointed out this branch introduced some pointless whitespace changes.
openjdk#13408 (comment)
@mickleness
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Apr 20, 2023
@openjdk
Copy link

openjdk bot commented Apr 20, 2023

@mickleness
Your change (at version fd751f5) is now ready to be sponsored by a Committer.

e.printStackTrace();
// If theConsumer is null and we throw a NPE when interacting with it:
// That's OK. That is an expected use case that can happen when an
// ImageConsumer detaches itself from this ImageProducer mid-production.

if (theConsumer != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need here and a few lines above save the theConsumer to the local, then check to null, then call imageComplete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is:

This ticket focuses on the use case where an ImageConsumer detaches itself from this OffScreenImageSource mid-production by calling removeConsumer(ImageConsumer). So in this situation: I'd argue no, the ImageConsumer should not get an imageComplete(int) notification. Because it opted to remove itself.

Does this answer your question, or are you considering a different use case?

(Note: addConsumer is synchronized, so it's usually reasonable to assume the field OffScreeImageSource#theConsumer acts mostly like a local variable. I could contrive a new failing test case if one ImageConsumer called addConsumer(newDifferentConsumer) while receiving a notification, but that's a separate discussion. It's beyond the scope of this ticket, and (to my knowledge) it's a contrived edge case that nobody has ever actually complained about in the real world. And it would fail with or without the changes in this PR...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this answer your question, or are you considering a different use case?

Yes, if it is safe to assume that it is not possible to change the theConsumer by a different thread.

@prrace
Copy link
Contributor

prrace commented May 3, 2023

/sponsor

@openjdk
Copy link

openjdk bot commented May 3, 2023

Going to push as commit 63cd0a3.
Since your change was applied there have been 683 commits pushed to the master branch:

  • db8b3cd: 8305963: Typo in java.security.Security.getProperty
  • dcb2f3f: 8306320: BufferedImage spec needs clarification w.r.t its implementation of the WritableRenderedImage interface
  • 1487477: 8305815: Update Libpng to 1.6.39
  • 705ad7d: 8306014: Update javax.net.ssl TLS tests to use SSLContextTemplate or SSLEngineTemplate
  • 3930709: 8068925: Add @OverRide in javax.tools classes
  • fc76687: 8306836: Remove pinned tag for G1 heap regions
  • ccf91f8: 8306933: C2: "assert(false) failed: infinite loop" failure
  • e9807a4: 8306042: C2: failed: Missed optimization opportunity in PhaseCCP (adding LShift->Cast->Add notification)
  • fcb280a: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity
  • 891530f: 8307005: Make CardTableBarrierSet::initialize non-virtual
  • ... and 673 more: https://git.openjdk.org/jdk/compare/98a7a60fcb7d1efdba60438df3c468f5320fb64c...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 3, 2023
@openjdk openjdk bot closed this May 3, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels May 3, 2023
@openjdk
Copy link

openjdk bot commented May 3, 2023

@prrace @mickleness Pushed as commit 63cd0a3.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client client-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants