Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api-minor] Add a jpx decoder based on OpenJPEG 2.5.2 #17946

Merged
merged 1 commit into from Apr 16, 2024

Conversation

calixteman
Copy link
Contributor

@calixteman calixteman commented Apr 15, 2024

The decoder is compiled in WASM:
https://github.com/mozilla/pdf.js.openjpeg

Fixes #17289, #17061, #16485, #13051, #6365, #4648, #12213.

@calixteman
Copy link
Contributor Author

/botio test

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/518a08770381d35/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/9c917df0ca2f57d/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/9c917df0ca2f57d/output.txt

Total script time: 1.15 mins

  • Unit tests: FAILED
  • Integration Tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://54.241.84.105:8877/9c917df0ca2f57d/reftest-analyzer.html#web=eq.log

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/518a08770381d35/output.txt

Total script time: 2.48 mins

  • Unit tests: FAILED
  • Integration Tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://54.193.163.58:8877/518a08770381d35/reftest-analyzer.html#web=eq.log

Copy link
Contributor

@timvandermeij timvandermeij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm positive about replacing our own decoder with OpenJPEG: it fixes a whole lot of issues, it reduces maintenance and the upstream project seems well-tested, widely used and more performing because it's written in C. I have looked at most of the code, but not everything in full detail yet because the code might change when these first comments are addressed. Overall this is really nice work so far!

src/core/openjpeg_decoder.js Outdated Show resolved Hide resolved
src/core/openjpeg_decoder.js Outdated Show resolved Hide resolved
src/core/openjpeg_decoder.js Outdated Show resolved Hide resolved
src/pdf.image_decoders.js Outdated Show resolved Hide resolved
test/unit/pdf.image_decoders_spec.js Outdated Show resolved Hide resolved
src/core/openjpeg_decoder.js Outdated Show resolved Hide resolved
src/core/openjpeg_decoder.js Outdated Show resolved Hide resolved
src/core/openjpeg_decoder.js Outdated Show resolved Hide resolved
src/core/openjpeg_decoder.js Outdated Show resolved Hide resolved
test/unit/pdf.image_decoders_spec.js Outdated Show resolved Hide resolved
@calixteman
Copy link
Contributor Author

/botio test

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/fcd0be7e6af1c55/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/a61426cf4d09fd4/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/fcd0be7e6af1c55/output.txt

Total script time: 0.96 mins

  • Unit tests: FAILED
  • Integration Tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://54.241.84.105:8877/fcd0be7e6af1c55/reftest-analyzer.html#web=eq.log

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/a61426cf4d09fd4/output.txt

Total script time: 3.25 mins

  • Unit tests: FAILED
  • Integration Tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://54.193.163.58:8877/a61426cf4d09fd4/reftest-analyzer.html#web=eq.log

test/unit/pdf.image_decoders_spec.js Outdated Show resolved Hide resolved
src/core/openjpeg_decoder.js Outdated Show resolved Hide resolved
@calixteman
Copy link
Contributor Author

/botio test

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/1edb483aecab386/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/3a9c099d983add0/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/1edb483aecab386/output.txt

Total script time: 0.86 mins

  • Unit tests: FAILED
  • Integration Tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://54.241.84.105:8877/1edb483aecab386/reftest-analyzer.html#web=eq.log

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/3a9c099d983add0/output.txt

Total script time: 2.59 mins

  • Unit tests: FAILED
  • Integration Tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://54.193.163.58:8877/3a9c099d983add0/reftest-analyzer.html#web=eq.log

@calixteman
Copy link
Contributor Author

/botio test

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/6b9cd46fa294c49/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/49552a9bba056f7/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/6b9cd46fa294c49/output.txt

Total script time: 26.82 mins

  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 54
  different first/second rendering: 2

Image differences available at: http://54.241.84.105:8877/6b9cd46fa294c49/reftest-analyzer.html#web=eq.log

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/49552a9bba056f7/output.txt

Total script time: 39.25 mins

  • Unit tests: Passed
  • Integration Tests: FAILED
  • Regression tests: FAILED
  different ref/snapshot: 20

Image differences available at: http://54.193.163.58:8877/49552a9bba056f7/reftest-analyzer.html#web=eq.log

@calixteman
Copy link
Contributor Author

/botio test

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/a5e13aa5dc488e0/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/cc8aadccf39d4bd/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/a5e13aa5dc488e0/output.txt

Total script time: 27.54 mins

  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 378
  different first/second rendering: 2

Image differences available at: http://54.241.84.105:8877/a5e13aa5dc488e0/reftest-analyzer.html#web=eq.log

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/cc8aadccf39d4bd/output.txt

Total script time: 39.29 mins

  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 345

Image differences available at: http://54.193.163.58:8877/cc8aadccf39d4bd/reftest-analyzer.html#web=eq.log

@calixteman calixteman merged commit 12c4119 into mozilla:master Apr 16, 2024
9 checks passed
@calixteman
Copy link
Contributor Author

/botio makeref

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_makeref from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/e499df6758716c2/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/b3d83cbb8825120/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/b3d83cbb8825120/output.txt

Total script time: 19.35 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Success

Full output at http://54.193.163.58:8877/e499df6758716c2/output.txt

Total script time: 24.39 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@timvandermeij
Copy link
Contributor

timvandermeij commented Apr 16, 2024

I also think this version of the patch is really nice; thank you for doing this!

I only have one question, which may or may not need a follow-up issue/PR depending on which guarantees we give about the image decoder package's API stability. Previously, from what I can gather, for all image decoders we basically guaranteed one public method was always available and served as the only entrypoint to the module, namely parse(data). In this PR we no longer do that for jpx.js only since that has become decode(data), and this might be considered an API-breaking change.

It could be argued that the previous APIs also weren't really well-defined because for example jpx.js exposed more public methods that look like they were intended to be private instead and that also weren't available in the other decoders. We don't really know if anyone uses them, but unlike for parse(data) a point can be made that those were never intended to be public anyway.

I think we could quite easily avoid any breakage of the "official" parse(data) API by renaming decode(data) to parse(data), to reduce the risk of issues being opened for this or it being regarded as an API-breaking change.

@calixteman @Snuffleupagus What do you think? Would it be a good idea to rename decode(data) to parse(data) for better API compatibility of the image decoders package?

@timvandermeij
Copy link
Contributor

timvandermeij commented Apr 16, 2024

Coming to think of this again, I actually think we should change this, not only to make the API equal to before but also because e.g. our JPX fuzzer relies on this API; see https://github.com/mozilla/pdf.js/blob/master/test/fuzz/jpx_image.fuzz.js#L21. Sorry for not noticing this earlier; I'll prepare a patch for that today and also try to extend the tests (since that will also help for e.g. changing the implementation of the other custom decoders we have).

@calixteman
Copy link
Contributor Author

I tend to overlook the non-Firefox use of pdf.js, sorry about that.
That said I'm fine with whatever name change.
About the fuzzing stuff: we didn't have any news from the people who added it... and I have no idea if it's because they didn't find anything or if it's because kind of dead...

@Snuffleupagus
Copy link
Collaborator

I think we could quite easily avoid any breakage of the "official" parse(data) API by renaming decode(data) to parse(data), to reduce the risk of issues being opened for this or it being regarded as an API-breaking change.

A simple renaming would unfortunately not be sufficient, since all the methods are now static, and we'd need to make additional changes in the code as well.

@timvandermeij
Copy link
Contributor

timvandermeij commented Apr 16, 2024

Yes, that is indeed a problem. I have written a test to assert that the expected public API is present, and that one now correctly fails on JpxImage, but now I'm wondering what we should do here because I don't see a way to get the API fully equal to before and the current version with static methods also makes more sense to me at least.

For example, previously parse(data) would expose the parsed data as public members on the JpxImage object, but now parse(data) directly returns the parsed data (which in my opinion is really better/easier). Moreover, making it a static method avoids the need to create an object, which previously also only really seemed necessary because we stored the parsed data on it.

Even if we were to revert to an object without static methods, it still wouldn't be a fully backwards-compatible change because we don't have e.g. the tiles member from

const tileCount = jpxImage.tiles.length;
anymore.

I'm not sure what's best here. I see a couple of options:

  • Decide that the current API is better and promote this change to [api-major]. In this case I do think we should update the other decoders to have a similar API, so with static methods, so that they are all in sync again (because having different APIs will be confusing). I'll also provide the extended unit test then so that from that point on the public API is documented and can't accidentally regress anymore.
  • Revert to non-static methods for JpxImage and re-expose the parsed data public members. However, technically this still breaks the API because I don't think OpenJPEG returns the tiles data like before and if people use the decoder they must rely on this in some way because tiles contained the actual image data...

I don't really see a point in doing the second option in combination with letting parse(data) return the data because then there is really no need to construct an object and the absence of the members would be equally API-breaking.

In short, all options are API-breaking, but some to a lesser extent. I'm wondering if for all of them [api-minor] is a good label here, and if we go to [api-major] we might as well bite the bullet and align the other decoders too while we're at it.

What do you think? I'm tempted to go for option one here myself, simply because I think the API from this patch is better than what we had before.

@Snuffleupagus
Copy link
Collaborator

Or maybe just do "nothing" here, and accept this as-is since our image-decoders never had "identical" shapes, given that we've made worse changes in api-minor patches before?
It's not really clear to me if we'll be able to fully make all the image-decoders identical without a bunch of annoying changes.

The question is really how much the stand-alone image_decoders are even used!?

@timvandermeij
Copy link
Contributor

timvandermeij commented Apr 16, 2024

That's a fair point. I also don't think there are many users, and besides I'm not sure if we ever really made any API guarantees for this code anyway. I'm probably overthinking the problem here, and we can always make further adjustments later if the need arises.

In that case I'll only make a patch to fix the fuzzer code because that is currently broken after this PR, and after that all code in this repository should work again. I do think the new JpxImage API is better in general, and even the decode name is probably an improvement over the previous parse name because we in fact decode the full image data but we only parse some basic image properties from the data blob (so https://github.com/mozilla/pdf.js/blob/master/src/core/jpx.js#L41 is named correctly in my opinion).

emilio added a commit to emilio/dwv that referenced this pull request Apr 19, 2024
I was trying to see an xray that I got recently (happy to email it
privately if it helps), which fails to decode on the demo with:

```
JPX Error: Unsupported COD options (selectiveArithmeticCodingBypass) jpx.js:339:23
```

So this tries to update to openjpeg via
mozilla/pdf.js#17946, which should support it.
However:

 * It's very WIP (the image decodes correctly on the worker, but I only
   see black on the canvas, however I see correct metadata).
 * I haven't updated all the references to the other decoders.
 * It loads the non-jpeg decoders as a module which I can't test, but
   probably breaks them. I also need to update the sync decoders, which
   likely doesn't work.

I don't think I'll have time to finish this, but posting it here in case
it saves someone time or someone has the time to finish it up.

Thanks.
emilio added a commit to emilio/dwv that referenced this pull request Apr 21, 2024
I was trying to see an xray that I got recently (happy to email it
privately if it helps), which fails to decode on the demo with:

```
JPX Error: Unsupported COD options (selectiveArithmeticCodingBypass) jpx.js:339:23
```

So this tries to update to openjpeg via
mozilla/pdf.js#17946, which should support it.
However:

 * It's very WIP (the image decodes correctly on the worker, but I only
   see black on the canvas, however I see correct metadata).
 * I haven't updated all the references to the other decoders.
 * It loads the non-jpeg decoders as a module which I can't test, but
   probably breaks them. I also need to update the sync decoders, which
   likely doesn't work.

I don't think I'll have time to finish this, but posting it here in case
it saves someone time or someone has the time to finish it up.

Thanks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PDF.js doesn't render large images in PDF that other renderers do
4 participants