Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF Filters #25

Merged
merged 19 commits into from
Apr 24, 2021
Merged

PDF Filters #25

merged 19 commits into from
Apr 24, 2021

Conversation

marioortizmanero
Copy link

@marioortizmanero marioortizmanero commented Apr 16, 2021

This adds:

  • Page::filter
  • Annotation::filter

And PdfFilterOptions in crate::pdf::filter to configure them.

What do you think about the names? Do you prefer filter or filter_contents?


Closes #24

@marioortizmanero marioortizmanero marked this pull request as draft April 16, 2021 13:52
@marioortizmanero
Copy link
Author

@messense can I get some help on how I should be wrapping the callbacks? The image_filter_callback function and PdfFilterOptions::[set_]image_filter in filter.rs obviously don't work and I'm not sure what's the best way to approach that.

@messense
Copy link
Owner

I'll take a look later.

@messense
Copy link
Owner

This article could be helpful: https://adventures.michaelfbryan.com/posts/rust-closures-in-ffi/

src/pdf/filter.rs Outdated Show resolved Hide resolved
@marioortizmanero
Copy link
Author

marioortizmanero commented Apr 17, 2021

I had read the article you linked before, but that only seems to be for functions with a void* parameter. In these cases can use that to pass your state around, but the mupdf functions I'm trying to implement don't. In that case I would have to use a global, but there's got to be a better solution. Although it doesn't have to be a global; the static variable could be contained just inside set_image_filter and could be initialized with once_cell or similars.

For reference, here are some usages of the functions I'm trying to implement in C. But none of these use external variables (I think).

Page filter:

  • source/pdf/pdf-clean.c: [1], which calls this or this
  • source/pdf/pdf-write.c: [2]

Annotations filter:

  • source/pdf/pdf-write.c: [1]

The docs don't help much either, and neither does the source code.

@marioortizmanero
Copy link
Author

marioortizmanero commented Apr 18, 2021

I've implemented an example of how it could work with a "global". Maybe we could use static mut instead of once_cell, as it's not accessed anywhere else, and I couldn't get Lazy::new() or WRAPPER.set(wrapper).unwrap() to work. But knowing that static mut and globals altogether are code smells and wildly unsafe, not sure if it's the way to go.

src/pdf/filter.rs Outdated Show resolved Hide resolved
src/pdf/filter.rs Outdated Show resolved Hide resolved

unsafe extern "C" fn image_filter_callback(
_ctx: *mut fz_context,
_opaque: *mut c_void,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the opaque argument should be passed into the callback? Otherwise it means we miss some context information in the callback.

In upstream it is used in some filters, for example https://github.com/ArtifexSoftware/mupdf/blob/a230d9c4255d4fefc6b3e71ba41447f5b0620e82/source/pdf/pdf-clean.c#L547

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where exactly should I pass that? I don't really understand why it's assigning its value to a pdf_page. And with the mupdf-rs bindings it's not like you can pass a pointer to void anywhere as a user.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to think that opaque is the void* we can use to pass our own parameters... But that would be bad naming... I've asked in their Discord and I'll update this once I know more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, opaque is meant to be used to pass the data around. Now that I've fixed that I've got one last doubt: perhaps we should be using the _ctx value instead of ignoring it? Not sure how that works.

@marioortizmanero
Copy link
Author

marioortizmanero commented Apr 19, 2021

Ok so the new implementation is closer to what we wanted and I've made sure it works, now using the void* parameter. It's definitely unsound, though, as I'm getting sporadic segmentation faults. This may be due to multiple things, i think:

  • Wrong implementation of the closure sharing part in set_image_filter: it's inspired on that blog post, but I haven't had time to make sure that's how it's supposed to work 100%
  • Maybe we need fz_keep_image when the image is left as it was (which would be just a Clone)? See this, for example I've implemented this but it doesn't seem to fix all the issues.
  • Maybe we need to use the context variable provided in the callback? Otherwise it's left as _ctx. I think this one isn't a problem because in the implementation it only seems to pass the ctx parameter around so that it's available when the callback is invoked.

Edit: the issue seems to be that the closure address is different in the callback compared to when it's set. Maybe I'm missing a std::pin::Pin?

@messense
Copy link
Owner

Sorry for lack of response recently, I'll take a look at the CI failure this weekend.

@messense
Copy link
Owner

Edit: the issue seems to be that the closure address is different in the callback compared to when it's set. Maybe I'm missing a std::pin::Pin?

Have you tried Box the callback?

@marioortizmanero
Copy link
Author

Have you tried Box the callback?

Yeah I tried to use a Pin + Box but that didn't work either. I'll try again this weekend with a different approach, maybe I did that incorrectly.

@messense
Copy link
Owner

running 1 test
test_filters-fe20f4266b4d7a04(89551,0x16d16f000) malloc: Incorrect checksum for freed object 0x149610fc8: probably modified after being freed.
Corrupt value: 0x0
test_filters-fe20f4266b4d7a04(89551,0x16d16f000) malloc: *** set a breakpoint in malloc_error_break to debug
error: test failed, to rerun pass '--test test_filters'

Caused by:
  process didn't exit successfully: `/Users/messense/Projects/mupdf-rs/target/debug/deps/test_filters-fe20f4266b4d7a04` (signal: 6, SIGABRT: process abort signal)

Looks like a double-free bug.

@messense
Copy link
Owner

Problem resolved, CI is passing now.

@marioortizmanero
Copy link
Author

Oh wow great job! I completely missed that. On my machine there's a test that does fail randomly, but I don't think that's related to this PR:

failures:

---- display_list::test::test_multi_threaded_display_list_search stdout ----
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
thread 'display_list::test::test_multi_threaded_display_list_search' panicked at 'called `Result::unwrap()` on an `Err` value: Any', src/display_list.rs:200:10


failures:
    display_list::test::test_multi_threaded_display_list_search

@marioortizmanero marioortizmanero marked this pull request as ready for review April 24, 2021 09:57
@messense messense changed the title [WIP] PDF Filters PDF Filters Apr 24, 2021
Copy link
Owner

@messense messense left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@marioortizmanero
Copy link
Author

marioortizmanero commented Apr 24, 2021

The only remaining question is this:

            // TODO: `context()` inside this function should probably use the
            // parameter's value instead of the global value, right?

But it doesn't seem to be a problem because the ctx used is the same as the original one. The callback just passes that to have it easily accessible, but we already have context() for that. Should I remove that then?

@messense messense merged commit 516dffc into messense:master Apr 24, 2021
@messense
Copy link
Owner

Oh wow great job! I completely missed that. On my machine there's a test that does fail randomly, but I don't think that's related to this PR:

failures:

---- display_list::test::test_multi_threaded_display_list_search stdout ----
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
thread 'display_list::test::test_multi_threaded_display_list_search' panicked at 'called `Result::unwrap()` on an `Err` value: Any', src/display_list.rs:200:10


failures:
    display_list::test::test_multi_threaded_display_list_search

Please open a new issue for this, thanks!

@messense
Copy link
Owner

The only remaining question is this:

            // TODO: `context()` inside this function should probably use the
            // parameter's value instead of the global value, right?

But it doesn't seem to be a problem because the ctx used is the same as the original one. The callback just passes that to have it easily accessible, but we already have context() for that. Should I remove that then?

I guess it's not a big deal, we can deal with it when problem does occur. 😅

@marioortizmanero
Copy link
Author

Ok! Thanks for the help and for maintaining this great library :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrap the PDF filters
2 participants