Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support wkhtmltopdf v0.12.x #6489

Closed
Tracked by #6603
gbp opened this issue Aug 20, 2021 · 6 comments · Fixed by #6680
Closed
Tracked by #6603

Support wkhtmltopdf v0.12.x #6489

gbp opened this issue Aug 20, 2021 · 6 comments · Fixed by #6680

Comments

@gbp
Copy link
Member

gbp commented Aug 20, 2021

Some changes will be required to support the latest version of wkhtmltopdf it errors out and the alaveteli fallbacks back to the txt version.

See: #6435

@gbp
Copy link
Member Author

gbp commented Sep 14, 2021

MaDada have patched the app/controllers/request_controller.rb to add additional arguments to AlaveteliExternalCommand call

https://gitlab.com/madada-team/dada-core/-/issues/49
https://gitlab.com/madada-team/dada-core/-/merge_requests/49/diffs

@gbp gbp added this to the Framework Modernisation milestone Nov 1, 2021
@garethrees garethrees added the bug Breaks expected functionality label Nov 10, 2021
@garethrees
Copy link
Member

Just to move this conversation closer to the specific issue at hand…

Can we make France's patch in core? Might have to add a conditional that checks wkhtmltopdf --version so that we retain compatibility, but that's a boatload easier than replacing it right now.

Yeah can do that. I'm abit wary about using the enable-local-file-access option although I guess that is no difference that what v0.11 is currently doing.

via #6615 (comment)

I certainly think we should make conscious decisions about this kind of thing.

How could this be used as an attack vector though? How would someone inject some JS into contents that get converted to PDF via this command on Alaveteli? I don't have the answers to these without giving it more thought, but that's what we should understand to make a confident decision here.

Here's the upstream issue that introduced the change wkhtmltopdf/wkhtmltopdf#4536. It looks like we could minimise the access by using the --path flag.

@gbp
Copy link
Member Author

gbp commented Nov 10, 2021

Have finally got wkhtmltopdf installed working correctly on simple HTML files.

After applying the patch, when downloading requests as Zip the conversion hangs, when I ^C I get:

External Command: "/nix/store/sv879013g0gmc89d0vm37rxxrp5apn3d-wkhtmltopdf/bin/wkhtmltopdf --enable-local-file-access --load-media-error-handling ignore --load-error-handling skip --no-images /var/folders/xl/ck8q7h510mn65gn5dkwqnbqh0000gn/T/foihtml2pdf-input20211110-25768-yc1wpj.html /var/folders/xl/ck8q7h510mn65gn5dkwqnbqh0000gn/T/foihtml2pdf-output20211110-25768-1ck2i8j" exited abnormally
Loading page (1/2)
Could not convert info request 104 to PDF with command '/nix/store/sv879013g0gmc89d0vm37rxxrp5apn3d-wkhtmltopdf/bin/wkhtmltopdf /var/folders/xl/ck8q7h510mn65gn5dkwqnbqh0000gn/T/foihtml2pdf-input20211110-25768-yc1wpj.html /var/folders/xl/ck8q7h510mn65gn5dkwqnbqh0000gn/T/foihtml2pdf-output20211110-25768-1ck2i8j'

Running the command outside of Rails I see:

> /nix/store/sv879013g0gmc89d0vm37rxxrp5apn3d-wkhtmltopdf/bin/wkhtmltopdf --enable-local-file-access --load-media-error-handling ignore --load-error-handling skip --no-images /var/folders/xl/ck8q7h510mn65gn5dkwqnbqh0000gn/T/foihtml2pdf-input20211110-25768-yc1wpj.html out.pdf
Loading page (1/2)
[======>                                                     ] 11%

I hangs on 11% without timing out

@gbp
Copy link
Member Author

gbp commented Nov 10, 2021

Personally I don't see this as a blocker to #6615 or #6347 as this is currently broken in 0.39 and we successfully fallback to outputting to correspondence.txt instead.

@gbp
Copy link
Member Author

gbp commented Nov 10, 2021

Another issue, the patch assumes we're using wkhtmltopdf whereas using the HTML_TO_PDF_COMMAND configuration variable this can be set to any command. This other command won't necessary response to the same arguments.

In fact in the specs we stub the command to be /bin/cp and these specs will fail with the patch applied.

Nor can we currently set HTML_TO_PDF_COMMAND: .../wkhtmltopdf --enable-local-file-access ... as this would mean the command isn't found and require some refactoring to support this - see #6432.

Now, does anyone use a different html to pdf convertor? I have no idea.

I think we could announce in #6347 we are depreciating HTML_TO_PDF_COMMAND configuration variable, and say we are going to assume wkhtmltopdf is installed from #6603 onwards.

Once removed, UTILITY_SEARCH_PATH would be used to search for the wkhtmltopdf instead of the full path being hardcoded in the config/general.yml.

@garethrees
Copy link
Member

I hangs on 11% without timing out

Applying the French patch successfully generates the correspondence PDF on the Debian Bullseye VM.

I think we could announce in #6347 we are depreciating HTML_TO_PDF_COMMAND configuration variable, and say we are going to assume wkhtmltopdf is installed from #6603 onwards. Once removed, UTILITY_SEARCH_PATH would be used to search for the wkhtmltopdf instead of the full path being hardcoded in the config/general.yml.

Yeah, let's definitely do this! One minor change I'd make to this suggestion is to introduce the deprecation notice in #6347 and then remove it in 0.40.1.0 so that we can get this resolved sooner.

The gotcha here is how we support the different arguments required as we migrate OSs'. Might need to make this configurable at theme level to ease migration

# THEME

# Old versions
InfoRequest::PdfConverter.command = %w(/usr/local/bin/wkhtmltopdf)

# New versions
InfoRequest::PdfConverter.command = %w(/usr/local/bin/wkhtmltopdf --enable-local-file-access --load-media-error-handling etc)

# IN USE

# I know there's a better way of doing this but YSWIM…
command = InfoRequest::PdfConverter.command.first.dup
cmd = command.first
command.delete_at(0)
args = command

AlaveteliExternalCommand.run(cmd, args, tmp_input.path, tmp_output.path)

Personally I don't see this as a blocker to #6615 or #6347 as this is currently broken in 0.39

On WhatDoTheyKnow we still generate a PDF, but that appears to be because we use a custom binary. Not sure what the deal is here, but it's likely that this is borked in all other supported OSs.

This is a significant regression. These are incredibly useful for sending a human-readable form of the request to regulators, especially for Pro users where the request is private, and for exporting requests to share with others (though, perhaps slightly less necessary with #5608?).

In any case, we've decided to press ahead with getting #6615 merged, even with this issue. With the deprecation, we'll at least have a bit more freedom around fixing this.

gbp added a commit that referenced this issue Nov 11, 2021
As with other external commands we need are only going to support one
option (`wkhtmltopdf`). Doing so will make it easier to pass command
specific arguments to the tool in order to support later versions.

See: #6489
gbp added a commit that referenced this issue Nov 19, 2021
gbp added a commit that referenced this issue Nov 23, 2021
@gbp gbp closed this as completed in a747a19 Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants