Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable open RC4 encrypted pdf #520

Closed
bilabar opened this issue Sep 16, 2023 · 17 comments
Closed

Unable open RC4 encrypted pdf #520

bilabar opened this issue Sep 16, 2023 · 17 comments

Comments

@bilabar
Copy link

bilabar commented Sep 16, 2023

File "/Users/xxx/dev/get_pdf.py", line 151, in main
  with Pdf.open(pdf, password=password) as pdf:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/xxx/.pyenv/versions/3.11.0/lib/python3.11/site-packages/pikepdf/_methods.py", line 887, in open
  pdf = Pdf._open(
        ^^^^^^^^^^
RuntimeError: unable to load openssl legacy provider

OS: MacOS Catalina
OpenSSL: LibreSSL 2.8.3
Python: 3.11.0
PikePDF: 8.4.1

@jbarlow83
Copy link
Member

How was pikepdf installed? PyPI or some other source?

@jbarlow83
Copy link
Member

Best bet is probably to change macOS build config to

-DUSE_IMPLICIT_CRYPTO=0 -DREQUIRE_CRYPTO_NATIVE=1

since presumably the macOS OpenSSL-LibreSSL implementation doesn't have/want obsolete crypto like RC4, and the easiest alternative is to compile QPDF's native crypto to ensure legacy support.

@bilabar
Copy link
Author

bilabar commented Sep 16, 2023

How was pikepdf installed? PyPI or some other source?

I install with pip install pikepdf

@bilabar
Copy link
Author

bilabar commented Sep 16, 2023

I try to building from source, but it is too slow. So, I check if older version still can do, since I already installed in the past and ran my code without error. Latest version that still works is 7.2.0. Since version 8, it returns error like above. Maybe there is change in build system.

@jbarlow83
Copy link
Member

The older binary wheels will have different binaries included.

I will probably need to change the build script for macOS.

@jbarlow83
Copy link
Member

I could not reproduce this on macOS Ventura with a build against Homebrew's qpdf or with the most recent wheel.

Note that Catalina is 10.15. The wheel only claims support for 11.0 and newer, so it may not link old libraries correctly.
pikepdf-8.4.1-cp311-cp311-macosx_11_0_arm64.whl

Since Catalina is not supported by Homebrew or Apple anymore I'm going to have to close this as something I can't support. If it shows up in newer OS versions I will definitely check.

@sihil
Copy link

sihil commented Dec 19, 2023

Hi @jbarlow83 - I'm seeing this on an Intel Mac with macOS 13.4.1.

Same error. Also started with 8.0.0 and later of pikepdf, 7.2.0 works for me; both installed using the provided binary wheels.

I also came across this blog post from @cfcurtis https://www.pdfstitcher.org/intel-mac-issue/ which indicated it might be constrained to Intel macs on Ventura (although 14/Sonoma has subsequently been released).

I'm going to stick with 7.2.0 for now, but please let me know if I can assist with debugging this or testing any fixes.

@jbarlow83
Copy link
Member

The Intel macOS wheels pass the test suite
https://github.com/pikepdf/pikepdf/actions/runs/7237641719/job/19717537597
which checks for this issue in
tests/test_encrypt.py::test_encrypt_basic, among others.
Those tests save and load every type of encryption, creating encrypted files on the fly and using them for the test.

I have not seen pikepdf's test suite fail for this issue. cibuildwheel does create Python wheels, and then creates a new virtual environment and runs the tests there, so it should be capable. The ARM wheels are built on Cirrus CI.

Now, I have seen this issue show up... on the ocrmypdf build, also Intel Mac, although lately it's passing again.
https://github.com/ocrmypdf/OCRmyPDF/actions/runs/7139114746/job/19496180061

I suspect it may have to do with the homebrew versions of libqpdf and openssl, which is the encryption provider. Upgrading all of your homebrew things and pikepdf to 8.10.1. I'd be curious if you're able to install the pikepdf test suite and get it to fail.

@sihil
Copy link

sihil commented Dec 19, 2023

Thanks for taking the time to reply.

My understanding of how wheels work has been somewhat broken by further experimentation. I updated my brew (which was more out of date than I expected, perhaps a couple of months), log below.

I then set about running the pike tests (apologies if this wasn't what you intended):

  • cloning pikepdf
  • creating/activating a virtualenv (3.10.12)
  • running pip install .
  • running `pip install -e .[test]
  • running pytest -n auto (all pass)

I then changed the dependency in the project I was having problems with earlier back to the latest version of pikepdf. Now the tests that were failing with this error earlier all pass.

That implies to me that the wheel has some external dependency on some tool, but I expected the wheel to include all of the binary dependencies (essentially acting as statically linked).

Unfortunately it would be quite painful to back track from here. I tried downgrading qpdf again but the project's tests still pass.

==> Upgrading 64 outdated packages:
mesa 22.3.6_1 -> 23.3.1
pyenv 2.3.30 -> 2.3.35
libheif 1.16.2 -> 1.17.5
austin 3.5.0 -> 3.6.0
ffmpeg 6.0_1 -> 6.0_2
libksba 1.6.4 -> 1.6.5
gpgme 1.22.0 -> 1.23.2
gh 2.34.0 -> 2.40.1
pygments 2.16.1_1 -> 2.17.2
python-setuptools 68.2.2 -> 69.0.2
frei0r 2.3.1 -> 2.3.2
postgresql@12 12.16 -> 12.17_1
gnu-getopt 2.39.2 -> 2.39.3
mpfr 4.2.0-p12 -> 4.2.1
assimp 5.2.5 -> 5.3.1
libavif 1.0.1 -> 1.0.3
shared-mime-info 2.2 -> 2.4
python@3.12 3.12.0 -> 3.12.1
glib 2.78.1 -> 2.78.3
readline 8.2.1 -> 8.2.7
netpbm 11.02.03 -> 11.02.06
aom 3.7.0 -> 3.8.0
qt 6.5.1_3 -> 6.6.1
img2pdf 0.5.0 -> 0.5.1_1
libde265 1.0.12 -> 1.0.14
awscli 2.13.26 -> 2.15.2
nss 3.93 -> 3.96.1
node-build 4.9.121 -> 4.9.133
mujs 1.3.3 -> 1.3.4
sqlite 3.44.0 -> 3.44.2
svt-av1 1.7.0 -> 1.8.0
libgcrypt 1.10.2 -> 1.10.3
moreutils 0.67 -> 0.68
openjdk 20.0.2 -> 21.0.1
curl 8.4.0 -> 8.5.0
librist 0.2.7_4 -> 0.2.10
pngquant 2.18.0 -> 3.0.3
openssl@3 3.1.4 -> 3.2.0_1
ca-certificates 2023-08-22 -> 2023-12-12
gettext 0.22.3 -> 0.22.4
librsvg 2.56.3 -> 2.57.1
graphviz 8.1.0 -> 9.0.0
dav1d 1.2.1 -> 1.3.0
mupdf 1.22.2_1 -> 1.23.7
oniguruma 6.9.8 -> 6.9.9
ffmpeg@4 4.4.4 -> 4.4.4_1
openssl@1.1 1.1.1v -> 1.1.1w
python-cryptography 41.0.5 -> 41.0.7
python@3.10 3.10.13 -> 3.10.13_1
git-crypt 0.7.0 -> 0.7.0_1
ocrmypdf 15.4.3 -> 16.0.0
six 1.16.0_3 -> 1.16.0_4
libomp 17.0.2 -> 17.0.6
coursier/formulas/coursier 2.1.6 -> 2.1.8
jq 1.7 -> 1.7.1
poppler 23.10.0 -> 23.12.0
libxrandr 1.5.3 -> 1.5.4
python@3.11 3.11.6 -> 3.11.6_1
homebrew install  -> 11.6.4
imagemagick 7.1.1-19 -> 7.1.1-23
jasper 4.0.0 -> 4.1.1
git 2.42.0 -> 2.43.0
python@3.9 3.9.18 -> 3.9.18_1
gnutls 3.8.1 -> 3.8.2

@jbarlow83
Copy link
Member

jbarlow83 commented Dec 19, 2023

What you tested seems fine.

Wheels include the binary dependencies necessary for execution, apart from binaries that are always available on the target system. So nothing in /usr/lib, /Library, /System gets bundled, but homebrew-derived libraries do. That way you don't have to install Homebrew to use Python wheels on a Mac.

You can use wheel unpack to see what .dylib files included in the wheel.

pikepdf 8.8.0 on PyPI has: (bad version)

libcrypto.3.dylib libjpeg.8.3.2.dylib libqpdf.29.6.3.dylib libssl.3.dylib 

pikepdf 8.10.1 has: (good version)

libcrypto.3.dylib libjpeg.8.3.2.dylib libqpdf.29.6.4.dylib

It looks like the inclusion of libssl.3 is the problem. For whatever reason (no relevant build changes on my end), pikepdf 8.8.0 bundled a broken libssl, presumably with legacy providers switched off. pikepdf 8.10.1 does not bundle this library, so some other binary takes over. You have both openssl@1.1 and @3.3 installed; I don't know how priority is settled. It could also be that the system openssl-libressl does the job instead.

If you're able to explore further https://github.com/matthew-brett/delocate is the program that gets used to decide what binaries go in a wheel. (I don't have an Intel Mac.) That would answer where a given pikepdf is getting its dylibs from.

How this happened, I still don't know. I check qpdf pikepdf delocate and Homebrew's openssl package, and didn't see any relevant changes. I've been tracking this issue for about two weeks since it broke my ocrmypdf build.

@bilabar
Copy link
Author

bilabar commented Dec 20, 2023

Just now, I test which version works in my machine:
8.3.0 - 8.5.2 is working
8.0 - 8.2, 8.6 - 8.8 return RuntimeError: unable to load openssl legacy provider
8.9 - 8.10 built for MacOS 12, so not working

OS Catalina
Python 3.10
Openssl from brew: 1.1, 3.1.2
Openssl from macport: 3.20

For now, my openssl pointing to macport version OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)
If I remember correctly, at the time I open issue, I already installed openssl from brew. But not sure why it still pointing to LibreSSL

@sihil
Copy link

sihil commented Dec 20, 2023

To clarify, I'm fairly confident I originally had the issue with 8.10.1 (I've been migrating an app from PyPDF2 to PikePDF and started with the latest). I then worked back through the versions and landed on 7.2.0 being the latest version that worked, with 8.0.0+ not working (I didn't try all versions, just a binary style search).

My IntelliJ IDEA local history backs up my recollection of the original version I'd tested against.

A quick test shows that

  • 8.8.0 is indeed reproducible as broken
  • 8.10.1 now seems to work

I've reinstalled older versions of openssl@3 and openssl@1.1 under Homebrew and that hasn't recreated the problem.

Out of my depth now, but having unpacked the 8.10.1 wheel and looked at what they load there really shouldn't be any external dependencies that my homebrew versions would impact. I'm going to ask colleagues to see if they can reproduce.

❯ otool -l libqpdf.29.6.4.dylib | grep '/'
         name /DLC/pikepdf/.dylibs/libqpdf.29.6.4.dylib (offset 24)
         name /usr/lib/libz.1.dylib (offset 24)
         name @loader_path/libjpeg.8.3.2.dylib (offset 24)
         name @loader_path/libcrypto.3.dylib (offset 24)
         name /usr/lib/libc++.1.dylib (offset 24)
         name /usr/lib/libSystem.B.dylib (offset 24)
         path @loader_path/../lib (offset 12)
❯ otool -l libcrypto.3.dylib | grep '/'
         name /DLC/pikepdf/.dylibs/libcrypto.3.dylib (offset 24)
         name /usr/lib/libSystem.B.dylib (offset 24)
❯ otool -l libjpeg.8.3.2.dylib | grep '/'
         name /DLC/pikepdf/.dylibs/libjpeg.8.3.2.dylib (offset 24)
         name /usr/lib/libSystem.B.dylib (offset 24)
         path @loader_path/. (offset 12)

@sihil
Copy link

sihil commented Dec 20, 2023

A colleague has managed to reproduce the error on 8.10.1 on:

ProductName:	macOS
ProductVersion:	12.6.6
BuildVersion:	21G646

After he ran brew upgrade openssl the issue went away.

I then managed to reproduce the issue, on my machine, against 8.10.1 by downgrading my brew openssl to 3.1.4 (downloading and installing the formula direct) and ensuring I delete the new version (3.2.0_1). brew upgrade openssl makes the issue go away again.

I've run the test suite with the older version of openssl installed and it works just fine!

I suspect that packaging a newer libssl in the wheel might resolve the issue for everyone and that the short term workaround is to upgrade their brewed openssl.

For reference, to install openssl version 3.1.4 I:

  • downloaded the 3.1.4 formula
  • brew unlink openssl@3
  • brew install openssl@3.rb (the file from above)
  • rm -rf /usr/local/Cellar/openssl@3/3.2.0_1

@cfcurtis
Copy link

Thanks for the mention, I've completely ignored this problem and I have to admit I thought it was an issue with PyInstaller. I'm glad to hear you're making some headway! I don't have an Intel mac myself so it's been very challenging to debug, and when I tried to reproduce it on the GitHub action runners I couldn't get it consistently. I wonder if running brew upgrade openssl prior to PyInstaller will fix things as well...

@jbarlow83
Copy link
Member

I won't try to change how delocate decides what to include in the wheel -- "not my circus, not my monkeys". Especially deciding whether to include libssl which has security implications.

The solution seems to be to upgrade to openssl to 3.2.0_1 or newer. That's easy enough to ask of users.

If it keeps coming up I'll add a note to the error message that "This means your openssl is out of date". Another option in the bag of tricks is to switch to gnutls - I'm avoiding that because it will make the macOS build more complex.

@cfcurtis
Copy link

Unfortunately upgrading openssl did not work when bundling the app with PyInstaller. However, thanks for the tip on v7.2.0 - I will try rolling back the pikepdf version and see if that helps.

jbarlow83 added a commit that referenced this issue Dec 30, 2023
Users reported trouble with open legacy encrypted files on macOS
specifically, e.g. #520

It appears this is because we were using Homebrew's qpdf which is
currently linked against Homebrew's openssl, which is phasing out
legacy crypto and needs to be compiled in a special mode to enable it.
Understandable but inconvenient for us.

So now we build and link against our own libqpdf, which in turn is
linked against gnutls, which does not seem to have the same issues.
All of our Linux and macOS builds are doing the same thing now,
rather than being split on crypto provider.
@jbarlow83
Copy link
Member

@cfcurtis 8.11.1 replaces openssl with gnutls which should...hopefully.. fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants