Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

ImportError: pikepdf's extension library failed to import #8

Closed
l4rm4nd opened this issue Aug 27, 2021 · 55 comments · Fixed by #23
Closed

ImportError: pikepdf's extension library failed to import #8

l4rm4nd opened this issue Aug 27, 2021 · 55 comments · Fixed by #23
Assignees

Comments

@l4rm4nd
Copy link

l4rm4nd commented Aug 27, 2021

linuxserver.io


Expected Behavior

Uploading a document into the paperless-ng web application should trigger an OCR process and the file should be available afterwards.

Current Behavior

The upload process stops at 'Upload complete, waiting...' and nothing happens. The web application itself is fully functional, also the Django admin backend. Just the upload process seems not to work at all.

Steps to Reproduce

  1. Log into the paperless-ng web application
  2. Upload a sample pdf file using the "drop documents here" or "browse file" button
  3. Observe that the file is not successfully uploaded or processed

Environment

OS:
Debian GNU/Linux 9 (stretch) - Raspberry Pi 4

CPU architecture:
arm32

How docker service was installed:
Via portainer, official repo from DockerHub

Docker logs

19:28:16 [Q] INFO recycled worker Process-1:1
19:28:16 [Q] INFO Process-1:5 ready for work at 381
19:28:19 [Q] INFO Enqueued 1
19:28:19 [Q] INFO Process-1:2 processing [example.pdf]
[pid: 347|app: 0|req: 7/7] 192.168.178.7 () {42 vars in 1669 bytes} [Fri Aug 27 19:28:19 2021] POST /api/documents/post_document/ => generated 4 bytes in 33 msecs (HTTP/1.1 200) 10 headers in 292 bytes (2 switches on core 0)
[2021-08-27 19:28:19,256] [INFO] [paperless.consumer] Consuming example.pdf
19:28:19 [Q] INFO Process-1:2 stopped doing work
19:28:19 [Q] ERROR Failed [example.pdf] - pikepdf's extension library failed to import : Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 13, in <module>
    from . import _qpdf
ImportError: /usr/local/lib/python3.8/dist-packages/pikepdf/_qpdf.cpython-38-arm-linux-gnueabihf.so: undefined symbol: _ZN20QPDFPageObjectHelper16placeFormXObjectE16QPDFObjectHandleRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_9RectangleEbbb
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
    res = f(*task["args"], **task["kwargs"])
  File "/app/paperless/src/documents/tasks.py", line 74, in consume_file
    document = Consumer().try_consume_file(
  File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
    document_parser.parse(self.path, mime_type, self.filename)
  File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse
    import ocrmypdf
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/__init__.py", line 10, in <module>
    from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in <module>
    import pikepdf
  File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 16, in <module>
    raise ImportError(_msg) from _e
ImportError: pikepdf's extension library failed to import
@github-actions
Copy link

Thanks for opening your first issue here! Be sure to follow the bug or feature issue templates!

@project-bot project-bot bot added this to To do in Issue & PR Tracker Aug 27, 2021
@henrygd
Copy link

henrygd commented Aug 29, 2021

I'm experiencing the same issue on Ubuntu 20.04.3 (aarch64)

@garret
Copy link

garret commented Sep 1, 2021

I am also having the same issue. Using this image on a raspberry pi 4B (4GB) with default raspberry pi os (32bit).

@zahnp4sta
Copy link

same on raspberry 4b 8GB with Ubuntu 21.04 64bit.

@Darlekesh
Copy link

Darlekesh commented Sep 14, 2021

I had the same issue on Ubuntu 20.04.1 aarch64 And I fixed it with connecting to the container and running this command
apt update && apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall
works also with newer version of pikepdf but I'm trying to be on the same version as provided.
Edit: verison of paperless-ng version-ng-1.5.0

@zahnp4sta
Copy link

@Darlekesh yup, that worked. thank you!

@l4rm4nd l4rm4nd closed this as completed Sep 16, 2021
Issue & PR Tracker automation moved this from To do to Done Sep 16, 2021
@henrygd
Copy link

henrygd commented Sep 16, 2021

Might want to leave this issue open until it's fixed in the actual image

@l4rm4nd l4rm4nd reopened this Sep 16, 2021
Issue & PR Tracker automation moved this from Done to PRs & in progress issues Sep 16, 2021
@ficusdragoneta
Copy link

@Darlekesh Unfortunately that did not work for me on Raspberry Pi 4B (4GB) with default Raspberry Pi OS (32bit).

@garret
Copy link

garret commented Sep 28, 2021

Since it has been taking so much time to resolve this issue, I reverted to the paperless-ng docker image offered by the official project. They luckily have an image for the rpi.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@vemek
Copy link

vemek commented Nov 8, 2021

This is still an issue. Building pikepdf wheel does not work on Docker for Mac (M1, aarch64):

Collecting pikepdf==2.16.1
  Downloading pikepdf-2.16.1.tar.gz (2.3 MB)
     |████████████████████████████████| 2.3 MB 1.7 MB/s
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting Pillow>=6.0
  Downloading Pillow-8.4.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.0 MB)
     |████████████████████████████████| 3.0 MB 37.0 MB/s
Collecting lxml>=4.0
  Downloading lxml-4.6.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_24_aarch64.whl (6.5 MB)
     |████████████████████████████████| 6.5 MB 6.7 MB/s
Building wheels for collected packages: pikepdf
  Building wheel for pikepdf (pyproject.toml) ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 /usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmpfayft38f
       cwd: /tmp/pip-install-0zsq3vrr/pikepdf_1d6b213d99e149d1a2c51e6a7d22ff7a
  Complete output (46 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-aarch64-3.8
  creating build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/_cpphelpers.py -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/_version.py -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/jbig2.py -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/objects.py -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/codec.py -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/_methods.py -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/_xml.py -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/__init__.py -> build/lib.linux-aarch64-3.8/pikepdf
  creating build/lib.linux-aarch64-3.8/pikepdf/models
  copying src/pikepdf/models/encryption.py -> build/lib.linux-aarch64-3.8/pikepdf/models
  copying src/pikepdf/models/outlines.py -> build/lib.linux-aarch64-3.8/pikepdf/models
  copying src/pikepdf/models/metadata.py -> build/lib.linux-aarch64-3.8/pikepdf/models
  copying src/pikepdf/models/image.py -> build/lib.linux-aarch64-3.8/pikepdf/models
  copying src/pikepdf/models/matrix.py -> build/lib.linux-aarch64-3.8/pikepdf/models
  copying src/pikepdf/models/__init__.py -> build/lib.linux-aarch64-3.8/pikepdf/models
  running egg_info
  writing src/pikepdf.egg-info/PKG-INFO
  writing dependency_links to src/pikepdf.egg-info/dependency_links.txt
  writing requirements to src/pikepdf.egg-info/requires.txt
  writing top-level names to src/pikepdf.egg-info/top_level.txt
  reading manifest file 'src/pikepdf.egg-info/SOURCES.txt'
  adding license file 'LICENSE.txt'
  adding license file 'licenses/license.wheel.txt'
  writing manifest file 'src/pikepdf.egg-info/SOURCES.txt'
  copying src/pikepdf/_qpdf.pyi -> build/lib.linux-aarch64-3.8/pikepdf
  copying src/pikepdf/py.typed -> build/lib.linux-aarch64-3.8/pikepdf
  running build_ext
  building 'pikepdf._qpdf' extension
  creating build/temp.linux-aarch64-3.8
  creating build/temp.linux-aarch64-3.8/src
  creating build/temp.linux-aarch64-3.8/src/qpdf
  aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/object.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/object.o -fvisibility=hidden -g0 -std=c++14
  aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/annotation.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/annotation.o -fvisibility=hidden -g0 -std=c++14
  aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/object_convert.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/object_convert.o -fvisibility=hidden -g0 -std=c++14
  aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/object_repr.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/object_repr.o -fvisibility=hidden -g0 -std=c++14
  aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/page.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/page.o -fvisibility=hidden -g0 -std=c++14
  aarch64-linux-gnu-gcc: fatal error: Killed signal terminated program cc1plus
  compilation terminated.
  aarch64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/tmp/pip-build-env-1ierem4r/overlay/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c src/qpdf/parsers.cpp -o build/temp.linux-aarch64-3.8/src/qpdf/parsers.o -fvisibility=hidden -g0 -std=c++14
  error: command 'aarch64-linux-gnu-gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for pikepdf
Failed to build pikepdf
ERROR: Could not build wheels for pikepdf, which is required to install pyproject.toml-based projects

@pblgomez
Copy link

Same error here still

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@RefineryX
Copy link

This is still an issue, tried installing last night on Raspberry Pi 4 4gb.

@rmiddle
Copy link

rmiddle commented Dec 25, 2021

Also seeing the same issue. It looks like a task fails due to memory issue on first boot. Just relaunch again fresh, and it looks like it didn't crash on it first boot. (I removed all storage before restarting)

07:21:01 [Q] INFO Process-1:8 ready for work at 428
[2021-12-25 07:22:27 +0000] [364] [CRITICAL] WORKER TIMEOUT (pid:394)
[2021-12-25 07:22:27 +0000] [364] [WARNING] Worker with pid 394 was terminated due to signal 6
07:23:53 [Q] INFO Enqueued 1

Attempting to import an .pdf and getting errors.

07:24:01 [Q] INFO Process-1:7 stopped doing work
07:24:01 [Q] ERROR Failed [2020_TaxReturn.pdf] - pikepdf's extension library failed to import : Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pikepdf/init.py", line 13, in
from . import _qpdf
ImportError: /usr/local/lib/python3.8/dist-packages/pikepdf/_qpdf.cpython-38-aarch64-linux-gnu.so: undefined symbol: _ZN20QPDFPageObjectHelper16placeFormXObjectE16QPDFObjectHandleRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_9RectangleEbbb
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
res = f(*task["args"], **task["kwargs"])
File "/app/paperless/src/documents/tasks.py", line 74, in consume_file
document = Consumer().try_consume_file(
File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
document_parser.parse(self.path, mime_type, self.filename)
File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse
import ocrmypdf
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/init.py", line 10, in
from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in
import pikepdf
File "/usr/local/lib/python3.8/dist-packages/pikepdf/init.py", line 16, in
raise ImportError(_msg) from _e
ImportError: pikepdf's extension library failed to import
07:24:01 [Q] INFO recycled worker Process-1:7
07:24:01 [Q] INFO Process-1:9 ready for work at 457
07:25:07 [Q] INFO Process-1:8 processing [bulldog-blossom-chicken-may]
07:25:07 [Q] INFO Enqueued 1
[2021-12-25 07:25:07,541] [INFO] [paperless.consumer] Consuming 2020_TaxReturn.pdf
07:25:15 [Q] INFO Process-1:8 stopped doing work
07:25:15 [Q] ERROR Failed [bulldog-blossom-chicken-may] - pikepdf's extension library failed to import : Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pikepdf/init.py", line 13, in
from . import _qpdf
ImportError: /usr/local/lib/python3.8/dist-packages/pikepdf/_qpdf.cpython-38-aarch64-linux-gnu.so: undefined symbol: _ZN20QPDFPageObjectHelper16placeFormXObjectE16QPDFObjectHandleRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_9RectangleEbbb
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
res = f(*task["args"], **task["kwargs"])
File "/app/paperless/src/documents/tasks.py", line 74, in consume_file
document = Consumer().try_consume_file(
File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
document_parser.parse(self.path, mime_type, self.filename)
File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse
import ocrmypdf
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/init.py", line 10, in
from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in
import pikepdf
File "/usr/local/lib/python3.8/dist-packages/pikepdf/init.py", line 16, in
raise ImportError(_msg) from _e
ImportError: pikepdf's extension library failed to import

@akashkj
Copy link

akashkj commented Dec 26, 2021

getting issue on arm64 as of today

2021-12-26T17:50:20.036147514Z ImportError: pikepdf's extension library failed to import
2021-12-26T17:50:20.036145274Z     raise ImportError(_msg) from _e
2021-12-26T17:50:20.036149874Z 
2021-12-26T17:50:20.023669087Z 17:50:20 [Q] INFO Process-1:12 stopped doing work
2021-12-26T17:50:20.036142674Z   File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 16, in <module>
2021-12-26T17:50:20.036137714Z   File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in <module>
2021-12-26T17:50:20.036140474Z     import pikepdf
2021-12-26T17:50:20.036135234Z     from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
2021-12-26T17:50:20.036132634Z   File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/__init__.py", line 10, in <module>
2021-12-26T17:50:20.036127834Z   File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse
2021-12-26T17:50:20.036130314Z     import ocrmypdf
2021-12-26T17:50:20.036125394Z     document_parser.parse(self.path, mime_type, self.filename)
2021-12-26T17:50:20.036122834Z   File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
2021-12-26T17:50:20.036120474Z     document = Consumer().try_consume_file(
2021-12-26T17:50:20.036110554Z Traceback (most recent call last):
2021-12-26T17:50:20.036112954Z   File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
2021-12-26T17:50:20.036115514Z     res = f(*task["args"], **task["kwargs"])
2021-12-26T17:50:20.036117994Z   File "/app/paperless/src/documents/tasks.py", line 74, in consume_file

@HydrelioxGitHub
Copy link

Same issue here :

18:05:05 [Q] INFO recycled worker Process-1:1

18:05:05 [Q] INFO Process-1:5 ready for work at 453

[2021-12-26 18:06:56,329] [INFO] [paperless.management.consumer] Adding /data/consume/201905.pdf to the task queue.

18:06:56 [Q] INFO Enqueued 1

18:06:56 [Q] INFO Process-1:2 processing [201905.pdf]

[2021-12-26 18:06:56,630] [INFO] [paperless.consumer] Consuming 201905.pdf

18:06:57 [Q] INFO Process-1:2 stopped doing work

18:06:57 [Q] ERROR Failed [201905.pdf] - pikepdf's extension library failed to import : Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 13, in <module>

    from . import _qpdf

ImportError: /usr/local/lib/python3.8/dist-packages/pikepdf/_qpdf.cpython-38-arm-linux-gnueabihf.so: undefined symbol: _ZN20QPDFPageObjectHelper16placeFormXObjectE16QPDFObjectHandleRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_9RectangleEbbb


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker

    res = f(*task["args"], **task["kwargs"])

  File "/app/paperless/src/documents/tasks.py", line 74, in consume_file

    document = Consumer().try_consume_file(

  File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file

    document_parser.parse(self.path, mime_type, self.filename)

  File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse

    import ocrmypdf

  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/__init__.py", line 10, in <module>

    from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo

  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in <module>

    import pikepdf

  File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 16, in <module>

    raise ImportError(_msg) from _e

ImportError: pikepdf's extension library failed to import

@jemelo
Copy link

jemelo commented Jan 16, 2022

I had this same issue, and after some testing and with the help of previous comments here, the following commands worked for me on rpi 4GB
sudo apt-get install libxml2-dev libxslt-dev python-dev
sudo apt-get install libjpeg-dev zlib1g-dev
pip install wheel
apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall

@darkmattercoder
Copy link

As @vemek stated, it is not building on Apple Silicon (M1).

@kurosch
Copy link

kurosch commented Jan 21, 2022

@vemek @darkmattercoder

Could you try the following:

apt-get install libxml2-dev libxslt-dev python3-dev libjpeg-dev zlib1g-dev build-essential cython3 -y
python3 -m pip install wheel
python3 -m pip install lxml --force-reinstall
python3 -m pip install Pillow --force-reinstall
python3 -m pip install pikepdf --force-reinstall

I was getting similar issues on my raspberry pi 3. The above steps fixed it for me.

@AnomalieXB-6783746
Copy link

Hey, I'm facing an issue that I suspect to be at least related to this one if not the same.

My Setup:

  • Raspberry-Pi 3B+
  • Paperless-ng in Docker over docker-compose with the following configuration:
      image: lscr.io/linuxserver/paperless-ng
      container_name: paperless
      environment:
        - DOCKER_MODS=linuxserver/mods:papermerge-multilangocr
        - PUID=1000
        - PGID=1000
        - PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title}
        - OCRLANG=deu
        - PAPERLESS_OCR_LANGUAGE=deu
        - PAPERLESS_TASK_WORKERS=2
        - PAPERLESS_THREADS_PER_WORKER=1
      volumes:
        - /etc/localtime:/etc/localtime:ro
        - ./paperless/config:/config
        - ./paperless/data:/data
      ports:
        - 8000:8000```
    

What I am facing is that on upload of a file the whole process of processing the file stops because of an error.

The corresponding Log shows:

[2022-04-03 12:05:39,603] [INFO] [paperless.consumer] Consuming some.pdf
paperless     | 12:05:40 [Q] INFO Process-1:7 stopped doing work
paperless     | 12:05:40 [Q] ERROR Failed [some.pdf] - libxslt.so.1: cannot open shared object file: No such file or directory : Traceback (most recent call last):
paperless     |   File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
paperless     |     res = f(*task["args"], **task["kwargs"])
paperless     |   File "/app/paperless/src/documents/tasks.py", line 70, in consume_file
paperless     |     document = Consumer().try_consume_file(
paperless     |   File "/app/paperless/src/documents/consumer.py", line 245, in try_consume_file
paperless     |     document_parser.parse(self.path, mime_type, self.filename)
paperless     |   File "/app/paperless/src/paperless_tesseract/parsers.py", line 237, in parse
paperless     |     import ocrmypdf
paperless     |   File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/__init__.py", line 10, in <module>
paperless     |     from ocrmypdf import helpers, hocrtransform, pdfa, pdfinfo
paperless     |   File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 23, in <module>
paperless     |     import pikepdf
paperless     |   File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 55, in <module>
paperless     |     from .models import (
paperless     |   File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/__init__.py", line 20, in <module>
paperless     |     from .metadata import PdfMetadata
paperless     |   File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/metadata.py", line 29, in <module>
paperless     |     from lxml import etree
paperless     | ImportError: libxslt.so.1: cannot open shared object file: No such file or directory
paperless     | 
paperless     | 12:05:40 [Q] INFO recycled worker Process-1:7
paperless     | 12:05:40 [Q] INFO Process-1:9 ready for work at 732

Now the reason why I suspect this phenomenon to be related is that the import of pikepdf is part of the fault-path (which can be seen in the given log). I was not able to find out whether a fix for the bug described in this issue was already distributed in newer version which thereby could have caused this different but similar error.

I tried resolving the problem the way @kurosch suggested but without any success.

@vemek @darkmattercoder

Could you try the following:

apt-get install libxml2-dev libxslt-dev python3-dev libjpeg-dev zlib1g-dev build-essential cython3 -y
python3 -m pip install wheel
python3 -m pip install lxml --force-reinstall
python3 -m pip install Pillow --force-reinstall
python3 -m pip install pikepdf --force-reinstall

I was getting similar issues on my raspberry pi 3. The above steps fixed it for me.

I did try as well to build the image locally but this did not work either due to other errors (not sure if relevant to this issue).

As I am trying for over a week now and I don't seem to find any solution, has anybody an Idea what else I could try to get the whole thing working?

@Roxedus
Copy link
Member

Roxedus commented Apr 3, 2022

Why does this issue keep getting comments, while noone tests pr?

@AnomalieXB-6783746
Copy link

AnomalieXB-6783746 commented Apr 3, 2022

Sorry that I missed to address you @Roxedus ,
at least in my fault scenario your approach seems not to solve my problem, I keep getting the following error:

paperless     | 13:20:27 [Q] INFO Process-1:7 stopped doing work
paperless     | 13:20:27 [Q] ERROR Failed [some.pdf] - libxslt.so.1: cannot open shared object file: No such file or directory : Traceback (most recent call last):
paperless     |   File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
paperless     |     res = f(*task["args"], **task["kwargs"])
paperless     |   File "/app/paperless/src/documents/tasks.py", line 74, in consume_file
paperless     |     document = Consumer().try_consume_file(
paperless     |   File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
paperless     |     document_parser.parse(self.path, mime_type, self.filename)
paperless     |   File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse
paperless     |     import ocrmypdf
paperless     |   File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/__init__.py", line 10, in <module>
paperless     |     from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
paperless     |   File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in <module>
paperless     |     import pikepdf
paperless     |   File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 50, in <module>
paperless     |     from .models import (
paperless     |   File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/__init__.py", line 14, in <module>
paperless     |     from .metadata import PdfMetadata
paperless     |   File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/metadata.py", line 28, in <module>
paperless     |     from lxml import etree
paperless     | ImportError: libxslt.so.1: cannot open shared object file: No such file or directory
paperless     | 
paperless     | 13:20:28 [Q] INFO recycled worker Process-1:7
paperless     | 13:20:28 [Q] INFO Process-1:9 ready for work at 756

Although as I am not 100% sure if the error occurring to me is the same as to the others so it might be that it solves their problem.

@tobbenb
Copy link
Member

tobbenb commented Apr 3, 2022

Can you try to exec into the container and do an apk add libxslt and try again?

@AnomalieXB-6783746
Copy link

AnomalieXB-6783746 commented Apr 3, 2022

@tobbenb apk seems not to exist in the container and I somewhat failed to install it.
Instead I used apt and tried to apt-get install libxslt-dev (libxslt alone seemed not to exist).
Yet the process failed with:

Processing triggers for libc-bin (2.31-0ubuntu9.2) ...
/usr/sbin/ldconfig: 16: exec: /sbin/ldconfig.real: not found
/usr/sbin/ldconfig: 16: exec: /sbin/ldconfig.real: not found
dpkg: error processing package libc-bin (--configure):
 installed libc-bin package post-installation script subprocess returned error exit status 127
Errors were encountered while processing:
 libc-bin
E: Sub-process /usr/bin/dpkg returned an error code (1)

@tobbenb
Copy link
Member

tobbenb commented Apr 3, 2022

Sorry, i don't know why I thought it was using alpine as the base OS.
Try first apt update then apt install libxslt1.1

@AnomalieXB-6783746
Copy link

Quick and happy update!
Thanks to @tobbenb it now seems to work seamlessly. Although I would note that after the apt update the installation of libxslt1.1 was not successful until an apt upgrade. In case this helps anyone in the future.

I'm going to try to figure out whether its just the re-installation of libxslt1.1 that fixes the problem or if it is the combination with the image provided by @Roxedus .

I managed to reproduce this. Can you test lsiodev/paperless-ng:1.5.0-pikepdf to confirm?

When I got any new information on this topic I'll attach it here!
In the mean time thanks for the support guys :)

@AnomalieXB-6783746
Copy link

So after a few more tests it seems to work only on some files, on others not. I tried to find a common property on the files working and the files not working but I didn't find any correlation.

The error now occurring on some files is:

[2022-04-03 16:00:34,313] [INFO] [paperless.consumer] Consuming some.pdf
[2022-04-03 16:00:37,179] [ERROR] [paperless.consumer] Error while consuming document some.pdf: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)
Traceback (most recent call last):
  File "/app/paperless/src/paperless_tesseract/parsers.py", line 241, in parse
    ocrmypdf.ocr(**args)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/api.py", line 340, in ocr
    return run_pipeline(options=options, plugin_manager=plugin_manager, api=True)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 359, in run_pipeline
    pdfinfo = get_pdfinfo(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_pipeline.py", line 157, in get_pdfinfo
    return PdfInfo(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 860, in __init__
    self._pages = _pdf_pageinfo_concurrent(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 644, in _pdf_pageinfo_concurrent
    executor(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_concurrent.py", line 82, in __call__
    self._execute(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/builtin_plugins/concurrency.py", line 132, in _execute
    for result in results:
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 601, in _pdf_pageinfo_sync
    page = PageInfo(pdf, pageno, infile, check_pages, detailed_analysis)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 675, in __init__
    self._gather_pageinfo(pdf, pageno, infile, check_pages, detailed_analysis)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 721, in _gather_pageinfo
    for ci in _process_content_streams(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 528, in _process_content_streams
    yield from _find_regular_images(container, contentsinfo)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 446, in _find_regular_images
    yield ImageInfo(name=draw.name, pdfimage=pdfimage, shorthand=draw.shorthand)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 319, in __init__
    pim_icc = pim.icc
  File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/image.py", line 394, in icc
    self._icc = ImageCms.ImageCmsProfile(iccbytesio)
  File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 172, in __init__
    self._set(core.profile_frombytes(profile.read()))
  File "/usr/local/lib/python3.8/dist-packages/PIL/_util.py", line 19, in __getattr__
    raise self.ex
  File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 23, in <module>
    from PIL import _imagingcms
ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
    document_parser.parse(self.path, mime_type, self.filename)
  File "/app/paperless/src/paperless_tesseract/parsers.py", line 290, in parse
    raise ParseError(f"{e.__class__.__name__}: {str(e)}")
documents.parsers.ParseError: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)
16:00:37 [Q] INFO Process-1:28 stopped doing work
16:00:37 [Q] ERROR Failed [some.pdf] - some.pdf: Error while consuming document some.pdf: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py) : Traceback (most recent call last):
  File "/app/paperless/src/paperless_tesseract/parsers.py", line 241, in parse
    ocrmypdf.ocr(**args)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/api.py", line 340, in ocr
    return run_pipeline(options=options, plugin_manager=plugin_manager, api=True)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 359, in run_pipeline
    pdfinfo = get_pdfinfo(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_pipeline.py", line 157, in get_pdfinfo
    return PdfInfo(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 860, in __init__
    self._pages = _pdf_pageinfo_concurrent(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 644, in _pdf_pageinfo_concurrent
    executor(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_concurrent.py", line 82, in __call__
    self._execute(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/builtin_plugins/concurrency.py", line 132, in _execute
    for result in results:
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 601, in _pdf_pageinfo_sync
    page = PageInfo(pdf, pageno, infile, check_pages, detailed_analysis)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 675, in __init__
    self._gather_pageinfo(pdf, pageno, infile, check_pages, detailed_analysis)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 721, in _gather_pageinfo
    for ci in _process_content_streams(
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 528, in _process_content_streams
    yield from _find_regular_images(container, contentsinfo)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 446, in _find_regular_images
    yield ImageInfo(name=draw.name, pdfimage=pdfimage, shorthand=draw.shorthand)
  File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/pdfinfo/info.py", line 319, in __init__
    pim_icc = pim.icc
  File "/usr/local/lib/python3.8/dist-packages/pikepdf/models/image.py", line 394, in icc
    self._icc = ImageCms.ImageCmsProfile(iccbytesio)
  File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 172, in __init__
    self._set(core.profile_frombytes(profile.read()))
  File "/usr/local/lib/python3.8/dist-packages/PIL/_util.py", line 19, in __getattr__
    raise self.ex
  File "/usr/local/lib/python3.8/dist-packages/PIL/ImageCms.py", line 23, in <module>
    from PIL import _imagingcms
ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/asgiref/sync.py", line 288, in main_wrap
    raise exc_info[1]
  File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
    document_parser.parse(self.path, mime_type, self.filename)
  File "/app/paperless/src/paperless_tesseract/parsers.py", line 290, in parse
    raise ParseError(f"{e.__class__.__name__}: {str(e)}")
documents.parsers.ParseError: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
    res = f(*task["args"], **task["kwargs"])
  File "/app/paperless/src/documents/tasks.py", line 74, in consume_file
    document = Consumer().try_consume_file(
  File "/app/paperless/src/documents/consumer.py", line 266, in try_consume_file
    self._fail(
  File "/app/paperless/src/documents/consumer.py", line 70, in _fail
    raise ConsumerError(f"{self.filename}: {log_message or message}")
documents.consumer.ConsumerError: some.pdf: Error while consuming document some.pdf: ImportError: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.8/dist-packages/PIL/__init__.py)

16:00:37 [Q] INFO recycled worker Process-1:28
16:00:37 [Q] INFO Process-1:30 ready for work at 3007

Besides this probably being a different issue, i find it somewhat interesting other exceptions ocuring during exceptionhandling. I guess that should not be.

@kurosch
Copy link

kurosch commented Apr 3, 2022

@Roxedus I am tested it and it doesn't work, still. The error is identical to @AnomalieXB-6783746.

@AnomalieXB-6783746
Copy link

After some more research it seems like this could be a bug in papaerless-ng itself
Issue.

@nothing2obvi
Copy link

Still having this issue. I opened a dupe issue, which was closed.

@darkmattercoder
Copy link

Consider GitHub.com/Linux-Server/docker-paperless-ngx as alternative. I think the maintenance effort for Ng is unnecessary because ngx has been established

@github-actions
Copy link

github-actions bot commented Jun 3, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@badermaiss
Copy link

Same issue here with the latest image on docker running with portainer.
Here's the command docker pull linuxserver/paperless-ng to confirm.

Using default tag: latest latest: Pulling from linuxserver/paperless-ng Digest: sha256:c5d2a1006be929edb9098532e0fee68f9588ff265e651fb7d1ab78c9ac350426 Status: Image is up to date for linuxserver/paperless-ng:latest docker.io/linuxserver/paperless-ng:latest

@jemelo's set of commands worked for me after accessing the container with docker exec -it paperless su

sudo apt-get install libxml2-dev libxslt-dev python-dev
sudo apt-get install libjpeg-dev zlib1g-dev
pip install wheel
apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall

Running on a Raspberry Pi 4B 4GB RAM with Raspberry Pi OS (64-bit).

@smignon612
Copy link

Same issue here with the latest image on docker running with portainer. Here's the command docker pull linuxserver/paperless-ng to confirm.

Using default tag: latest latest: Pulling from linuxserver/paperless-ng Digest: sha256:c5d2a1006be929edb9098532e0fee68f9588ff265e651fb7d1ab78c9ac350426 Status: Image is up to date for linuxserver/paperless-ng:latest docker.io/linuxserver/paperless-ng:latest

@jemelo's set of commands worked for me after accessing the container with docker exec -it paperless su

sudo apt-get install libxml2-dev libxslt-dev python-dev
sudo apt-get install libjpeg-dev zlib1g-dev
pip install wheel
apt install python3-dev build-essential -y && pip install pikepdf==2.16.1 --force-reinstall

Running on a Raspberry Pi 4B 4GB RAM with Raspberry Pi OS (64-bit).

works for me, thx

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@francescofum
Copy link

Still having issue on raspberry pi 4gb running ubuntu 22.04

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@ssabdb
Copy link

ssabdb commented Aug 30, 2022

I'm still having the issue, rpi 4gb running ubuntu 20.04.

@thespad thespad mentioned this issue Sep 4, 2022
1 task
Issue & PR Tracker automation moved this from PRs & in progress issues to Done Sep 5, 2022
@jemelo
Copy link

jemelo commented Oct 11, 2022 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects