-
Notifications
You must be signed in to change notification settings - Fork 942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Unsupported mime type application/octet-stream #776
Comments
Have you tried re-creating that PDF? Im not sure See jonaswinkler/paperless-ng#906 also jonaswinkler/paperless-ng#291 --> jonaswinkler/paperless-ng#201 |
Normally, a PDF would be detected as I'd be curious to see what |
I have a similar problem:
It's a (what i think) valid PDF created by FineReader with OCR-recognition. If i examine the raw file (via vi), i can't find any string of "inode" or "x-empty". Where does this come from? |
Revisiting this, it seems likely there are 2 things happening here. To make a mime type of For the |
I'm encountering this error with all account statement PDFs my bank produces. After tinkering with it for a while, I found out that for some reason they have extra data at the beginning of the file, before
Paperless will error out when trying to consume this file. However, if I simply erase everything before
I'm not sure it's the same problem exactly, but I thought this might help. A possible solution could be to look forward a few hundred characters when hitting this error to solve the case where a small amount of extra data is added at the beginning of the file? Although as I understand, this might be entirely dependent on an external library ( |
Yes, I suspect |
Following @stumpylog's recommendation, I tried to set up a preconsume script to handle this. However, paperless does the mime-type checking (and fails out) before the preconsume script is run. See lines 272-291 of consumer.py Is this the expected behavior? Or should the preconsume script be allowed to run before mime-type checking? |
|
@auberginepop
|
You need to have qpdf installed. I assume you are running Linux in which case use whatever method you normally use to install things. For example, |
thanks for that. need I do this if my paperless ngx is running in a docker container ? i figure it out I have to run this in the container but I have in total over 50 files all invoice from amazon and 16 are with the issue Unsupported mime type inode/x-empty |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns. |
Description
Getting "Unsupported mime type application/octet-stream : Traceback (most recent call last)" when ingesting a normal PDF file.
File "/usr/local/lib/python3.9/site-packages/django_q/cluster.py", line 432, in worker res = f(*task["args"], **task["kwargs"]) File "/usr/src/paperless/src/documents/tasks.py", line 70, in consume_file document = Consumer().try_consume_file( File "/usr/src/paperless/src/documents/consumer.py", line 211, in try_consume_file self._fail(MESSAGE_UNSUPPORTED_TYPE, f"Unsupported mime type {mime_type}") File "/usr/src/paperless/src/documents/consumer.py", line 69, in _fail raise ConsumerError(f"{self.filename}: {log_message or message}") documents.consumer.ConsumerError: 110Tapestry2021TaxBill.pdf: Unsupported mime type application/octet-stream
Expected behavior
Other PDFs work fine.
Steps to reproduce
When uploading certain PDFs
Webserver logs
No response
Screenshots
No response
Paperless-ngx version
1.6
Host OS
Unraid
Installation method
Docker
Browser
Chrome
Configuration changes
No response
Other
No response
The text was updated successfully, but these errors were encountered: