-
Notifications
You must be signed in to change notification settings - Fork 941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Sanity check returns "Document xxx has no content." #1041
Comments
Well, i stumbled upon this a few weeks ago and looked after it. It was completely right, the mentioned documents hat no text in its content tab. The number is the unique database id of the document, you can just go to your running instance and pass ist with "/documents/" and should be able to view the document. |
The URL would look something like this: As to why there is no text, there's a plenty of possibilities: no text in the document or images, OCR failed (PDFs are surprisingly not very standard), etc. As of currently, there isn't a way to re-do OCR on a document. I do agree outputting the primary key isn't very user friendly. When I can, I'll look at including the title and/or path |
Oh I see, thanks for the help, I figured it out and it is as you described like e.g there ended up a picture in my system which obviously does not have any content. Thanks again for the kind help! |
So "no content" actually means "no OCR data"? To a newer user and without context, this is not obvious. No content = empty file.
I'm wondering how one would catch OCR errors or other import issues? I was totally not aware of the direct accessability of documents via an URL! That's awesome. |
Unless the OCR problem is an error which ocrmypdf can't work around, it will only output the issues to the log. Actual show stopping issues would be reported to the web ui if uploading, otherwise still the log if using the consume folder. As for the URL, it's just a URL, there's not something to document there that I see. Usually, if someone is caring about a document via primary key, they'd be in the API, which is documented. |
Closed in branch by 04db521 |
It can be a quite useful feature for some, but many people just don't know that it can be used like that. Somebody in the discussions wanted to link documents to a task in their ToDo app for example. The direct link feature is really nice there - instead having to download+attach a file. It also works when appending Keep in mind that not all Paperless users are tech guys. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns. |
Description
My sanity checker returned several lines of "Document xxx has no content."
Is there any way to find the Document via this number and how can I fix?
Do I need to be worried?
Steps to reproduce
go to docker-compose folder
run command
docker-compose exec webserver document_sanity_checker
Webserver logs
Paperless-ngx version
1.7.1
Host OS
Ubuntu 20.04.3 LTS x86_64
Installation method
Docker
Browser
No response
Configuration changes
No response
Other
No response
The text was updated successfully, but these errors were encountered: