New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Barcode splitting not detecting all barcodes #2385
Comments
Relevant environment:
|
Input and Output files from above. |
Probably some image format |
Shouldn‘t your PR mentioned above fix this? |
Just once I‘d like to open a bug just to be told that I‘m just made a stupid mistake ;) So… it was working in the past with that specific page. I don‘t think it‘s the scanner because I do see barcodes parsed correctly with the same settings (mentioned above). So something with this specific page? Too many barcodes? But why is there no message at all? |
Fun… Found two bug reports for pyzbar: NaturalHistoryMuseum/pyzbar#75 and NaturalHistoryMuseum/pyzbar#63 Both haven‘t been worked on but the second one gave me the impression that it might habe to do with the dimensions of the pictures. I grabbed the png from above and quickly resized it from 600dpi to 300 and 150. Result below. Interesting detail that it found even more on the smallest one. So it seems to be stopping after a specific amount of… pixels?
|
Sigh. That's a unique finding. I checked the extracted image with I'm not sure yet what to do, but I'll have to have a think on it |
I did a bit more testing, based on yesterdays script. I used 3 different images (original as above, one with only one QR upper left, one with only one QR lower right). All images in 4 different resolutions (600, 300, 150 and 72dpi). Then I counted how many barcodes were detected:
Please note that the dpi are not necessarily correct. I basically reduced the original (600dpi) by .5, .25, .125 to get my downsized versions. Interesting is that with the original (7 barcodes) it does not get it right at all. With 150 it detects all but one. But with 72 it only detects 4. The 72dpi version is still quite readable and my mobile has no problem scanning the codes. With the other two examples it gets the lower right example correct. But the upper left one not in all cases. I have no idea what it is doing there. For me it seems that the following factors are relevant:
The issue trackers of pyzbar and libzbar are mentioning issues which seem related but I found nothing that was worked on or actionable infos in there. All test files attached. upper_left_300dpi Edit: Disabled image previews. |
All right. I did some more tests and added yet another file to the test pool. I modified the above test script to see how to get the best results:
With this combination of resize and blur I got most of the cases right:
The new 'small' variant is an image with one small QR code on it. This result seems to be okay, but still with the original image it only detects 6 out of 7 barcodes. Execution time for the blur and resize is a factor here but maybe not really important in the overall execution time of document parsing: With filter:
Without filter:
Newly created small test files: |
Not sure if the following change has side effects, but it could help a bit with the problem even though this is NOT a fix.
|
I think this is essentially NaturalHistoryMuseum/pyzbar/issues/63, given how playing around with scaling and size seems to be the largest effect. And an interesting blog post on the subject: https://kdmurray.id.au/post/2022-03-21_decode-qrcodes/ I'm looking into what can be done on our side, ideally without turning into an image processor and taking up lots of time and memory... |
With #2468, 6 of the 7 barcodes are detected now and all existing barcodes also worked. Unfortunately, unless libzbar and/or pyzbar get some bugfixes around image size, I think that's the best result we'll get. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns. |
Description
With Paperless 1.11.3 barcode splitting is no longer working for me. It was working with my specific setup as of 1.6.0 or 1.7.0. I can't say for sure for the versions between. I mostly worked through old files which were already splitted.
The split is just not happening and I do not see any detection in the log. Interestingly: If I scan a document that has barcodes in it (unrelated) they are detected. Can't show these because they are payment codes with personal data in them.
I'm aware of the work that was done for example #1953 by @stumpylog so I also tested with the most recent dev Image on docker-hub. No change.
The pdfminer.six error visible in the log is not happening if I refeed the finished and downloaded document. The problem stays the same.
Steps to reproduce
Webserver logs
Browser logs
No response
Paperless-ngx version
1.11.3
Host OS
Ubuntu 22.04.1 LTS / docker
Installation method
Docker - official image
Browser
Chrome
Configuration changes
No response
Other
No response
The text was updated successfully, but these errors were encountered: