New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] PDFs Not Imported Properly Due to Missing Fonts #1135
Comments
This appears to be a duplicate of #277. With the LSIO, you can probably add a custom user start script to install the fonts-roboto package, which appears to be the missing font in this particular case. |
The details on a custom startup script are here: https://www.linuxserver.io/blog/2019-09-14-customizing-our-containers
If that does resolve the issue, let me know. GhostScript isn't the best documentation for where it finds fonts to use. |
For testing, I ran the below commands in the currently-running Paperless-NGX container: apt-get -qq update
apt-get install -y fonts-roboto After that, I printed to PDF a random email from GMail and waited for it to import. Unfortunately, the imported PDF is still blank. Below is the log from the import.
|
Ok, it seems like this is a font that Android includes. Basically the only references I can find are in the AOSP. Basically, I believe a PDF should embed the font in some way, so it's self contained. Kind of the point of PDF, so it looks the same for everyone. But it's not doing that. So to test that theory, if you are able to:
(fc-cache is generating the font caches after the change: https://linux.die.net/man/1/fc-cache) |
I totally agree that PDFs should contain any fonts they use so they are self-contained. The GMail app did do this up until about June 3, as I don't recall seeing any errors like this before (and the PDFs imported just fine into Paperless). Kinda frustrating that this was changed. :( I did get a chance to troubleshoot this a bit this morning. I ran the above commands as you suggested in the same container we have been using. The commands worked, but Paperless still threw errors when attempting to process the PDF. I copied the TTF file into #!/bin/bash
mkdir /usr/share/fonts/truetype/roboto
cp /config/custom-cont-init.d/RobotoStatic-Regular.ttf /usr/local/share/fonts/
fc-cache -v I restarted the container and verified it ran the script. I logged into the container and verified the font cache saw the RobotoStatic font:
I tried uploading the document again, but ghostscript still fails to find the font. The uploaded PDF is still mostly blank. It appears someone created a similar issue for the LSIO container: linuxserver/docker-paperless-ngx#25. I tried installing the #!/bin/bash
mkdir /usr/share/fonts/truetype/droid
cp /config/custom-cont-init.d/DroidSansFallback.ttf /usr/share/fonts/truetype/droid/DroidSansFallbackFull.ttf
fc-cache -v This time, when uploading the PDF, ghostscript does find the fallback font and uses it. However, any part of the PDF that uses it returns gibberish characters. I verified the TTF file itself is OK, so I am not sure what happened. Here is the log output for reference:
It appears that, although the container recognizes the font as installed, ghostscript doesn't use |
Yep, that's about what I get as well. Very strange that Ghostscript finds the font during For the future, I think my idea would be a directory ( |
I have learned more about fonts in the last few hours than I would care to admit. :) Using
If I download an older PDF that imported into paperless just fine, I see a similar output.
I am not sure what to make of this, just pointing it out. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
We should probably set |
Yes agree, the different options seem to behave in weird ways, I thought |
Oh, I guess I know what's going on... I took the liberty to apply a fix. :) #1237 |
I am circling back to this issue to report it is no longer an issue (sort of). Emails that I print to PDF from GMail on my Android now properly import and show text (w/ OCR). I verified this using several recent e-mails. I still had the "bad" PDF from earlier in this thread, and uploading it still causes the blank image. However, re-printing to PDF (from Android GMail), and importing it is now successful. As a note, when I try to view the "bad" PDF in Xreader, it also shows up as blank. When I open it via Brave Browser, the text is there but is definitely a substitute font. The new "good" version properly renders in both. Below is the output from
Either way, this was obviously a bug in the GMail app that is now resolved. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns. |
Description
I have been running Paperless NGX for a while and have developed a (still manual) workflow for when I get an e-mail receipt (where the e-mail itself is a receipt and does not have an attached PDF). In the GMail app on my Android phone, I print the email and select "Save as PDF" as the printer. I save the PDF into a Syncthing folder which syncs the paperless-consume directory to my docker server. This has been working fine for some time.
This morning, I realized that starting on June 3, all of the PDFs generated in this manner failed to import properly. Paperless "successfully" imports it, but all of the text and most of the images are missing. Based on the message below, this appears to be due to missing embedded fonts.
To be clear, the issue is primarily with the GMail app. Using the same "Save as PDF" printer method with other apps seems to work just fine.
Notes:
Steps to reproduce
Webserver logs
Paperless-ngx version
1.7.1
Host OS
Ubuntu 20.04 / Docker
Installation method
Docker
Browser
No response
Configuration changes
No response
Other
No response
The text was updated successfully, but these errors were encountered: