New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Math fonts (Type 3) incorrectly embedded in PDF? #21797
Comments
Consider the file Test.pdf, which causes the aforementioned error message in Adobe Acrobat. |
Please try with 3.5.0 |
I tested on 3.5.0 and the error message still exists. OTOH I'm not clear what it means, if anything.. |
I tried to get more details by using the Preflight tool from Adobe Acrobat, but unfortunately I could not get any. However, since the error message only arises for certain types of fonts, there seems to be an actual reason. The error message is significant insofar as it also arises if such a figure is embedded in a larger PDF (e.g. via LaTeX) and therefore presumably affects a whole lot of documents. |
.. but does it have a practical effect, in that the document looks bad? Are the fonts actually missing? I wouldn't rule out an Adobe bug here... |
As far as I can tell, the figures look fine, but it is hard to tell since I usually use the font Stix 2 while Matplotlib uses the former version and they differ anyway. According to the document properties shown in Adobe Acrobat (prior to displaying the error message) the font is embedded. Similarly, name type encoding emb sub uni object ID ------------------------------------ ----------------- ---------------- --- --- --- --------- DejaVuSans-Oblique Type 3 Custom yes no no 15 0 It could be a bug in Adobe Acrobat, or e.g., maybe something remotely similar to pull request #1808? Does anybody know a way to truly check the PDF, e.g. via the Preflight tool? |
3.5 would instead produce a font subsetted file (with 6 characters in the beginning of font name)..
^when
^when Not sure about the actual bug here, but I'd first try to reproduce it with latest Matplotlib build.. also these PDFs do not really bug other native PDF viewers that I've tried them with, so the Adobe Acrobat bug seems possible. |
Could you please add the two PDFs for Matplotlib 3.5 with font type 3 and TrueType, so I could check whether the error arises? |
@theBruegge here you go: |
Thanks @aitikgupta! The issue is still the same: for font type 3 the aforementioned error message arises, which states that the embedded font cannot be extracted (the font now has the name I already tried to get more details via the Preflight tool from Adobe Acrobat and from Adobe support, which unfortunately did not work out so far, but I'll try again ... |
I think I got some more details via the Preflight tool from Adobe Acrobat Pro: according to Preflight the font is not embedded if Font Type 3 is used, while TrueType fonts are shown to be embedded. In the former case, Preflight therefore tries to embed the font. This procedure is shown to be successful, if the DejaVu Fonts were previously installed on the system. However, while this procedure truly works for non-math text in DejaVuSans (the font is denoted as Embedded Subset afterwards in the Document Properties), embedding apparently does not work for math text in DejaVuSans-Oblique and Preflight vectorizes the math text. Unfortunately, this does still not reveal any details. However, in my experience printing companies check PDFs in that respect for instance using Preflight – which renders this issue actually problematic, at least to some extent. |
@theBruegge Its not clear to me what action you are suggesting for Matplotlib. |
@jklymak |
Finally, I got some more details – I hope it helps @aitikgupta and @jklymak! I contacted Adobe Care on Twitter via Direct Message and after some discussion and mailing them the file, they came to the following conclusion:
from pdfquery import PDFQuery
file = 'Test.pdf'
pdf = PDFQuery(file)
pdf.load()
pdf.tree.write(file.replace('.pdf', '.xml'), pretty_print=True, encoding='utf-8')
|
Sorry for bothering you again @aitikgupta, @dstansby, and @jklymak, but consider the following crucial question for this issue: does Matplotlib intendedly create vector paths for Greek math symbols instead of actual text? |
There are various locations in the PDF backend that check (for Type 3) whether the character code is < 256. I think this may be a misinterpretation of the spec, or at least a simplification. AFAICT, there is no requirement that Character Encodings are restrained to codes < 256. There is however a statement that 'With a simple font, each byte of the string shall be treated as a separate character code.' Since Greek characters are outside this range in Unicode/UTF-8, they all would not work and are output as However, there is this statement:
And we do implement a Unicode |
We can definitely drop some of the <256 checks, though things are a bit under-optimized that way. However, I hesitate to do much more with the |
@QuLogic, thanks for your previous efforts! Are there any updates on this topic? I tried the current Matplotlib version and tried to deactivate the <256 checks (more precisely, in |
Another example is from https://stackoverflow.com/questions/76057034/python-matplotlib-produces-larger-and-blurrier-pdf-than-r?noredirect=1#comment134141564_76057034 Note that if you do plt.rcParams["pdf.fonttype"] = 42 the file looks fine, but actually gets larger than if you use Type-3 fonts. If you use |
Not emitting an empty font embedding makes sense, but I would push back on adobe and ask if that is actually out of spec or not. If the spec does not forbid it we have found a bug in their tool :) |
Knowing what I know now, I would formulate my issue differently: it's actually not just about the error message in Adobe Acrobat (regardless of whether out of spec or not as mentioned by @tacaswell). The issue rather addresses the fact that characters with a code ≥ 256 are not embedded as text in the PDF but as a rendered path. |
Bug summary
PDFs containing math fonts cause the following error message in Adobe Acrobat: Cannot extract the embedded font. Some characters may not display or print correctly.
Code for reproduction
Actual outcome
Expected outcome
The font seems to be displayed correctly, but apparently it is not correctly embedded.
Additional information
To reproduce this issure, open the PDF created by Matplotlib in Adobe Acrobat, navigate to Document Properties > Fonts > OK. The error message is only displayed if one changes the zoom, switches pages, etc. The error message arises in recent versions of Adobe Acrobat on different operating systems.
The error message arises only for PDFs containing mathematical expressions, while PDFs with regular text cause no error message. However, the issue seems to arise independently of the actual font type and arises for the default font
dejavusans
and others such asstix
.The issue furthermore arises for the default value of
pdf.fonttype
being Type 3. When switching to TrueType (causing much larger file sizes), the error message no longer arises.Operating system
No response
Matplotlib Version
3.4.3
Matplotlib Backend
No response
Python version
No response
Jupyter version
No response
Installation
conda
The text was updated successfully, but these errors were encountered: