Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subset_fonts error exit without exception/warning #3470

Open
ragebear00 opened this issue May 13, 2024 · 6 comments
Open

subset_fonts error exit without exception/warning #3470

ragebear00 opened this issue May 13, 2024 · 6 comments
Labels
fix developed release schedule to be determined

Comments

@ragebear00
Copy link

ragebear00 commented May 13, 2024

Description of the bug

in the new PyMUPDF 1.24.3, if any error in doc.subset_fonts(), the process will end without any warning or error number. doc.subset_fonts() Error will be raised in PyMUPdf 1.23.26.

How to reproduce the bug

In PyMUPdf 1.23.26
Traceback (most recent call last):
File "C:_a\PDF_Searchable_v1.py", line 346, in pdfSearhable4
doc.subset_fonts()
File "C:\Users\6\AppData\Local\Programs\Python\Python310\lib\site-packages\fitz\utils.py", line 5631, in subset_fonts
width_table, def_width = get_old_widths(font_xref)
File "C:\Users\6\AppData\Local\Programs\Python\Python310\lib\site-packages\fitz\utils.py", line 5350, in get_old_widths
df_xref = int(df[1][1:-1].replace("0 R", ""))
ValueError: invalid literal for int() with base 10: '<</BaseFont/CIDFont+F1/CIDSystemInfo<</Ordering 97 /Registry 98 /Supplement 0>>/CIDToGIDMap/Identity/FontDescriptor<</Ascent 952/CapHeight 631/Descent -268/Flags 6/FontBBox 99 /FontFile2 100 /FontNam

PyMuPDF version

1.24.3

Operating system

Windows

Python version

3.10

@JorjMcKie
Copy link
Collaborator

This post cannot be accepted with a reproducing file.
To circumvent an urgent situation, please use argument fallback=True.

@ragebear00
Copy link
Author

ragebear00 commented May 13, 2024

try to run doc.subset_fonts in the attached file will create an error in an
1 - Copy.pdf
earlier version.

Under with fallback, the doc.subset_fonts will raise the same error.

Under new version(without fallback), the error will not be raised, but the file doc.save after doc.subset_fonts will scramble the words.

@cbm755
Copy link
Contributor

cbm755 commented May 15, 2024

I can reproduce the previous comment:

In [2]: fitz.version
Out[2]: ('1.23.3', '1.23.2', '20230831000001')

In [3]: d = fitz.open("1.-.Copy.pdf")

In [4]: d.subset_fonts()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 1
----> 1 d.subset_fonts()

File /usr/lib64/python3.12/site-packages/fitz/utils.py:5448, in subset_fonts(doc, verbose)
   5445 # walk through the original font xrefs and replace each by the subset def
   5446 for font_xref in xref_set:
   5447     # we need the original '/W' and '/DW' width values
-> 5448     width_table, def_width = get_old_widths(font_xref)
   5449     # ... and replace original font definition at xref with it
   5450     doc.update_object(font_xref, font_str)

File /usr/lib64/python3.12/site-packages/fitz/utils.py:5175, in subset_fonts.<locals>.get_old_widths(xref)
   5173 if df[0] != "array":  # only handle xref specifications
   5174     return None, None
-> 5175 df_xref = int(df[1][1:-1].replace("0 R", ""))
   5176 widths = doc.xref_get_key(df_xref, "W")
   5177 if widths[0] != "array":  # no widths key found

ValueError: invalid literal for int() with base 10: '<</BaseFont/CIDFont+F1/CIDSystemInfo<</Ordering 13 /Registry 14 /Supplement 0>>/CIDToGIDMap/Identity/FontDescriptor<</Ascent 952/CapHeight 631/Descent -268/Flags 6/FontBBox 15 /FontFile2 16 /FontName

But with 1.24.3, I get no error and upon save I see scrambled words:
image

@JorjMcKie
Copy link
Collaborator

The MuPDF team has developed a fix which I am currently testing.

@JorjMcKie
Copy link
Collaborator

Update: fix developed.

@JorjMcKie JorjMcKie added fix developed release schedule to be determined and removed example required Waiting for information labels May 16, 2024
@cbm755
Copy link
Contributor

cbm755 commented May 16, 2024

I have a possibly-related issue where 1.24.3 leaves some misc chars on the page, which go away if I stop using subset_fonts. Haven't narrowed it down to a MWE yet, but one difference is I DO NOT get an error with older pymupdf: so it might not be quite the same issue... More to follow.

Downstream issue: https://gitlab.com/plom/plom/-/issues/3374

cbm755 added a commit to plomgrading/plom that referenced this issue May 21, 2024
Fixes Issue #3374, by falling back on the deprecated in-python fonttools
based technique for doing subsetting.  To be removed once the new
MuPDF-based code is a little more mature, or at least once [1, 2] are
fixed.

[1] pymupdf/PyMuPDF#3470
[2] pymupdf/PyMuPDF#3494
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix developed release schedule to be determined
Projects
None yet
Development

No branches or pull requests

3 participants