Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validating supplement number of predefined CMaps #77

Open
bdoubrov opened this issue May 24, 2021 · 23 comments
Open

validating supplement number of predefined CMaps #77

bdoubrov opened this issue May 24, 2021 · 23 comments
Assignees
Labels
Parked Parked (eg. passed to another TWG, next ISO spec) PDF/A-4 PDF/A-4 (ISO 19005-4:202x) PDF/UA-2 PDF/UA-2 [ISO 14289-2:202x] PDF/X-6 PDF/X-6 [ISO 15930-9:2020]

Comments

@bdoubrov
Copy link

All PDF/A specifications starting from PDF/A-2 contain the following requirement on the Registry, Ordering, and Supplement in CIDFonts:

For any given composite (Type 0) font within a conforming file, the CIDSystemInfo entry in its CIDFont dictionary and its Encoding dictionary shall have the following relationship:

  • If the Encoding key in the Type 0 font dictionary is Identity-H or Identity-V, any values of Registry, Ordering, and Supplement may be used in the CIDSystemInfo entry of the CIDFont.
  • Otherwise, the corresponding Registry and Ordering strings in both CIDSystemInfo dictionaries shall be identical, and the value of the Supplement key in the CIDSystemInfo dictionary of the CIDFont shall be greater than or equal to the Supplement key in the CIDSystemInfo dictionary of the CMap.

The main interest here the second (Otherwise, […]) list item and the case when one of the predefined CMaps is used. While Registry and Ordering are uniquely defined by the name of the predefined CMap, its Supplement is supposed to change (increase) with new versions of PDF. And we have Table 117 in ISO 32000-1 listing these supplement numbers for PDF 1.2-1.5, which was removed from ISO 32000-2.

Would it be correct to say that the PDF/A-2 and PDF/A-3 validator shall assume that the Supplement of a predefined CMap is equal to the largest number of its character collection as listed in ISO 3200-1, Table 117? While PDF/A-4 validator shall assume that the Supplement is fixed based on the following statement from ISO 32000-2: “A PDF processor shall support Adobe-CNS1-7, Adobe-GB1-5, Adobe-Japan1-7 and Adobe-KR-9 character collections”?

Next, it seems that the text "shall be greater than or equal" is wrong and should be replaced by "shall be less than or equal". For example, some PDF creators today specify Supplement value in the CIDFont info as a minimal supplement number that covers all characters in the embedded font. So, normally such number will be less than the Supplement in the CIDSystemInfo dictionary of the CMap.

@bdoubrov bdoubrov added the bug Something isn't correct label May 24, 2021
@petervwyatt
Copy link
Member

See also Issue #40 and adobe-type-tools/cmap-resources#11

@bdoubrov bdoubrov added PDF/A-4 PDF/A-4 (ISO 19005-4:202x) PDF/X-6 PDF/X-6 [ISO 15930-9:2020] question Further information is requested and removed bug Something isn't correct labels May 25, 2021
@MPBailey
Copy link

MPBailey commented Jun 3, 2021

As a software vendor selling tools that consume PDF files, we would expect to keep pretty close to the latest published CMaps for those bundled with our products. So I agree that needing the font's Supplement value to be equal to or later than the CMap's Supplement value sounds incorrect.

@petervwyatt petervwyatt added this to the Font and text related milestone Jul 7, 2021
@DietrichSeggern
Copy link

I think this is indeed wrong. A CMap is assumed to be updated with additional glyphs over time and each time this happens the supplement value should be increased. The CIDFont that references a CMap should either use an older version of that CMap (a lower supplement value) or the same version. So it should say:
"... and the value of the Supplement key in the CIDSystemInfo dictionary of the CIDFont shall be smaller than or equal to the Supplement key in the CIDSystemInfo dictionary of the CMap."

But is this not more an issue for the PDF/A TWG? It is not an issue in ISO 32000-2.

@petervwyatt
Copy link
Member

Yes it is for the PDF/A TWG to decide corrections for ISO 19005-4:2020 and report back here so that I can then publish the industry recommended corrections.
@bdoubrov

@lrosenthol
Copy link
Contributor

I agree with @DietrichSeggern on this - both in terms of the text and the issue in PDF/A (and PDF/X)

@petervwyatt
Copy link
Member

PDF/A TWG agree to change text "shall be greater than or equal" is wrong and should be replaced by "shall be less than or equal".
This also impacts PDF/X so parallel errata will be done.

@petervwyatt petervwyatt added proposed solution Proposed solution is ready for review and removed question Further information is requested labels Feb 3, 2022
petervwyatt added a commit that referenced this issue Feb 4, 2022
@petervwyatt
Copy link
Member

Fixed for PDF/A-4. Cannot find equivalent text in PDF/X-6...

@DietrichSeggern
Copy link

Indeed the whole chapter "6.2.10.3 Composite fonts" from PDF/A-4 does not have a corresponding text in PDF/X-6. I am not sure whether this is by purpose...

@bdoubrov bdoubrov removed PDF/A-4 PDF/A-4 (ISO 19005-4:202x) proposed solution Proposed solution is ready for review labels Feb 4, 2022
@bdoubrov
Copy link
Author

bdoubrov commented Feb 4, 2022

@lrosenthol maybe you could check if this was on purpose or by accident. I'm reopening this issue leaving just PDF/X-6 label.

@bdoubrov bdoubrov reopened this Feb 4, 2022
@petervwyatt petervwyatt added the question Further information is requested label Feb 5, 2022
@bdoubrov bdoubrov assigned lrosenthol and unassigned bdoubrov Mar 14, 2022
@bdoubrov bdoubrov added the PDF/A-4 PDF/A-4 (ISO 19005-4:202x) label Mar 14, 2022
@MPBailey
Copy link

Just for reference, 32000-2:2020 9.7.3 says "In order for a CIDFont and a CMap to be compatible, their Registry and Ordering values shall be the same." (modulo Identity CMaps), and the Supplement row in Table 114 explicitly says "This value shall not be used in determining compatibility between character collections."

So the reported issue does not affect 32000-2 itself, although there might be a related problem in allowing a font with a high supplement number to be combined with a CMap with a low supplement number that may not include all of the character codes required for that document. If so, it's not a new problem!

@petervwyatt
Copy link
Member

@lrosenthol - could you please research this issue and the history with PDF/X?

@lrosenthol
Copy link
Contributor

Well, in the final disposition of comments on the 3rd CD of 15930-9 (date 2017-03-22), there is a table comment from the PL that says:

Ensure that the content around fonts is 100% consistent with PDF/A-4 and PDF/E-2

which was accepted by the committee at the time.

In reviewing my notes on completion of application of comments, it is marked as completed.

HOWEVER, in checking CD4 (and all subsequent versions of the document), only minor changes were applied within the context of the existing font material. No effort was made - AFAICT - to actually bring over the full spectrum of PDF/A font material.

I do recall that there have been various debates over time in TC 130/WG 2/TF 2 about whether the font requirements from PDF/A really applied, in their entirety, to PDF/X - since rules about content extraction aren't relevant in the PDF/X world and only those requirements that impacted visual rendering/fidelity should apply.

@MPBailey
Copy link

And that makes perfect sense. The goal was to ensure that it was possible to create a file that conformed to both PDF/X and PDF/A, without constraining it beyond what was necessary to conform to each individually.

That doesn't imply a need for adding all of the PDF/A constraints to the PDF/X standard or vice versa.

But I would definitely argue that, for both PDF/X and PDF/A, the value of supplement should be taken into account in determining compatibility between character collections (see my Jun 30 comment above). So fixing PDF/A and copying the result to X would make sense to me.

@bdoubrov
Copy link
Author

A related discussion at adobe-type-tools repo: adobe-type-tools/cmap-resources#14

@petervwyatt
Copy link
Member

More related information on the upcoming introduction of GB1-6 CMap:

@lrosenthol lrosenthol added Parked Parked (eg. passed to another TWG, next ISO spec) and removed question Further information is requested labels Jul 12, 2023
@petervwyatt
Copy link
Member

And more recently: adobe-type-tools/cmap-resources#16

@bdoubrov
Copy link
Author

To be discussed at the next PDF TWG (16/11/23)

@petervwyatt petervwyatt added help wanted Extra attention is needed proposed solution Proposed solution is ready for review and removed Parked Parked (eg. passed to another TWG, next ISO spec) labels Nov 13, 2023
@petervwyatt
Copy link
Member

Labelled as proposed solution when there isn't to ensure we cover this in the next PDF TWG (so any changes can make it in for the PDF/A-4 and PDF/X-6 dated revisions).

@petervwyatt
Copy link
Member

PDF TWG agree that fixing supplement numbers for PDF/A-4 and PDF/X-6 is best - onus is then on PDF creators to "flatten"/embed CMP should future supplements occur.
PDF/UA-2 has identical font section - needs to be aligned.

@petervwyatt petervwyatt added PDF/UA-2 PDF/UA-2 [ISO 14289-2:202x] and removed help wanted Extra attention is needed labels Nov 17, 2023
@bdoubrov
Copy link
Author

The decision of PDF/A TWG is not to make this change in the dated revisions of PDF/A-4 and PDF/X-6

@petervwyatt
Copy link
Member

To be discussed during the next PDF TWG meeting since the PDF and PDF/A TWGs seem to be at some disagreement.

The PDF Association should at least come up with appropriate communication for stakeholders (both implementers and end-users), even if this is not officially ratified by ISO.

@petervwyatt
Copy link
Member

PDF/A TWG to write an article/whitepaper on this issue with recommendations.
No errata changes required to ISO 32000-2.

@petervwyatt petervwyatt added Parked Parked (eg. passed to another TWG, next ISO spec) and removed proposed solution Proposed solution is ready for review labels Feb 16, 2024
@petervwyatt petervwyatt assigned bdoubrov and unassigned lrosenthol Feb 16, 2024
@petervwyatt
Copy link
Member

Parking this errata until PDF/A TWG complete their article/whitepaper. Assigned to Boris as PDF/A TWG chair.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Parked Parked (eg. passed to another TWG, next ISO spec) PDF/A-4 PDF/A-4 (ISO 19005-4:202x) PDF/UA-2 PDF/UA-2 [ISO 14289-2:202x] PDF/X-6 PDF/X-6 [ISO 15930-9:2020]
Projects
None yet
Development

No branches or pull requests

5 participants