Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert uniXXXX glyph names to proper ones when building the charCodeToGlyphId map for TrueType fonts (bug 1132849, issue 6893, issue 6894) #7069

Merged
merged 1 commit into from
Mar 10, 2016

Conversation

Snuffleupagus
Copy link
Collaborator

Some PDF generators, e.g. Scribus PDF, use improper uniXXXX glyph names which breaks the glyph mapping. We can avoid this by converting them to "standard" glyph names instead.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1132849.
Fixes #6893.
Fixes #6894.

/cc @timvandermeij

@timvandermeij
Copy link
Contributor

Awesome work! I can confirm that this fixes both issues I filed. The text is also more crisp and more like how other PDF viewers render the files (what I see is best seen in the reference images on Linux from above). The original and the reduced test case from the Bugzilla issue already seem to be fixed for me with the current master (at least on Arch Linux when comparing PDF.js with Okular), but nevertheless it's good to include it as a reduced test case.

@brendandahl Could you please review this?

@Snuffleupagus
Copy link
Collaborator Author

The original and the reduced test case from the Bugzilla issue already seem to be fixed for me with the current master (on Arch Linux when comparing PDF.js with Okular), but nevertheless it's good to include it as a reduced test case.

For me there're slight differences with 1132849.pdf on Windows (with HWA on), which is why I included it.

render_diff


Although there's now a couple of somewhat similar test-cases, which might be unnecessary. The already existing issue6889.pdf could probably be replaced with issue6894.pdf in this patch, since those two appear to be using the exact same font (but the new one covers the font-weight issues as well).

However, I didn't want to remove any existing tests in this patch.

@Snuffleupagus
Copy link
Collaborator Author

I've update the patch slightly, to only attempt this conversion for the this.differences case. Since otherwise we just read from one of the standard encodings, which we know don't suffer from this issue.

/botio test

@brendandahl
Copy link
Contributor

This is looking good. I was curious how other PDF viewers handle this and found https://pdfium.googlesource.com/pdfium/+/master/core/src/fxge/freetype/fx_freetype.cpp#56 . Seems like it would be a good idea to make a general function for getting the unicode from glyph name. There function handles names like 'u123456' and a few other cases. Now I wonder if we have any broken pdfs with names like the above.

@Snuffleupagus
Copy link
Collaborator Author

Seems like it would be a good idea to make a general function for getting the unicode from glyph name.

Sure, that sounds fine! The only one I can remember seeing in practice is uniXXXX, but I suppose it can't hurt to support the uXXXX/uXXXXXX case as well.

@brendandahl How closely do you want that function to mimic what the FreeType code does?
Should we enforce only upper-case hexadecimal chars, to prevent accidentally matching things that are not actually hex numbers?

@Snuffleupagus
Copy link
Collaborator Author

I've added a getUnicodeForGlyph helper function, similarly to the FreeType one, which is used in various places to do the glyphName -> Unicode conversion.
I kept the recoverGlyphName, which is using getUnicodeForGlyph internally, since we may need both the standard/non-standard glyph names when creating the charCodeToGlyphId map for TrueType fonts.

@brendandahl Is this what you had in mind?

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Mar 9, 2016

From: Bot.io (Linux)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/b0e4ee870623fb9/output.txt

hexStr = name.substr(3);
} else if (nameLen >= 5 && nameLen <= 7) { // 'uXXXX{XX}'
hexStr = name.substr(1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit easier to follow if we add an else and return -1 early here.

@brendandahl
Copy link
Contributor

@brendandahl Is this what you had in mind?

Yes, looks very good. Just the minor nits above.

…odeToGlyphId` map for TrueType fonts (bug 1132849, issue 6893, issue 6894)

This patch adds a `getUnicodeForGlyph` helper function, which is used to recover Unicode values for non-standard glyph names.

Some PDF generators, e.g. Scribus PDF, use improper `uniXXXX` glyph names which breaks the glyph mapping. We can avoid this by converting them to "standard" glyph names instead.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1132849.
Fixes 6893.
Fixes 6894.
@Snuffleupagus
Copy link
Collaborator Author

Yes, looks very good. Just the minor nits above.

Thanks for the review, I've addressed the comments.

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Mar 9, 2016

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/635be36d4835bb1/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Mar 9, 2016

From: Bot.io (Linux)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/29f3c11931563c4/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Mar 9, 2016

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/635be36d4835bb1/output.txt

Total script time: 20.43 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Mar 9, 2016

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/29f3c11931563c4/output.txt

Total script time: 22.14 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/29f3c11931563c4/reftest-analyzer.html#web=eq.log

@brendandahl
Copy link
Contributor

r+
/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @brendandahl received. Current queue size: 0

Live output at: http://107.22.172.223:8877/c58317f2e65a163/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_makeref from @brendandahl received. Current queue size: 0

Live output at: http://107.21.233.14:8877/258818beaeb1a03/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/c58317f2e65a163/output.txt

Total script time: 20.21 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/258818beaeb1a03/output.txt

Total script time: 23.52 mins

  • Lint: Passed
  • Make references: FAILED

@brendandahl
Copy link
Contributor

/botio-linux makeref

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_makeref from @brendandahl received. Current queue size: 0

Live output at: http://107.21.233.14:8877/a88dee7cbd7205b/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/a88dee7cbd7205b/output.txt

Total script time: 23.29 mins

  • Lint: Passed
  • Make references: FAILED

@timvandermeij
Copy link
Contributor

/botio-linux makeref

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/56900e861a1f844/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/56900e861a1f844/output.txt

Total script time: 21.79 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

timvandermeij added a commit that referenced this pull request Mar 10, 2016
Convert `uniXXXX` glyph names to proper ones when building the `charCodeToGlyphId` map for TrueType fonts (bug 1132849, issue 6893, issue 6894)
@timvandermeij timvandermeij merged commit 4784863 into mozilla:master Mar 10, 2016
@timvandermeij
Copy link
Contributor

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants