Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move char codes from high surrogate pair range into private use. #9062

Merged
merged 1 commit into from
Dec 8, 2017

Conversation

brendandahl
Copy link
Contributor

Fixes #2884

In the test PDF, the glyph that draws '4' has a unicode value of 0xD860 (which is not a valid unicode value). This value seems to have issues with Firefox and Chrome rendering it in that position.

I was hoping to stop playing whack-a-mole with the problematic unicode values and instead come up with a white list, but this is going to need more investigation. Stepping through Firefox's fillText canvas command I couldn't find an obvious spot where Firefox decided if it will draw a glyph at a certain char code. I'm thinking it's partially Firefox and the various font backends, but I'm not sure.

As another way to try and come up with a white list, I wrote a test script that uses a font that maps all char codes to one glyph and then attempted to draw all char codes from 0 to 0xFFFF. Unfortunately, Firefox and Chrome have different behavior. Firefox seems to have issues with lots of ranges, while Chrome doesn't have any issues. For Chrome, what's really strange is 0xD860 seems to work fine with the test script, but it also fails to render the attached PDF with pdf.js, so what Chrome draws may be dependent on font type.

@brendandahl
Copy link
Contributor Author

Looks like I need to fix some unit tests, but in the mean time...

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @brendandahl received. Current queue size: 0

Live output at: http://54.215.176.217:8877/ad1fb46a42805ed/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @brendandahl received. Current queue size: 0

Live output at: http://54.67.70.0:8877/7f7d10abb9ece3c/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/7f7d10abb9ece3c/output.txt

Total script time: 16.88 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/ad1fb46a42805ed/output.txt

Total script time: 22.92 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Regression tests: Passed

@Snuffleupagus
Copy link
Collaborator

I was hoping to stop playing whack-a-mole with the problematic unicode values and instead come up with a white list, but this is going to need more investigation.

@brendandahl Do you want to fix the test failures, such that we can land this PR as a temporary solution?
Or should we close this for now, and wait for a more complete solution instead (one based on the findings in http://brendandahl.github.io/sandbox/find_matching_glyphs/)?

@brendandahl
Copy link
Contributor Author

@Snuffleupagus I'll fix failures as a temporary solution. I started working on the full solution, but when I start moving more glyphs to the private use area it causes more problems. If a glyph has an id of 0 (usually notdef, but in some cases a valid glyph) and it gets moved the private use area it no longer will be drawn by the browser. What we really need to do is shift all the glyphs and insert a fake notdef, but that's a non-trivial amount of work.

@Snuffleupagus
Copy link
Collaborator

@brendandahl Ping :-)

Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, with one question; thanks for the patch!

@@ -83,7 +83,7 @@ var checkProblematicCharRanges = function checkProblematicCharRanges() {

describe('Fonts', function() {
it('checkProblematicCharRanges', function() {
var EXPECTED_PERCENTAGE = 45;
var EXPECTED_PERCENTAGE = 100;
Copy link
Collaborator

@Snuffleupagus Snuffleupagus Dec 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it strictly necessary to make this a full 100 percent, for the tests to pass with this patch?

The reason that I'm asking is that with this change, there's nothing stopping us from adding enough entries to ProblematicCharRanges that there'd be no room over for re-mapping glyphs that would otherwise end up at a previously mapped location. Basically, wouldn't we always hit

pdf.js/src/core/fonts.js

Lines 787 to 790 in 518da6c

if (nextAvailableFontCharCode > PRIVATE_USE_OFFSET_END) {
warn('Ran out of space in font private use area.');
break;
}
in that case, where as if we keep the value lower than 100 this test-case would ensure that there's atleast some space left in the PUA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's now at 70%. I could lower it, but it seems odd to have a test that we have to change every time the ranges change. We would only run out of space in the PUA if the font had tons of values without unicode values and lots of values with problematic unicodes (or a very large symbolic font).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's just do this for now!
Hopefully, we'd be able to (eventually) replace the current black-list approach with a white-list instead.

@Snuffleupagus
Copy link
Collaborator

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Dec 8, 2017

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/9e08853f71b5e53/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Dec 8, 2017

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/8a4ec69060f5efc/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Dec 8, 2017

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/8a4ec69060f5efc/output.txt

Total script time: 17.00 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Dec 8, 2017

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/9e08853f71b5e53/output.txt

Total script time: 23.30 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@Snuffleupagus
Copy link
Collaborator

/botio makeref

@pdfjsbot
Copy link

pdfjsbot commented Dec 8, 2017

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/6dcbfc3f0bf3ea6/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Dec 8, 2017

From: Bot.io (Windows)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/c11218fefcc69f2/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Dec 8, 2017

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/6dcbfc3f0bf3ea6/output.txt

Total script time: 15.74 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

pdfjsbot commented Dec 8, 2017

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/c11218fefcc69f2/output.txt

Total script time: 21.06 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@Snuffleupagus Snuffleupagus merged commit a5e3261 into master Dec 8, 2017
@Snuffleupagus Snuffleupagus deleted the no_high branch December 8, 2017 11:44
movsb pushed a commit to movsb/pdf.js that referenced this pull request Jul 14, 2018
Move char codes from high surrogate pair range into private use.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants