Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Unicode code points above 16bit max 65536 #314

Closed
halmos opened this issue Nov 29, 2017 · 14 comments
Closed

Issue with Unicode code points above 16bit max 65536 #314

halmos opened this issue Nov 29, 2017 · 14 comments
Assignees
Labels
bug Needs Investigation writing support Anything related to writing support as opposed to parsing or rendering
Milestone

Comments

@halmos
Copy link

halmos commented Nov 29, 2017

Expected Behavior

When testing download of Microsoft Stix font, the resulting file is not validated by OTS-Sanitize and produces errors in the browser. The resulting font should load normally.

Current Behavior

Both OTS-Sanitize and chrome report : OTS parsing error: cmap: Out of order end range (54272 <= 65533).

It seems that the CMAP table creation function may have a problem with code points above 16bits / 65536. I'm not certain, but glyphs with unicode values over 65536 seem to be bit shifted to drop their first bit. For example, in the error above code point 54272 is listed as glyph id 2297 in the opentype.js web font inspector. However the correct code point is 119808. If you remove the first bit of 119808 it becomes 54272. I haven't been able to parse the table creation functions completely enough to understand how they are working, but I suspect their may be an issue with bitshifting 32 bit numbers, possibly something to do with the sign.

Steps to Reproduce (for bugs)

  1. Get a copy of the Stix font from http://www.stixfonts.org/
  2. Load the font with opentype.js
  3. use the download method to output font back out.
  4. try to load font in browers or test with OTS-sanitize.

Your Environment

  • Version used: 0.7.3
  • Font used: Stix
  • Browser Name and version: chrome 62.0.3202.94
  • Operating System and version (desktop or mobile): OS X 10.11.6
@brawer
Copy link
Collaborator

brawer commented Nov 29, 2017

By the way, rendering is also broken in OpenType.js for codepoints outside the Basic Multilingual Plane. This is one reason why OpenType.js fails quite many tests in the test suite; compare the test report for OpenType.js to that of fontkit.

@halmos
Copy link
Author

halmos commented Nov 30, 2017

I'm only just learning about the details of open type cmap table format 4 as used in opentype.js, but I think this function found in cmap.js could be the source of the 16bit unicode code point limitation:

function addTerminatorSegment(t) {
    t.segments.push({
        end: 0xffff,
        start: 0xffff,
        delta: 1,
        offset: 0
    });
}

offset is used to set the idRangeOffsets value, which is being hard-coded to 0 above. My understanding is that this value should offset the unicode code points, allowing it to exceed the 16 bit, USHORT upper value limit.

However, the spec is not super clear to me on this points (see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html or https://www.microsoft.com/typography/otspec/cmap.htm#customPlatform_OTF)

Can anyone confirm the correct calculation for the offset?
My understanding is that for unicode values over 65535 the offset should be 65535 (or more depending on the range). Alternatively, the spec states that the idDelta arithmetic is modulo 65536. so perhaps the idRangeOffsets should by calculated as modulo 65536 as well?

@Jolg42
Copy link
Member

Jolg42 commented Nov 30, 2017

I think the format 4 is limited by the 16bit limitation and the only way to put 32bit code-points is to use another format.
The format 12 is the best one for that normally. It's basically the same format than 4 but with a 32bit limit

@halmos
Copy link
Author

halmos commented Dec 1, 2017

I'm not clear on the spec, but If that is the case, then I think for current implementation with CMAP format 4, it would be better for opentype.js to not try to encode unicode code points outside of 16bit range since it produces broken font file.

Format 12 implementation would be very nice to have tho.

@fdb
Copy link
Contributor

fdb commented Dec 1, 2017

Yeah we should definitely support format 12.

However, I'm completely swamped at the moment...

@Jolg42
Copy link
Member

Jolg42 commented Dec 2, 2017

@halmos See my PR #315 for cmap format 12. Let me know if this works for you.

@halmos
Copy link
Author

halmos commented Dec 3, 2017

@Jolg42 thank you! that looks great. It looks like it will certainly solve my problem. If the pull request is merged I can test it immediately, otherwise let me know if you would like my to run a test on your branch and I'll go about reconfiguring m project to do that.

@Jolg42
Copy link
Member

Jolg42 commented Dec 3, 2017

@halmos If you could run a test on my branch it would be great 😉 (As there are no unit test on the cmap tables and that I didn't have time to run many tests.)

@halmos
Copy link
Author

halmos commented Dec 4, 2017

@Jolg42 - I've tested your format 12 implementation with the Stix font. It passes OTS-sanitize and loads in the browser with no problem. From my POV the pull request looks great. Great work! and thank you.

@Jolg42
Copy link
Member

Jolg42 commented Dec 22, 2017

Fixed by #315

@Jolg42 Jolg42 closed this as completed Dec 22, 2017
@NemoStein
Copy link

opentype.js v1.3.4

Font creation still suffers from this issue.

Glyphs with unicode 0xfffe or lower works as intended
With unicode 0xffff it generates, but the glyph's actual unicode becames -1 (as seen in FontForge)
Unicodes 0x10000 and higher fails to generate a proper font file

Steps to reproduce:

const font = new Font({
  ascender: 1000,
  descender: 0,
  familyName: 'test',
  styleName: 'test',
  unitsPerEm: 1000,
  glyphs: [
    new Glyph({ name: '.notdef', advanceWidth: 0, unicode: -1 }),
    new Glyph({ name: '?', advanceWidth: 1000, unicode: 0x10000 })
  ]
})

font.download('./test.otf')

@NemoStein
Copy link

Just out of curiosity I tried cloning the repo and building myself...
Unicode 0xffff still behaves the same (reverting to -1), but higher values now works as intended.

@Connum
Copy link
Contributor

Connum commented Apr 23, 2024

Just out of curiosity I tried cloning the repo and building myself... Unicode 0xffff still behaves the same (reverting to -1), but higher values now works as intended.

Thank you for the report, I'll look into it!

@Connum Connum reopened this Apr 23, 2024
@Connum Connum added bug Needs Investigation writing support Anything related to writing support as opposed to parsing or rendering labels Apr 23, 2024
@Connum Connum added this to the Release 2.0.0 milestone Apr 23, 2024
@Connum Connum self-assigned this Apr 23, 2024
@Connum
Copy link
Contributor

Connum commented Apr 24, 2024

After spending quite some time investigating this, I just found out that 0xFFFF (as well as U+10FFFF) as the highest possible 16-bit (or 32-bit) value is a non-character. So it shouldn't ever happen to be assigned to a glyph. By the way, loading a font with it assigned into the glyph inspector, it will actually display the code point, so fontforge showing -1 instead is probably a special treatment for those non-characters. As the codfe points above work as intended, I'll close this issue again.

@Connum Connum closed this as completed Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Needs Investigation writing support Anything related to writing support as opposed to parsing or rendering
Projects
None yet
Development

No branches or pull requests

6 participants