Issue with Unicode code points above 16bit max 65536 #314

halmos · 2017-11-29T16:27:42Z

Expected Behavior

When testing download of Microsoft Stix font, the resulting file is not validated by OTS-Sanitize and produces errors in the browser. The resulting font should load normally.

Current Behavior

Both OTS-Sanitize and chrome report : OTS parsing error: cmap: Out of order end range (54272 <= 65533).

It seems that the CMAP table creation function may have a problem with code points above 16bits / 65536. I'm not certain, but glyphs with unicode values over 65536 seem to be bit shifted to drop their first bit. For example, in the error above code point 54272 is listed as glyph id 2297 in the opentype.js web font inspector. However the correct code point is 119808. If you remove the first bit of 119808 it becomes 54272. I haven't been able to parse the table creation functions completely enough to understand how they are working, but I suspect their may be an issue with bitshifting 32 bit numbers, possibly something to do with the sign.

Steps to Reproduce (for bugs)

Get a copy of the Stix font from http://www.stixfonts.org/
Load the font with opentype.js
use the download method to output font back out.
try to load font in browers or test with OTS-sanitize.

Your Environment

Version used: 0.7.3
Font used: Stix
Browser Name and version: chrome 62.0.3202.94
Operating System and version (desktop or mobile): OS X 10.11.6

The text was updated successfully, but these errors were encountered:

brawer · 2017-11-29T16:46:57Z

By the way, rendering is also broken in OpenType.js for codepoints outside the Basic Multilingual Plane. This is one reason why OpenType.js fails quite many tests in the test suite; compare the test report for OpenType.js to that of fontkit.

halmos · 2017-11-30T15:48:12Z

I'm only just learning about the details of open type cmap table format 4 as used in opentype.js, but I think this function found in cmap.js could be the source of the 16bit unicode code point limitation:

function addTerminatorSegment(t) {
    t.segments.push({
        end: 0xffff,
        start: 0xffff,
        delta: 1,
        offset: 0
    });
}

offset is used to set the idRangeOffsets value, which is being hard-coded to 0 above. My understanding is that this value should offset the unicode code points, allowing it to exceed the 16 bit, USHORT upper value limit.

However, the spec is not super clear to me on this points (see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html or https://www.microsoft.com/typography/otspec/cmap.htm#customPlatform_OTF)

Can anyone confirm the correct calculation for the offset?
My understanding is that for unicode values over 65535 the offset should be 65535 (or more depending on the range). Alternatively, the spec states that the idDelta arithmetic is modulo 65536. so perhaps the idRangeOffsets should by calculated as modulo 65536 as well?

Jolg42 · 2017-11-30T15:58:32Z

I think the format 4 is limited by the 16bit limitation and the only way to put 32bit code-points is to use another format.
The format 12 is the best one for that normally. It's basically the same format than 4 but with a 32bit limit

halmos · 2017-12-01T16:29:17Z

I'm not clear on the spec, but If that is the case, then I think for current implementation with CMAP format 4, it would be better for opentype.js to not try to encode unicode code points outside of 16bit range since it produces broken font file.

Format 12 implementation would be very nice to have tho.

fdb · 2017-12-01T18:07:26Z

Yeah we should definitely support format 12.

However, I'm completely swamped at the moment...

Jolg42 · 2017-12-02T00:43:39Z

@halmos See my PR #315 for cmap format 12. Let me know if this works for you.

halmos · 2017-12-03T19:12:33Z

@Jolg42 thank you! that looks great. It looks like it will certainly solve my problem. If the pull request is merged I can test it immediately, otherwise let me know if you would like my to run a test on your branch and I'll go about reconfiguring m project to do that.

Jolg42 · 2017-12-03T20:30:25Z

@halmos If you could run a test on my branch it would be great 😉 (As there are no unit test on the cmap tables and that I didn't have time to run many tests.)

halmos · 2017-12-04T14:34:10Z

@Jolg42 - I've tested your format 12 implementation with the Stix font. It passes OTS-sanitize and loads in the browser with no problem. From my POV the pull request looks great. Great work! and thank you.

Jolg42 · 2017-12-22T12:10:28Z

Fixed by #315

NemoStein · 2024-04-17T21:35:44Z

opentype.js v1.3.4

Font creation still suffers from this issue.

Glyphs with unicode 0xfffe or lower works as intended
With unicode 0xffff it generates, but the glyph's actual unicode becames -1 (as seen in FontForge)
Unicodes 0x10000 and higher fails to generate a proper font file

Steps to reproduce:

const font = new Font({
  ascender: 1000,
  descender: 0,
  familyName: 'test',
  styleName: 'test',
  unitsPerEm: 1000,
  glyphs: [
    new Glyph({ name: '.notdef', advanceWidth: 0, unicode: -1 }),
    new Glyph({ name: '?', advanceWidth: 1000, unicode: 0x10000 })
  ]
})

font.download('./test.otf')

NemoStein · 2024-04-17T21:51:33Z

Just out of curiosity I tried cloning the repo and building myself...
Unicode 0xffff still behaves the same (reverting to -1), but higher values now works as intended.

Connum · 2024-04-23T13:03:39Z

Just out of curiosity I tried cloning the repo and building myself... Unicode 0xffff still behaves the same (reverting to -1), but higher values now works as intended.

Thank you for the report, I'll look into it!

Connum · 2024-04-24T22:04:57Z

After spending quite some time investigating this, I just found out that 0xFFFF (as well as U+10FFFF) as the highest possible 16-bit (or 32-bit) value is a non-character. So it shouldn't ever happen to be assigned to a glyph. By the way, loading a font with it assigned into the glyph inspector, it will actually display the code point, so fontforge showing -1 instead is probably a special treatment for those non-characters. As the codfe points above work as intended, I'll close this issue again.

Jolg42 mentioned this issue Dec 2, 2017

Make cmap format 12 if needed #315

Merged

Jolg42 closed this as completed Dec 22, 2017

Connum reopened this Apr 23, 2024

Connum added bug Needs Investigation writing support Anything related to writing support as opposed to parsing or rendering labels Apr 23, 2024

Connum added this to the Release 2.0.0 milestone Apr 23, 2024

Connum self-assigned this Apr 23, 2024

Connum closed this as completed Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Unicode code points above 16bit max 65536 #314

Issue with Unicode code points above 16bit max 65536 #314

halmos commented Nov 29, 2017

brawer commented Nov 29, 2017

halmos commented Nov 30, 2017 •

edited

Loading

Jolg42 commented Nov 30, 2017

halmos commented Dec 1, 2017 •

edited

Loading

fdb commented Dec 1, 2017

Jolg42 commented Dec 2, 2017 •

edited

Loading

halmos commented Dec 3, 2017

Jolg42 commented Dec 3, 2017

halmos commented Dec 4, 2017

Jolg42 commented Dec 22, 2017

NemoStein commented Apr 17, 2024

NemoStein commented Apr 17, 2024

Connum commented Apr 23, 2024

Connum commented Apr 24, 2024

Issue with Unicode code points above 16bit max 65536 #314

Issue with Unicode code points above 16bit max 65536 #314

Comments

halmos commented Nov 29, 2017

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Your Environment

brawer commented Nov 29, 2017

halmos commented Nov 30, 2017 • edited Loading

Jolg42 commented Nov 30, 2017

halmos commented Dec 1, 2017 • edited Loading

fdb commented Dec 1, 2017

Jolg42 commented Dec 2, 2017 • edited Loading

halmos commented Dec 3, 2017

Jolg42 commented Dec 3, 2017

halmos commented Dec 4, 2017

Jolg42 commented Dec 22, 2017

NemoStein commented Apr 17, 2024

NemoStein commented Apr 17, 2024

Connum commented Apr 23, 2024

Connum commented Apr 24, 2024

halmos commented Nov 30, 2017 •

edited

Loading

halmos commented Dec 1, 2017 •

edited

Loading

Jolg42 commented Dec 2, 2017 •

edited

Loading