Creating a uni-glyph font fails at high UTF-8 codepoint above `0x7F00` #3322

tombh · 2018-07-27T13:26:02Z

Using libfontforge 20170805 from Arch AUR.

Here's a snippet of the code I'm using:

for i in range(0x0000, 0x7F00):
      if i == codepoint: continue
      glyph = blocks.createChar(i)
      glyph.width = 600
      glyph.addReference(block)

Above roughly 0x7F00 I get this error: Internal Error: Attempt to output 81854 into a 16-bit field. It will be truncated and the file may not be useful.

If I stop the loop as in the example, then the final font is actually usable, but there are just a lot of missing Glyphs, like for Czech and a lot of Chinese.

Here's a link to the full file (60 lines).

The text was updated successfully, but these errors were encountered:

ctrlcctrlv · 2018-07-27T13:34:20Z

Probably there is a type issue somewhere. The arguments to createChar are 16-bit while the underlying data is larger. Issue in python.c more than likely.

tombh · 2018-07-27T13:39:00Z

Thanks. So not an implementation error on my part? If I want it fixed I should look into it as a bug, potentially in Python's codebase? Or is there another more idiomatic way to create a font made entirely of the same glyph?

JoesCat · 2018-07-29T05:31:24Z

I did not attempt to run the code, but there appears to be a minimum of 4 errors here.

def generate() calls blocks.generate() - this seem to suggest a nesting function call.
You might want to try def gen_block() to differentiate between your function and fontforge.generate()
I'm puzzled where def generate() gets the value codepoint from. Might be worth printing codepoint before starting the for loop to verify
If you plan to use a for loop with a range of {0...0x7eff}, you cannot expect to have a value 0x9fcf as it is outside the range {0..0x7fe}.
The decimal value 81854 (0x13fbe) is outside the hex range {0..0x7eff}, which indicates there is a problem with the code.

ctrlcctrlv · 2018-07-29T12:11:20Z

@JoesCat Sorry, you're definitely wrong about (1). Python is not going to confuse fontforge.generate() and his local generate(), because he did not put fontforge.generate() into his local scope (as from fontforge import *, or generate)

And about (3) and (4), obviously OP @tombh knows that, it seems like he purposely made the range end at 0x7EFF to avoid the error.

JoesCat · 2018-07-29T20:57:32Z

@ctrlcctrlv - thanks for checking. Verified.
Tried running the code ...At first, I thought it was a case of characters out of range since you are limited to a 16bit offset, but I see this is a problem with undefined characters.

    for i in range(0x20, 0x10000):

the above will run from start to finish (ran latest HEAD code as 32bit), but I expect there are possibly some problem characters within this set that aren't tested, for example, there is no such characters as UTF8 0xfffe or 0xffff.

This is a quick visual showing what exists and what does not (Unicode/utype.c):

const uint32 ff_unicode_codepointassigned[] = {
  /* 32 unicode.org characters represented for each data value in array */
  0x00000000, 0xffffffff, 0xffffffff, 0x7fffffff, 0x00000000, 0xffffffff, 0xffffffff, 0xffffffff,	/* 0x0000 */
  0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
  0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
  0xffffffff, 0xffffffff, 0xffffffff, 0xfcffffff, 0xffffd7f0, 0xfffffffb, 0xffffffff, 0xffffffff,

JoesCat · 2018-07-29T21:06:42Z

script highlights one or more bugs, (1) generating info for non-existing chars, (2) attempting to build ttf file with undefined 'unicodefull' or 'iso10646' chars.

tombh · 2018-08-04T06:51:43Z

You got it to run!

I just tried on latest HEAD too, but I still get the same error and the fonts are unusable. But I didn't set any 32bit flags for compile. How do I do that?

JoesCat · 2018-08-04T16:55:53Z

Seems like we need to add an additional note here for 'int' assumptions as another problem.
At this time, I'm currently going through Unicode/* code, but this problem may be in fontforge/*
I believe this code base started when 16bit was popular, and 32bit was becoming main. There's plenty of code showing differentiation between char/short/int in the older base, but more recent/modern code seems to take int and 32bit as interchangeable (it's not fontforge). Now that 64bit computers are mainstream, we run into more 'int' issues between 32/64bit. This sort of problem could be resolved by scrubbing data...example instead of y=x>>8, we do stuff like y=(x>>8)&0xff
This problem could be in fontforge or even in python, but since python has a bigger audience, its more likely whatever problems there got caught already, so the problem is more likely in fontforge.

To test the loop you can get to run your code using a 32bit version of linux. I'm currently using mageia6,32bit, but for you to avoid an install, you could probably try a 32bit live distro. This should work but mageia5 is just in the process of being phased-out.... https://www.mageia.org/en/5/

Before you start struggling with installing 32bit, you first need to review your python code, and make it so 'your' python for loop 'continue's and avoids adding glyphs to non-existant characters. In your python loop you need to add an additional if statement to test to see if the character exists. If the character does not exist, then skip adding character data.

for i in range(0x0000, 0x7F00):
      if i == codepoint: continue
      if character does not exist: continue
      glyph = blocks.createChar(i)
      glyph.width = 600
      glyph.addReference(block)

To test out your code, you could loop 'for i in range 1...0x10000 and verify that you do not add character data like createChar() for chars 0...0x1f as well as other following locations.

At the moment, I haven't had a chance to review the fontforge python instructions in detail (fontforge/python.c), but there should be an instruction to let you check to see if a unicode character exists. Note: let fontforge check it's internal table to avoid conflicts between what other tables may have vs what fontforge understands internally.

tombh mentioned this issue Jul 27, 2018

Fix bug in webext/contrib/font_maker.py browsh-org/browsh#75

Open

JoesCat added the someday-maybe label Jul 29, 2018

tombh mentioned this issue Aug 10, 2018

Missing Chinese characters browsh-org/browsh#175

Closed

RickoNoNo3 mentioned this issue May 30, 2022

Issues with displaying Korean browsh-org/browsh#366

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating a uni-glyph font fails at high UTF-8 codepoint above `0x7F00` #3322

Creating a uni-glyph font fails at high UTF-8 codepoint above `0x7F00` #3322

tombh commented Jul 27, 2018

ctrlcctrlv commented Jul 27, 2018

tombh commented Jul 27, 2018

JoesCat commented Jul 29, 2018

ctrlcctrlv commented Jul 29, 2018

JoesCat commented Jul 29, 2018

JoesCat commented Jul 29, 2018

tombh commented Aug 4, 2018

JoesCat commented Aug 4, 2018

Creating a uni-glyph font fails at high UTF-8 codepoint above 0x7F00 #3322

Creating a uni-glyph font fails at high UTF-8 codepoint above 0x7F00 #3322

Comments

tombh commented Jul 27, 2018

ctrlcctrlv commented Jul 27, 2018

tombh commented Jul 27, 2018

JoesCat commented Jul 29, 2018

ctrlcctrlv commented Jul 29, 2018

JoesCat commented Jul 29, 2018

JoesCat commented Jul 29, 2018

tombh commented Aug 4, 2018

JoesCat commented Aug 4, 2018

Creating a uni-glyph font fails at high UTF-8 codepoint above `0x7F00` #3322

Creating a uni-glyph font fails at high UTF-8 codepoint above `0x7F00` #3322