Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a uni-glyph font fails at high UTF-8 codepoint above 0x7F00 #3322

Open
tombh opened this issue Jul 27, 2018 · 8 comments
Open

Creating a uni-glyph font fails at high UTF-8 codepoint above 0x7F00 #3322

tombh opened this issue Jul 27, 2018 · 8 comments

Comments

@tombh
Copy link

tombh commented Jul 27, 2018

Using libfontforge 20170805 from Arch AUR.

Here's a snippet of the code I'm using:

for i in range(0x0000, 0x7F00):
      if i == codepoint: continue
      glyph = blocks.createChar(i)
      glyph.width = 600
      glyph.addReference(block)

Above roughly 0x7F00 I get this error: Internal Error: Attempt to output 81854 into a 16-bit field. It will be truncated and the file may not be useful.

If I stop the loop as in the example, then the final font is actually usable, but there are just a lot of missing Glyphs, like for Czech and a lot of Chinese.

Here's a link to the full file (60 lines).

@ctrlcctrlv
Copy link
Member

Probably there is a type issue somewhere. The arguments to createChar are 16-bit while the underlying data is larger. Issue in python.c more than likely.

@tombh
Copy link
Author

tombh commented Jul 27, 2018

Thanks. So not an implementation error on my part? If I want it fixed I should look into it as a bug, potentially in Python's codebase? Or is there another more idiomatic way to create a font made entirely of the same glyph?

@JoesCat
Copy link
Contributor

JoesCat commented Jul 29, 2018

I did not attempt to run the code, but there appears to be a minimum of 4 errors here.

  1. def generate() calls blocks.generate() - this seem to suggest a nesting function call.
    You might want to try def gen_block() to differentiate between your function and fontforge.generate()
  2. I'm puzzled where def generate() gets the value codepoint from. Might be worth printing codepoint before starting the for loop to verify
  3. If you plan to use a for loop with a range of {0...0x7eff}, you cannot expect to have a value 0x9fcf as it is outside the range {0..0x7fe}.
  4. The decimal value 81854 (0x13fbe) is outside the hex range {0..0x7eff}, which indicates there is a problem with the code.

@ctrlcctrlv
Copy link
Member

@JoesCat Sorry, you're definitely wrong about (1). Python is not going to confuse fontforge.generate() and his local generate(), because he did not put fontforge.generate() into his local scope (as from fontforge import *, or generate)

And about (3) and (4), obviously OP @tombh knows that, it seems like he purposely made the range end at 0x7EFF to avoid the error.

@JoesCat
Copy link
Contributor

JoesCat commented Jul 29, 2018

@ctrlcctrlv - thanks for checking. Verified.
Tried running the code ...At first, I thought it was a case of characters out of range since you are limited to a 16bit offset, but I see this is a problem with undefined characters.

    for i in range(0x20, 0x10000):

the above will run from start to finish (ran latest HEAD code as 32bit), but I expect there are possibly some problem characters within this set that aren't tested, for example, there is no such characters as UTF8 0xfffe or 0xffff.

This is a quick visual showing what exists and what does not (Unicode/utype.c):

const uint32 ff_unicode_codepointassigned[] = {
  /* 32 unicode.org characters represented for each data value in array */
  0x00000000, 0xffffffff, 0xffffffff, 0x7fffffff, 0x00000000, 0xffffffff, 0xffffffff, 0xffffffff,	/* 0x0000 */
  0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
  0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
  0xffffffff, 0xffffffff, 0xffffffff, 0xfcffffff, 0xffffd7f0, 0xfffffffb, 0xffffffff, 0xffffffff,

@JoesCat
Copy link
Contributor

JoesCat commented Jul 29, 2018

script highlights one or more bugs, (1) generating info for non-existing chars, (2) attempting to build ttf file with undefined 'unicodefull' or 'iso10646' chars.

@tombh
Copy link
Author

tombh commented Aug 4, 2018

You got it to run!

I just tried on latest HEAD too, but I still get the same error and the fonts are unusable. But I didn't set any 32bit flags for compile. How do I do that?

@JoesCat
Copy link
Contributor

JoesCat commented Aug 4, 2018

Seems like we need to add an additional note here for 'int' assumptions as another problem.
At this time, I'm currently going through Unicode/* code, but this problem may be in fontforge/*
I believe this code base started when 16bit was popular, and 32bit was becoming main. There's plenty of code showing differentiation between char/short/int in the older base, but more recent/modern code seems to take int and 32bit as interchangeable (it's not fontforge). Now that 64bit computers are mainstream, we run into more 'int' issues between 32/64bit. This sort of problem could be resolved by scrubbing data...example instead of y=x>>8, we do stuff like y=(x>>8)&0xff
This problem could be in fontforge or even in python, but since python has a bigger audience, its more likely whatever problems there got caught already, so the problem is more likely in fontforge.

To test the loop you can get to run your code using a 32bit version of linux. I'm currently using mageia6,32bit, but for you to avoid an install, you could probably try a 32bit live distro. This should work but mageia5 is just in the process of being phased-out.... https://www.mageia.org/en/5/

Before you start struggling with installing 32bit, you first need to review your python code, and make it so 'your' python for loop 'continue's and avoids adding glyphs to non-existant characters. In your python loop you need to add an additional if statement to test to see if the character exists. If the character does not exist, then skip adding character data.

for i in range(0x0000, 0x7F00):
      if i == codepoint: continue
      if character does not exist: continue
      glyph = blocks.createChar(i)
      glyph.width = 600
      glyph.addReference(block)

To test out your code, you could loop 'for i in range 1...0x10000 and verify that you do not add character data like createChar() for chars 0...0x1f as well as other following locations.

At the moment, I haven't had a chance to review the fontforge python instructions in detail (fontforge/python.c), but there should be an instruction to let you check to see if a unicode character exists. Note: let fontforge check it's internal table to avoid conflicts between what other tables may have vs what fontforge understands internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants