-
Notifications
You must be signed in to change notification settings - Fork 682
use utf32 #271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use utf32 #271
Conversation
| @@ -20,6 +20,7 @@ def tounicode(s, encoding='utf-8'): | |||
|
|
|||
| fontdata = open (sys.argv[1], 'rb').read () | |||
| text = tounicode(sys.argv[2]) | |||
| codepoints = list(map(ord, text)) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on narrow Python 2 builds, ord() will fail if input text contains unicode characters beyond 0xFFFF.
The fontTools.misc.py23 has a workaround for that.
|
Actually I'm not even sure this is a good idea. The encode(utf-8) approach is A LOT more efficient than creating a Pythonic list of integers and then passing that to HarfBuzz. I don't see the benefit of this. Plus, what @anthrotype said. |
|
if i remember right the reason i said to change this to ord() is because On Wed, Jun 29, 2016 at 12:12 AM, Behdad Esfahbod notifications@github.com
|
Then let's debug that. Tell us your original problem, not your proposed solution. UTF-8 approach definitely works just fine for non-ASCII for me. What was your problem?
As was pointed out by @anthrotype, that's wrong on some builds of Python and causes hard-to-debug bugs. |
|
everything about this is on this mailing list thread: https://lists.freedesktop.org/archives/harfbuzz/2016-June/005633.html UTF-8 doesn’t work because the cluster values don’t line up with the character indexes any more. ā causes the clusters to jump by 2 when it should only count for 1.
|
|
By the way, sys.argv is a list of Unicode strings on Python3, but on Python2 it's a list of bytes strings encoded using the console's encoding, sys.stdin.encoding, which may not be UTF-8, especially on non-Unix. |
Ok I see. In that case the correct way to do this is to use utf16 or utf32 based on whether the python build is narrow or wide. @anthrotype or @khaledhosny can you put that together? I don't have a narrow build to test. We should still have the utf-8 code in the sample as well. |
|
utf16 just makes the bug harder to catch, it has the same problem because On Wed, Jun 29, 2016 at 3:03 PM, Behdad Esfahbod notifications@github.com
|
No. If the python build is narrow, then you want the index to jump. |
|
and if the build is not narrow? On Wed, Jun 29, 2016 at 3:37 PM, Behdad Esfahbod notifications@github.com
|
Then you want the utf-32 option. |
|
This way the clusters will match Python string indices. |
|
except utf-32 isn’t working
|
You need to tell us a bit more than "isn't working". |
|
it just isn’t working idk what to tell you it makes harfbuzz spit out a bunch of “0” glyphs. It looks like the output pattern in the comment before yours with 3 zeros for every actual glyph |
|
Fixed properly. |
|
Has anybody actually looked at the result before and after this change? Somehow gid3 (space) is added at the cluster zero. |
No description provided.