New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lepcha font should NOT use script tag 'lepc' #3
Comments
/cc @kmansourMT |
Applies to Kharoshti as well, perhaps a dozen or so scripts, maybe more. |
@waksmonskiMT, @kmansourMT, what is the thorough list of Noto fonts that should use 'DFLT' but not yet? |
Xiangye, Kamal From: Xiangye Xiao <notifications@github.commailto:notifications@github.com> @waksmonskiMThttps://github.com/waksmonskiMT, @kmansourMThttps://github.com/kmansourMT, what is the thorough list of Noto fonts that should use 'DFLT' but not yet? — |
Hi Kamal, Unfortunately this is problematic. Because, at the font design time, as you said, there was no script-shaper for this script, so anything put there under the real script tag is speculative and untested. If you had left that out, the font would work consistently using the DFLT script into the future, but with the dual approach, fonts break now that we do have a script shaper for those. At least, the fonts are completely untested and as such unreliable now. I believe we need versions of these files either without the script tag, or to test and fix those under Universal Shaping Engine. Personally I think you would want to do the former for phase 2, and the latter for phase 3. |
adding Priority-Critical to make it consistent with notofonts/noto-fonts#543 |
@kamal, BTW, can you share test strings you used while developing/testing Noto Sans Lepcha? |
My limited tests with a few strings run identically with or without 'lepc' in GPOS (as expected). |
@kmansourMT : where are text files? All I got are two png files in your comment. If you change the file extension to '.txt' before attaching, github will make a link to the text file automatically. Anyway, I ran a bit more extensive (still limited) test based on what the TUS says about Lepcha. test2.txt And below is the result by harfbuzz trunk [1] with lepc removed from GPOS (only keeping DFLT in GPOS). The result is identical to the one generated with the current version (both lepc and DFLT in GPOS) . This does not mean that the shaping is correct. Especially, U+1C36 position seems to be off (or maybe not). In addition, a sample text (which is NOT per the Unicode encoding model) from #2 is also shaped identically with or without lepc in GPOS. [1] It uses USE if lepc is present and uses default shaping engine if lepc is absent. |
It seems Github munged the text files. I’ve attached them again here in zip format. From: jungshik <notifications@github.commailto:notifications@github.com> @kmansourMThttps://github.com/kmansourMT : where are text files? All I got are two png files in your comment. If you change the file extension to '.txt' before attaching, github will make a link to the text file automatically. Anyway, I ran a bit more extensive (still limited) test based on what the TUS says about Lepcha. test2.txthttps://github.com/googlei18n/noto-fonts/files/28454/test2.txt And below is the result by harfbuzz trunk [1] with lepc removed from GPOS (only keeping DFLT in GPOS). The result is identical to the one generated with the current version (both lepc and DFLT in GPOS) [test2 new]https://cloud.githubusercontent.com/assets/8578343/10983174/fecab046-83c5-11e5-925e-1c7cf3b25f8c.png In addition, a sample text (which is NOT per the Unicode encoding model) from #395https://github.com/googlei18n/noto-fonts/issues/395 is also shaped identically with or without lepc in GPOS. bug395.txthttps://github.com/googlei18n/noto-fonts/files/28463/bug395.txt [1] It uses USE if lepc is present and uses default shaping engine if lepc is absent. — |
@kmansourMT , please reply directly in github instead of replying by email. I didn't get a zip file either. At github.com bug tracker, text attachment works well as shown in my previous comment. |
The syllable structure is C(·)(R)(Y)(V)(^)(F) according to http://www.unicode.org/L2/L2005/05158-n2947-lepcha.pdf (from #2) BTW, the above syllable structure is different from what's in the Unicode 8.0/9.0, according to which U+1C36 (RAN ; syllable modifier) is at the end instead of right after 'V'. I can build a comprehensive set of syllables based on that document. Actually, I'll make both (the above doc and the current Unicode). |
I generated all the possible "syllables" and compared their shaping with and without lepc in GPOS. There's no difference in all the cases. In addition to #2, other issues were found. For instance, U+1C29 collides with final consonants and ran. See the screenshot below. Anyway, dropping lepc from GPOS is shaping-neutral and I'll go ahead with that. |
U+1C34/LEPCHA CONSONANT SIGN NYIN-DO only occurs with inherent a (no vowel sign) U+1C35/LEPCHA CONSONANT SIGN KANG only occurs with vowel signs U+1C36/LEPCHA SIGN RAN only occurs with inherent (no vowel sign) or U+1C27/LEPCHA VOWEL SIGN I U+1C37/LEPCHA SIGN NUKTA only occurs with U+1C001C25, U+11C031C25, U+1C1D1C25, U+1C001C251C24, U+11C031C251C24 or U+1C1D1C251C24 |
I used the above two rules. If my list of 'syllables' have instances violating the above rules, it's a bug in my generator.
The current version of Unicode (8.0) has the following:
That is, in addition to the inherent vowel (no vowel sign) and U+1C27 (vowel sign I), the TUS 8.0 says that it can also occur with U+1C26 (vowel sign AA). I did notice that TUS 8.0 differs from the original proposal you cited in #2 (http://www.unicode.org/L2/L2005/05158-n2947-lepcha.pdf which agrees with you), but went ahead with the TUS 8.0. So, which is correct? Can RAN be used with U+1C26 (vowel sign AA) or not?
Thanks for the reminder. I read the section on retroflex consonants and U+1C37 (Nukta) being only used with ka, ga, and ha, but forgot to add that constraint. I'll add it. |
Can U+1C37 (NUKTA) be used between U+1C0[03] / U+1C1D and U+1C24 (without U+1C25)? Thank you, |
@serkhang |
Assuming the answers to 1 and 2 are "No" (invalid) and "No" (cannot), I regenerated the syllable list (I also fixed a serious bug in my code; I was comparing unichr with integer). The list got much shorter. |
GPOS for DFLT is identical to that for lepc and there's no need to keep them both. GSUB has only 'DFLT' and is left alone. All the possible syllables were generatd and shaped with and without 'lepc' in GPOS. No differene was observed for any of them. (see #451 and #395) Will fix #451
@jungshik: U+1C37/NUKTA only appears in the mentioned combinations. U+1C001C371C24 is NOT valid. U+1C4D, U+1C4E and U+1C4F are never combined with SUBJOINED YA or SUBJOINED RA. The attached file may be of help to you (though it's rather bulky). |
Re: RAN (U+1C36) and U+1C26 (LEPCHA VOWEL SIGN AA) The proposal ( http://www.unicode.org/L2/L2005/05158-n2947-lepcha.pdf ) has this
TUS 8.0 has
It's not clear what the proposal wants to say. Is it listing three cases, 1. the inherent vowel ( 'a') 2. '-a' (a typo of '-aa' ) 3. '-i'? Did it mean two cases, 1. the inherent vowel ( i.e. '-a') 2. '-i' It appears that it meant the latter ( the inherent vowel or '-i'). In that case, the TUS needs to be revised. Has this issue been reported to the UTC? |
BTW, thank you for answering my questions. What I assumed in https://github.com/googlei18n/noto-fonts/issues/451#issuecomment-155614260 was fortunately correct. As for U+1C36 (RAN) and U+1C26 (vowel 'aa'), I've just updated the list to exclude those combinations. It has 1,378 fewer entries. The new file has 13,848 entries. |
Thanks for the file. Does it list all the possible 'syllables' in Lepcha in all languages written in Lepcha across regions ? I assumed that two medials (U+1C25 (RA) and U+1C24 (YA)) and their combination (rya) can come after any consonants (when nukta is not present). If your PDF is the full list, it appears that only a subset of initial consonants can combine with them.
The above is applied when NUKTA is absent. When NUPKA is present, the rules are already given before.
|
Thank you for the reply. This has to be reported to the UTC. Are you going to? |
The same set of syllables as in @serkhang's PDF. I applied the constraints outlined in https://github.com/googlei18n/noto-fonts/issues/451#issuecomment-155923554 syllables.list_with_cp.unicode.reduced.txt There are total 7,560 of them ( 84 sets of 90) |
Lazong_Rong.pdf contains the full set of syllables according to the traditional Lazóng syllabaries of the Lepcha in use across the whole area of settlement of the community (Sikkim, West Bengal, Nepal, Bhutan). For more details, see http://www.openbookpublishers.com/htmlreader/978-1-78374-062-8/3.Plaisier.xhtml#_idTextAnchor032 http://aachulay.blogspot.ch/2010/09/secret-and-concealed-lazaong-book-of.html @jungshik: Launched an error report for the UTC, today. |
Thanks. My list should match those listed in the file.
Thanks. I'll ping @roozbehp once more about them. |
moved from https://github.com/googlei18n/noto-alpha/issues/8
Imported from Google Code issue notofonts/noto-fonts#8 created by behdad@google.com on 2013-11-12T20:47:13.000Z:
For Lepcha, our recommendation is to NOT use the 'lepc' script tag if the font does not expect reordering to happen at the shaper level. So, I suggest removing it. HarfBuzz reorders left vowel marks for the 'lepc' script but not for 'DFLT'. So right now we may get wrong results if the font doesn't expect reordering. I'm interested in seeing a lean version without the pre-composed set, designed to work with a reordering shaper.
The text was updated successfully, but these errors were encountered: