Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lepcha font should NOT use script tag 'lepc' #3

Closed
marekjez86 opened this issue Aug 4, 2015 · 28 comments · Fixed by notofonts/noto-fonts#558
Closed

Lepcha font should NOT use script tag 'lepc' #3

marekjez86 opened this issue Aug 4, 2015 · 28 comments · Fixed by notofonts/noto-fonts#558

Comments

@marekjez86
Copy link

moved from https://github.com/googlei18n/noto-alpha/issues/8

Imported from Google Code issue notofonts/noto-fonts#8 created by behdad@google.com on 2013-11-12T20:47:13.000Z:

For Lepcha, our recommendation is to NOT use the 'lepc' script tag if the font does not expect reordering to happen at the shaper level. So, I suggest removing it. HarfBuzz reorders left vowel marks for the 'lepc' script but not for 'DFLT'. So right now we may get wrong results if the font doesn't expect reordering. I'm interested in seeing a lean version without the pre-composed set, designed to work with a reordering shaper.

@jungshik
Copy link

jungshik commented Aug 4, 2015

/cc @kmansourMT

@behdad
Copy link

behdad commented Aug 7, 2015

Applies to Kharoshti as well, perhaps a dozen or so scripts, maybe more.

@xiangyexiao
Copy link

@waksmonskiMT, @kmansourMT, what is the thorough list of Noto fonts that should use 'DFLT' but not yet?

@kmansourMT
Copy link

Xiangye,
We followed a certain logic in determining which script tag to use. If a script was not yet supported by existing OT interpreters, we then examined whether it required any preprocessing such as reordering. If it did, then we would provide this preprocessing under the “default” script tag, in addition to all other OT features deemed necessary. In such cases, we also usually provided the OT features code, except for the preprocessing, under the real script tag. This dual approach allows the OpenType interpreter to render the script properly through either the real script path, or through the default script.

Kamal

From: Xiangye Xiao <notifications@github.commailto:notifications@github.com>
Reply-To: googlei18n/noto-fonts <reply@reply.github.commailto:reply@reply.github.com>
Date: Saturday, 17 October 2015 at 09:34
To: googlei18n/noto-fonts <noto-fonts@noreply.github.commailto:noto-fonts@noreply.github.com>
Cc: Kamal Mansour <kamal.mansour@monotype.commailto:kamal.mansour@monotype.com>
Subject: Re: [noto-fonts] Lepcha font should NOT use script tag 'lepc' (#451)

@waksmonskiMThttps://github.com/waksmonskiMT, @kmansourMThttps://github.com/kmansourMT, what is the thorough list of Noto fonts that should use 'DFLT' but not yet?


Reply to this email directly or view it on GitHubhttps://github.com/googlei18n/noto-fonts/issues/451#issuecomment-148930308.

@behdad
Copy link

behdad commented Oct 21, 2015

We followed a certain logic in determining which script tag to use. If a script was not yet supported by existing OT interpreters, we then examined whether it required any preprocessing such as reordering. If it did, then we would provide this preprocessing under the “default” script tag, in addition to all other OT features deemed necessary. In such cases, we also usually provided the OT features code, except for the preprocessing, under the real script tag. This dual approach allows the OpenType interpreter to render the script properly through either the real script path, or through the default script.

Hi Kamal,

Unfortunately this is problematic. Because, at the font design time, as you said, there was no script-shaper for this script, so anything put there under the real script tag is speculative and untested. If you had left that out, the font would work consistently using the DFLT script into the future, but with the dual approach, fonts break now that we do have a script shaper for those. At least, the fonts are completely untested and as such unreliable now.

I believe we need versions of these files either without the script tag, or to test and fix those under Universal Shaping Engine. Personally I think you would want to do the former for phase 2, and the latter for phase 3.

@marekjez86
Copy link
Author

adding Priority-Critical to make it consistent with notofonts/noto-fonts#543

@jungshik
Copy link

jungshik commented Nov 4, 2015

@kamal,
As noted by behdad@ in notofonts/noto-fonts#543, NotoSansLepcha has only DFLT in GSUB. GPOS has DFLT and lepc but their feature lists are identical. So, just dropping 'lepc' from GPOS should work.

BTW, can you share test strings you used while developing/testing Noto Sans Lepcha?

@jungshik
Copy link

jungshik commented Nov 4, 2015

My limited tests with a few strings run identically with or without 'lepc' in GPOS (as expected).

@kmansourMT
Copy link

Here are two test files, along with their respective output. These date back to June 2014.

sample text lepcha syllables -wtle

sample text lepcha-wtle

@jungshik
Copy link

jungshik commented Nov 5, 2015

@kmansourMT : where are text files? All I got are two png files in your comment. If you change the file extension to '.txt' before attaching, github will make a link to the text file automatically.

Anyway, I ran a bit more extensive (still limited) test based on what the TUS says about Lepcha.

test2.txt
(some of syllables are not valid, I believe. I put them in on purpose).

And below is the result by harfbuzz trunk [1] with lepc removed from GPOS (only keeping DFLT in GPOS). The result is identical to the one generated with the current version (both lepc and DFLT in GPOS) . This does not mean that the shaping is correct. Especially, U+1C36 position seems to be off (or maybe not).

test2 new

In addition, a sample text (which is NOT per the Unicode encoding model) from #2 is also shaped identically with or without lepc in GPOS.

bug395.txt
bug395 new

[1] It uses USE if lepc is present and uses default shaping engine if lepc is absent.

@kmansourMT
Copy link

It seems Github munged the text files. I’ve attached them again here in zip format.

From: jungshik <notifications@github.commailto:notifications@github.com>
Reply-To: googlei18n/noto-fonts <reply@reply.github.commailto:reply@reply.github.com>
Date: Thursday, 5 November 2015 at 14:06
To: googlei18n/noto-fonts <noto-fonts@noreply.github.commailto:noto-fonts@noreply.github.com>
Cc: Kamal Mansour <kamal.mansour@monotype.commailto:kamal.mansour@monotype.com>
Subject: Re: [noto-fonts] Lepcha font should NOT use script tag 'lepc' (#451)

@kmansourMThttps://github.com/kmansourMT : where are text files? All I got are two png files in your comment. If you change the file extension to '.txt' before attaching, github will make a link to the text file automatically.

Anyway, I ran a bit more extensive (still limited) test based on what the TUS says about Lepcha.

test2.txthttps://github.com/googlei18n/noto-fonts/files/28454/test2.txt

And below is the result by harfbuzz trunk [1] with lepc removed from GPOS (only keeping DFLT in GPOS). The result is identical to the one generated with the current version (both lepc and DFLT in GPOS)

[test2 new]https://cloud.githubusercontent.com/assets/8578343/10983174/fecab046-83c5-11e5-925e-1c7cf3b25f8c.png

In addition, a sample text (which is NOT per the Unicode encoding model) from #395https://github.com/googlei18n/noto-fonts/issues/395 is also shaped identically with or without lepc in GPOS.

bug395.txthttps://github.com/googlei18n/noto-fonts/files/28463/bug395.txt
[bug395 new]https://cloud.githubusercontent.com/assets/8578343/10983252/50483c90-83c6-11e5-9bcb-bdfbb836c502.png

[1] It uses USE if lepc is present and uses default shaping engine if lepc is absent.


Reply to this email directly or view it on GitHubhttps://github.com/googlei18n/noto-fonts/issues/451#issuecomment-154210423.

@jungshik
Copy link

jungshik commented Nov 5, 2015

@kmansourMT , please reply directly in github instead of replying by email. I didn't get a zip file either. At github.com bug tracker, text attachment works well as shown in my previous comment.

@jungshik
Copy link

jungshik commented Nov 6, 2015

The syllable structure is C(·)(R)(Y)(V)(^)(F) according to http://www.unicode.org/L2/L2005/05158-n2947-lepcha.pdf (from #2)

BTW, the above syllable structure is different from what's in the Unicode 8.0/9.0, according to which U+1C36 (RAN ; syllable modifier) is at the end instead of right after 'V'.

I can build a comprehensive set of syllables based on that document. Actually, I'll make both (the above doc and the current Unicode).

@jungshik
Copy link

jungshik commented Nov 7, 2015

I generated all the possible "syllables" and compared their shaping with and without lepc in GPOS. There's no difference in all the cases.

syllable.list.unicode.txt

In addition to #2, other issues were found. For instance, U+1C29 collides with final consonants and ran. See the screenshot below.

image

Anyway, dropping lepc from GPOS is shaping-neutral and I'll go ahead with that.

@serkhang
Copy link

serkhang commented Nov 8, 2015

U+1C34/LEPCHA CONSONANT SIGN NYIN-DO only occurs with inherent a (no vowel sign)

U+1C35/LEPCHA CONSONANT SIGN KANG only occurs with vowel signs

U+1C36/LEPCHA SIGN RAN only occurs with inherent (no vowel sign) or U+1C27/LEPCHA VOWEL SIGN I

U+1C37/LEPCHA SIGN NUKTA only occurs with U+1C001C25, U+11C031C25, U+1C1D1C25, U+1C001C251C24, U+11C031C251C24 or U+1C1D1C251C24

@jungshik
Copy link

U+1C34/LEPCHA CONSONANT SIGN NYIN-DO only occurs with inherent a (no vowel sign)
U+1C35/LEPCHA CONSONANT SIGN KANG only occurs with vowel signs

I used the above two rules. If my list of 'syllables' have instances violating the above rules, it's a bug in my generator.

U+1C36/LEPCHA SIGN RAN only occurs with inherent (no vowel sign) or
U+1C27/LEPCHA VOWEL SIGN I

The current version of Unicode (8.0) has the following:

The combining mark U+1C36 lepcha sign ran occurs only after the inherent vowel -a or
the dependent vowels -aa and -i.

That is, in addition to the inherent vowel (no vowel sign) and U+1C27 (vowel sign I), the TUS 8.0 says that it can also occur with U+1C26 (vowel sign AA). I did notice that TUS 8.0 differs from the original proposal you cited in #2 (http://www.unicode.org/L2/L2005/05158-n2947-lepcha.pdf which agrees with you), but went ahead with the TUS 8.0.

So, which is correct? Can RAN be used with U+1C26 (vowel sign AA) or not?

U+1C37/LEPCHA SIGN NUKTA only occurs with U+1C001C25, U+11C031C25, U+1C1D1C25,
U+1C001C251C24, U+11C031C251C24 or U+1C1D1C251C24

Thanks for the reminder. I read the section on retroflex consonants and U+1C37 (Nukta) being only used with ka, ga, and ha, but forgot to add that constraint. I'll add it.

@jungshik
Copy link

Can U+1C37 (NUKTA) be used between U+1C0[03] / U+1C1D and U+1C24 (without U+1C25)?
e.g., is U+1C00 U+1C37 U+1C24 valid? @serkhang

Thank you,

@jungshik
Copy link

@serkhang
Another question. U+1C4[D-F] (TTA, TTHA, DDA) cannot be used with U+1C25 (subjoined RA) and U+1C24 (subjoined YA), can they?

@jungshik
Copy link

  1. Can U+1C37 (NUKTA) be used between U+1C0[03] / U+1C1D and U+1C24 (without U+1C25)?
    e.g., is U+1C00 U+1C37 U+1C24 valid? @serkhang
  2. U+1C4[D-F](TTA, TTHA, DDA) cannot be used with U+1C25 (subjoined RA) and U+1C24 (subjoined YA), can they?

Assuming the answers to 1 and 2 are "No" (invalid) and "No" (cannot), I regenerated the syllable list (I also fixed a serious bug in my code; I was comparing unichr with integer). The list got much shorter.
syllables.list_with_cp.unicode.txt

jungshik referenced this issue in jungshik/noto-fonts Nov 11, 2015
GPOS for DFLT is identical to that for lepc and there's no need to
keep them both.

GSUB has only 'DFLT' and is left alone.

All the possible syllables were generatd and shaped with and
without 'lepc' in GPOS. No differene was observed for any of them.
(see #451 and #395)

Will fix #451
@serkhang
Copy link

@jungshik:
Initials (base characters) may have U+1C36 OR U+1C26, but never in combination. That's another canard of the PUBLISHED specs (as opposed to the APPROVED specs).

U+1C37/NUKTA only appears in the mentioned combinations. U+1C001C371C24 is NOT valid.

U+1C4D, U+1C4E and U+1C4F are never combined with SUBJOINED YA or SUBJOINED RA.

The attached file may be of help to you (though it's rather bulky).
Lazong_Rong.pdf

@jungshik
Copy link

Re: RAN (U+1C36) and U+1C26 (LEPCHA VOWEL SIGN AA)

The proposal ( http://www.unicode.org/L2/L2005/05158-n2947-lepcha.pdf ) has this

the diacritical mark RAN (^)—only after the inherent vowel or -a or -i, never with any of the other vowels

TUS 8.0 has

The combining mark U+1C36 lepcha sign ran occurs only after the inherent vowel -a or
the dependent vowels -aa and -i.

It's not clear what the proposal wants to say. Is it listing three cases, 1. the inherent vowel ( 'a') 2. '-a' (a typo of '-aa' ) 3. '-i'? Did it mean two cases, 1. the inherent vowel ( i.e. '-a') 2. '-i'

It appears that it meant the latter ( the inherent vowel or '-i').

In that case, the TUS needs to be revised. Has this issue been reported to the UTC?

@jungshik
Copy link

BTW, thank you for answering my questions. What I assumed in https://github.com/googlei18n/noto-fonts/issues/451#issuecomment-155614260 was fortunately correct.

As for U+1C36 (RAN) and U+1C26 (vowel 'aa'), I've just updated the list to exclude those combinations. It has 1,378 fewer entries. The new file has 13,848 entries.

syllables.list_with_cp.unicode.txt

@serkhang
Copy link

Absence of the combination U+1C261C36 is maintained by

Mainwaring 1876, Diringer 1951, Haarh 1959, Chakraborty 1978, Kai 2003, Plaisier 2006, Tamsangmoo 2010 (see attachment).

Though Everson has a bit an unfortunate wording, I'd expect him to say '-aa' had he meant U+1C26.
tamsangmoo_2010

@jungshik
Copy link

The attached file may be of help to you (though it's rather bulky).
Lazong_Rong.pdf

Thanks for the file. Does it list all the possible 'syllables' in Lepcha in all languages written in Lepcha across regions ?

I assumed that two medials (U+1C25 (RA) and U+1C24 (YA)) and their combination (rya) can come after any consonants (when nukta is not present). If your PDF is the full list, it appears that only a subset of initial consonants can combine with them.

  1. Combine with RA (U+1C25) : KA, GA, NGA, PA, FA, BA, MA, HA
  2. Combine with YA (U+1C24) : KA, KHA, GA, TA, THA, DA, PA, PHA, FA, VA, MA, RA(U+1C1B), LA, HA, BA, A (U+1C23), KLA (U+1C01), GLA, PLA, FLA, BLA, MLA, HLA,
  3. Combine with "RA + YA" (1C25, 1C24) : KA, GA, NGA, PA, FA, BA, MA, HA

The above is applied when NUKTA is absent.

When NUPKA is present, the rules are already given before.

  1. Ci + NUKTA + RA where Ci is KA, GA, or HA
  2. Ci + NUKTA + RA + YA where Ci is KA, GA or HA

@jungshik
Copy link

Absence of the combination U+1C261C36 is maintained by

Mainwaring 1876, Diringer 1951, Haarh 1959, Chakraborty 1978, Kai 2003, Plaisier 2006, Tamsangmoo 2010 (see attachment).

Thank you for the reply. This has to be reported to the UTC. Are you going to?

@jungshik
Copy link

The same set of syllables as in @serkhang's PDF. I applied the constraints outlined in https://github.com/googlei18n/noto-fonts/issues/451#issuecomment-155923554

syllables.list_with_cp.unicode.reduced.txt

There are total 7,560 of them ( 84 sets of 90)

@serkhang
Copy link

Lazong_Rong.pdf contains the full set of syllables according to the traditional Lazóng syllabaries of the Lepcha in use across the whole area of settlement of the community (Sikkim, West Bengal, Nepal, Bhutan). For more details, see

http://www.openbookpublishers.com/htmlreader/978-1-78374-062-8/3.Plaisier.xhtml#_idTextAnchor032

http://aachulay.blogspot.ch/2010/09/secret-and-concealed-lazaong-book-of.html

@jungshik: Launched an error report for the UTC, today.

@jungshik
Copy link

Lazong_Rong.pdf contains the full set of syllables according to t

Thanks. My list should match those listed in the file.

Launched an error report for the UTC, today.

Thanks. I'll ping @roozbehp once more about them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants