Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kannada Ra-Virama-ZWJ gives different results from Windows 10 #435

Closed
devosb opened this issue Mar 4, 2017 · 4 comments

Comments

@devosb
Copy link

commented Mar 4, 2017

The character sequence Ra-Virama-ZWJ in Kannada script gives a different rendering from Windows 10. Specifically, with HarfBuzz the virama is visibly rendered, just like if ZWJ was replaced by ZWNJ. In Windows 10, the virama causes a sub form to appear.

The file harfbuzz.png was generated from hb-view on Ubuntu Xenial, using the latest git sources. The file windows10.png was generated on Windows 10 using Notepad. The font in both cases was Noto Sans Kannada (Regular), version 1.04. The source text is the file renderdiff.txt. The word in the second line of the example comes from the Kannada Wikipedia wordlist that was used to test HarfBuzz.

I also tested with the fonts

  • Noto Sans Kannada
  • Noto Sans Kannada UI
  • Noto Serif Kannada
  • Nirmala UI (Windows 10 only)
  • Tunga (Windows 10 and 7 only)
  • Lohit Kannada

with various combinations of Notepad, LibreOffice 5.1, LibreOffice 5.2, Windows 10, Windows 7, Ubuntu Xenial with HarfBuzz as packaged by Ubuntu and also compiled from source. The difference seemed to be was HarfBuzz doing the OpenType rendering, or was Microsoft DirectWrite (I don't think Uniscribe or USE would have been involved) doing the rendering.
harfbuzz
windows10
renderdiff.txt

@devosb

This comment has been minimized.

Copy link
Author

commented Mar 6, 2017

I apologize, I did not see #341 before posting. I did test in Word 2016, on Windows 10, and Word gives the same result as Notepad on Windows 10.

@behdad

This comment has been minimized.

Copy link
Member

commented Jul 14, 2017

According to discussion in #341 this sequence is undefined in Kannada, hence not marking as bug, but enhancement request for matching what Windows does.

@behdad behdad added the Android label Jul 14, 2017

@behdad

This comment has been minimized.

Copy link
Member

commented Oct 3, 2017

But it looks to us like Uniscribe is reordering the broken sequence (Ra,Virama,ZWJ) into the correct one (Ra,ZWJ,Virama) before processing lookups in the font. I've asked @PeterCon to confirm, before we implement.

@behdad

This comment has been minimized.

Copy link
Member

commented Oct 10, 2017

Here's what Peter Constable wrote to me:

For Kannada character sequences < RA, VIRAMA, consonant >, there’s potential ambiguity as to whether the display should be [ gRA, gConsonant.subjoined ] or [ gConsonant, gReph ]. On pages 499-500 of Unicode 10.0 (section 12.8 — http://www.unicode.org/versions/Unicode10.0.0/ch12.pdf), it specifies that a sequence of < RA, ZWJ, VIRAMA, consonant > be used to represent text that needs to be rendered [gRA, gConsonant.subjoined ].

However, things were not always specified that way. If you look back in Unicode 4.0, section 9.8 (http://www.unicode.org/versions/Unicode4.0.0/ch09.pdf), it was actually specified the other way around: the sequence < RA, VIRAMA, ZWJ, Consonant > was specified to represent [ gRA, gConsonant.subjoined ]. This was changed in Unicode 5 after it came to light that there were inconsistent specifications for different Indic scripts.

This all arose as a result of various things involving Indic scripts all happening in 2004, one of which was publication of a draft for a Sri Lanka standard (http://www.unicode.org/cgi-bin/GetMatchingDocs.pl?L2/04-131). This came up for discussion at the UTC meeting in June 2004 (see http://www.unicode.org/cgi-bin/GetL2Ref.pl?99-C37). At the time, I was working on updates to the Indic shaping engine in Uniscribe, and I was aware of some of the inconsistencies (e.g., the draft for the Sri Lanka standard specifying the opposite of what Unicode had specified for Kannada RA and for Bangla ya-phalaa). So, I was given a UTC action item to prepare a doc regarding the general issue for Indic scripts. That eventually led to issuing Public Review Issue #37 (http://www.unicode.org/review/pr-37.pdf), which proposed having a consistent specification of ZWJ sequences across Indic scripts. (See the last page for specific changes for Kannada and Bangla.) That proposal was adopted at the UTC meeting in August 2004 (http://www.unicode.org/cgi-bin/GetL2Ref.pl?100-C22).

So, Kannada sequences < RA, VIRAMA, ZWJ, Consonant > aren’t currently recommended, but earlier on they were.

To accommodate previously-existing docs, Uniscribe does have a special-case behaviour:

             // For compatibility with legacy useage in Kannada,

             // Ra+h+ZWJ must behave like Ra+ZWJ+h...

@behdad behdad closed this in fa48ccb Oct 12, 2017

devosb added a commit to nlci/knda-font-badami that referenced this issue Oct 16, 2017
Test data for a HarfBuzz bug
HarfBuzz differed from Microsoft when displaying Ra,H,ZWJ sequences.
Details at harfbuzz/harfbuzz#435
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.