Combining commas above are inconsistent with those below #186

GoogleCodeExporter · 2015-06-08T18:31:37Z

The issue applies to combining commas above:
• [o̒] U+0312 COMBINING TURNED COMMA ABOVE
• [o̓] U+0313 COMBINING COMMA ABOVE
• [o̔] U+0314 COMBINING REVERSED COMMA ABOVE
• [o̕] U+0315 COMBINING COMMA ABOVE RIGHT

combining comma below:
• [o̦] U+0326 COMBINING COMMA BELOW

modifier commas above:
• [ʻ] U+02BB MODIFIER LETTER TURNED COMMA
• [ʼ] U+02BC MODIFIER LETTER APOSTROPHE
• [ʽ] U+02BD MODIFIER LETTER REVERSED COMM

and certain precomposed characters with cedilla or comma below:
• ĢĶĻŅŖȘȚ ģķļņŗșț

I believe that the comma in all these characters should look analogously, even 
though for historic reasons the Latvian letters ģķļņŗ are treated as if 
they had a cedilla rather than comma below or turned comma above.

Unfortunately in Noto Serif the combining commas above and modifier commas 
above have a bulb, while combining comma below and the comma in all precomposed 
characters have a more simplified shape. Also in Noto Sans they differ in size.

They might all have the simpler / smaller comma shape.

Why do I care? My conscript uses several letters with comma below and turned 
comma above (b̦d̦f̦l̦m̦n̦p̒r̦șțv̦z̦). They must use the combining 
characters (except for șț), and they look inconsistently in Noto fonts.

Original issue reported on code.google.com by qrc...@google.com on 7 Aug 2014 at 2:23

The text was updated successfully, but these errors were encountered:

nizarsq · 2020-07-23T04:25:09Z

verdy-p · 2022-02-20T16:07:28Z

Why using simple (wedge) shapes of combining commas for Noto Serif ? IMHO:

the simpler wedge shape is well suited for Noto Sans (and should match with the shapes of comma/apostrophe/quotation punctuation signs).
the bulb shape is better suited for Noto Serif (and should match with the shapes of comma/apostrophe/quotation punctuation signs as well).

In all these cases, we should be able to use variation selectors to select the shape for punctuation signs, but for now there's no mechanism available in Unicode to use varaition selectors for combining characters (or possibly with precomposed characters that must preserve their canonical equivalence, where it will not be obvious if the variation selector used after a precomposed characters applies to the base character or to one of its embedded diacritics: for such case a base variation selector after a precomposed character should only apply to its base, if there's such variation registered for this base, but never to its embedded diacritics; and a combining variation selector with combining class 0 should only be valid after a combining character, or after a precomposed character in which case it will alter only the embedded combining character with the highest non-zero combining class in order to preserve the canonical equivalence...).

CGJ is already the first combining variation selector (however it is limited and does not indicate a variation of shape when it it used between two diacritics, it just allows fixing the logical ordering/stacking to alter the default order implied by normalization, before combining characters with non zero-combining classes when the previous combining character has a higher combining class; beside that usage, CGJ has no defined meaning before other combining characters with non-zero combining classes, except for some South Asian scripts where it it not really encoding a variation, but a simultaneous change of semantics and the way the following combining character behaves in a cluster, for example to encode distinctive REPHA-like forms of Indic combining letters forming cunjuncts, by using <CGJ+VIRAMA/HALANT> before the next base consonnant, or to disable this default behavior for complex clusters of such scripts): is there a need for a few more combining variation selectors (CGJ2, CGJ3...) to be encoded by Unicode (in the few remaining unallocated codepoints in the BMP), and to ask to Unicode to extend its registry of variation sequences as well to combining diacritics (this would also require some precision in the existing Unicode definition of combining sequences).... unless we encode them as <CGJ+VSn>? Note also that some encoded scripts also add their own variation selectors (not using generic , also because they need distinctive semantics and are not allowing optional or free change of the possible forms).
But if we want to explicitly select between two alternate forms (wedge-like, or with bulb) for combining comma-like diacritics or punctuations, we need at least 2 different combining variation selectors...). For now this seems to have been neglected in the early Unicode discussions when these alternate forms were unified with the same encoded character, under the assumption that they share the same semantics (so that it makes possible for different forms to be used by default for example between Noto Sans and Noto Serif).

Of course they must always be consistant between precomposed characters and combining diacritics (beside their placement or rotations for combinations used in Baltic languages (and of course the possibility to cancel such default change of position or rotation for non-Baltic languages using them at their normal position, by encoding for example a CGJ between the base letter and the diacritic, so that they are no longer canonically equivalent to the Baltic combinations which must remain consistant independantly of the fact they were encoded as precomposed characters or as base letters plus a separate diacritic).

This consideration should also be applied to Roumanian-like usage of cedillas which may look as comma diacritics: such change of shape and attachment used by default for Roumanian should be disavbled as well using a CGJ to preserve the position and shape of the cedilla below, without having to use language-specific features (meaning forcing document to correctly use correct language identification in rich-text formats for multilingual documents, something that is not possible in plain-text where language identification is only applicable to the whole document (and we should not depend on the encoding of language tags, i.e. with special characters in plane 14, something that is now discouraged, never needed for rich-text formats like HTML: language-specific OpenType "feature" tables should be avoided as much as possible, given that Unicode now also supports variant selectors when needed, and the encoding and usage of variants are defined by the Unicode standard itself in the UCD).

simoncozens · 2023-01-19T12:46:50Z

To summarise where we are with this: precomposed forms with comma above use a different form of comma to the comma below and combining comma above. In Serif, it's a different form; in Sans, it's a different size. A good test string for the inconsistency is ģn̦p̒:

I think this is clearly a bug; unfortunately I think the fix is less clear. I personally feel like bulb-shaped commas everywhere for the serif would make sense, and maybe large comma accents everywhere in the sans would make sense, but I don't have enough design experience to know. I think for now the answer is "ask @moyogo". :-)

(I don't believe that the Unicode CGJ suggestion is a good way forward; we don't really want to be innovating our own Unicode conventions. Romanian cedilla shape is already selectable in the font through setting the language tag, and yes, not many applications correctly support that (browsers do!) but that's an application issue. We do the right thing, even if they don't.)

GoogleCodeExporter added the Type-Defect label Jun 8, 2015

marekjez86 assigned waksmonskiMT Jul 24, 2015

xiangyexiao added the FoundIn-1.x label Aug 5, 2015

roozbehp added the Android label Oct 14, 2015

roozbehp added the Priority-Medium label Feb 17, 2016

davelab6 unassigned waksmonskiMT Apr 16, 2019

marekjez86 added the Script-LatinGreekCyrillic label Jun 22, 2020

nizarsq added the in-evaluation label Jul 23, 2020

nizarsq assigned marekjez86 Jul 23, 2020

marekjez86 added the design-question label Aug 30, 2020

moyogo mentioned this issue Mar 10, 2021

Update tests LGC/123 and Telugu/346 notofonts/noto-source#349

Merged

simoncozens transferred this issue from notofonts/noto-fonts Jun 20, 2022

simoncozens removed the Script-LatinGreekCyrillic label Jan 11, 2023

simoncozens added Serif Sans Latin and removed in-evaluation Type-Defect Android FoundIn-1.x labels Jan 19, 2023

simoncozens added the Triaged label Jan 19, 2023

moyogo mentioned this issue Apr 8, 2023

[Question] Possible to define a standard tadpole-like (or curly) comma/quotation mark for Noto Sans series? #411

Open

simoncozens unassigned marekjez86 Jun 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combining commas above are inconsistent with those below #186

Combining commas above are inconsistent with those below #186

GoogleCodeExporter commented Jun 8, 2015

nizarsq commented Jul 23, 2020

verdy-p commented Feb 20, 2022 •

edited

simoncozens commented Jan 19, 2023

Combining commas above are inconsistent with those below #186

Combining commas above are inconsistent with those below #186

Comments

GoogleCodeExporter commented Jun 8, 2015

nizarsq commented Jul 23, 2020

verdy-p commented Feb 20, 2022 • edited

simoncozens commented Jan 19, 2023

verdy-p commented Feb 20, 2022 •

edited