Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining commas above are inconsistent with those below #186

Open
GoogleCodeExporter opened this issue Jun 8, 2015 · 3 comments
Open

Comments

@GoogleCodeExporter
Copy link

The issue applies to combining commas above:
• [o̒] U+0312 COMBINING TURNED COMMA ABOVE
• [o̓] U+0313 COMBINING COMMA ABOVE
• [o̔] U+0314 COMBINING REVERSED COMMA ABOVE
• [o̕] U+0315 COMBINING COMMA ABOVE RIGHT

combining comma below:
• [o̦] U+0326 COMBINING COMMA BELOW

modifier commas above:
• [ʻ] U+02BB MODIFIER LETTER TURNED COMMA
• [ʼ] U+02BC MODIFIER LETTER APOSTROPHE
• [ʽ] U+02BD MODIFIER LETTER REVERSED COMM

and certain precomposed characters with cedilla or comma below:
• ĢĶĻŅŖȘȚ ģķļņŗșț

I believe that the comma in all these characters should look analogously, even 
though for historic reasons the Latvian letters ģķļņŗ are treated as if 
they had a cedilla rather than comma below or turned comma above.

Unfortunately in Noto Serif the combining commas above and modifier commas 
above have a bulb, while combining comma below and the comma in all precomposed 
characters have a more simplified shape. Also in Noto Sans they differ in size.

They might all have the simpler / smaller comma shape.

Why do I care? My conscript uses several letters with comma below and turned 
comma above (b̦d̦f̦l̦m̦n̦p̒r̦șțv̦z̦). They must use the combining 
characters (except for șț), and they look inconsistently in Noto fonts.

Original issue reported on code.google.com by qrc...@google.com on 7 Aug 2014 at 2:23

@nizarsq
Copy link

nizarsq commented Jul 23, 2020

Screen Shot 2020-07-22 at 9 24 36 PM

@verdy-p
Copy link

verdy-p commented Feb 20, 2022

Why using simple (wedge) shapes of combining commas for Noto Serif ? IMHO:

  • the simpler wedge shape is well suited for Noto Sans (and should match with the shapes of comma/apostrophe/quotation punctuation signs).
  • the bulb shape is better suited for Noto Serif (and should match with the shapes of comma/apostrophe/quotation punctuation signs as well).

In all these cases, we should be able to use variation selectors to select the shape for punctuation signs, but for now there's no mechanism available in Unicode to use varaition selectors for combining characters (or possibly with precomposed characters that must preserve their canonical equivalence, where it will not be obvious if the variation selector used after a precomposed characters applies to the base character or to one of its embedded diacritics: for such case a base variation selector after a precomposed character should only apply to its base, if there's such variation registered for this base, but never to its embedded diacritics; and a combining variation selector with combining class 0 should only be valid after a combining character, or after a precomposed character in which case it will alter only the embedded combining character with the highest non-zero combining class in order to preserve the canonical equivalence...).

CGJ is already the first combining variation selector (however it is limited and does not indicate a variation of shape when it it used between two diacritics, it just allows fixing the logical ordering/stacking to alter the default order implied by normalization, before combining characters with non zero-combining classes when the previous combining character has a higher combining class; beside that usage, CGJ has no defined meaning before other combining characters with non-zero combining classes, except for some South Asian scripts where it it not really encoding a variation, but a simultaneous change of semantics and the way the following combining character behaves in a cluster, for example to encode distinctive REPHA-like forms of Indic combining letters forming cunjuncts, by using <CGJ+VIRAMA/HALANT> before the next base consonnant, or to disable this default behavior for complex clusters of such scripts): is there a need for a few more combining variation selectors (CGJ2, CGJ3...) to be encoded by Unicode (in the few remaining unallocated codepoints in the BMP), and to ask to Unicode to extend its registry of variation sequences as well to combining diacritics (this would also require some precision in the existing Unicode definition of combining sequences).... unless we encode them as <CGJ+VSn>? Note also that some encoded scripts also add their own variation selectors (not using generic , also because they need distinctive semantics and are not allowing optional or free change of the possible forms).
But if we want to explicitly select between two alternate forms (wedge-like, or with bulb) for combining comma-like diacritics or punctuations, we need at least 2 different combining variation selectors...). For now this seems to have been neglected in the early Unicode discussions when these alternate forms were unified with the same encoded character, under the assumption that they share the same semantics (so that it makes possible for different forms to be used by default for example between Noto Sans and Noto Serif).

Of course they must always be consistant between precomposed characters and combining diacritics (beside their placement or rotations for combinations used in Baltic languages (and of course the possibility to cancel such default change of position or rotation for non-Baltic languages using them at their normal position, by encoding for example a CGJ between the base letter and the diacritic, so that they are no longer canonically equivalent to the Baltic combinations which must remain consistant independantly of the fact they were encoded as precomposed characters or as base letters plus a separate diacritic).

This consideration should also be applied to Roumanian-like usage of cedillas which may look as comma diacritics: such change of shape and attachment used by default for Roumanian should be disavbled as well using a CGJ to preserve the position and shape of the cedilla below, without having to use language-specific features (meaning forcing document to correctly use correct language identification in rich-text formats for multilingual documents, something that is not possible in plain-text where language identification is only applicable to the whole document (and we should not depend on the encoding of language tags, i.e. with special characters in plane 14, something that is now discouraged, never needed for rich-text formats like HTML: language-specific OpenType "feature" tables should be avoided as much as possible, given that Unicode now also supports variant selectors when needed, and the encoding and usage of variants are defined by the Unicode standard itself in the UCD).

@simoncozens
Copy link
Contributor

To summarise where we are with this: precomposed forms with comma above use a different form of comma to the comma below and combining comma above. In Serif, it's a different form; in Sans, it's a different size. A good test string for the inconsistency is ģn̦p̒:

shape
shape

I think this is clearly a bug; unfortunately I think the fix is less clear. I personally feel like bulb-shaped commas everywhere for the serif would make sense, and maybe large comma accents everywhere in the sans would make sense, but I don't have enough design experience to know. I think for now the answer is "ask @moyogo". :-)

(I don't believe that the Unicode CGJ suggestion is a good way forward; we don't really want to be innovating our own Unicode conventions. Romanian cedilla shape is already selectable in the font through setting the language tag, and yes, not many applications correctly support that (browsers do!) but that's an application issue. We do the right thing, even if they don't.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants