Parse "x-hbsc" language tags as script overrides #645

dscorbett · 2017-12-06T16:31:25Z

If a private use subtag begins with “hbsc” it overrides the script tag. This allows testing of 'deva' in a font which also supports 'dev2'. “hbot” and “hbsc” can be used with each other in either order. This also fixes the language tag parser so that it ignores “x-hbot” when the “x” is not its own subtag.

To implement that, this pull request adds the function hb_ot_tags_from_language_and_script to the API. I didn’t document it: the files in docs/ seem to be automatically generated, but I couldn’t find the generator. That part I leave to you. Or you can tell me how to do it.

Closes #495.

Closes #495

khaledhosny · 2017-12-06T18:39:58Z

To build the documentation you need to configure with --enable-gtk-doc.

behdad · 2017-12-06T19:29:06Z

Thanks. Looks good in general. Except that the name hb_ot_tags_from_language_and_script does not reveal that this is for finding script tag, not language tag. I suggest we combine both, ie .this function would return both OT language tag, and list of OT script tags. Also, Apple has introduced a dev3 etc set of tags, so we might want to extend the API for that. Perhaps caller can pass in an array and size of it, and we will have a macro for max number of entries per script.

behdad · 2017-12-06T19:35:49Z

For documentation, you just need to add a comment block before the function definition, and add it to harfbuzz-sections.txt.

dscorbett · 2017-12-06T20:22:23Z

For the function as I wrote it without further refactoring, how is the name hb_ot_script_tags_from_language_and_script?

Your suggested refactoring makes sense. Relatedly, I have been thinking about improving the BCP 47 parsers, currently scattered throughout various functions; that would fix #362. I intended this pull request to just be for the script overrides, and I am wary of making a big API change to a project where I am not a core contributor, but if you think this is a good idea I can work on it.

behdad · 2017-12-06T20:42:08Z

Yeah I'd rather we add one new API that handles multiple enhancements than adding piecemeal improvements.

behdad · 2017-12-06T20:43:10Z

Another approach, maybe better, would be to add hb_ot_script_tag_from_language(). But that would rely on users to call this before calling existing hb_ot_tags_from_script(), so I'd rather we just add one API that everyone can call and get the correct tags.

dscorbett · 2017-12-11T20:08:24Z

The refactoring is coming along nicely. I’ve got some questions about HarfBuzz’s language code policy.

What sort of BCP 47 normalization is the caller responsible for? HarfBuzz canonicalizes everything to lowercase (via canon_map) but doesn’t replace e.g. i-navajo with nv.

If HarfBuzz isn’t responsible for normalizing deprecated or invalid codes, should it support them? The answer could be:

no;
yes, all deprecated BCP 47 subtags;
yes, all deprecated BCP 47 subtags and all retired ISO 639 codes that aren’t even in BCP 47; or
yes, but only the codes explicitly mentioned in the OpenType tag list, like flm but not adp.

Some unrelated languages coincidentally have the same name. For example, mdc and mdy are both “Male”. The OpenType tag list only lists mdy, but hb-ot-tag.cc lists both. Should it?

brawer · 2017-12-20T14:12:22Z

Regarding language tag normalization (handling deprecated subtags such as iw for he, overlong subtags such as eng for en, etc.): Personally, I think it would be good to be permissive when accepting input, so I’d recommend handling deprecated BCP47 tags assuming that it’s OK to spend a couple kilobytes for a mapping table. Unicode CLDR maintains a BCP47 subtag alias table in supplementalMetadata.xml with languageAlias, scriptAlias, territoryAlias, and subdivisionAlias entries. (The last two probably don’t matter for HarfBuzz). CLDR is currently on a six-month release cycle, but that link points to the head of the master branch.

behdad · 2017-12-24T22:40:35Z

As long as it's not too much code, we should whatever BCP-47 requires.

behdad · 2018-02-18T22:06:05Z

Closing. Continued in #730

Parse "x-hbsc" language tags as script overrides

b3487a3

Closes #495

dscorbett mentioned this pull request Jan 28, 2018

Refactor script and language tags #730

Merged

behdad closed this Feb 18, 2018

behdad mentioned this pull request Feb 6, 2021

[hb-view] [utils] application of language #2844

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse "x-hbsc" language tags as script overrides #645

Parse "x-hbsc" language tags as script overrides #645

dscorbett commented Dec 6, 2017

khaledhosny commented Dec 6, 2017

behdad commented Dec 6, 2017

behdad commented Dec 6, 2017

dscorbett commented Dec 6, 2017

behdad commented Dec 6, 2017

behdad commented Dec 6, 2017

dscorbett commented Dec 11, 2017

brawer commented Dec 20, 2017

behdad commented Dec 24, 2017

behdad commented Feb 18, 2018

Parse "x-hbsc" language tags as script overrides #645

Parse "x-hbsc" language tags as script overrides #645

Conversation

dscorbett commented Dec 6, 2017

khaledhosny commented Dec 6, 2017

behdad commented Dec 6, 2017

behdad commented Dec 6, 2017

dscorbett commented Dec 6, 2017

behdad commented Dec 6, 2017

behdad commented Dec 6, 2017

dscorbett commented Dec 11, 2017

brawer commented Dec 20, 2017

behdad commented Dec 24, 2017

behdad commented Feb 18, 2018