Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse "x-hbsc" language tags as script overrides #645

Closed
wants to merge 1 commit into from
Closed

Parse "x-hbsc" language tags as script overrides #645

wants to merge 1 commit into from

Conversation

dscorbett
Copy link
Collaborator

If a private use subtag begins with “hbsc” it overrides the script tag. This allows testing of 'deva' in a font which also supports 'dev2'. “hbot” and “hbsc” can be used with each other in either order. This also fixes the language tag parser so that it ignores “x-hbot” when the “x” is not its own subtag.

To implement that, this pull request adds the function hb_ot_tags_from_language_and_script to the API. I didn’t document it: the files in docs/ seem to be automatically generated, but I couldn’t find the generator. That part I leave to you. Or you can tell me how to do it.

Closes #495.

@khaledhosny
Copy link
Collaborator

To build the documentation you need to configure with --enable-gtk-doc.

@behdad
Copy link
Member

behdad commented Dec 6, 2017

Thanks. Looks good in general. Except that the name hb_ot_tags_from_language_and_script does not reveal that this is for finding script tag, not language tag. I suggest we combine both, ie .this function would return both OT language tag, and list of OT script tags. Also, Apple has introduced a dev3 etc set of tags, so we might want to extend the API for that. Perhaps caller can pass in an array and size of it, and we will have a macro for max number of entries per script.

@behdad
Copy link
Member

behdad commented Dec 6, 2017

For documentation, you just need to add a comment block before the function definition, and add it to harfbuzz-sections.txt.

@dscorbett
Copy link
Collaborator Author

For the function as I wrote it without further refactoring, how is the name hb_ot_script_tags_from_language_and_script?

Your suggested refactoring makes sense. Relatedly, I have been thinking about improving the BCP 47 parsers, currently scattered throughout various functions; that would fix #362. I intended this pull request to just be for the script overrides, and I am wary of making a big API change to a project where I am not a core contributor, but if you think this is a good idea I can work on it.

@behdad
Copy link
Member

behdad commented Dec 6, 2017

Yeah I'd rather we add one new API that handles multiple enhancements than adding piecemeal improvements.

@behdad
Copy link
Member

behdad commented Dec 6, 2017

Another approach, maybe better, would be to add hb_ot_script_tag_from_language(). But that would rely on users to call this before calling existing hb_ot_tags_from_script(), so I'd rather we just add one API that everyone can call and get the correct tags.

@dscorbett
Copy link
Collaborator Author

The refactoring is coming along nicely. I’ve got some questions about HarfBuzz’s language code policy.

What sort of BCP 47 normalization is the caller responsible for? HarfBuzz canonicalizes everything to lowercase (via canon_map) but doesn’t replace e.g. i-navajo with nv.

If HarfBuzz isn’t responsible for normalizing deprecated or invalid codes, should it support them? The answer could be:

  • no;
  • yes, all deprecated BCP 47 subtags;
  • yes, all deprecated BCP 47 subtags and all retired ISO 639 codes that aren’t even in BCP 47; or
  • yes, but only the codes explicitly mentioned in the OpenType tag list, like flm but not adp.

Some unrelated languages coincidentally have the same name. For example, mdc and mdy are both “Male”. The OpenType tag list only lists mdy, but hb-ot-tag.cc lists both. Should it?

@brawer
Copy link
Contributor

brawer commented Dec 20, 2017

Regarding language tag normalization (handling deprecated subtags such as iw for he, overlong subtags such as eng for en, etc.): Personally, I think it would be good to be permissive when accepting input, so I’d recommend handling deprecated BCP47 tags assuming that it’s OK to spend a couple kilobytes for a mapping table. Unicode CLDR maintains a BCP47 subtag alias table in supplementalMetadata.xml with languageAlias, scriptAlias, territoryAlias, and subdivisionAlias entries. (The last two probably don’t matter for HarfBuzz). CLDR is currently on a six-month release cycle, but that link points to the head of the master branch.

@behdad
Copy link
Member

behdad commented Dec 24, 2017

As long as it's not too much code, we should whatever BCP-47 requires.

@behdad
Copy link
Member

behdad commented Feb 18, 2018

Closing. Continued in #730

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hb-shape and hb-view should accept all OpenType language systems and scripts
4 participants