-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rough first draft of script matching doc #1
Conversation
I'm very open to ideas regarding open questions and TODO items. |
@jfkthame I'd love to get your input on this, even if just a few pointers of where to look in the Gecko codebase. |
As a simple note, we can't use only the This is how Qt does this. |
For Gecko, see gfxFontGroup::FindFontForChar, which in turn will call into WhichPrefFontSupportsChar and WhichSystemFontSupportsChar. Note that the structure of the Gecko font preferences is pretty ancient, with roots in the old world of multiple 8-bit and double-byte codepages for different "language groups", and could really use an extensive rewrite... |
@jfkthame Thanks, that's useful, but I find myself still mystified by where, in particular, Han unification logic happens. It seems like it should be in WhichSystemFontSupportsChar (as that takes an |
You probably want to look at WhichPrefFontSupportsChar, as that's where whichever CSS generic is applicable will be mapped to a font family from the (user-configurable) prefs. It'll look up a "unicode range" for the character, and then map this through gfxPlatformFontList::GetFontPrefLangFor and gfxPlatformFontList::GetLangPrefs to determine which set of prefs to use. So in most cases, if a CJK font hasn't been explicitly named, this is where it'll get selected. Only if the font specified via prefs doesn't cover the character in question will we end up in WhichSystemFontSupportsChar. |
Ok, that's helpful, though I've got to say it's not easy to figure out what's going on from reading the code. However, having come across implement font cascading for system fonts under OSX, it seems like this might be the answer I'm looking for: CTFontCopyDefaultCascadeListForLanguages. That linked bug identifies a few problems with the approach, but I'm wondering whether I should be pursuing this or trying to replicate what Gecko does. And after a little more digging, I found the source of truth for that: lang-tags in the |
More background and information, mostly from investigations into Blink, Gecko, and Qt.
I've added significant new content, based on investigations of Gecko and Qt. This certainly seems to be a complicated problem domain. Again, feedback is welcome! |
|
||
### Fontconfig | ||
|
||
The [Fontconfig] configuration file format specifies "langset" as an an "RFC-3066-style" language. [RFC 3066] is a predecessor to BCP-47 (dated 2001), and basically specifies language and country, with no provision for explicit script or variant. For the purpose of Han unification, the convention is to infer script from country. For example, "zh-CN" could be translated to "zh-Hans", "zh-TW" to "zh-Hant". However, after a little investigation, it's not clear to me how useful it is to do sophisticated processing here, as the default fontconfig on a clean Debian 9 install lists doesn't specify "langset" attributes, but just has a few informal descriptions in comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC a lot of systems don't handle this so everyone just uses CN and TW. Not sure about font systems specifically.
I have had hans vs hant trigger differences in browsers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just the march of progress, and hopefully in a few decades the use of country to represent script will fade away.
|
||
I strongly recommend the use of BCP-47 as the identifier for language, script, and other locale metadata. This is an easy decision for web use cases, as it is the standard for the [lang] tag. The main challenge is that mechanisms for system font metadata in general predate BCP-47, so there will be some impedance matching. | ||
|
||
TODO: investigate Rust ecosystem for common BCP-47 tag representation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @zbraniecki
Pretty sure fluent-rs needs this too, worth knowing what y'all are using
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm curious what work is happening. I did just a bit of searching, didn't find any clear consensus on what people are doing, so would be very happy to hear from any efforts in this space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using a crate fluent-locale which provides the basic BCP47 locale management and negotiation.
I'm hoping for a Locale
class to be added to unic
, but that is going a bit slower and awaits open-i18n/rust-unic#195 (comment)
No description provided.