Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduce pseudo-localization #83

Open
zbraniecki opened this issue Oct 19, 2017 · 27 comments
Open

Reintroduce pseudo-localization #83

zbraniecki opened this issue Oct 19, 2017 · 27 comments

Comments

@zbraniecki
Copy link
Collaborator

Coming back from the Unicode Conference, there was a lot of chatter about pseudo-locales.

Fluent already had a pretty good support for pseudo-locales in the past and due to our client-side mode, we offer an exciting approach to pseudo-locales - runtime pseudolocalization.

I'd like to bring back this: https://github.com/l20n/l20n.js/blob/v3.x/src/lib/pseudo.js to modern fluent.

@stasm - do you have any thoughts on how would you like it to work?

@zbraniecki
Copy link
Collaborator Author

Maybe just introducing some generic "post-processing" on messages in MessageContext, and then adding fluent-pseudolocale would work as the first step?

let ctx = new MessageContext(['ar'], {
  process: fluent_pseudolocales.transform.bind('ar-XB')
});
let msg = ctx.formatValue('l10n-id');

@stasm
Copy link
Contributor

stasm commented Oct 19, 2017

Maybe just introducing some generic "post-processing" on messages in MessageContext, and then adding fluent-pseudolocale would work as the first step?

That was my first thought as well. A few additional thoughts below. I'll try to have answers tomorrow.

We have to take into account how this will interact with the language negotiation. Would we expect the user to set their requested locale to a pseudolocale in order to enable it? Would we require that developers add pseudolocales to the list of available locales in their app?

Perhaps it would make sense to encode pseudolocales as Unicode extensions to BCP47? Something like ab-CD-u-pseudo-accent or ab-CD-u-pseudo-rtl. The language negotiation process would then still correctly pick the regular ab-CD for fetching translation resources. Some logic would then be responsible for transforming the fetched resource using the fluent-pseudo module.

What should be the outcome of formatting a date or a number in a pseudolocalized translation?

Also, the first step might be to only support build-time pseudolocalization.

@zbraniecki
Copy link
Collaborator Author

Google just went for en-XA, and ar-XB and added them to CLDR. So we can get internationalized date/time from CLDR 31 if we use those two.

Now, my problem with this is that because they used en-XA and not fr-XA, the numbers still look the same. I recommended them fr-XA, but it may be too late.

Since we'd be doing runtime pseudo, maybe we don't need extensions (and they wouldn't be unicode extensions, but rather variants ( Google originally used en-psaccent and ar-psaccentrtl or sth like that).

Maybe all we need is:

let ctx = new MessageContext('pl', {
  process: pseudo
});

and it'll transform polish strings? This way we could get a pseudo of the current locale, irrelevant of what it is.

@stasm
Copy link
Contributor

stasm commented Oct 19, 2017

How would you decide when to turn pseudolocalization on? A different logic independent of the language negotiation?

@zbraniecki
Copy link
Collaborator Author

How would you decide when to turn pseudolocalization on? A different logic independent of the language negotiation?

Yeah! This way the user can either detect pseudo from a langtag (oh, you're using XA region?) or by some checkbox (show me pseudolocale).
Since we're on client-side at runtime, that would mean no rebuilding, restarting or anything. Just take the exact locale we use, whatever it is, and recompute for pseudo.

@stasm
Copy link
Contributor

stasm commented Oct 19, 2017 via email

@zbraniecki
Copy link
Collaborator Author

Oh, you're right.

I think it would be better to have it on a lower level though.

I agree.

Does the language negotiation preserve extensions found on the requested locales?

yes.

So, maybe private-extension? fr-FR-x-pseudo ?
We just need to make sure that if we see a pseudo like this, we actually feed en-XA, ar-XB to Intl API (so that CLDR picks it up)

@stasm
Copy link
Contributor

stasm commented Oct 20, 2017

There's discussion in http://unicode.org/cldr/trac/ticket/3971 and http://unicode.org/cldr/trac/ticket/9819 on why CLDR didn't go for variant tags. It's mostly about compatibility with existing code. Also, since en-XA and ar-XB are now in CLDR we should stick to these codes. I wish we hadn't missed the discussion when it happened.

I'm reconsidering my stance on where this logic should live. Having it higher up, e.g. in fluent-web would allow multiple approaches:

  • build-time transformation,
  • AST transformation right after IO,
  • string transformation after ctx.format().

fluent-web can also transform translations in a way which preserves HTML for the overlay mechanic.

@Pike
Copy link
Contributor

Pike commented Oct 20, 2017

I'd think that the most accute way to implement the actual pseudo localization would be on the AST?

@stasm
Copy link
Contributor

stasm commented Oct 20, 2017

On buildtime or on runtime?

@Pike
Copy link
Contributor

Pike commented Oct 20, 2017

For both, I guess.

@stasm
Copy link
Contributor

stasm commented Oct 20, 2017

Transforming the runtime AST means doing the transformation inside of MessageContext. That still might a viable option given my earlier comments: fluent-web could supply a markup-aware transform function to the MessageContext constructor. Compared to transforming the result of ctx.format(), this would have the advantage of only transforming TextElements in the translation rather than the whole string.

We still need to solve the problem of fetching valid locale files. Given that it looks like fluent-web (or fluent-react) would need to handle the pseudolocalization anyways (if only to be HTML-aware), I think it makes sense to special-case en-XA and ar-XB in their IO.

For example, given the following result of language negotiation:

  • requested: en-XA, de
  • available: en-XA, en-US, de
  • default: en-US
  • negotiated: en-XA, de, en-US

…a developer using fluent-react will need to add a special case to the IO code which fetches en-US when en-XA is requested. This sounds okay to me since the same developer has already put en-XA among the available locales.

let ctx = new MessageContext(negotiated, { pseudo: makeAccent });
ctx.addMessages( /* en-US translations to be transformed into en-XA */);

Or, if the build pipeline is capable of building pseudolocales up front, the IO code would simply fetch the pre-made en-XA files.

let ctx = new MessageContext(negotiated);
ctx.addMessages( /* en-XA  translations generated on build-time */);

@zbraniecki
Copy link
Collaborator Author

I do not agree with Stas that we have to use en-XA and ar-XB here. I believe it's perfectly fine for us to use whatever mechanism we want to use to recognize pseudolocales, and then just make sure to collapse in Intl constructor onto en-XA and ar-XB for Intl API / CLDR.

@stasm
Copy link
Contributor

stasm commented Oct 24, 2017

Note that the approach from my previous comment will work with any scheme of specifying pseudolocales. In my example I chose to put en-XA in requested but it could also be an app-specific pref which handles that. This is also how I understood your comment from 5 days ago.

There's value in using en-XA, ar-XB now that they were standardized in the CLDR. They will become recognizable names for pseudolocales and with time will gain support in various tools and platforms.

@zbraniecki
Copy link
Collaborator Author

@stasm - would you have time to draft a plan to get this into a POC state? I'm happy to commit to work on that, but would prefer to follow your vision.

@zbraniecki
Copy link
Collaborator Author

zbraniecki commented Nov 7, 2017

Some POC prototyping gave me this: https://youtu.be/E3t8-u8e5D0

It's actually quite simple to get to that point, and even get Intl hooked in. There's going to be more work to be done to get complex messages handling.

I'm wondering if it's better to import pseudo for side-effects and allow itself to hook into fluent:

import "fluent-pseudo";

let cx = new MessageContext(locales, {
  usePseudo: true
});

or make people hook it explicitly:

// strategy1 - 30% longer via duplication of vovels, larin chars transformed, LTR
import { strategy1 } from "fluent-pseudo";

let cx = new MessageContext(locales, {
  transform: strategy1
});

Enough for now, will wait for stas :)

@stasm
Copy link
Contributor

stasm commented Nov 9, 2017

I find the explicit version easier to understand. It will also be easier to test.

@Pike
Copy link
Contributor

Pike commented Nov 9, 2017

From a developer point-of-view, I don't expect that any Firefox developer will be touching code at that abstraction level. We explicitly don't want these folks to know that MessageContext even exists.

@stasm
Copy link
Contributor

stasm commented Nov 9, 2017

Agreed. IIUC this issue is about the low-level API which fluent-web will completely hide.

@stasm
Copy link
Contributor

stasm commented Dec 7, 2017

@zbraniecki and I talked about this yesterday and today. We'd like to start simple with the approach from comment #83 (comment).

  • The MessageContext constructor will accept a process or transform option whose value is a function to be invoked on all TextElements.
    • The transformation would happen inside of the MessageContext.addMessages call in the runtime parser.
    • We'll publish a new package with the psaccent and psbidi transforms. We'll discuss the exact strategies and implementations later.
    • Users are free to write their own transform functions. We encourage experimentation.
  • For now, we'll use regular language tags: en-US, de, etc. The transform function should only be passed to the constructor if the user has expressed interest in using pseudolocales. This should be handled outside of Fluent.
    • As a consequence, formatted dates interpolated into pseudolocalized translations will be spelled normally (e.g. Tuesday if the current locale is en-US).
  • In the future, we'll have Intl.Locale (and fluent-locale) and it will be easy to recognize well-formed BCP47 variant tags, e.g. en-US-psaccent and en-US-psbidi.
    • CLDR's en-XA and ar-XB are called such mostly because of legacy code in Android which wouldn't handle language tags with variants.
    • @zbraniecki will start a discussion with CLDR about using language variants for pseudolocales.
    • In even farther future we might try to standardize the variants with IANA.
      • If variants get standardized, MessageContext could by default include transforms for known pseudolocales.

@Pike
Copy link
Contributor

Pike commented Dec 7, 2017

Users are free to write their own transform functions. We encourage experimentation.

This issue uses the word user for a ton of things, I'm loosing track.

Say, I'm a firefox developer, and I want to run my local build with psaccent on. How would I do that, and which parts of our code stack are involved in doing so, and what would they need to do?

@stasm
Copy link
Contributor

stasm commented Dec 7, 2017

This issue uses the word user for a ton of things, I'm loosing track.

Good point. I meant the users of the library here. Elsewhere I meant the user of the app.

Say, I'm a firefox developer, and I want to run my local build with psaccent on. How would I do that, and which parts of our code stack are involved in doing so, and what would they need to do?

You would start by flipping a pref somewhere in the UI. The values of the pref could be: psaccent, psbidi. fluent-gecko (which is fluent-dom packaged for Gecko privileged content) would observe this pref and use in its generateMessages which constructs MessageContext instances. fluent-react in Devtools would need to do the same.

@zbraniecki
Copy link
Collaborator Author

Can we close this issue? We have capability for pseudolocalization since fluent 0.7 and we use it in Gecko. Or should we wait until we extract fluent-pseudo as a package (I have that in rust - https://github.com/projectfluent/fluent-rs/tree/master/fluent-pseudo )

@julienw
Copy link

julienw commented Feb 17, 2021

Hey @zbraniecki, by chance would you have some guidance or documentation about how to use pseudolanguages with fluent.js/fluent-react in a plain web page (as opposed to in Firefox)? Thanks!

@zbraniecki
Copy link
Collaborator Author

hmm, I can tell you how to enable it in fluent.js, not react
You need to extract from an old L10nRegistry.jsm https://hg.mozilla.org/mozilla-central/file/a1f74e8c8fb72390d22054d6b00c28b1a32f6c43/intl/l10n/L10nRegistry.jsm#l425
and then when constructing FluentBundle you pass a method as transform - https://github.com/projectfluent/fluent.js/blob/master/fluent-bundle/src/bundle.ts#L61
I assume something similar happens for react, but I'm short on details
if you do spend time, I'd accept that resurrected block of code as fluent-pseudo in fluent.js repo to maintain it!

@julienw
Copy link

julienw commented Feb 23, 2021

Thanks for the pointers! This is what I did to support pseudo locales in the profiler: firefox-devtools/profiler#3188

We enable a pseudo locale by calling a function in the devtools console.

Would the file https://github.com/firefox-devtools/profiler/pull/3188/files#diff-ca1e6802f7be91e16b4123f89f090a2c40053a53e52b73ed3d69469619179d24 be suitable as fluent-pseudo? I'm not sure how "bidi" would set "rtl" with "fluent-dom", do you know? Or maybe fluent-dom doesn't set it anyway, like fluent-react?

@zbraniecki
Copy link
Collaborator Author

yeah, it looks good!

For a while we used a hardcoded list which is quite stable - https://github.com/mozilla-b2g/gaia/blob/master/shared/js/intl/l20n-client.js#L31-L35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants