Reintroduce pseudo-localization #83

zbraniecki · 2017-10-19T00:47:34Z

Coming back from the Unicode Conference, there was a lot of chatter about pseudo-locales.

Fluent already had a pretty good support for pseudo-locales in the past and due to our client-side mode, we offer an exciting approach to pseudo-locales - runtime pseudolocalization.

I'd like to bring back this: https://github.com/l20n/l20n.js/blob/v3.x/src/lib/pseudo.js to modern fluent.

@stasm - do you have any thoughts on how would you like it to work?

zbraniecki · 2017-10-19T07:34:44Z

Maybe just introducing some generic "post-processing" on messages in MessageContext, and then adding fluent-pseudolocale would work as the first step?

let ctx = new MessageContext(['ar'], {
  process: fluent_pseudolocales.transform.bind('ar-XB')
});
let msg = ctx.formatValue('l10n-id');

stasm · 2017-10-19T15:51:47Z

Maybe just introducing some generic "post-processing" on messages in MessageContext, and then adding fluent-pseudolocale would work as the first step?

That was my first thought as well. A few additional thoughts below. I'll try to have answers tomorrow.

We have to take into account how this will interact with the language negotiation. Would we expect the user to set their requested locale to a pseudolocale in order to enable it? Would we require that developers add pseudolocales to the list of available locales in their app?

Perhaps it would make sense to encode pseudolocales as Unicode extensions to BCP47? Something like ab-CD-u-pseudo-accent or ab-CD-u-pseudo-rtl. The language negotiation process would then still correctly pick the regular ab-CD for fetching translation resources. Some logic would then be responsible for transforming the fetched resource using the fluent-pseudo module.

What should be the outcome of formatting a date or a number in a pseudolocalized translation?

Also, the first step might be to only support build-time pseudolocalization.

zbraniecki · 2017-10-19T15:59:03Z

Google just went for en-XA, and ar-XB and added them to CLDR. So we can get internationalized date/time from CLDR 31 if we use those two.

Now, my problem with this is that because they used en-XA and not fr-XA, the numbers still look the same. I recommended them fr-XA, but it may be too late.

Since we'd be doing runtime pseudo, maybe we don't need extensions (and they wouldn't be unicode extensions, but rather variants ( Google originally used en-psaccent and ar-psaccentrtl or sth like that).

Maybe all we need is:

let ctx = new MessageContext('pl', {
  process: pseudo
});

and it'll transform polish strings? This way we could get a pseudo of the current locale, irrelevant of what it is.

stasm · 2017-10-19T16:00:54Z

How would you decide when to turn pseudolocalization on? A different logic independent of the language negotiation?

zbraniecki · 2017-10-19T16:59:07Z

How would you decide when to turn pseudolocalization on? A different logic independent of the language negotiation?

Yeah! This way the user can either detect pseudo from a langtag (oh, you're using XA region?) or by some checkbox (show me pseudolocale).
Since we're on client-side at runtime, that would mean no rebuilding, restarting or anything. Just take the exact locale we use, whatever it is, and recompute for pseudo.

stasm · 2017-10-19T17:15:46Z

Yeah! This way the user can either detect pseudo from a langtag (oh, you're using XA region?) or by some checkbox.

If the user sets their requested to en-XA and the available only have en-US, the result of the language negotiation will be `en-US`. At the moment when we’d create the MessageContext we wouldn’t know the region was XA. On the other hand, if we add en-XA to the list of available locales, and the files for it do not exists on disk, we will fail to fetch anything. We’d need to extend the IO logic to fetch them. This might mean moving the pseudolocalization to fluent-web. I think it would be better to have it on a lower level though. Does the language negotiation preserve extensions found on the requested locales?

zbraniecki · 2017-10-19T17:19:13Z

Oh, you're right.

I think it would be better to have it on a lower level though.

I agree.

Does the language negotiation preserve extensions found on the requested locales?

yes.

So, maybe private-extension? fr-FR-x-pseudo ?
We just need to make sure that if we see a pseudo like this, we actually feed en-XA, ar-XB to Intl API (so that CLDR picks it up)

stasm · 2017-10-20T09:55:16Z

There's discussion in http://unicode.org/cldr/trac/ticket/3971 and http://unicode.org/cldr/trac/ticket/9819 on why CLDR didn't go for variant tags. It's mostly about compatibility with existing code. Also, since en-XA and ar-XB are now in CLDR we should stick to these codes. I wish we hadn't missed the discussion when it happened.

I'm reconsidering my stance on where this logic should live. Having it higher up, e.g. in fluent-web would allow multiple approaches:

build-time transformation,
AST transformation right after IO,
string transformation after ctx.format().

fluent-web can also transform translations in a way which preserves HTML for the overlay mechanic.

Pike · 2017-10-20T10:33:01Z

I'd think that the most accute way to implement the actual pseudo localization would be on the AST?

stasm · 2017-10-20T10:39:06Z

On buildtime or on runtime?

Pike · 2017-10-20T10:49:56Z

For both, I guess.

stasm · 2017-10-20T11:15:45Z

Transforming the runtime AST means doing the transformation inside of MessageContext. That still might a viable option given my earlier comments: fluent-web could supply a markup-aware transform function to the MessageContext constructor. Compared to transforming the result of ctx.format(), this would have the advantage of only transforming TextElements in the translation rather than the whole string.

We still need to solve the problem of fetching valid locale files. Given that it looks like fluent-web (or fluent-react) would need to handle the pseudolocalization anyways (if only to be HTML-aware), I think it makes sense to special-case en-XA and ar-XB in their IO.

For example, given the following result of language negotiation:

requested: en-XA, de
available: en-XA, en-US, de
default: en-US
negotiated: en-XA, de, en-US

…a developer using fluent-react will need to add a special case to the IO code which fetches en-US when en-XA is requested. This sounds okay to me since the same developer has already put en-XA among the available locales.

let ctx = new MessageContext(negotiated, { pseudo: makeAccent });
ctx.addMessages( /* en-US translations to be transformed into en-XA */);

Or, if the build pipeline is capable of building pseudolocales up front, the IO code would simply fetch the pre-made en-XA files.

let ctx = new MessageContext(negotiated);
ctx.addMessages( /* en-XA  translations generated on build-time */);

zbraniecki · 2017-10-20T15:32:07Z

I do not agree with Stas that we have to use en-XA and ar-XB here. I believe it's perfectly fine for us to use whatever mechanism we want to use to recognize pseudolocales, and then just make sure to collapse in Intl constructor onto en-XA and ar-XB for Intl API / CLDR.

stasm · 2017-10-24T06:35:31Z

Note that the approach from my previous comment will work with any scheme of specifying pseudolocales. In my example I chose to put en-XA in requested but it could also be an app-specific pref which handles that. This is also how I understood your comment from 5 days ago.

There's value in using en-XA, ar-XB now that they were standardized in the CLDR. They will become recognizable names for pseudolocales and with time will gain support in various tools and platforms.

zbraniecki · 2017-11-07T00:42:45Z

@stasm - would you have time to draft a plan to get this into a POC state? I'm happy to commit to work on that, but would prefer to follow your vision.

zbraniecki · 2017-11-07T03:24:25Z

Some POC prototyping gave me this: https://youtu.be/E3t8-u8e5D0

It's actually quite simple to get to that point, and even get Intl hooked in. There's going to be more work to be done to get complex messages handling.

I'm wondering if it's better to import pseudo for side-effects and allow itself to hook into fluent:

import "fluent-pseudo";

let cx = new MessageContext(locales, {
  usePseudo: true
});

or make people hook it explicitly:

// strategy1 - 30% longer via duplication of vovels, larin chars transformed, LTR
import { strategy1 } from "fluent-pseudo";

let cx = new MessageContext(locales, {
  transform: strategy1
});

Enough for now, will wait for stas :)

stasm · 2017-11-09T12:10:50Z

I find the explicit version easier to understand. It will also be easier to test.

Pike · 2017-11-09T12:32:35Z

From a developer point-of-view, I don't expect that any Firefox developer will be touching code at that abstraction level. We explicitly don't want these folks to know that MessageContext even exists.

stasm · 2017-11-09T12:36:57Z

Agreed. IIUC this issue is about the low-level API which fluent-web will completely hide.

stasm · 2017-12-07T00:40:33Z

@zbraniecki and I talked about this yesterday and today. We'd like to start simple with the approach from comment #83 (comment).

The MessageContext constructor will accept a process or transform option whose value is a function to be invoked on all TextElements.
- The transformation would happen inside of the MessageContext.addMessages call in the runtime parser.
- We'll publish a new package with the psaccent and psbidi transforms. We'll discuss the exact strategies and implementations later.
- Users are free to write their own transform functions. We encourage experimentation.
For now, we'll use regular language tags: en-US, de, etc. The transform function should only be passed to the constructor if the user has expressed interest in using pseudolocales. This should be handled outside of Fluent.
- As a consequence, formatted dates interpolated into pseudolocalized translations will be spelled normally (e.g. Tuesday if the current locale is en-US).
In the future, we'll have Intl.Locale (and fluent-locale) and it will be easy to recognize well-formed BCP47 variant tags, e.g. en-US-psaccent and en-US-psbidi.
- CLDR's en-XA and ar-XB are called such mostly because of legacy code in Android which wouldn't handle language tags with variants.
- @zbraniecki will start a discussion with CLDR about using language variants for pseudolocales.
- In even farther future we might try to standardize the variants with IANA.
  - If variants get standardized, MessageContext could by default include transforms for known pseudolocales.

Pike · 2017-12-07T13:11:00Z

Users are free to write their own transform functions. We encourage experimentation.

This issue uses the word user for a ton of things, I'm loosing track.

Say, I'm a firefox developer, and I want to run my local build with psaccent on. How would I do that, and which parts of our code stack are involved in doing so, and what would they need to do?

stasm · 2017-12-07T20:05:34Z

This issue uses the word user for a ton of things, I'm loosing track.

Good point. I meant the users of the library here. Elsewhere I meant the user of the app.

Say, I'm a firefox developer, and I want to run my local build with psaccent on. How would I do that, and which parts of our code stack are involved in doing so, and what would they need to do?

You would start by flipping a pref somewhere in the UI. The values of the pref could be: psaccent, psbidi. fluent-gecko (which is fluent-dom packaged for Gecko privileged content) would observe this pref and use in its generateMessages which constructs MessageContext instances. fluent-react in Devtools would need to do the same.

zbraniecki · 2020-02-13T18:09:50Z

Can we close this issue? We have capability for pseudolocalization since fluent 0.7 and we use it in Gecko. Or should we wait until we extract fluent-pseudo as a package (I have that in rust - https://github.com/projectfluent/fluent-rs/tree/master/fluent-pseudo )

julienw · 2021-02-17T16:07:03Z

Hey @zbraniecki, by chance would you have some guidance or documentation about how to use pseudolanguages with fluent.js/fluent-react in a plain web page (as opposed to in Firefox)? Thanks!

zbraniecki · 2021-02-23T17:32:30Z

hmm, I can tell you how to enable it in fluent.js, not react
You need to extract from an old L10nRegistry.jsm https://hg.mozilla.org/mozilla-central/file/a1f74e8c8fb72390d22054d6b00c28b1a32f6c43/intl/l10n/L10nRegistry.jsm#l425
and then when constructing FluentBundle you pass a method as transform - https://github.com/projectfluent/fluent.js/blob/master/fluent-bundle/src/bundle.ts#L61
I assume something similar happens for react, but I'm short on details
if you do spend time, I'd accept that resurrected block of code as fluent-pseudo in fluent.js repo to maintain it!

julienw · 2021-02-23T21:13:31Z

Thanks for the pointers! This is what I did to support pseudo locales in the profiler: firefox-devtools/profiler#3188

We enable a pseudo locale by calling a function in the devtools console.

Would the file https://github.com/firefox-devtools/profiler/pull/3188/files#diff-ca1e6802f7be91e16b4123f89f090a2c40053a53e52b73ed3d69469619179d24 be suitable as fluent-pseudo? I'm not sure how "bidi" would set "rtl" with "fluent-dom", do you know? Or maybe fluent-dom doesn't set it anyway, like fluent-react?

zbraniecki · 2021-02-23T21:47:16Z

yeah, it looks good!

For a while we used a hardcoded list which is quite stable - https://github.com/mozilla-b2g/gaia/blob/master/shared/js/intl/l20n-client.js#L31-L35

stasm added the housekeeping label Jul 26, 2019

stasm mentioned this issue Jul 26, 2019

Implement pseudolocalizations #17

Closed

meandavejustice mentioned this issue Dec 24, 2019

use pseudolocalization to test translations on payments server mozilla/fxa#3756

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reintroduce pseudo-localization #83

Reintroduce pseudo-localization #83

zbraniecki commented Oct 19, 2017

zbraniecki commented Oct 19, 2017

stasm commented Oct 19, 2017

zbraniecki commented Oct 19, 2017

stasm commented Oct 19, 2017

zbraniecki commented Oct 19, 2017

stasm commented Oct 19, 2017 via email •

edited

zbraniecki commented Oct 19, 2017

stasm commented Oct 20, 2017

Pike commented Oct 20, 2017

stasm commented Oct 20, 2017

Pike commented Oct 20, 2017

stasm commented Oct 20, 2017 •

edited

zbraniecki commented Oct 20, 2017

stasm commented Oct 24, 2017

zbraniecki commented Nov 7, 2017

zbraniecki commented Nov 7, 2017 •

edited

stasm commented Nov 9, 2017

Pike commented Nov 9, 2017

stasm commented Nov 9, 2017

stasm commented Dec 7, 2017 •

edited

Pike commented Dec 7, 2017

stasm commented Dec 7, 2017

zbraniecki commented Feb 13, 2020

julienw commented Feb 17, 2021

zbraniecki commented Feb 23, 2021

julienw commented Feb 23, 2021 •

edited

zbraniecki commented Feb 23, 2021

Reintroduce pseudo-localization #83

Reintroduce pseudo-localization #83

Comments

zbraniecki commented Oct 19, 2017

zbraniecki commented Oct 19, 2017

stasm commented Oct 19, 2017

zbraniecki commented Oct 19, 2017

stasm commented Oct 19, 2017

zbraniecki commented Oct 19, 2017

stasm commented Oct 19, 2017 via email • edited

zbraniecki commented Oct 19, 2017

stasm commented Oct 20, 2017

Pike commented Oct 20, 2017

stasm commented Oct 20, 2017

Pike commented Oct 20, 2017

stasm commented Oct 20, 2017 • edited

zbraniecki commented Oct 20, 2017

stasm commented Oct 24, 2017

zbraniecki commented Nov 7, 2017

zbraniecki commented Nov 7, 2017 • edited

stasm commented Nov 9, 2017

Pike commented Nov 9, 2017

stasm commented Nov 9, 2017

stasm commented Dec 7, 2017 • edited

Pike commented Dec 7, 2017

stasm commented Dec 7, 2017

zbraniecki commented Feb 13, 2020

julienw commented Feb 17, 2021

zbraniecki commented Feb 23, 2021

julienw commented Feb 23, 2021 • edited

zbraniecki commented Feb 23, 2021

stasm commented Oct 19, 2017 via email •

edited

stasm commented Oct 20, 2017 •

edited

zbraniecki commented Nov 7, 2017 •

edited

stasm commented Dec 7, 2017 •

edited

julienw commented Feb 23, 2021 •

edited