Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic message references #80

Open
stasm opened this issue Jan 15, 2018 · 22 comments
Open

Dynamic message references #80

stasm opened this issue Jan 15, 2018 · 22 comments
Labels

Comments

@stasm
Copy link
Member

@stasm stasm commented Jan 15, 2018

It is sometimes desired to parametrize message references in placeables. In this issue I'd like to propose a new argument type, extending FluentType which could be used to programmatically pass message references as arguments to messages.

Problem Statement

Redundancy is considered good for localization. It allows localizers to tailor the wording and the grammar of the translation of each particular case. Also see Fluent Good Practices.

In general, the pattern of having one message per item is preferred over factoring the action out to its own message (Delete This { $item }) and passing the translated item in some way.

# Having two separate messages allows localizers
# to customize translations in each, if needed.
delete-picture = Delete This Picture
delete-video = Delete This Video

In some cases, however, this pattern doesn't scale well.

Consider this example from Firefox (source):

# %S is the website origin (e.g. www.mozilla.org)
getUserMedia.sharingMenuCamera = %S (camera)
getUserMedia.sharingMenuMicrophone = %S (microphone)
getUserMedia.sharingMenuAudioCapture = %S (tab audio)
getUserMedia.sharingMenuApplication = %S (application)
getUserMedia.sharingMenuScreen = %S (screen)
getUserMedia.sharingMenuWindow = %S (window)
getUserMedia.sharingMenuBrowser = %S (tab)
getUserMedia.sharingMenuCameraMicrophone = %S (camera and microphone)
getUserMedia.sharingMenuCameraMicrophoneApplication = %S (camera, microphone and application)
getUserMedia.sharingMenuCameraMicrophoneScreen = %S (camera, microphone and screen)
getUserMedia.sharingMenuCameraMicrophoneWindow = %S (camera, microphone and window)
getUserMedia.sharingMenuCameraMicrophoneBrowser = %S (camera, microphone and tab)
getUserMedia.sharingMenuCameraAudioCapture = %S (camera and tab audio)
getUserMedia.sharingMenuCameraAudioCaptureApplication = %S (camera, tab audio and application)
getUserMedia.sharingMenuCameraAudioCaptureScreen = %S (camera, tab audio and screen)
getUserMedia.sharingMenuCameraAudioCaptureWindow = %S (camera, tab audio and window)
getUserMedia.sharingMenuCameraAudioCaptureBrowser = %S (camera, tab audio and tab)
getUserMedia.sharingMenuCameraApplication = %S (camera and application)
getUserMedia.sharingMenuCameraScreen = %S (camera and screen)
getUserMedia.sharingMenuCameraWindow = %S (camera and window)
getUserMedia.sharingMenuCameraBrowser = %S (camera and tab)
getUserMedia.sharingMenuMicrophoneApplication = %S (microphone and application)
getUserMedia.sharingMenuMicrophoneScreen = %S (microphone and screen)
getUserMedia.sharingMenuMicrophoneWindow = %S (microphone and window)
getUserMedia.sharingMenuMicrophoneBrowser = %S (microphone and tab)
getUserMedia.sharingMenuAudioCaptureApplication = %S (tab audio and application)
getUserMedia.sharingMenuAudioCaptureScreen = %S (tab audio and screen)
getUserMedia.sharingMenuAudioCaptureWindow = %S (tab audio and window)
getUserMedia.sharingMenuAudioCaptureBrowser = %S (tab audio and tab)

Or the use-case @cruelbob gives in #79 (comment):

Collect meat from cows, pigs and sheep.

One of my favorite games, Heroes of Might and Magic III, pits armies consisting of over 140 different unit types in battles against each other. After every move, the battle log reads:

The Bone Dragon does 46 damage. 2 Griffins perish.

Or:

The Cyclops Kings do 233 damage. One Giant perishes.

If we wanted to avoid concatenation of sentences (two sentences per creature: one for do X damage and one for X creatures perish), we'd end up with 141² = 19,881 different permutations of creature pairs.

This doesn't scale well.

Proposed Solution

Introducing some redundancy should still be preferred for small sets of items. For large sets leading to lots and lots of permutations, it should be possible to parametrize the translation of placeables.

I'll use the example of HoMM3 because the other two also require the List Formatting feature to make sense.

I'd like to make it possible to pass external arguments which resolve to message references. Given the following FTL:

-creature-bone-dragon =
    {
       *[singular] Bone Dragon
        [plural] Bone Dragons
    }
-creature-griffin =
    {
       *[singular] Griffin
        [plural] Griffins
    }

# … Hundreds more …

battle-log-attack-perish =
    { $attacker_count ->
        [one] The { $attacker_name[singular] } does
       *[other] The { $attacker_name[plural] } do
    } { $damage_points } damage. { $perish_count ->
        [one] One { $defender_name[singular] } perishes.
       *[other] { $defender_count } { $defender_name[plural] } perish.
    }

…both $attacker_name and $defender_name would be arguments of type FluentReference (extending FluentType; same as FluentNumber and FluentDateTime). The developer would pass them like so:

let msg = ctx.getMessage("battle-log-attack-perish");
log(ctx.format(msg, {
    attacker_name: new FluentReference("-creature-bone-dragon"),
    attacker_count: 1,
    defender_name: new FluentReference("-creature-griffin"),
    perish_count: 2,
    damage_points: 46
}));

This change mostly requires additions to the MessageContext resolution logic. Syntax-wise, the VariantExpression and the AttributeExpression should be changed to accept both message identifiers as well as external arguments as parent objects (like in the $attacker_name[singular] example above).

Open Questions

  1. Should we also allow public messages to be dynamically referenced like this?

Sign-offs

(toggle)
@Pike
  • I support this.
  • I don't care.
  • I object this.
@stasm
  • I support this.
  • I don't care.
  • I object this.
@zbraniecki
  • I support this.
  • I don't care.
  • I object this.

Also CC @flodolo.

@cruelbob

This comment has been minimized.

Copy link

@cruelbob cruelbob commented Jan 15, 2018

In some languages there are grammatical cases(https://en.wikipedia.org/wiki/Grammatical_case). This feature can help with this problem.
Example in russian:
Nominative case - У меня есть коровы, свиньи и овцы. (I have cows, pigs and sheep.)
Genitive case - У меня нет коров, свиней и овец. (I have no cows, pigs and sheep.)

@stasm

This comment has been minimized.

Copy link
Member Author

@stasm stasm commented Jan 16, 2018

Grammatical cases are already well-supported by Fluent; see http://projectfluent.org/fluent/guide/variants.html. But you're right—these two features will synergize well :)

@Pike

This comment has been minimized.

Copy link
Collaborator

@Pike Pike commented Jan 23, 2018

I think we should do this, the use-cases look good enough. Localizers' life will be hard in these cases, but less hard than with the alternative.

@stasm

This comment has been minimized.

Copy link
Member Author

@stasm stasm commented Jan 23, 2018

Thanks, @Pike. A few more examples which illustrate why it's useful to resolve the references on the localization side rather than in the code (and pass translated strings as arguments).

Let's assume a game UI which logs what the player sees:

You see a fairy.
You see an elf.

English Localization

-creature-fairy = fairy
-creature-elf = elf
    .StartsWith = vowel

you-see =
    You see { $object.StartsWith ->
        [vowel] an { $object }
       *[consonant] a { $object }
    }.

The you-see message can inspect the English-specific StartsWith attribute and choose between the correct indefinite article a or an. For conciseness, -create-fairy doesn't define the StartsWith attribute at all; the default variant in you-see will be used.

German Localization

-creature-fairy = Fee
    .Genus = Femininum
-creature-elf =
    {
       *[Nominativ] Elf
        [Akkusativ] Elfen
    }
    .Genus = Maskulinum

you-see =
    Du siehst { $object.Genus->
       *[Maskulinum] einen { $object[Akkusativ] }
        [Femininum] eine { $object[Akkusativ] }
        [Neutrum] ein { $object[Akkusativ] }
    }.

The you-see message can inspect the German-specific Genus attribute and choose between the correct indefinite article for the gender of $object. The object is also correctly accorded with the verb sehen which requires the Akkusativ.


PS. The examples above don't solve capitalization (fairy and elf are always lowercase in the English translation), but I'm leaving it out on purpose. It may be solved by nested variants or a function.

@zbraniecki

This comment has been minimized.

Copy link
Contributor

@zbraniecki zbraniecki commented Mar 8, 2018

While working on https://bugzilla.mozilla.org/show_bug.cgi?id=1435915 I found a use case for this feature.

There's an API there which constructs a description of the application handler.

It can be a localizable term, like "Portable Document Format (PDF)" or "Video Podcast", it can be a generic description like { $extension } file, or it can be a raw string.

I handle all three scenarios using a strategy from Gaia days - the API circulates an "l10n type" object:

// a string -> l10nId
// an object -> {id: l10nId, args: l10nArgs}
// an object -> {raw: string}

{id: "applications-type-video-podcast-feed"},
{id: "applications-file-ending", args: {extension: ".mp4"}},
{raw: "Windows Video File"}, // this one comes straight from the OS

Those strings are resolved in a loop and displayed in a table in Firefox Preferences in a column "type description".

Now, the trick is that there's a place in the API which separates how this string is displayed in case there are two entries with the same description.

This can happen because for example, there are two file types for "Video Podcast" or "Windows Video File".

In that case, there's a special string in Fluent:

applications-type-description-with-type = { $description } ({ $type })

which is used to display Windows Video File (.mp4) separately from Windows Video File (.mpg).

With support for this UI I could use the FluentReference as $description instead of having to resolve the string with formatValue and pass it as a string.

I'm going to workaround it for now, but just thought it may be useful to know that we already encountered a use case in Firefox.

@zbraniecki

This comment has been minimized.

Copy link
Contributor

@zbraniecki zbraniecki commented Mar 8, 2018

This would also help with cases where a message value is used as an attribute in another element.

Example 1: https://searchfox.org/mozilla-central/rev/588d8120aa11738657da93e09a03378bcd1ba8ec/browser/locales/en-US/browser/preferences/preferences.ftl#35

could be:

pane-general-title = General
pane-search-title = Search

category =
    .tooltiptext = { $paneTitle }

Example2:

the applications-type-description are used as values, but then placed into the XUL as <item typeDescription="..."/>. It would be useful to make it:

item-type-description =
    .typeDescription = { $typeDescription }

Granted, I don't know how will we store it in data-l10n-args.

@stasm stasm added the syntax label Mar 26, 2018
@zbraniecki

This comment has been minimized.

Copy link
Contributor

@zbraniecki zbraniecki commented Apr 2, 2018

Aaand another use case:

language-pl = Polish
language-fr = French

region-pl = Poland
region-us = United States

locale-pattern = { $language } ($region})
@stasm

This comment has been minimized.

Copy link
Member Author

@stasm stasm commented Apr 10, 2018

From https://bugzilla.mozilla.org/show_bug.cgi?id=1451450#c6:

We'll need to support VariantExpressions and AttributeExpressions on both MessageReferences and ExternalArguments:

-term[varname]
$object[varname]

-term.attr
$object.attr

Which I think is best solved by adding another level of nesting to the AST, unfortunately. Right now, -term[varname] parses as:

{
    "type": "VariantExpression",
    "id": {
        "type": "Identifier",
        "name": "-term"
    },
    "key": {
        "type": "VariantName",
        "name": "name"
    }
}

In order to support both MessageReferences and ExternalArguments and to be able to serialize them, I think it should rather parse as:

{
    "type": "VariantExpression",
    "of": {
        "type": "MessageReference",
        "id": {
            "type": "Identifier",
            "name": "-term"
        }
    },
    "key": {
        "type": "VariantName",
        "name": "name"
    }
}

This is best visualized with the spans of $object[varname]:

$object[varname]

 +----+          Identifier
+-----+          ExternalArgument
        +------+ VariantName
+--------------+ VariantExpression
@spookylukey

This comment has been minimized.

Copy link
Contributor

@spookylukey spookylukey commented Apr 11, 2018

In Django we have a use case for this feature, not just as a matter of convenience - without it we wouldn't be able to generate correct translations at all. We have exactly the "Delete the selected %s items" case, but in our case, as a framework, %s is the name of a model, provided usually in English by a developer. It's therefore not known to the Django authors, but is known to the app developers. If we are translating into French, for example, the word "selected" becomes either " "sélectionnés" or "sélectionnées" depending on the gender of the substituted model name.

We would also want some way for FluentReference to provide a fallback - what happens if e.g. the application code passes in FluentReference("-creature-a-new-creature") but -creature-a-new-creature doesn't exist at all in the FTL file? We'd want to pass in FluentReference("-creature-a-new-creature", "a new creature"), and "a new creature" would be used for all variants if -creature-a-new-creature is not defined at all.

(Whether Django, with its current investment in gettext, would be able to move to fluent is another matter, but the point applies to other framework-like code, and the choices of frameworks can affect the choices of a lot of other things).

@stasm stasm added syntax ast and removed syntax labels May 15, 2018
@stasm stasm added this to To do in Syntax 0.8 May 23, 2018
@zbraniecki

This comment has been minimized.

Copy link
Contributor

@zbraniecki zbraniecki commented May 24, 2018

I think I'm stuck with https://bugzilla.mozilla.org/show_bug.cgi?id=1435915#c15 until this lands.

@stasm

This comment has been minimized.

Copy link
Member Author

@stasm stasm commented May 25, 2018

Some more explanation would help :) Do you mean something like the following?

applications-action-always-ask =
    .label = Always ask
applications-action-generic-label = {$menuitem.label}

And then in JS:

setAttributes(
    labelElement,
    "applications-action-generic-label",
    {
        menuitem: new FluentReference(menuitemElement.getAttribute("data-l10n-id")),
    }
);

It's still an open question for me whether we should allow dynamic reference to messages. In fact, I'd prefer to start by allowing dynamic references to terms only. I have concerns about dynamic reference being abused in scenarios where they're not about grammar. In bug 1435915 comment 16 I suggested a slightly more verbose alternative which will fix the problem outline in the bug. IIUC, the real fix would be to encapsulate the variable shape of the translation with a WebComponent.

@stasm stasm added the FUTURE label May 25, 2018
@stasm stasm removed this from To do in Syntax 0.8 May 29, 2018
@stasm stasm added syntax and removed syntax labels Oct 16, 2018
@stasm stasm removed syntax: ast labels Oct 16, 2018
@sn0ooow

This comment has been minimized.

Copy link

@sn0ooow sn0ooow commented Oct 23, 2018

https://pontoon.mozilla.org/ro/common-voice/cross-locale.ftl/?string=177874

In this case, no matter what {$lang} will display, I need lower-case for first letter, no matter what locale name will be. This is how Romanian language works.

For example it should display "Mulțumim pentru interesul de a contribui la română" ("Thank you for your interest in contributing to Romanian").

If there could be a way to force this by sintax or to define specific new rules for Romanian, that would be great!

And maybe an option for uppercasing too, not only lowercasing. It could be useful perhaps for other languages.

@aphillips

This comment has been minimized.

Copy link

@aphillips aphillips commented Oct 23, 2018

@cristisilaghi Note that CLDR provides different contextual strings for "standalone" vs. "in context" display of strings like language, region, time zone name (or other values, such as month names, etc.).

@zbraniecki

This comment has been minimized.

Copy link
Contributor

@zbraniecki zbraniecki commented Jan 5, 2019

Another example where this would save the day: https://phabricator.services.mozilla.com/D15762

we have two long lists of language and region names:

region-name-us = United States
region-name-pl = Poland
region-name-de = Germany
language-name-pl = Polish
language-name-de = German
language-name-en = English

and a single connector:

locale-display-name = { $lang } ({ $region })

with dynamic refs I could do:

document.l10n.setAttributes(menuitem, "locale-display-name", {
  lang: FluentMessage(`language-name-${langCode}`),
  region: FluentMessage(`region-name-${regionCode}`)
});

and have it properly retranslate on locale change and such.

@alabamenhu

This comment has been minimized.

Copy link
Contributor

@alabamenhu alabamenhu commented Apr 1, 2019

I have been working a P6 implementation of Fluent while porting over a text adventure game and I definitely agree dynamic term references are necessary, and shouldn't be too hard to implement.

The initial proposed syntax of just using $foo[bar] would cause ambiguous entries for languages that don't need case/etc) and don't have variants as both a variable and a variable-powered term reference would look $foo(When translating from say, an en.ftl file, it also let's the localizer know if they need to try to creatively word to avoid case/etc issues, or if they know it's going to be a term, so they can then add on case information).

To me the better syntax would be -$variable, as $ is not a valid identifier character, so $variable cannot be confused by parsers as a term. In effective, it would be a variable term reference that just simultaneously performs the actions of a variable and a message reference (and hence -$).

I'm haven't dug deep into the internals of other implementations, but for the way I wrote the P6 implementation, it took me about 5 minutes to add support for -$ and a VariableTermReference class.

Here's the FTL file I used and the results:

-dog =
  { $style ->
     *[normal          ] dog
      [diminutive      ] puppy
      [diminutive-redup] puppy dog
  }
-cat =
  { $style ->
     *[normal          ] cat
      [diminutive      ] kitten
      [diminutive-redup] kitty cat
  }
cute       = Wow, that { -$animal(style: "diminutive") } is so cute!
stupidcute = OMG, that { -$animal(style: "diminutive-redup") } is like so amazeballs cute!
handsome   = That's a handsome { -$animal(style: "normal") }.

To call it, nothing changes from a regular variable:

say localized("handsome",   :animal<dog>);
say localized("handsome",   :animal<cat>);
say localized("cute",       :animal<dog>);
say localized("cute",       :animal<cat>);
say localized("stupidcute", :animal<dog>);
say localized("stupidcute", :animal<cat>);

And the results are fairly intuitive:

That's a handsome dog.
That's a handsome cat.
Wow, that puppy is so cute!
Wow, that kitten is so cute!
OMG, that puppy dog is like so amazeballs cute!
OMG, that kitty cat is like so amazeballs cute!

It's currently posted to the repository (but not listed as a release) if anyone wants to play around with it.

@zbraniecki

This comment has been minimized.

Copy link
Contributor

@zbraniecki zbraniecki commented Oct 24, 2019

@Pike

This comment has been minimized.

Copy link
Collaborator

@Pike Pike commented Jan 23, 2020

I've come full circle on this one.

I still like the API and its intent, but the implications on the engineering and localization processes are tough.

I can see dynamic references be an interesting option for an ecosystem where the developers land all localization as part of their feature development, including appropriate testing and tests.

For other environments, the questions below just don't come up with good answers to me.

Background: Dynamic references basically re-establish string concatenation, and with terms, add bidirectional dependencies. Those dependencies go between all messages that include dynamic references, and all referenced terms. These dependencies are also strongly language dependent.

Starting with testing, you need to ensure that a linguist/translator has reviewed the results. Which means you need a test plan for each language, covering any grammatical combination of phrases and terms. So, you need a linguist to develop the test plan, and a translator to review. For each target language.

For creating localizations, we know that l10n tools are really not mastering string concatenations. Or including test plans in the UI. Assuming one has the test plans from above, you'd need an l10n/fluent engineer to adjust the implementations for each language. Even for the limited complexity of Terms in Firefox, most of them are done with flod's help.

Another problem arises with partial translations. Falling back to Terms in a foreign language is probably not what you want. At least, it might make the problem harder rather than easier, as you not only need to deal with attributes and variants of your own language, but also with those of others.

And then I wonder how to do change management for this. Say you have 5 player characters, and 3 monsters. And 15 strings using term references into each. Add a monster. I see tears.

I'm starting to think that the energy that needs to go into maintaining this would often be greater than the energy of just creating 200 strings with a script. Conceptually doing Term references in your source language, but not in the actual translation process.

Again, I can see ecosystems where these challenges are easily met. Like, if you're a game dev shop, and you need to have linguistic and cultural experts for all your target languages in the same office as your devs anyway.

For mozilla, though, I don't see us being able to prove this feature. Which makes me think that we shouldn't be the ones that drive this.

@zbraniecki

This comment has been minimized.

Copy link
Contributor

@zbraniecki zbraniecki commented Jan 23, 2020

How does this differ from:

let monster = await document.l10n.formatValue(selectedMonsterId);
document.l10n.setAttributes(msgElement, messageId, { monster });

?

It seems to me that this is what people will do in absence of dynamic references and it has all the cons you listed and none of the pros. Additional cons of this approach are:

  • The value will not get retranslated on language change/update
  • Requires async imperative call
  • Prevents any additional inter-locale logic (monster's gender impacting selection of the message variant?)
@spookylukey

This comment has been minimized.

Copy link
Contributor

@spookylukey spookylukey commented Jan 24, 2020

I'm starting to think that the energy that needs to go into maintaining this would often be greater than the energy of just creating 200 strings with a script. Conceptually doing Term references in your source language, but not in the actual translation process.

For some use cases - like the ones I mentioned for Django, which will apply to other frameworks - this solution is simply not an option, because we don't know the strings ahead of time, they are supplied by other developers. We'd be left with the kind of 'solution' that @zbraniecki has, which leaves you with broken translations for many uses cases (inability to deal with case/gender agreement etc.).

@Pike

This comment has been minimized.

Copy link
Collaborator

@Pike Pike commented Jan 24, 2020

So, Zibi's example is actually interesting in two ways:

Firstly, it emulates message references. And message references are easy, and also kinda pointless as they're completely atomic.

Secondly, it adds fallback for missing message references. In his code example, messages are resolved on the Localization abstraction instead of the Bundle. Which solves a lot of problems we have with message references right now. Even just static ones. I'd love to discuss how message references work as part of the resolver standardization. But I'm also realistic about not getting a fully sync and fully async resolver implemented for all impls that want both. Neither of js, python, or rust have generic sync/async programming, right?

To Luke's comment: Terms are effectively language-dependent APIs. Messages referencing terms need to know the API, and all terms for that use need to implement the same API. With static term references, that's already nasty. With dynamic term references, it's an order of magnitude worse.

And when you talk about different software packages ... .

Say, the German team of the django localizers decides to change the Term API for contrib.admin. Now, all generic apps with models need to update their l10n, and all custom templates that use model names need to update. And at best, you get release notes to communicate that. I guess.

Also, to clarify, I'm just saying that Mozilla isn't the right org to drive this. That doesn't mean that we shouldn't build the Fluent ecosystem such that someone else can give this a shot. Their task is going to be to figure out these things, beyond writing down APIs and syntax.

@zbraniecki

This comment has been minimized.

Copy link
Contributor

@zbraniecki zbraniecki commented Jan 24, 2020

Are you saying that for a scenario like #80 (comment) we should generate lang x region combinations into an FTL file for Mozilla needs?

@Pike

This comment has been minimized.

Copy link
Collaborator

@Pike Pike commented Jan 24, 2020

I would just go for computed values and retranslations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.