[Schema] L10n proposal #114

teoli2003 · 2017-01-16T14:19:32Z

Hi!

Our current proposal for the schema has notes: these are textual comments.

E.g.

...
  "__compat": {
    ...,
      "Internet Explorer": {
        "support": "4.0",
        "notes": ["In Internet Explorer 8 and 9, there is a bug where a computed <code>background-color</code> of <code>transparent</code> causes <code>click</code> events to not get fired on overlaid elements."]

There may be several notes (hence the []).

How do we want to translate them? We would like something that is simple, that is something that doesn't force us to build something outside github.

One way could be to have an object. Instead of:
["text1", "text2"]
we would have:
[{"en-US": "text1"}, {"en-US":"text2"}]

This would allow to store translated strings from the start and allows macros to use them easily. This would not make maintenance easy: if the en-US text changes, there is no easy way for a translator to know it (beside watching the file), also there is no way of knowing if a translation is up-to-date or not.

This is a basic proposal. Does anybody have a better idea?

The text was updated successfully, but these errors were encountered:

SebastianZ · 2017-01-24T11:52:27Z

[{"en-US": "text1"}, {"en-US":"text2"}] sounds reasonable and we're already using this schema in other places like l10n/css.json, but you're right that it's hard for translators to get to know when they have to update their translation.

One idea to get rid of this issue is to add some kind of number versioning to the strings, e.g.

[
  {
    "en-US: {
      "version": 3,
      "text": "In Internet Explorer 8 and 9, there is a bug where a computed <code>background-color</code> of <code>transparent</code> causes <code>click</code> events to not get fired on overlaid elements."
    }
  }
]

Though this may be overkill. A simpler solution would be to add a note to the commit messages which language(s) were changed, e.g. "en-US: Clarified compatibity note of 'background-color' CSS property for IE". Then translators could filter the commit messages by "en-US" to see what has changes since the last edit in their language.

Sebastian

Elchi3 · 2017-01-24T12:09:43Z

I also think that [{"en-US": "text1", "de": "Text eins"}, {"en-US":"text2", "de": "Text zwei" }] makes sense.

To help localizers, I would build an external dashboard that does the checks (it might be like doc status pages or completely on its own). I think there are two things:
a) The language key is not present. So if you are a French localizer and an object is {"en-US": "text1", "de": "Text eins"} this will show up as untranslated for you.

b) There is an update to the English string. In this case I think the person who updates the English string should invalidate the localizations. So, for example {"en-US": "text42", "de": "#NEEDSUPDATE# Text eins"}. This would then show up in the German dashboard as "update needed". And in the rendering it of the data it could fall back to English, as the translation is invalid.

teoli2003 · 2017-01-24T14:41:17Z

I'm concerned about the verbosity of adding a version number for each string. Also, having flags inside a string seems to make consumption of these complex, as they need to be parsed.

What about:

[{"en-US":"text1",
  "de": "Text eins"},
 {"en-US": "text2",
  "de": {"up-to-date":false,"string":"Text two"}

SebastianZ · 2017-01-24T22:35:15Z

[{"en-US":"text1",
  "de": "Text eins"},
 {"en-US": "text2",
  "de": {"up-to-date":false,"string":"Text two"}

I like that approach and the idea of a dashboard.

Though I wonder whether you both missed my second solution about adding 'en-US' to the commit messages instead of putting the info into the files, because there was no feedback to it.

Sebastian

Elchi3 · 2017-01-25T14:49:17Z

Though I wonder whether you both missed my second solution about adding 'en-US' to the commit messages instead of putting the info into the files, because there was no feedback to it.

I saw it, but it doesn't sound compelling to me. It would require reviewing or validating commit messages. Forgetting to add "up-to-date":false is easy as well, but I assume it's a bit better, because it is in the code and could be caught easier when reviewing.

jwhitlock · 2017-01-27T15:46:52Z

If I had time to work on this, I would:

Define all source strings as English in the spec, as well as which data elements are plain text, HTML, etc. Avoiding HTML is a good idea, but being clear about it is necessary if you can't avoid it.
Create a script to extract strings into the standard gettext format
Manage translation using gettext conventions, like Kuma, perhaps even using Pontoon to translate the strings. With gettext, you get fuzzy translations, notifications of changed strings, etc. etc. for free.
Create a second script to export gettext-formatted files to a JSON data structure.
Implement a gettext-like translation in KumaScript (I'm pretty sure this is already done, and multiple times).

I think using existing gettext standards will be less painful then building dashboards, versioning strings, etc.

SebastianZ · 2017-02-03T08:53:41Z

Define all source strings as English in the spec, as well as which data elements are plain text, HTML, etc. Avoiding HTML is a good idea, but being clear about it is necessary if you can't avoid it.

If HTML is avoided, it need to be clarified if and how the entries should be formatted. Allowing formatted strings has some advantages as well as disadvantages.

I think using existing gettext standards will be less painful then building dashboards, versioning strings, etc.

I'm not familiar with gettext. I assume you mean the GNU related project gettext, right?

Sebastian

jwhitlock · 2017-02-03T13:46:30Z

I think using existing gettext standards will be less painful then building dashboards, versioning strings, etc.

I'm not familiar with gettext. I assume you mean the GNU related project gettext, right?

Yes. I think Wikipedia's gettext page is a better introduction than the GNU docs. Once you have text in the .po format, you can use existing tools to translate the strings, or the format is easy enough to translate by hand (see Kuma's German javascript.po).

We would have to write a tool to extract strings from JSON to the template .pot file, like javascript.pot. We could then script the gettext tools to update each locale's .po files and check them in. After translation, our second custom tool would extract translated strings from the .po files and put them back in JSON or KumaScript. For example, in Kuma, we use Django's tools to generate JavaScript javascript.js that implements the gettext functions with the translated strings, for use in client-side UI.

Elchi3 · 2017-02-03T14:20:25Z

Sounds promising to me. Found these relevant resources:
https://www.npmjs.com/package/jsxgettext
http://stackoverflow.com/questions/39586651/json-and-translation

I think one general aspect to decide is whether l10n is offered by the data provider (this repo) and or that data consumers (in our case Kuma/KumaScript) have to deal with l10n themselves. It seems like we are aiming for the former and thus translations would somehow live in this repository.

teoli2003 · 2017-02-07T13:33:10Z

I think that we should avoid multiple translations: translations should live in this repository.

Basically, if we go for @jwhitlock idea, we will have the .json file containing only English, and a script to create a .po from these.

In other words, from the .json file point of view, it means that we don't translate in the file, but we consider all translated strings as English (we need to define the format of these strings, though).

queengooborg · 2023-08-06T07:41:21Z

This issue has been sitting around for a long time and is one of the oldest issues we have open. Unfortunately, localizing the notes in BCD has not been discussed or even mentioned for quite a while, and I don't think it will be a priority for us any time soon. As such, I'm going to close this issue, but I am happy to revisit it in the future!

Alex13313b · 2024-02-15T20:14:08Z

I'm concerned about the verbosity of adding a version number for each string. Also, having flags inside a string seems to make consumption of these complex, as they need to be parsed.

What about:
[{"en-US":"text1",
  "de": "Text eins"},
 {"en-US": "text2",
  "de": {"up-to-date":false,"string":"Text two"}

teoli2003 changed the title ~~[Schema] L10n proposals~~ [Schema] L10n proposal Jan 16, 2017

jwhitlock mentioned this issue Jan 27, 2017

[Schema] ID proposal #113

Closed

SebastianZ mentioned this issue Feb 20, 2017

Support linking to bugs/issues #126

Closed

Elchi3 added schema ⚙️ Isses or pull requests regarding the JSON schema files used in this project. infra 🏗️ Infrastructure issues (npm, GitHub Actions, releases) of this project labels May 4, 2017

Elchi3 added this to Infra improvements in Non-data issue overview Jan 10, 2019

queengooborg mentioned this issue Nov 25, 2019

Fix MDN URLs in api folder #5203

Merged

5 tasks

ddbeck added this to Out of scope or needs personnel in Prioritization review Sep 4, 2020

queengooborg mentioned this issue May 13, 2022

Replace old-style compatibility/specification tables with new macro mdn/translated-content#5618

Closed

github-actions bot added the idle 🐌 Issues and pull requests with no recent activity label May 25, 2022

queengooborg closed this as completed Aug 6, 2023

This comment was marked as spam.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Schema] L10n proposal #114

[Schema] L10n proposal #114

teoli2003 commented Jan 16, 2017 •

edited by SebastianZ

SebastianZ commented Jan 24, 2017

Elchi3 commented Jan 24, 2017

teoli2003 commented Jan 24, 2017 •

edited by SebastianZ

SebastianZ commented Jan 24, 2017

Elchi3 commented Jan 25, 2017

jwhitlock commented Jan 27, 2017

SebastianZ commented Feb 3, 2017

jwhitlock commented Feb 3, 2017

Elchi3 commented Feb 3, 2017

teoli2003 commented Feb 7, 2017

queengooborg commented Aug 6, 2023

This comment was marked as spam.

This comment was marked as spam.

Alex13313b commented Feb 15, 2024

[Schema] L10n proposal #114

[Schema] L10n proposal #114

Comments

teoli2003 commented Jan 16, 2017 • edited by SebastianZ

SebastianZ commented Jan 24, 2017

Elchi3 commented Jan 24, 2017

teoli2003 commented Jan 24, 2017 • edited by SebastianZ

SebastianZ commented Jan 24, 2017

Elchi3 commented Jan 25, 2017

jwhitlock commented Jan 27, 2017

SebastianZ commented Feb 3, 2017

jwhitlock commented Feb 3, 2017

Elchi3 commented Feb 3, 2017

teoli2003 commented Feb 7, 2017

queengooborg commented Aug 6, 2023

This comment was marked as spam.

This comment was marked as spam.

Alex13313b commented Feb 15, 2024

teoli2003 commented Jan 16, 2017 •

edited by SebastianZ

teoli2003 commented Jan 24, 2017 •

edited by SebastianZ