Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Schema] L10n proposal #114

Closed
teoli2003 opened this issue Jan 16, 2017 · 14 comments
Closed

[Schema] L10n proposal #114

teoli2003 opened this issue Jan 16, 2017 · 14 comments
Labels
idle 🐌 Issues and pull requests with no recent activity infra 🏗️ Infrastructure issues (npm, GitHub Actions, releases) of this project schema ⚙️ Isses or pull requests regarding the JSON schema files used in this project.

Comments

@teoli2003
Copy link
Member

teoli2003 commented Jan 16, 2017

Hi!

Our current proposal for the schema has notes: these are textual comments.

E.g.

...
  "__compat": {
    ...,
      "Internet Explorer": {
        "support": "4.0",
        "notes": ["In Internet Explorer 8 and 9, there is a bug where a computed <code>background-color</code> of <code>transparent</code> causes <code>click</code> events to not get fired on overlaid elements."]

There may be several notes (hence the []).

How do we want to translate them? We would like something that is simple, that is something that doesn't force us to build something outside github.

One way could be to have an object. Instead of:
["text1", "text2"]
we would have:
[{"en-US": "text1"}, {"en-US":"text2"}]

This would allow to store translated strings from the start and allows macros to use them easily. This would not make maintenance easy: if the en-US text changes, there is no easy way for a translator to know it (beside watching the file), also there is no way of knowing if a translation is up-to-date or not.

This is a basic proposal. Does anybody have a better idea?

@teoli2003 teoli2003 changed the title [Schema] L10n proposals [Schema] L10n proposal Jan 16, 2017
@SebastianZ
Copy link
Contributor

[{"en-US": "text1"}, {"en-US":"text2"}] sounds reasonable and we're already using this schema in other places like l10n/css.json, but you're right that it's hard for translators to get to know when they have to update their translation.

One idea to get rid of this issue is to add some kind of number versioning to the strings, e.g.

[
  {
    "en-US: {
      "version": 3,
      "text": "In Internet Explorer 8 and 9, there is a bug where a computed <code>background-color</code> of <code>transparent</code> causes <code>click</code> events to not get fired on overlaid elements."
    }
  }
]

Though this may be overkill. A simpler solution would be to add a note to the commit messages which language(s) were changed, e.g. "en-US: Clarified compatibity note of 'background-color' CSS property for IE". Then translators could filter the commit messages by "en-US" to see what has changes since the last edit in their language.

Sebastian

@Elchi3
Copy link
Member

Elchi3 commented Jan 24, 2017

I also think that [{"en-US": "text1", "de": "Text eins"}, {"en-US":"text2", "de": "Text zwei" }] makes sense.

To help localizers, I would build an external dashboard that does the checks (it might be like doc status pages or completely on its own). I think there are two things:
a) The language key is not present. So if you are a French localizer and an object is {"en-US": "text1", "de": "Text eins"} this will show up as untranslated for you.

b) There is an update to the English string. In this case I think the person who updates the English string should invalidate the localizations. So, for example {"en-US": "text42", "de": "#NEEDSUPDATE# Text eins"}. This would then show up in the German dashboard as "update needed". And in the rendering it of the data it could fall back to English, as the translation is invalid.

@teoli2003
Copy link
Member Author

teoli2003 commented Jan 24, 2017

I'm concerned about the verbosity of adding a version number for each string. Also, having flags inside a string seems to make consumption of these complex, as they need to be parsed.

What about:

[{"en-US":"text1",
  "de": "Text eins"},
 {"en-US": "text2",
  "de": {"up-to-date":false,"string":"Text two"}

@SebastianZ
Copy link
Contributor

[{"en-US":"text1",
  "de": "Text eins"},
 {"en-US": "text2",
  "de": {"up-to-date":false,"string":"Text two"}

I like that approach and the idea of a dashboard.

Though I wonder whether you both missed my second solution about adding 'en-US' to the commit messages instead of putting the info into the files, because there was no feedback to it.

Sebastian

@Elchi3
Copy link
Member

Elchi3 commented Jan 25, 2017

Though I wonder whether you both missed my second solution about adding 'en-US' to the commit messages instead of putting the info into the files, because there was no feedback to it.

I saw it, but it doesn't sound compelling to me. It would require reviewing or validating commit messages. Forgetting to add "up-to-date":false is easy as well, but I assume it's a bit better, because it is in the code and could be caught easier when reviewing.

@jwhitlock
Copy link
Contributor

If I had time to work on this, I would:

  1. Define all source strings as English in the spec, as well as which data elements are plain text, HTML, etc. Avoiding HTML is a good idea, but being clear about it is necessary if you can't avoid it.
  2. Create a script to extract strings into the standard gettext format
  3. Manage translation using gettext conventions, like Kuma, perhaps even using Pontoon to translate the strings. With gettext, you get fuzzy translations, notifications of changed strings, etc. etc. for free.
  4. Create a second script to export gettext-formatted files to a JSON data structure.
  5. Implement a gettext-like translation in KumaScript (I'm pretty sure this is already done, and multiple times).

I think using existing gettext standards will be less painful then building dashboards, versioning strings, etc.

@SebastianZ
Copy link
Contributor

  1. Define all source strings as English in the spec, as well as which data elements are plain text, HTML, etc. Avoiding HTML is a good idea, but being clear about it is necessary if you can't avoid it.

If HTML is avoided, it need to be clarified if and how the entries should be formatted. Allowing formatted strings has some advantages as well as disadvantages.

I think using existing gettext standards will be less painful then building dashboards, versioning strings, etc.

I'm not familiar with gettext. I assume you mean the GNU related project gettext, right?

Sebastian

@jwhitlock
Copy link
Contributor

I think using existing gettext standards will be less painful then building dashboards, versioning strings, etc.

I'm not familiar with gettext. I assume you mean the GNU related project gettext, right?

Yes. I think Wikipedia's gettext page is a better introduction than the GNU docs. Once you have text in the .po format, you can use existing tools to translate the strings, or the format is easy enough to translate by hand (see Kuma's German javascript.po).

We would have to write a tool to extract strings from JSON to the template .pot file, like javascript.pot. We could then script the gettext tools to update each locale's .po files and check them in. After translation, our second custom tool would extract translated strings from the .po files and put them back in JSON or KumaScript. For example, in Kuma, we use Django's tools to generate JavaScript javascript.js that implements the gettext functions with the translated strings, for use in client-side UI.

@Elchi3
Copy link
Member

Elchi3 commented Feb 3, 2017

Sounds promising to me. Found these relevant resources:
https://www.npmjs.com/package/jsxgettext
http://stackoverflow.com/questions/39586651/json-and-translation

I think one general aspect to decide is whether l10n is offered by the data provider (this repo) and or that data consumers (in our case Kuma/KumaScript) have to deal with l10n themselves. It seems like we are aiming for the former and thus translations would somehow live in this repository.

@teoli2003
Copy link
Member Author

I think that we should avoid multiple translations: translations should live in this repository.

Basically, if we go for @jwhitlock idea, we will have the .json file containing only English, and a script to create a .po from these.

In other words, from the .json file point of view, it means that we don't translate in the file, but we consider all translated strings as English (we need to define the format of these strings, though).

@Elchi3 Elchi3 added schema ⚙️ Isses or pull requests regarding the JSON schema files used in this project. infra 🏗️ Infrastructure issues (npm, GitHub Actions, releases) of this project labels May 4, 2017
@Elchi3 Elchi3 added this to Infra improvements in Non-data issue overview Jan 10, 2019
@ddbeck ddbeck added this to Out of scope or needs personnel in Prioritization review Sep 4, 2020
@github-actions github-actions bot added the idle 🐌 Issues and pull requests with no recent activity label May 25, 2022
@queengooborg
Copy link
Collaborator

This issue has been sitting around for a long time and is one of the oldest issues we have open. Unfortunately, localizing the notes in BCD has not been discussed or even mentioned for quite a while, and I don't think it will be a priority for us any time soon. As such, I'm going to close this issue, but I am happy to revisit it in the future!

@meisamkhengul

This comment was marked as spam.

@meisamkhengul

This comment was marked as spam.

@Alex13313b
Copy link

I'm concerned about the verbosity of adding a version number for each string. Also, having flags inside a string seems to make consumption of these complex, as they need to be parsed.

What about:

[{"en-US":"text1",
  "de": "Text eins"},
 {"en-US": "text2",
  "de": {"up-to-date":false,"string":"Text two"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idle 🐌 Issues and pull requests with no recent activity infra 🏗️ Infrastructure issues (npm, GitHub Actions, releases) of this project schema ⚙️ Isses or pull requests regarding the JSON schema files used in this project.
Projects
No open projects
Non-data issue overview
Infra improvements
Prioritization review
  
Out of scope or needs personnel
Development

No branches or pull requests

10 participants
@jwhitlock @Elchi3 @SebastianZ @teoli2003 @queengooborg @meisamkhengul @Alex13313b and others