i18n RFC #111

Keats · 2017-08-21T11:59:01Z

No description provided.

batisteo · 2017-10-17T20:59:03Z

rfcs/i18n.md

+and a new global Tera function will be added to get the i18n values out of the config.toml.
+
+```jinja2
+{{ i18n(key="title", lang=lang) }}


Maybe trans to stick with Django?

{{ trans(key="title", lang=lang) }}

I'm fine with both

enricostano · 2017-10-19T23:27:32Z

Really looking forward to this! Thanks 👍

batisteo · 2018-01-16T23:51:40Z

rfcs/i18n.md

+### Url
+
+The default language base url will be equal to `config.base_url`.
+Other languages will be available at `{config.base_url}/{language}`


I think it’s important to have the language in the URL when i18n is used.

One challenge is when an user access the root url, what would happen?
I would address this issue by:

Copying the output folder in the root and in /en

All the links in the page direct to the link path with the language

Include a head tag <link rel="canonical" href="https://example.com/en/" >

Optional redirect in Javascript for the files in the root folder, like:

{% if config.language_redirect %} <script> if (navigator.languages) { let paths = ['/en/', '/fr/', '/jp/'] let path = match_language_path(navigator.languages, paths, '/en/'); window.location.replace(path); } </script> {% endif %}

I think that should be up to the users whether they want the lang code in the url for their default language

I agree it should be a choice. Zola should make it easy either way. Do you mind adding a setting to opt in this behavior? redirect_to_default_language?

For reference, the index.html of rust-lang.org:
https://github.com/rust-lang/rust-www/blob/master/_layouts/redirect.html

It is a empty page with just the JS and HTML redirect. The setting could be then force_i18n.

batisteo · 2018-01-17T13:31:07Z

rfcs/i18n.md

+The content files have to have the same name for multiple languages.
+The language is defined in the extension prefix: `{name}.{language_code}.{extension}`
+
+The language code is omitted for the default language.


The language code can be (?) omitted […]

Keats · 2018-08-05T12:37:33Z

See https://blog.mozilla.org/l10n/2018/08/03/intl_pluralrules-a-rust-crate-for-handling-plural-forms-with-cldr-plural-rules/ for pluralization

zbraniecki · 2018-08-06T08:47:00Z

Hi!

I work at Mozilla on Gecko Platform Intl/L10n technologies. I'm one of the core contributors to ECMA402 and one of the authors of Project Fluent.

One of the projects I work on is bringing the core modern intl/l10n API to Rust, since we'd like to soon start transitioning Gecko Intl/L10n to it, which makes me interested in Rust intl/l10n story.

I'd be happy to help you with your project and advice on designing around modern i18n/l10n systems, but I have to warn you that Rust is very immature in this area and there are not many hashed out libraries to chose from.

On the flip side, that means that we can focus on designing and implementing modern libraries and leap frog the last 20 years of badly designed intl solutions :)

In that spirit, while I'm trying to stay unbiased and not necessarily recommend technologies my team is working on, I will in an opinionated fashion recommend three things:

Settle around Unicode CLDR and ECMA402. It's a great model for locale management, and good source of API inspirations that are well hashed out, thought through and will be maintained for many years by all major browser vendors.
Do not use gettext. Gettext is a very old localization system with bad pluralization, and close to no support for any advanced intl formatting or bidirectionality. Project Fluent has an article on the differences, which apply to any modern system vs gettext. I would recommend either going for ICU MessageFormat, which is more mature, but a bit older and limiting, or Project Fluent, which is very modern, but yet to reach 1.0 (hopefully by Unicode Conference in September!). Here's an article on the differences between those two.
Do not rely on OS hooks for formatting intl data. It sounds like an easy win but its a trap that every big system I know of is painfully escaping from.

Everything else is up to you. It's going to be some work to bring full intl/l10n, but you can start with simple things like managing language negotiation between user request, accepted headers and articles in many locales. (for that I strongly recommend using locale lists when dealing with user requests, rather than a single locale).
Then you can hook a localization system into your template language. If you'll decide to go with Fluent, we'd be interested in cooperating on making Fluent API easy to hook into templating systems. If you'll go with MessageFormat, you should shape your API after one of the messageformat templating APIs.
Finally, you'll need intl formatters, which I hope will slowly start being designed and developed now that we have intl_pluralrules, but it'll take time to get them, so I wouldn't expect anything anytime soon.

Hope that helps!

Keats · 2018-08-06T12:29:25Z

Thanks!

but I have to warn you that Rust is very immature in this area and there are not many hashed out libraries to chose from.

That's one of the reason I haven't started working on i18n before, hoping that it would get more tools by now.

Everything else is up to you. It's going to be some work to bring full intl/l10n, but you can start with simple things like managing language negotiation between user request, accepted headers and articles in many locales. (for that I strongly recommend using locale lists when dealing with user requests, rather than a single locale).

Gutenberg is a static site engine so it's kind of easier on that end: the language lives in the URL and that's it. No need to deal with headers, JS or anything else: all the interpolation/resolution happens at "compile time".

Then you can hook a localization system into your template language. If you'll decide to go with Fluent, we'd be interested in cooperating on making Fluent API easy to hook into templating systems

I maintain tera and the oldest issue open is still Keats/tera#134 :) I'm not too sure how that would look like though, it seems more a concern at the user (in this case Gutenberg, Rocket etc) level.

In Gutenberg case, we only really need to translate the templates so that's a very limited scope.

zbraniecki · 2018-08-06T15:44:30Z

Gutenberg is a static site engine so it's kind of easier on that end: the language lives in the URL and that's it. No need to deal with headers, JS or anything else: all the interpolation/resolution happens at "compile time".

I'm no expert - my main experience in the area comes from writing several django sites, but I'm not sure if that's correct.

Imagine a scenario where the website author provides content in two locales - they provide an article in French and German (content localization). They also provide a translation of the website UI's using MessageFormat as it.json and fr.json.

A Swiss user visits the website, and their Accepted Headers look like this: ["it-CH", "de-CH", "fr-CH", "rm-CH"].
What locale selection guttenberg is going to show the article in?

Unless I'm mistaken, the system will have to perform a language negotiation based on the selected locales. It'll have to parse the locale headers, collect the available locales for the content and for the UI, and negotiate the best possible fallback chain of locales for UI and content to show.
In this particular example, the content will probably be shown in ["fr"] as this is the locale handled by both pieces, and matches closely third most requested locale from the user.

But you can imagine other negotiation strategies that will provide the content in ["de"] and the UI in ["fr"], and of course everything becomes more complex if you article is in "fr-CA" (Canadian French).
In extreme edge cases, the user may request "sr-RU" and your content is in "sr-SR" and unfortunately those two extend to "sr-Cyrl-RU" and "sr-Latn-SR" which means that the user requests content in Cyrlic script, and you have content in Latin script, so that's not a match.

All of that is solved by a language negotiation API.

Am I wrong assuming gutenberg will need one?

Keats · 2018-08-06T15:54:12Z

Gutenberg outputs HTML files, you don't need to have a server in between them and the users so content negotiation is not necessarily possible: I use netlify, some people host them on S3, others in git repos. Even less serverless than serverless :)

The way it is usually done in static site is that the creator selector a default lang and the users can switch later if needed. After that, nothing prevents someone from writing some JavaScript that would redirect from homepage to another language after checking navigator.languages.
Not as good as checking headers and directly returning the right page of course but I would assume that running a webserver for a static site is very rare

zbraniecki · 2018-08-06T22:59:02Z

Ahh, I see. Interesting. Thank you for taking time to explain it to me :)

paulcmal · 2018-08-14T13:58:28Z

A map is added to the Config struct to hold some i18n strings:

What do you think about separate files for each language? Having everyone edit the same file (config.toml) just brings merge conflicts to resolve.

After working on an internationalized site with Hugo, i now think that one file per language is the minimum requirement for maintainability. To be honest, i actually found that system limiting. Allowing multiple files per language would make it easier for people to maintain translations, by grouping strings by area of interests.

How to only render a RSS feed for a given language if generate_rss is set to true. Do we even care about that feature?

If custom output formats (#365) are implemented as well, i think the issue of per-language RSS feed would be de facto addressed. Each language's sections/taxonomies could have their own RSS.

On a sidenote, i've taken a look at Fluent and it's amazing. I didn't dig too much, but from a translator's perspective it seems to address all issues i encountered with other internationalization systems. There's also a rust implementation.

Keats · 2018-08-14T14:18:29Z

What do you think about separate files for each language?

Definitely needed. The current trans global function and languages map of the config.toml are stopgap while a proper i18n is implemented.

Allowing multiple files per language would make it easier for people to maintain translations, by grouping strings by area of interests.

Is it that big of a deal for a static site? The content translation is done by the user so there isn't as much to translate as for an app for example, only the templates need to be translated.

paulcmal · 2018-08-14T16:51:16Z

only the templates need to be translated

Yes, but when making a theme with different components (widgets of sorts), that can become a lot of different strings to handle in different languages. From my (limited) perspective, i only see two way to decouple properly content from templating :

translation namespaces with support for different files per language (i.e. mytheme/i18n/agenda.{fr,en}.strings and mytheme/i18n/gallery.{fr,en}.strings)
internal page bundles that don't get rendered but allow to group content together while taking advantage of the architecture for translating content files (seems already possible with render = false, but only for sections)

Thinking of it, i don't even think one of the approaches contradict the other. They even seem complementary to me.

Translation namespaces for small strings offer proper separation of concerns, easily porting components from a site/theme to another, and potentially even sharing sharing translation packs like we share themes. I also find it easier if you write scripts to keep updated on missing translations to let translators know that this small file hasn't been translated (or lacks some strings) instead of pointing them to line XYZ in a single file. Even without a script you can see directly from your file manager when a file is missing in a language.

Internal page bundles, on the other hand, would make it easier to organize actual content translation (potentially big files with separate paragraphs). To demonstrate the power of internal page bundles, imagine a multi-column widget that's content agnostic (like asked for in #289). For example, to allow for easy translation of a footer with multiple columns (separated by !!!!! in the Markdown) directly from the content folder :

<footer>
{% set page = get_page(path="_common/footer.md") %}
<h2>{{ page.title }}</h2>
{% set columns = page.content | split(pat="!!!!!") %}
{% for column in columns %}
    <div class="footer-column">
        {{ column | safe | markdown(inline=false) }}
    </div>
{% endfor %}
</footer>

So in the end content translators only have to deal with Markdown files as they would for regular pages, and theme translators with a few translation strings in dedicated files that are easy to keep track of.

Keats · 2018-08-15T13:28:17Z

I guess I would go with one file per lang to start with simply because it's easier to have something working.
I'm not sure how could Gutenberg know that gallery.fr.strings for example is meant to only be used in a specific context?
My thought process was:

themes can define their i18n strings in a file
users can override translations in their own i18n files, which get merged with the theme one
theme calls a global function, let's say trans(key="header_under_logo", lang=lang) that will look for that key in the merged translations.

Adding namespaces to the mix seem to complicate the code and user UX for not much gains IMO.

zbraniecki · 2018-08-15T16:56:39Z

If you'll decide to use Fluent, we have the concept of MessageContext - you can read more about it here: https://github.com/projectfluent/fluent/wiki/Get-Started

It allows you to select a number of resources and put them together into a single context that resolves messages together.

zoosky · 2018-08-17T04:19:33Z

For Hugo, see https://regisphilibert.com/blog/2018/08/hugo-multilingual-part-1-managing-content-translation/

matclab · 2018-11-18T18:25:58Z

When using i18n static blog, there are two scenarios for articles that are not translated : either display the article in the default language, or else do not display the article. I think it's a common scenario to not have all articles translated, and as such it would be nice if the two alternatives were offered by Zola.

Keats · 2018-11-22T11:47:16Z

Yep that's definitely one of the required bits for i18n

Keats · 2018-11-22T18:03:07Z

I updated the RFC and moved it to the new Discourse for Zola: https://zola.discourse.group/t/rfc-i18n/13

Closing this in favour of that thread.

Keats · 2018-12-29T10:52:48Z

Everyone interested in that thread can have a look at #567 for the initial implementation (only content, no translations in templates yet). It is missing taxonomies (as I have no ideas what to do with them yet), links to other translations of the same content and per language RSS.
It is still pretty rough but it would be helpful to have users needing multilingual sites to try and report on what is missing/needed. The PR has a doc page in it explaining how to set it up. I don't want to add too many things to it until I know it is going in the right/best direction.

Feedback is probably better given on https://zola.discourse.group/t/rfc-i18n/13 to avoid notifications from commit on the PR and is a better discussion medium overall.

cc @batisteo @enricostano @matclab and anyone else wanting multilingual sites!

This was referenced Aug 21, 2017

[RFC] Translations and Internationalization #110

Closed

Translations #13

Closed

Keats changed the title ~~Start of i18n RFC~~ i18n RFC Aug 30, 2017

batisteo reviewed Oct 17, 2017

View reviewed changes

lodenrogue approved these changes Oct 20, 2017

View reviewed changes

Keats mentioned this pull request Nov 27, 2017

Can't import macros in index.html of themes #185

Closed

batisteo reviewed Jan 16, 2018

View reviewed changes

batisteo reviewed Jan 17, 2018

View reviewed changes

Start of i18n RFC

e46e2d1

Keats force-pushed the i18n-rfc branch from a86732e to 53b015c Compare February 6, 2018 12:53

Update RFC

f39052f

Keats force-pushed the i18n-rfc branch from 53b015c to f39052f Compare February 6, 2018 12:55

Keats mentioned this pull request Mar 8, 2018

RFC Multi-language site support cobalt-org/cobalt.rs#392

Open

paulcmal mentioned this pull request Aug 14, 2018

Pages in unrendered sections still get rendered #373

Closed

Keats closed this Nov 22, 2018

Keats deleted the i18n-rfc branch December 29, 2018 10:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i18n RFC #111

i18n RFC #111

Keats commented Aug 21, 2017

batisteo Oct 17, 2017

Keats Oct 19, 2017

enricostano commented Oct 19, 2017

batisteo Jan 16, 2018

Keats Jan 17, 2018

batisteo Jan 17, 2018 •

edited

Loading

batisteo Jan 23, 2018 •

edited

Loading

batisteo Jan 17, 2018

Keats commented Aug 5, 2018

zbraniecki commented Aug 6, 2018

Keats commented Aug 6, 2018

zbraniecki commented Aug 6, 2018 •

edited

Loading

Keats commented Aug 6, 2018

zbraniecki commented Aug 6, 2018

paulcmal commented Aug 14, 2018 •

edited

Loading

Keats commented Aug 14, 2018

paulcmal commented Aug 14, 2018 •

edited

Loading

Keats commented Aug 15, 2018

zbraniecki commented Aug 15, 2018

zoosky commented Aug 17, 2018

matclab commented Nov 18, 2018

Keats commented Nov 22, 2018 •

edited

Loading

Keats commented Nov 22, 2018

Keats commented Dec 29, 2018

i18n RFC #111

i18n RFC #111

Conversation

Keats commented Aug 21, 2017

batisteo Oct 17, 2017

Choose a reason for hiding this comment

Keats Oct 19, 2017

Choose a reason for hiding this comment

enricostano commented Oct 19, 2017

batisteo Jan 16, 2018

Choose a reason for hiding this comment

Keats Jan 17, 2018

Choose a reason for hiding this comment

batisteo Jan 17, 2018 • edited Loading

Choose a reason for hiding this comment

batisteo Jan 23, 2018 • edited Loading

Choose a reason for hiding this comment

batisteo Jan 17, 2018

Choose a reason for hiding this comment

Keats commented Aug 5, 2018

zbraniecki commented Aug 6, 2018

Keats commented Aug 6, 2018

zbraniecki commented Aug 6, 2018 • edited Loading

Keats commented Aug 6, 2018

zbraniecki commented Aug 6, 2018

paulcmal commented Aug 14, 2018 • edited Loading

Keats commented Aug 14, 2018

paulcmal commented Aug 14, 2018 • edited Loading

Keats commented Aug 15, 2018

zbraniecki commented Aug 15, 2018

zoosky commented Aug 17, 2018

matclab commented Nov 18, 2018

Keats commented Nov 22, 2018 • edited Loading

Keats commented Nov 22, 2018

Keats commented Dec 29, 2018

batisteo Jan 17, 2018 •

edited

Loading

batisteo Jan 23, 2018 •

edited

Loading

zbraniecki commented Aug 6, 2018 •

edited

Loading

paulcmal commented Aug 14, 2018 •

edited

Loading

paulcmal commented Aug 14, 2018 •

edited

Loading

Keats commented Nov 22, 2018 •

edited

Loading