New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i18n RFC #111

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
8 participants
@Keats
Copy link
Collaborator

Keats commented Aug 21, 2017

No description provided.

This was referenced Aug 21, 2017

@Keats Keats changed the title Start of i18n RFC i18n RFC Aug 30, 2017

and a new global Tera function will be added to get the i18n values out of the config.toml.

```jinja2
{{ i18n(key="title", lang=lang) }}

This comment has been minimized.

@batisteo

batisteo Oct 17, 2017

Contributor

Maybe trans to stick with Django?

{{ trans(key="title", lang=lang) }}

This comment has been minimized.

@Keats

Keats Oct 19, 2017

Author Collaborator

I'm fine with both

@enricostano

This comment has been minimized.

Copy link

enricostano commented Oct 19, 2017

Really looking forward to this! Thanks 👍

### Url

The default language base url will be equal to `config.base_url`.
Other languages will be available at `{config.base_url}/{language}`

This comment has been minimized.

@batisteo

batisteo Jan 16, 2018

Contributor

I think it’s important to have the language in the URL when i18n is used.

One challenge is when an user access the root url, what would happen?
I would address this issue by:

  1. Copying the output folder in the root and in /en
  2. All the links in the page direct to the link path with the language
  3. Include a head tag <link rel="canonical" href="https://example.com/en/" >
  4. Optional redirect in Javascript for the files in the root folder, like:
{% if config.language_redirect %}
<script>
  if (navigator.languages) {
      let paths = ['/en/', '/fr/', '/jp/']
      let path = match_language_path(navigator.languages, paths, '/en/');
      window.location.replace(path);
  }
</script>
{% endif %}

This comment has been minimized.

@Keats

Keats Jan 17, 2018

Author Collaborator

I think that should be up to the users whether they want the lang code in the url for their default language

This comment has been minimized.

@batisteo

batisteo Jan 17, 2018

Contributor

I agree it should be a choice. Zola should make it easy either way. Do you mind adding a setting to opt in this behavior? redirect_to_default_language?

This comment has been minimized.

@batisteo

batisteo Jan 23, 2018

Contributor

For reference, the index.html of rust-lang.org:
https://github.com/rust-lang/rust-www/blob/master/_layouts/redirect.html

It is a empty page with just the JS and HTML redirect. The setting could be then force_i18n.

The content files have to have the same name for multiple languages.
The language is defined in the extension prefix: `{name}.{language_code}.{extension}`

The language code is omitted for the default language.

This comment has been minimized.

@batisteo

batisteo Jan 17, 2018

Contributor

The language code can be (?) omitted […]

@Keats Keats force-pushed the i18n-rfc branch from a86732e to 53b015c Feb 6, 2018

@Keats Keats force-pushed the i18n-rfc branch from 53b015c to f39052f Feb 6, 2018

@Keats

This comment has been minimized.

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Aug 6, 2018

Hi!

I work at Mozilla on Gecko Platform Intl/L10n technologies. I'm one of the core contributors to ECMA402 and one of the authors of Project Fluent.

One of the projects I work on is bringing the core modern intl/l10n API to Rust, since we'd like to soon start transitioning Gecko Intl/L10n to it, which makes me interested in Rust intl/l10n story.

I'd be happy to help you with your project and advice on designing around modern i18n/l10n systems, but I have to warn you that Rust is very immature in this area and there are not many hashed out libraries to chose from.

On the flip side, that means that we can focus on designing and implementing modern libraries and leap frog the last 20 years of badly designed intl solutions :)

In that spirit, while I'm trying to stay unbiased and not necessarily recommend technologies my team is working on, I will in an opinionated fashion recommend three things:

  • Settle around Unicode CLDR and ECMA402. It's a great model for locale management, and good source of API inspirations that are well hashed out, thought through and will be maintained for many years by all major browser vendors.
  • Do not use gettext. Gettext is a very old localization system with bad pluralization, and close to no support for any advanced intl formatting or bidirectionality. Project Fluent has an article on the differences, which apply to any modern system vs gettext. I would recommend either going for ICU MessageFormat, which is more mature, but a bit older and limiting, or Project Fluent, which is very modern, but yet to reach 1.0 (hopefully by Unicode Conference in September!). Here's an article on the differences between those two.
  • Do not rely on OS hooks for formatting intl data. It sounds like an easy win but its a trap that every big system I know of is painfully escaping from.

Everything else is up to you. It's going to be some work to bring full intl/l10n, but you can start with simple things like managing language negotiation between user request, accepted headers and articles in many locales. (for that I strongly recommend using locale lists when dealing with user requests, rather than a single locale).
Then you can hook a localization system into your template language. If you'll decide to go with Fluent, we'd be interested in cooperating on making Fluent API easy to hook into templating systems. If you'll go with MessageFormat, you should shape your API after one of the messageformat templating APIs.
Finally, you'll need intl formatters, which I hope will slowly start being designed and developed now that we have intl_pluralrules, but it'll take time to get them, so I wouldn't expect anything anytime soon.

Hope that helps!

@Keats

This comment has been minimized.

Copy link
Collaborator Author

Keats commented Aug 6, 2018

Thanks!

but I have to warn you that Rust is very immature in this area and there are not many hashed out libraries to chose from.

That's one of the reason I haven't started working on i18n before, hoping that it would get more tools by now.

Everything else is up to you. It's going to be some work to bring full intl/l10n, but you can start with simple things like managing language negotiation between user request, accepted headers and articles in many locales. (for that I strongly recommend using locale lists when dealing with user requests, rather than a single locale).

Gutenberg is a static site engine so it's kind of easier on that end: the language lives in the URL and that's it. No need to deal with headers, JS or anything else: all the interpolation/resolution happens at "compile time".

Then you can hook a localization system into your template language. If you'll decide to go with Fluent, we'd be interested in cooperating on making Fluent API easy to hook into templating systems

I maintain tera and the oldest issue open is still Keats/tera#134 :) I'm not too sure how that would look like though, it seems more a concern at the user (in this case Gutenberg, Rocket etc) level.

In Gutenberg case, we only really need to translate the templates so that's a very limited scope.

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Aug 6, 2018

Gutenberg is a static site engine so it's kind of easier on that end: the language lives in the URL and that's it. No need to deal with headers, JS or anything else: all the interpolation/resolution happens at "compile time".

I'm no expert - my main experience in the area comes from writing several django sites, but I'm not sure if that's correct.

Imagine a scenario where the website author provides content in two locales - they provide an article in French and German (content localization). They also provide a translation of the website UI's using MessageFormat as it.json and fr.json.

A Swiss user visits the website, and their Accepted Headers look like this: ["it-CH", "de-CH", "fr-CH", "rm-CH"].
What locale selection guttenberg is going to show the article in?

Unless I'm mistaken, the system will have to perform a language negotiation based on the selected locales. It'll have to parse the locale headers, collect the available locales for the content and for the UI, and negotiate the best possible fallback chain of locales for UI and content to show.
In this particular example, the content will probably be shown in ["fr"] as this is the locale handled by both pieces, and matches closely third most requested locale from the user.

But you can imagine other negotiation strategies that will provide the content in ["de"] and the UI in ["fr"], and of course everything becomes more complex if you article is in "fr-CA" (Canadian French).
In extreme edge cases, the user may request "sr-RU" and your content is in "sr-SR" and unfortunately those two extend to "sr-Cyrl-RU" and "sr-Latn-SR" which means that the user requests content in Cyrlic script, and you have content in Latin script, so that's not a match.

All of that is solved by a language negotiation API.

Am I wrong assuming gutenberg will need one?

@Keats

This comment has been minimized.

Copy link
Collaborator Author

Keats commented Aug 6, 2018

Gutenberg outputs HTML files, you don't need to have a server in between them and the users so content negotiation is not necessarily possible: I use netlify, some people host them on S3, others in git repos. Even less serverless than serverless :)

The way it is usually done in static site is that the creator selector a default lang and the users can switch later if needed. After that, nothing prevents someone from writing some JavaScript that would redirect from homepage to another language after checking navigator.languages.
Not as good as checking headers and directly returning the right page of course but I would assume that running a webserver for a static site is very rare

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Aug 6, 2018

Ahh, I see. Interesting. Thank you for taking time to explain it to me :)

@paulcmal

This comment has been minimized.

Copy link
Contributor

paulcmal commented Aug 14, 2018

A map is added to the Config struct to hold some i18n strings:

What do you think about separate files for each language? Having everyone edit the same file (config.toml) just brings merge conflicts to resolve.

After working on an internationalized site with Hugo, i now think that one file per language is the minimum requirement for maintainability. To be honest, i actually found that system limiting. Allowing multiple files per language would make it easier for people to maintain translations, by grouping strings by area of interests.

How to only render a RSS feed for a given language if generate_rss is set to true. Do we even care about that feature?

If custom output formats (#365) are implemented as well, i think the issue of per-language RSS feed would be de facto addressed. Each language's sections/taxonomies could have their own RSS.

On a sidenote, i've taken a look at Fluent and it's amazing. I didn't dig too much, but from a translator's perspective it seems to address all issues i encountered with other internationalization systems. There's also a rust implementation.

@Keats

This comment has been minimized.

Copy link
Collaborator Author

Keats commented Aug 14, 2018

What do you think about separate files for each language?

Definitely needed. The current trans global function and languages map of the config.toml are stopgap while a proper i18n is implemented.

Allowing multiple files per language would make it easier for people to maintain translations, by grouping strings by area of interests.

Is it that big of a deal for a static site? The content translation is done by the user so there isn't as much to translate as for an app for example, only the templates need to be translated.

@paulcmal

This comment has been minimized.

Copy link
Contributor

paulcmal commented Aug 14, 2018

only the templates need to be translated

Yes, but when making a theme with different components (widgets of sorts), that can become a lot of different strings to handle in different languages. From my (limited) perspective, i only see two way to decouple properly content from templating :

  • translation namespaces with support for different files per language (i.e. mytheme/i18n/agenda.{fr,en}.strings and mytheme/i18n/gallery.{fr,en}.strings)
  • internal page bundles that don't get rendered but allow to group content together while taking advantage of the architecture for translating content files (seems already possible with render = false, but only for sections)

Thinking of it, i don't even think one of the approaches contradict the other. They even seem complementary to me.

Translation namespaces for small strings offer proper separation of concerns, easily porting components from a site/theme to another, and potentially even sharing sharing translation packs like we share themes. I also find it easier if you write scripts to keep updated on missing translations to let translators know that this small file hasn't been translated (or lacks some strings) instead of pointing them to line XYZ in a single file. Even without a script you can see directly from your file manager when a file is missing in a language.

Internal page bundles, on the other hand, would make it easier to organize actual content translation (potentially big files with separate paragraphs). To demonstrate the power of internal page bundles, imagine a multi-column widget that's content agnostic (like asked for in #289). For example, to allow for easy translation of a footer with multiple columns (separated by !!!!! in the Markdown) directly from the content folder :

<footer>
{% set page = get_page(path="_common/footer.md") %}
<h2>{{ page.title }}</h2>
{% set columns = page.content | split(pat="!!!!!") %}
{% for column in columns %}
    <div class="footer-column">
        {{ column | safe | markdown(inline=false) }}
    </div>
{% endfor %}
</footer>

So in the end content translators only have to deal with Markdown files as they would for regular pages, and theme translators with a few translation strings in dedicated files that are easy to keep track of.

@Keats

This comment has been minimized.

Copy link
Collaborator Author

Keats commented Aug 15, 2018

I guess I would go with one file per lang to start with simply because it's easier to have something working.
I'm not sure how could Gutenberg know that gallery.fr.strings for example is meant to only be used in a specific context?
My thought process was:

  • themes can define their i18n strings in a file
  • users can override translations in their own i18n files, which get merged with the theme one
  • theme calls a global function, let's say trans(key="header_under_logo", lang=lang) that will look for that key in the merged translations.

Adding namespaces to the mix seem to complicate the code and user UX for not much gains IMO.

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Aug 15, 2018

If you'll decide to use Fluent, we have the concept of MessageContext - you can read more about it here: https://github.com/projectfluent/fluent/wiki/Get-Started

It allows you to select a number of resources and put them together into a single context that resolves messages together.

@matclab

This comment has been minimized.

Copy link

matclab commented Nov 18, 2018

When using i18n static blog, there are two scenarios for articles that are not translated : either display the article in the default language, or else do not display the article. I think it's a common scenario to not have all articles translated, and as such it would be nice if the two alternatives were offered by Zola.

@Keats

This comment has been minimized.

Copy link
Collaborator Author

Keats commented Nov 22, 2018

Yep that's definitely one of the required bits for i18n

@Keats

This comment has been minimized.

Copy link
Collaborator Author

Keats commented Nov 22, 2018

I updated the RFC and moved it to the new Discourse for Zola: https://zola.discourse.group/t/rfc-i18n/13

Closing this in favour of that thread.

@Keats Keats closed this Nov 22, 2018

@Keats

This comment has been minimized.

Copy link
Collaborator Author

Keats commented Dec 29, 2018

Everyone interested in that thread can have a look at #567 for the initial implementation (only content, no translations in templates yet). It is missing taxonomies (as I have no ideas what to do with them yet), links to other translations of the same content and per language RSS.
It is still pretty rough but it would be helpful to have users needing multilingual sites to try and report on what is missing/needed. The PR has a doc page in it explaining how to set it up. I don't want to add too many things to it until I know it is going in the right/best direction.

Feedback is probably better given on https://zola.discourse.group/t/rfc-i18n/13 to avoid notifications from commit on the PR and is a better discussion medium overall.

cc @batisteo @enricostano @matclab and anyone else wanting multilingual sites!

@Keats Keats deleted the i18n-rfc branch Dec 29, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment