Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Docusaurus v2 i18n #3317

Closed
slorber opened this issue Aug 20, 2020 · 25 comments
Closed

RFC: Docusaurus v2 i18n #3317

slorber opened this issue Aug 20, 2020 · 25 comments
Labels
proposal This issue is a proposal, usually non-trivial change

Comments

@slorber
Copy link
Collaborator

slorber commented Aug 20, 2020

Docusaurus v2 i18n

Here is a brain dump of many things to consider for i18n support in v2.

I'll keep this issue updated over time, but feel free to comment if you have anything to say, particularly if you used v1 i18n support and can provide valuable feedback.

Superseed this older issue (that still have interesting content): #2651


Existing translation systems

Links to get inspiration from.

Git fork based translations

Have an upstream repo (often in English), and one fork per language

A translation strategy first seen on Vue translation: each language creates a git fork.

We can build tooling on top of that, so that a translation change made in the upstream repo can trigger new PRs on forked repos, to automate the process and ensure translations stay in sync.

Pros:

  • work fine, as seen on ReactJS and VueJs docs
  • stay in sync with upstream repo
  • one repo per lang permit per-language git permissions
  • developers don't like to contribute through another new saas tool that they have to learn
  • developers like their contributions to appear on github graph etc...

Cons:

  • developer-centric workflow
  • does it work for small communities?
  • maintaining many forks can be overwhelming, can we find an owner for each fork?
  • need infrastructure to run the sync bots and open the PRs
  • no translation automation tools

ReactJS case

Links related to the work of Nat Alison.

Contains some interesting notes on why a SaaS like Crowdin was not a good fit, despite an attempt to use it.

https://reactjs.org/blog/2019/02/23/is-react-translated-yet.html
reactjs/react.dev#1605
https://github.com/reactjs/reactjs.org-translation
https://github.com/reactjs/reactjs.org-translation/blob/master/PROGRESS.template.md
facebook/react#8063
reactjs/react.dev#82
reactjs/react.dev#873

GatsbyJS case

Another translation RFC from Nat Alison, quite close to her work on ReactJS:
https://github.com/gatsbyjs/rfcs/blob/master/text/0010-gatsby-docs-localization.md

I don't think this work is in production.

Also some interesting bits on this thread where she explains her unfortunate situation working at Gatsby.

Git, single repo

You have a repo and you just have a folder per language.

Pros:

  • simple approach, easy to contribute
  • still git-based, developers will like
  • single repo

Cons:

  • developer-centric
  • difficulty to have per language git permissions
  • how to stay in sync with "upstream language"
  • scalability, can be overwhelming to have many PRs large sites

Nuxt case

The nuxt doc is a simple repo with language folders.
It works fine, but the author told me it was hard to keep all languages in sync. Looks like a manual process.

TypeScript case

Quite similar, TS website has one languages folder per package: <packageName>/copy/<lang> and the translations are handled on the same github monorepo, but split by package

microsoft/TypeScript-Website#100
microsoft/TypeScript-Website#181

Note: Orta found a way to solve the per-language permission problem, as he created a bot so that code owners can self merge through a github PR comment despite not having git permissions:

microsoft/TypeScript-Website#130 (comment)
https://github.com/orta/code-owner-self-merge

Additional notes

I think it's possible to handle the "sync with upstream" problem inside a mono repo by using git patch.

  • generate a patch on upstream docs: git diff origin/master HEAD~100 -- ./website/docs
  • apply that patch to each translated language (maybe open one PR per language?)

https://stackoverflow.com/questions/9939952/create-a-patch-including-specific-files-in-git

It's a way to emulate the upstream repo -> language forks pattern

SaaS

Using a SaaS like Crowdin / Transifex or others has benefits, like the ability to have advanced translation features (UI, editors supporting various formats (PO, Markdown, ICU key/values), translation memory, automatically pay for platform translators, track translation progress, sync with upstream language, version management...)

Pros:

  • Advanced translation features
  • Non-developers can use it
  • Docusaurus v1 use Crowdin already

Cons:

  • Proprietary
  • Often paid services
  • Need custom scripts to interface with Docusaurus system
  • UI not always easy to use
  • Developers don't like it much
  • Developers contributions are not "visible", lack of incentives

Crowdin

Solution suggested by Docusaurus 1, free plan for open-source, used by Docusaurus site v1, Jest, Yarn, Electron...

We should rather try to make it easy to migrate from v1.

Not everybody like this solution however.

Some drawbacks mentioned here:
https://github.com/gatsbyjs/rfcs/blob/master/text/0010-gatsby-docs-localization.md#saas-platform-crowdin

Note: some questions I have asked to Crowdin here: https://gist.github.com/slorber/30643299196c7efa77084eec10c1c609

Other SaaS

???


Docusaurus 2 translation system

Unlike presented use-cases, we are a framework, not a site, and we don't serve a single community.

I think we want to be able to support both the developers and non-developers.

We can't expect all Docusaurus translators to be developers, nor git users, yet we know that developers don't necessarily always like the lock-in to a SaaS like Crowdin.

Translation management

I think the translation system should be file-system based, as it's probably the common abstraction between git-based workflows and saas-based workflows

Basically, if you build your site for the fr language, and if you have i18n/fr/docs/myDoc.md, then it should be used for the french page instead of the file at docs/myDoc.md.

I think ./i18n is a good default path to put the translated content, but the paths of such system should be flexible enough so that you can adopt the workflow of your choice, but I thin

  • git single repo: you can directly version control the content of ./i18n
  • git multiple forks: maybe you'll just fork the main website, but you could as well fork only the content and use git submodules or whatever...
  • saas: each saas will require integration scripts to upload/download the translations from/to the correct paths

So, the first step is to support the first case where you just put the translations in a folder of your site. I'm going to experiment with this on Docusaurus 2 website and try to see if I can provide a french translation.

It's unlikely we'll be able to provide integrations with all the existing translation SaaS, but a 2nd step would be to write integration scripts with Crowdin, so that v1 users can keep using it.

Translation runtime lib

It's likely we'll try to use FBT, a translation tool from Facebook.

I have personally a good experience with React-intl as well and prefer it over many react alternatives.

Translated URLs

Supposing en is the "main" language.

Does https://myDomain.com/en/myDoc exist?

What should be the behavior of the site if the URL does not contain a language, like https://myDomain.com/myDoc ? Is it the English language? Or do we add code to redirect to the most suitable language?

Is it ok for SEO to have a homepage that just redirects? Or is the homepage english? Then which page is the canonical one?

Note: v1 redirects docs, but not the homepage: https://docusaurus.io/ & https://docusaurus.io/docs/installation

Interesting comment (point 5): #2651 (comment)

Let's not forget to add the proper page meta tags such as:

<html lang="en">
<link rel="alternate" href="https://myDomain.com/fr/myDoc" hrefLang="fr-FR"/>

See also #2471

(I think if we have this header in pages, it's not needed to add it in sitemaps)

Translated URL schemes

There are multiple ways to handle the URLs of translated pages

https://fr.myDomain.com/myDoc

Using a custom subdomain seems not a very good fit, as it would require one separate deployment per lang (or you'd need to have some custom reverse proxy logic to handle that?).

I don't think this is the workflow we'll encourage, but we could still support this if people really want it. Maybe with an option like docusaurus build --fr, so that it builds a single language site.

Note: this can't be done on simple hosting solutions like Github Pages

https://myDomain.com/fr/myDoc

I think having a path language prefix is a simpler option, and can be easily done with a single deployment.

There's still a choice to be made here:

  • 1: Should we build all languages as a single SPA website?
  • 2: Should we build one SPA per language, and append the language prefix as baseUrl?

Both solutions has cons:

  • 1 means the PWA plugin will download all the content of all the languages for offline support (quite overkill)
  • 1 means the SPA routes file and other site globals will grow very large and can be a performance problem
  • 2 means we'll have to be careful about how the "main sitemap" will have to link to language specific sitemaps

For now I think 2 is a better solution

Details and problems to consider

1 SPA per language, dev experience?

As we have seen above, it may be a good idea for performance to split the site into multiple smaller SPAs.

But this also means that we'll build the SPAs independently, but what would be the dev experience if you run docusaurus start?

Do we code something completely different in dev so that the routes of all languages are accessible as a single SPA? Do we instead provide a docusaurus start --lang fr to only run the "french SPA"? I think it's an acceptable tradeoff and have some advantages, but can also be annoying for some users.

Anchor links

Auto-generated ids are a problem for anchor links.
As a translator change a heading of some translated markdown file, the id changes, and links from other files do change as well. We should provide an easy way to make the anchors stable across translations

reactjs/react.dev#1605 (comment)
reactjs/react.dev#1605 (comment)
ethereum/ethereum-org-website#272
https://github.com/reactjs/reactjs.org/pull/1636/files
mdx-js/mdx#810

Right-to-Left support

Support RTL in themes?

Plugin integration

TODO

Doc edit button

If the user is browsing a french doc, and press "edit", he should rather open the correct URL (git or crowdin), so we should make this configurable.

Related:
#648

Default language

We should not assume english will be the default language, like in v1.

#3317

Scalability

The build time mostly depends on 3 factors:

  • number of docs
  • number of versions
  • number of languages

To decrease build time and make it sustainable, you can remove older versions from the SPA part, and make them available as a standalone, single version deployment.

We'll work on a cli feature to "archive" older versions more easily: #3286

Fallback

A missing page/translation should be allowed, in such case we'd fallback to the default language and could show a warning

See 6: #2651 (comment)

Creating a language

We need a cli to init a language folder based on current language/versions

Creating a version

See proposal here: #2651 (comment)

We'll have to snapshot each localized folder too

Asset colocation

It's possible to colocate assets close to the docs. Somehow it permits to use a different image per version. What's the story for i18n? This colocated image would likely end up being copied in the language folders too, so it might be duplicated on multiple axis (version/lang). Is it a good thing? At the same time, if an image contains text, that text could be translated differently so it still makes sense...

Slugs

Should we allow to create custom slugs per language?

If we do that, to be able to switch from one lang to the other without loosing context (the doc you are currently reading), one version would have to be aware of the slugs of all the other language versions, which might be quite a lot of data. How do we access such data in a performant way?

To me, it does not look so critical to be able to switch language and preserving context. If the user wants to browse docs in french, he can go through the french home and browse from there, and it's likely google gives him the docs in the correct language in the first place.

We should try to find a solution though, but this can probably be done later, with some code that would, on language switch request, read some json file emitted by the other language, and then obtain a mapping from document id to slug of the other language.

Note: Yarn 1/classic (Jekyll based?) can switch language and preserve context when doing so, but the slugs are not localized: https://classic.yarnpkg.com/es-ES/docs/usage

Translation mode

If you add the ?translate=true querystring, it could enhance the UI so that we add in-place translation features.
It could be possible to integrate with the translation API of a SaaS like crowdin.
This is mostly for key/value translations, as markdown docs will be translated as a whole and there's already the editUrl on the docs plugin.


TODO ...

Ongoing PR: #3325

@slorber
Copy link
Collaborator Author

slorber commented Oct 14, 2020

Worth studying the NextJS i18n routing RFC: vercel/next.js#17078

@clairefro
Copy link

Hi there, eavesdropping as I've also been grappling translation approaches for another project

Regarding translation management in a single repo scenario, are you aware of git localize? A system that incorporates or mimics this could make for happy devs

https://gitlocalize.com/

@slorber
Copy link
Collaborator Author

slorber commented Nov 3, 2020

thanks @clairefro , didn't know about this one, will take a look :)

@slorber
Copy link
Collaborator Author

slorber commented Dec 10, 2020

Here are some news about i18n support.

You'll find the i18n RFC here: #3317

The i18n core PR has already been merged but it is not officially released yet.
#3325

However, can test it using the @canary npm dist tag (yarn add @docusaurus/core@canary etc) and reading some instructions in that PR.

We are in the dogfooding phase to see if the i18n API and system works fine, and if we need some breaking changes.

We dogfood this on 2 sites:

When tests are ok for these 2 sites, we'll release i18n with proper documentation, hopefully before the end of the year.

The Jest v2 + i18n migration is in progress and can be tracked here: jest-website-migration/jest#2

@tomchen
Copy link

tomchen commented Jan 5, 2021

Hello, good job, I'm currently using it. And here are my suggestions:

  1. I can add className to any other navbar items but {type: 'localeDropdown'}. Please support className so user can more easily write CSS for it.

  2. {type: 'localeDropdown'} should has an icon by default, just like v1 where it has 2021-01-05_142752. You can imagine a visitor, who does not speak a language at all, lands on a page in that language, and can't find the lang dropdown menu because the title of the lang menu is also in that language...

btw, in case anyone wants, currently I use something like this in Sass to insert the aforementioned lang icon (click to show)
// language bar
div.navbar__items.navbar__items--right > div:nth-child(1) > a {
  padding-left: 40px;

  &,
  &:hover {
    background: url("data:image/svg+xml,%3Csvg viewBox='0 0 24 24' xmlns='http://www.w3.org/2000/svg'%3E%3Cpath d='m0.35579 8.7757v-7.824h1.764v6.336h3.096v1.488z'/%3E%3Cpath d='m7.3323 4.8159-0.192 0.72h1.668l-0.18-0.72q-0.168-0.58802-0.324-1.248-0.156-0.65998-0.312-1.272h-0.048q-0.144 0.624-0.3 1.284-0.144 0.648-0.312 1.236zm-2.832 3.9602 2.448-7.824h2.124l2.448 7.824h-1.872l-0.48-1.86h-2.388l-0.48 1.86z'/%3E%3Cpath d='m11.364 8.7757v-7.824h1.812l2.04 3.888 0.768 1.728h0.048q-0.06-0.624-0.144-1.392-0.072-0.768-0.072-1.464v-2.76h1.68v7.824h-1.812l-2.04-3.9001-0.768-1.704h-0.048q0.06 0.648 0.132 1.392 0.084 0.744 0.084 1.44v2.772z'/%3E%3Cpath d='m21.139 8.9197q-0.80398 0-1.512-0.252-0.696-0.264-1.212-0.768t-0.816-1.248q-0.288-0.75598-0.288-1.74 0-0.97202 0.3-1.728 0.3-0.768 0.816-1.296 0.528-0.528 1.224-0.80398 0.696-0.27598 1.476-0.276 0.85198 0 1.464 0.312 0.61202 0.312 0.99598 0.70798l-0.92402 1.128q-0.3-0.264-0.63602-0.44402-0.33602-0.18002-0.84-0.18-0.456 0-0.84 0.18-0.372 0.168-0.648 0.49202t-0.432 0.792q-0.144 0.46798-0.144 1.056 0 1.212 0.54002 1.884 0.552 0.65998 1.656 0.65998 0.24 0 0.46798-0.06 0.22798-0.06 0.372-0.18v-1.344h-1.296v-1.44h2.856v3.6q-0.408 0.39602-1.08 0.672t-1.5 0.276z'/%3E%3Cpath d='m11.348 12.09h-2.82l1.095-0.615c-0.24-0.54002-0.705-1.335-1.11-1.95l-1.095 0.54002c0.40501 0.615 0.855 1.47 1.08 2.025h-2.715v1.11h5.565zm-0.52502 1.995h-4.4251v1.095h4.4251zm-4.4251 3.06h4.4251v-1.065h-4.4251zm3.315 2.07v2.145h-2.13v-2.145zm1.215-1.125h-4.5449v5.0699h1.2v-0.67498h3.345zm6.3602 1.02v2.43h-4.035v-2.43zm-5.325 4.2452h1.29v-0.6h4.035v0.54002h1.35v-5.4002h-6.675zm4.9051-9.5252v1.995h-2.46c0.12-0.585 0.255-1.275 0.39001-1.995zm1.29 1.995v-3.135h-3.135c0.06-0.44999 0.15-0.91499 0.225-1.335h3.9148v-1.17h-7.6199v1.17h2.325c-0.06 0.41998-0.135 0.88502-0.225 1.335h-1.74v1.14h1.545c-0.135 0.72-0.27 1.41-0.40501 1.995h-1.89v1.155h8.3399v-1.155z'/%3E%3C/svg%3E")
      no-repeat;
  }

  &:hover {
    color: var(--ifm-navbar-link-color);
    opacity: 0.6;
  }

  html[data-theme='dark'] & {
    background: url("data:image/svg+xml,%3Csvg viewBox='0 0 24 24' xmlns='http://www.w3.org/2000/svg'%3E%3Cg fill='%23fff'%3E%3Cpath d='m0.35579 8.7757v-7.824h1.764v6.336h3.096v1.488z'/%3E%3Cpath d='m7.3323 4.8159-0.192 0.72h1.668l-0.18-0.72q-0.168-0.58802-0.324-1.248-0.156-0.65998-0.312-1.272h-0.048q-0.144 0.624-0.3 1.284-0.144 0.648-0.312 1.236zm-2.832 3.9602 2.448-7.824h2.124l2.448 7.824h-1.872l-0.48-1.86h-2.388l-0.48 1.86z'/%3E%3Cpath d='m11.364 8.7757v-7.824h1.812l2.04 3.888 0.768 1.728h0.048q-0.06-0.624-0.144-1.392-0.072-0.768-0.072-1.464v-2.76h1.68v7.824h-1.812l-2.04-3.9001-0.768-1.704h-0.048q0.06 0.648 0.132 1.392 0.084 0.744 0.084 1.44v2.772z'/%3E%3Cpath d='m21.139 8.9197q-0.80398 0-1.512-0.252-0.696-0.264-1.212-0.768t-0.816-1.248q-0.288-0.75598-0.288-1.74 0-0.97202 0.3-1.728 0.3-0.768 0.816-1.296 0.528-0.528 1.224-0.80398 0.696-0.27598 1.476-0.276 0.85198 0 1.464 0.312 0.61202 0.312 0.99598 0.70798l-0.92402 1.128q-0.3-0.264-0.63602-0.44402-0.33602-0.18002-0.84-0.18-0.456 0-0.84 0.18-0.372 0.168-0.648 0.49202t-0.432 0.792q-0.144 0.46798-0.144 1.056 0 1.212 0.54002 1.884 0.552 0.65998 1.656 0.65998 0.24 0 0.46798-0.06 0.22798-0.06 0.372-0.18v-1.344h-1.296v-1.44h2.856v3.6q-0.408 0.39602-1.08 0.672t-1.5 0.276z'/%3E%3Cpath d='m11.348 12.09h-2.82l1.095-0.615c-0.24-0.54002-0.705-1.335-1.11-1.95l-1.095 0.54002c0.40501 0.615 0.855 1.47 1.08 2.025h-2.715v1.11h5.565zm-0.52502 1.995h-4.4251v1.095h4.4251zm-4.4251 3.06h4.4251v-1.065h-4.4251zm3.315 2.07v2.145h-2.13v-2.145zm1.215-1.125h-4.5449v5.0699h1.2v-0.67498h3.345zm6.3602 1.02v2.43h-4.035v-2.43zm-5.325 4.2452h1.29v-0.6h4.035v0.54002h1.35v-5.4002h-6.675zm4.9051-9.5252v1.995h-2.46c0.12-0.585 0.255-1.275 0.39001-1.995zm1.29 1.995v-3.135h-3.135c0.06-0.44999 0.15-0.91499 0.225-1.335h3.9148v-1.17h-7.6199v1.17h2.325c-0.06 0.41998-0.135 0.88502-0.225 1.335h-1.74v1.14h1.545c-0.135 0.72-0.27 1.41-0.40501 1.995h-1.89v1.155h8.3399v-1.155z'/%3E%3C/g%3E%3C/svg%3E")
      no-repeat;
  }
}
  1. Use this in <head> for better SEO:
<link rel="alternate" href="https://docusaurus.io/en/" hreflang="en" />
<link rel="alternate" href="https://docusaurus.io/fr/" hreflang="fr" />
  1. For lang global attribute (<html lang="">), you currently trim or simplify zh-Hant / zh-Hans to zh, or pt-BR to pt, etc. You should not. They are valid lang tags per iana assignment (this one is the real official list according to w3c).
    Changing zh-Hant / zh-Hans to zh actually causes font rendering problems (depending on your OS and your OS configurations, <html lang="zh-Hant"> pages could use fonts designed for zh-Hant, but <html lang="zh"> pages could default to zh-Hans fonts)

  2. You said in feat(v2): core v2 i18n support + Docusaurus site Crowdin integration #3325 the command is docusaurus write-translations --locales all, but:

First, it's actually --locale, without 's'. It's not only the text in PR #3325 that's wrong, I saw warnings in terminal like Available locales=, so you might need to check everything in the code, replacing locales by locale.

Second, I run docusaurus write-translations --locale all, it shows:

Error: Can't write-translation for locale that is not in the locale configuration file.
Unknown locale=[all].
Available locales=[en,fr,zh]

Well, I don't quite understand. I have the locale list in my docusaurus.config.js.

docusaurus.config.js (click to show)
  i18n: {
    defaultLocale: 'en',
    locales: [
      'en',
...
    ],
    localeConfigs: {
      en: {
        label: 'English',
      },
...

I had to docusaurus write-translations --locale=en and manually copy them into other lang folders. And it works fine.


This one is just a question, not really an issue or a suggesetion: how to check / get the language of the current page, or, how to modify (e.g. insert tags into the <head>) all pages from a specific language?

@slorber
Copy link
Collaborator Author

slorber commented Jan 5, 2021

Thanks for the feedback @tomchen

  1. I can add className to any other navbar items but {type: 'localeDropdown'}. Please support className so user can more easily write CSS for it.

Agree

  1. {type: 'localeDropdown'} should has an icon by default, just like v1 where it has 2021-01-05_142752. You can imagine a visitor, who does not speak a language at all, lands on a page in that language, and can't find the lang dropdown menu because the title of the lang menu is also in that language...

Agree

  1. Use this in <head> for better SEO:
<link rel="alternate" href="https://docusaurus.io/en/" hreflang="en" />
<link rel="alternate" href="https://docusaurus.io/fr/" hreflang="fr" />

About these meta tags, I didn't think they were 100% required for the initial i18n release (as v1 does not have them).

I'm still trying to figure some things out.
What I understand is that we can't simply put the "root" of the localized site here but need to link to the exact same page in the localized site.
And then we also want at the same time to localize the slugs (probably for SEO reasons too), but this complicates things as now /hello need to know about the french URL /bonjour, which makes things more complicated.

My idea was to create a /hello page (the original slug) on the localized sites, and redirects to /bonjour with JS, but not sure how happy google will be with that client-side redirect

  1. For lang global attribute (<html lang="">), you currently trim or simplify zh-Hant / zh-Hans to zh, or pt-BR to pt, etc. You should not. They are valid lang tags per iana assignment (this one is the real official list according to w3c).Changing zh-Hant / zh-Hans to zh actually causes font rendering problems (depending on your OS and your OS configurations, <html lang="zh-Hant"> pages could use fonts designed for zh-Hant, but <html lang="zh"> pages could default to zh-Hans fonts)

Thanks, didn't know, will see what I can do.

  1. You said in feat(v2): core v2 i18n support + Docusaurus site Crowdin integration #3325 the command is docusaurus write-translations --locales all, but:

First, it's actually --locale, without 's'. It's not only the text in PR #3325 that's wrong, I saw warnings in terminal like Available locales=, so you might need to check everything in the code, replacing locales by locale.

Second, I run docusaurus write-translations --locale all, it shows:

Error: Can't write-translation for locale that is not in the locale configuration file.
Unknown locale=[all].
Available locales=[en,fr,zh]

Well, I don't quite understand. I have the locale list in my docusaurus.config.js.

docusaurus.config.js (click to show)
I had to docusaurus write-translations --locale=en and manually copy them into other lang folders. And it works fine.

The PR might have been changed a bit and the PR doc is not up to date. The up to date doc is the cli --help flag.
Currently writing the i18n doc, will make sure the published doc is correct regarding this.

I removed the --locales all option as I thought it was a bit messy and overkill (as it's possible to run manually the cli command for each locale one after the other).
Do you have a usecase for it? Is your site repo public?

This one is just a question, not really an issue or a suggesetion: how to check / get the language of the current page, or, how to modify (e.g. insert tags into the <head>) all pages from a specific language?

You can access the i18n config and current locale using the useDocusaurusContext() hook.

You can wrap your site by using a custom Root component:
https://v2.docusaurus.io/docs/next/using-themes/#wrapper-your-site-with-root

Or by using a custom Layout component:

https://v2.docusaurus.io/docs/next/using-themes/#for-site-owners

This gives you the opportunity to use a <Head> (react-helmet) to add additional metadatas on a per-locale basis:

https://v2.docusaurus.io/docs/next/docusaurus-core#head

This is not the most convenient API to do that.
I'd like to make the current locale available in the configuration file, just not sure how to do this properly yet.
Also we currently build one SPA per locale instead of one SPA for all locales.
I'd like to avoid creating an API surface that would prevent us in the future to build all localized sites as a single SPA.


In general, the initial i18n release will not be exhaustive.

It is more the core system we will build on top, and I'd like to avoid rushing creating a hasty API surface, but rather gather some initial feedback and properly design the issues users have

@thibaudcolas
Copy link
Contributor

thibaudcolas commented Jan 5, 2021

It’s great to see the progress on this – just chiming in with a few things I think I can help with.

What I understand is that we can't simply put the "root" of the localized site here but need to link to the exact same page in the localized site.

Yes, each page should have bidirectional links to its equivalent in other languages there are translations for. In their official guidance, Google emphasizes it’s important for the links to be bidirectional at least between the "main" language of each page, and its translations (but links between different translations – not as much).

And then we also want at the same time to localize the slugs (probably for SEO reasons too), but this complicates things as now /hello need to know about the french URL /bonjour, which makes things more complicated.

For the validity of hreflang – there is no need to localize the slugs. From my experience I don’t see this done very often if at all, as it leads to the complexities you mention when querying pages by slug. So /fr/hello is perfectly fine. Looking around, I couldn’t find examples of localized slugs on the multilingual sites I’ve been involved with recently.

@slorber
Copy link
Collaborator Author

slorber commented Jan 5, 2021

Thanks @thibaudcolas

Also something to consider is that a Docusaurus site may decide to exclude some parts (like the blog) in localized sites. This is also the reason nextjs does not include hreflang headers by default (vercel/next.js#17078)

In any case, it's possible for you to insert the hreflang by creating your own <Root> component, including the conditional logic of your choice.

So I guess we could simply assume that we could generate hreflang for all languages and all pages of the site, assuming slugs won't be translated, and site translations will be "complete".

But will leave an option to disable this hreflang generation in case user wants to translate slugs or create partially translated sites, in such case he'll have to figure out how to generate the appropriate headers himself.

@tomchen
Copy link

tomchen commented Jan 5, 2021

@slorber No I don't have a usecase for --locale all. Not having --locale all is fine for me. I use docusaurus for some small personal py/js library documentation and hobby websites, my usecases are perhaps the simplest. I didn't even look at Crowdin and other i18n tools because I don't think I need them - I just generate i18n/<LANG>/ and edit these current.json files.

What makes my usecase a little special is I have zh-Hans (Simplified Chinese) and zh-Hant (Traditional Chinese), I even wrote a simple script to convert everthing in i18n/zh-Hans/ to i18n/zh-Hant/ before building). It's nice to write in a variant and automatically generate another variant. But a small SEO concern is that Google might think zh-Hans and zh-Hant pages are duplicate content, so, declaring <link rel="alternate" hreflang="" /> in <head>, correct lang code in <html lang="zh-Hant">, and optionally inserting a <link rel="canonical" href="<zh-Hans_URL>" /> on zh-Hant pages, these might be important here.

(Ideally the zh-Hans->zh-Hant conversion, or even any page creation from user data, should be done programmatically when building, but it seems currently users can't do this very easily in Docusaurus like we do it in Gatsby. And for a small website, there's nothing wrong in not doing it programmatically when building, but auto-generating lots of .md files from the data)

Oh, there's another i18n issue I forgot to mention: dropdown menu items in the top nav can be exported into current.json files using docusaurus write-translations --locale <LANG>, but after you translate them in current.json files, you will find they are still in default language and not correctly localized on the website. Non-dropdown top nav items work fine.

@slorber
Copy link
Collaborator Author

slorber commented Jan 15, 2021

Going to close this now but feel free to continue the discussions on i18n and providing feedback.

I'll handle those feedbacks very soon @tomchen.
Btw the translation of the theme is not yet 100% exhaustive, I mostly translated the low hanging fruits first but remains some small, less visible labels.

I've created an issue for any discussion related to Crowdin + Docusaurus here: #4052

@slorber
Copy link
Collaborator Author

slorber commented Jan 27, 2021

FYI I've handled your feedbacks @tomchen , let me know if this looks good to you.

Also worth mentioning that some i18n doc is online here: https://v2.docusaurus.io/docs/next/i18n/introduction

@hyochan
Copy link

hyochan commented Feb 7, 2021

Really thanks for this! Is there any goal to use FBT optionally?

@slorber
Copy link
Collaborator Author

slorber commented Feb 8, 2021

Thanks @hyochan . We don't plan to have first-class support for FBT, but I clearly want to make the integration possible. That goal will be reached when we'd be able to integrate FBT demos directly in its documentation here: https://facebook.github.io/fbt/
We probably need new lifecycle hooks for that (that will also unlock some CSS-in-JS integrations)

@renatodex
Copy link

renatodex commented Feb 16, 2021

Just heard that the Docusaurus team got some progress with the i18n feature!
That's awesome! I'm the owner of Fables & Goblins, an Open Source RPG System oriented in a Goblin world.
And we are using Docusaurus 2.0 to host the Online Book documentation:
htttps://www.fabulasegoblins.com.br

Currently, it's entirely written in Brazilian Portuguese, but I would like to try the latest non-stable version to see if I could start to add English as an option.
Do you guys think it's reasonably safe to try?

Or would you recommend waiting for a few weeks/months?

@slorber
Copy link
Collaborator Author

slorber commented Feb 16, 2021

@renatodex we'd appreciate if you try it and give us feedback, but in my opinion, it's already working fine and you should try it. Just use the @canary npm dist tag

@renatodex
Copy link

Thank you @slorber, I really appreciate all of this awesome work!

@mem212
Copy link
Contributor

mem212 commented Nov 22, 2021

But if you have only one local ( en ) , docusaurus will generate hreflang tags for both x-default and en with the same url AND the canonical tag with the same url also . This can be considered as a conflict on term of SEO and make search engines like Google ignore the doc.

@slorber
Copy link
Collaborator Author

slorber commented Dec 8, 2021

@mxhdx I'm open to suggestion and not an ultimate i18n/SEO expert, however it's important for me that you link to some authority website (like Google SEO documentation) to back your claims.

I think it's reasonable to remove hreflang headers for sites using a single locale, but where have you read that not doing so leads to search engines ignoring the docs?

@thadguidry
Copy link
Contributor

General targeting with x-default

If your page serves up content in a variety of languages or just asks a user to select a preferred page, you can use x-default to show that the page is not specifically targeted. That looks like this:

<link rel="alternate" href="http://example.com/" hreflang="x-default" />

Hreflang's effect on rankings

Hreflang attributes may not help you increase traffic; instead, the goal of using them is to serve the right content to the right users. They help search engines swap the correct version of the page into the SERP based on a user's location and language preferences.

The difference between hreflang and canonicalization

The difference between hreflang and canonicalization

Canonicalization is a tool for showing search engines which version of a URL (each with the same content) is the dominant one to avoid duplicate content issues. Hreflang, on the other hand, is a tool to show which of the different (but often similar) pages (based on language or region) should show up in a search.

Google recommends not using rel="canonical" across country or language versions of your site. But you can use it within a country or language version.

https://moz.com/learn/seo/hreflang-tag
https://developers.google.com/search/docs/advanced/crawling/localized-versions

@slorber
Copy link
Collaborator Author

slorber commented Dec 9, 2021

Thanks

@mxhdx @thadguidry I created a dedicated issue for these SEO problems: #6075

I read those pages when I implemented i18n but it's not so easy to interpret for me.
Please help me figure out the bad behaviors of Docusaurus with more concrete examples.

I don't remember reading anywhere that using x-default + hreflang en on the same page can lead to Google not indexing a page 🤷‍♂️ But I do agree it may be un-necessarily pre-emptive for unlocalized sites.

@thadguidry
Copy link
Contributor

thadguidry commented Dec 9, 2021

@slorber oh I was only giving that context and not making a judgement one way or another. If I were to make a judgement then I would say you are right that it is probably unnecessary but not harmful. That's what Google's docs are alluding to.

@jovezhong
Copy link

Is that possible to load different font for different locale?

For example, by default I'd like to use Inter font, when the user switched to Chinese locale, I'd like to use a different font, say noto-sans-sc

I cannot find a proper way to do so. I run yarn add @fontsource/inter and set --ifm-font-family-base: 'Inter'; in custom.css. Also import "@fontsource/inter"; in pages/index.js

It'll be great to have per locale font settings

@slorber
Copy link
Collaborator Author

slorber commented Nov 9, 2022

@jovezhong it's not possible, but indeed it makes sense to have per-locale stylesheets in general (allowing you to load fonts and define per-locale CSS variables).

I think I'll provide an API so that you can customize your site config on a per locale basis: this solves many use-cases we have like translating site titles etc... The bad news is that such an API is hard to design correctly, so it will likely be an experimental API that will be used as a temporary workaround

@jovezhong
Copy link

Thanks @slorber

Look forwards to such experimental API. Yes, it'll be nice to customize the title/tagline per locale.

Sorry I am not familiar with how docusaurus works. Is there a javascript file each page will load? Maybe I can import the custom font in that JS per locale and set font for --ifm-font-family-base

This is not a blocking issue for me.

@slorber
Copy link
Collaborator Author

slorber commented Nov 10, 2022

The way I see it is to enable config to receive the locale being currently built so you can register a per-locale stylesheet:

module.exports = function configCreator({currentLocale}) {
  return {
    // ...
    stylesheets: ["/locale-" + currentLocale + ".css"]
  };
};

This is a bit weird as an API unfortunately because we would have to run this function once initially with currentLocale: undefined just to be able to retrieve the list of locales your site supports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue is a proposal, usually non-trivial change
Projects
None yet
Development

No branches or pull requests

10 participants