Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-language (the auto-translate proposal) #617

Closed
ProfYaffle opened this issue Jun 8, 2015 · 21 comments
Closed

Multi-language (the auto-translate proposal) #617

ProfYaffle opened this issue Jun 8, 2015 · 21 comments
Milestone

Comments

@ProfYaffle
Copy link
Contributor

ProfYaffle commented Jun 8, 2015

<edit>

Note from project maintainers:

There are multiple aspects to supporting multiple languages in MkDocs, each of which is linked to from #211, which is the primary issue and ties all of the various aspects together. However, it is best to discuss each aspect in its related issue.

This issue is one (of two) proposals for adding support for pages in multiple languages. This issue proposes a system in which pages are auto-translated using po files or the like. and #774 proposes a system in which pages are manually translated.

It is likely that any native solution would come from a third party plugin or wrapper if/when one establishes itself as well developed and is a clear favorite among users. Whether that means the plugin would be moved in-house or it would become the "recommended" solution and remain a third-party plugin will depend upon the circumstances at the time that such a decision is made. In any event, the best way to move this forward is to volunteer your time to develop and/or test (and provide feedback to) any such third-party solutions.

</edit>

There was a long conversation today on IRC about this, so I thought I'd raise an issue to document some thoughts - and, hopefully, collect better ones :)

Problem Statement

  • How to support more than one language within mkdocs without over-complicating things
  • How to make these user-selectable
  • How to co-exist with multiple document versions (is it actually the same thing?)

Considerations

  • Default language/landing page

  • Language selector (vs version selector?)

  • Whether translations are stored alongside the default docs or pulled in from somewhere else, e.g.

    Site
    Language (default)
    Content
    Language (alt)
    Content

vs

Site (default)
    Language (default)
       Content
Site (alt)
    Language (alt)
       Content
  • How much of this is a mkdocs issue versus a publishing/hosting issue?
  • Ease of use - it can't get in the way of single-language simplicity
  • Theme support vs content

Constraints

  • mkdocs renders a single, static site
  • We don't necessarily want tags in the markdown, translation/.po files, or similar
  • We want to keep the source .md files clean

Initial Ramblings

From what I can see on a quick quint around, most multi-lingual sites are using dynamic content, or have multiple sites under different CNAMEs (even if they then link to a central host). So they're no use as inspiration...

The issue is thus one of jumping from one hierarchy of docs to another. Given how mkdocs renders (e.g. how ToCs are handled), you'd probably want to build each language site independently and then glue them together - otherwise, your ToC would have all language headings, and you coudl prev/next between languages.

So, does this actually turn into a debate about 'wrapping' independent sites in some way?

  • You could do it with a language selection page (pretty ugly, but useful if you think your users couldn't find the language selector otherwise), and then shift to the relevant mkdocs sub-site as necessary; each sub-site would need a path back up to the top if you wanted to change.
  • Or you could simply have a selection mechanism on your default page that allows you to switch into a different documentation path on the hosted site.
  • Cookies to store language preference? Direct URLs to each language?
  • Interplay with version? (my.project.docs/vv/en or my.project.docs/en/vv - probably the latter - which makes sense in building, as /en is a standalone mkdocs output with multiple versions and its own default).
  • You could do a lot with relative paths. Indeed, if each sub-site is standalone, they don't even have to have the same theme or content. Inefficient use of storage, though, because of repetition.
  • If mkdocs handled the ability to switch 'branches' cleanly, then it becomes a hosting issue: different language maintainers pushing different sub-sites in line with their design guidelines/headings/policies (is that in line with @d0ugal's initial approach to versions?).
  • You could stick to replaceable strings in the main themes, as these would need to be localised as well (home/prev/next et al). That suggests theme-sharing, which would cut down on the duplication of storage; alternatively, each sub-site keeps its own theme and there's just some way to force the used theme into language XYZ.
@d0ugal
Copy link
Member

d0ugal commented Jun 8, 2015

See also #211 for some old and brief discussion on this subject.

@ProfYaffle
Copy link
Contributor Author

It would seem that readthedocs - or, their home docs page, at least - decided to take a string-substitution approach in some capacity (versus the site's recommended translation method which seems to be to bring in a completely separate repository as alternative).

See here: https://github.com/rtfd/readthedocs.org/blob/master/docs/locale/it/LC_MESSAGES/index.po

It would appear that the strings approach is the preferred route, despite my misgivings:

As with most of the Django applications out there, Read the Docs’ i18n/l10n framework is based on GNU gettext. Crowd-sourced localization is optionally available at Transifex.

For more information about the general ideas, look at this document: http://www.gnu.org/software/gettext/manual/html_node/Concepts.html

Link to the RTD page here.

Not 100% sure of the relevance, as much of the python/Django vernacular is lost on me, but I mention it here in case someone else understands properly 😀

@d0ugal
Copy link
Member

d0ugal commented Jun 10, 2015

I think they are talking about translating ReadTheDocs with Transifex rather that translating the documentation. So, in MkDocs terms that would be translating the theme (Next/Previous text, search tokenisation/stemming and so on) rather than translating the Markdown documentation.

These are two quite different but somewhat related problems.

@ProfYaffle
Copy link
Contributor Author

MMm, understand the difference - but the example strings are related to the markdown itself (e.g. 'Benvenuto a Read The Docs' - see what they did there?).

I'm not convinced that this is a 100% complete model, though, as it does contradict the add-another-repository approach, and is far from a polished example.

@d0ugal
Copy link
Member

d0ugal commented Jun 11, 2015

Hmm, true. I am not really sure then!

@waylan
Copy link
Member

waylan commented Jun 11, 2015

I believe that Read The Docs is using Sphinx for their own documentation and it appears that they are using Sphinx's Internationalization feature, which "Sphinx uses ... to translate whole documents."

I haven't looked at the source code to confirm, but I suspect Sphinx taps right into Docutils by accessing the document object (before it is serialized to an HTML string), stepping through it, and swapping out each phrase with the translated phrase from the po file for the appropriate locale. Interestingly, it appears that this is done before inline markup is processed; presumably because the inline markup would need to be different to fit different grammars in different languages.

If MkDocs wanted to take a similar approach, I see two options:

  1. Create a Markdown extension which implements a Treeprocessor which does the same thing on the ElementTree object (before the inline treeprocessor is run). However, I see in the po files for Sphinx that a file and line number is assigned to each entry. That information is not available within the Markdown parser, so it may be a little trickier to get working.
  2. Simply parse the HTML returned by Markdown and use the po file against that. However, given that the inline markup will have been processed by then, it may be rather difficult.

@ProfYaffle
Copy link
Contributor Author

FWIW, you may want to take a look at https://github.com/tvheadend/tvheadend-documentation - a multi-lingual mkdocs build system using gettext translation strings combined with a Python markdown parser to generate our web interface help files.

I don't pretend to understand it, as Python is a black art to me, but you guys can surely form an opinion!

@ProfYaffle
Copy link
Contributor Author

Just need a version switcher for the different languages/versions now...

@waylan
Copy link
Member

waylan commented Jun 11, 2015

Wow, that is quite the project. If I'm understanding everything correctly, they use the mistune Markdown parser, which returns a parse tree. They then walk that parse tree and build a pot file, which is used to build po files for each language. To build the docs, they use the po file and the original docs to generate translated docs which are then fed through MkDocs.

The potential problem I see is that mistune might not properly parse some syntax which is supported by some of Python-Markdown's extensions. Therefore you may get a few surprises. I think you need to consistently stick to a single Markdown parser for consistency.

@ProfYaffle
Copy link
Contributor Author

Yup, you've got it. Not sure whether that's what anyone was envisaging, but it seems to work for us - one set of markdown reference docs in EN as a source, which ultimately generates in-app web help and (potential) multi-lingual help/mkdocs targets.

So far, it seems to be working - I've been checking the HTML output rather than having a fully-assembled translation to work through. We were using pandoc initially to do the markdown-to-html step, so we already had the inconsistency risk (I don't think mkdocs will produce ToC-less pages) - and at least we control the translation now to a greater extent, because we can tune everything.

Working for everyone? Probably not. But if there's some inspiration or an idea in there, clamber aboard - that's what FOSS is about, after all 😄

@lepture
Copy link

lepture commented Jun 19, 2015

@ProfYaffle @waylan I am the author of mistune Markdown parser. You can create extensions with renderer and lexers, here are some examples:

  1. TOC renderer: https://github.com/lepture/mistune-contrib/blob/master/mistune_contrib/toc.py
  2. Math and latex: https://github.com/lepture/mistune-contrib/blob/master/mistune_contrib/math.py

@heaviss
Copy link

heaviss commented Oct 18, 2017

Hi all! (@ProfYaffle, @d0ugal )
Has something changed here from 2015?
I'm only junior, but I want to to something. How can I help to move forward?

@Depado
Copy link

Depado commented Nov 2, 2017

Hey there ☺
Same question as @heaviss ^^

@waylan
Copy link
Member

waylan commented Nov 2, 2017

There has been no work on this. The discussion above is it. We discuss some options but no decisions have been made about which solution is better. If you would like to work on it, review the discussion, pick an approach, and get to work. I would suggest starting as a third-party plugin.

@Depado
Copy link

Depado commented Nov 3, 2017

Thanks @waylan I'll have a look ☺

@cakrit
Copy link

cakrit commented Feb 13, 2019

You can see a less than ideal approach that we followed using mkdocs and mkdocs-material to produce https://docs.netdata.cloud.
The static site generator is here and the translations are contributed in a separate github project than the one containing the original markdown files.

Again, far from ideal, but it did let us move ahead.

@waylan waylan mentioned this issue Feb 22, 2019
@waylan waylan changed the title Multi-Language Support Multi-language (the auto-translate proposal) Feb 22, 2019
@waylan
Copy link
Member

waylan commented Feb 22, 2019

It should be noted that the search documentation states:

While search does support using multiple languages together, it is best not to add additional languages unless you really need them. Each additional language adds significant bandwidth requirements and uses more browser resources. Generally it is best to keep each instance of MkDocs to a single language.

That being the case, I expect that any solution to this proposal will need to run a separate MkDocs build for each language. We already have a similar solution in jimporter/mike which runs a separate build for each version. Presumably this would follow a similar pattern and could be implemented as a third party tool.

Finally, #211 is the official issue covering multi-language support and this issue only covers one proposal for one aspect of the feature as outlined in this comment.

@weitzman
Copy link

FYI, Material for mkdocs supports a language selector now https://squidfunk.github.io/mkdocs-material/changelog/#700-_-february-22-2021

@waylan
Copy link
Member

waylan commented May 6, 2021

I am excited to announce that theme localization has just been added in #2299 and will be available in version 1.2.0 of MkDocs!

Of course, that does not address translation of page content, which this issue is addressing. But the solution there may affect how this issue proceeds.

@mondeja
Copy link
Contributor

mondeja commented Jun 8, 2021

I've created a plugin to translate Mkdocs projects using PO files. Check it at https://github.com/mondeja/mkdocs-mdpo-plugin

@ultrabug
Copy link
Member

i18n is now supported by mkdocs themes and there are plugins for content localization available that should be considered (mkdocs-static-i18n is most mature at the time of writing this comment - disclaimer: I'm the author of the mkdocs-static-i18n plugin).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants