Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd multilingual support #5
Comments
azerupi
added
the
Enhancement
label
Jul 29, 2015
This comment has been minimized.
This comment has been minimized.
|
multiple languages for document? |
This comment has been minimized.
This comment has been minimized.
|
Yes, I think Gitbook does support something like that. Instead of having the markdown files directly in the source folder you would have some sub folders like this:
And their would be an easy way to change the language in the rendered book. It's definitely something I would like to add, but it's not the highest priority at the moment |
azerupi
added
Status: Wishlist
M-Discussion
and removed
Type: Enhancement
labels
Jan 9, 2016
This comment has been minimized.
This comment has been minimized.
|
Multiple designs possible:
|
This comment has been minimized.
This comment has been minimized.
mkpankov
commented
Jan 13, 2016
|
I don't think one SUMMARY.md for everything is a good idea. I consider consistency within translated version more important than consistency with original. Otherwise, we can easily start having broken links because upstream renamed some chapter and translation didn't, yet. I believe a book that has no broken links is the minimum standard. Also, I don't support the idea of "pushing" to be up-to-date. AFAIK, translations (not only ours) are done by enthusiasts and it's not always possible to keep up at all times. Moreover, 1 to 1 mapping of pages doesn't look straightforward to me, even in case there's single SUMMARY. Words have different length in different languages, and in Russian translation we consistently have sentences that are noticeably longer than original. But I'd love to have it so that one click can show the same point in text in original language. I think this can be handled by tracking 1-to-1 mapping of paragraphs - sections aka markdown files are too big. Paragraphs also seem a good candidate because sentences get paraphrased and reordered sometimes, but the paragraphs stay in same order and have same gist. |
This comment has been minimized.
This comment has been minimized.
|
Thanks for the input! I really appreciate the feedback :)
When I am talking about 1 to 1 mapping I am talking about page to page mapping, not sentence to sentence (that would be insane Let's take a hypothetical situation with the Rust book. Let's say I am reading a blog post and it references some chapter in the Rust book, for example the chapter about ownership. But English is not my main language and it would be a lot easier to understand the chapter in my native language. If we have 1 to 1 mapping on page / chapter level the user could then select his language (if it is supported) from a dropdown menu and he would land on the exact same page in his chosen language. However for this to work correctly we need a guarantee that every page in one language has an equivalent page in the other language. If you allow a different
Of course, I totally agree with you. But the If there is one
To be honest, once a book has it's definitive structure the I think both designs have advantages and drawbacks, we need to figure out which one we want / need the most. Idea for Rust book workflow when translations are in treeWhen / if translations are moved into the official repository we could create a more elaborate pull request process. This is only an idea, it may be flawed When a pull request is made that contain changes that need translation (e.g. not typos) we could wait to merge the pull request until translations have been made for all officially supported languages. The pull request could track what translations have been made using a check list like this:
Once all the translations are ready the pull request is merged in. This would add a little / lot of overhead for the english version but it would solve the two big issues with translations.
There may be organizational problems I haven't considered though. @steveklabnik |
This comment has been minimized.
This comment has been minimized.
|
The biggest problem with blocking English changes to non-English changes is that I am paid for my work, but others are not. This places a big burden on them; I'm gonna want to land changes ASAP, and that's not fair to people who can't do this as a day job. |
This comment has been minimized.
This comment has been minimized.
|
That's true, didn't think of that. Anyways, do you have a preference for any of the two design choices (one vs. multiple |
This comment has been minimized.
This comment has been minimized.
|
I think I prefer a single for the reasons you've stated, but since I'm not doing the translations themselves, I don't think my opinions matter much :) And yeah, tracking might be different/better than actually blocking on them landing. |
This comment has been minimized.
This comment has been minimized.
mkpankov
commented
Jan 14, 2016
Ok, I think what I was trying to say but couldn't get across is this: page-to-page mapping isn't enough for printed versions, as same pages will have different content. And if by page you meant a web page, that is not enough either. Some sections (pages) are tens of screens long, and to provide smooth transition from one version to another we should track smaller units than entire files (web pages). I originally thought you were talking about printed pages and written the following, but I'm not sure now. For printed versions, depending on length of the section and sentence-length difference with the original, this can very from "I see not the beginning of the paragraph that talks about Foo feature, but the end" to "I don't see the paragraph that talks about Foo feature on screen at all", when linked to "page 83 of PDF". So let's clarify the terms before continuing as apparently I misunderstood something |
This comment has been minimized.
This comment has been minimized.
|
Ok yes, I will try to do my best to explain what I envision: So in this issue I am not at all talking about tracking any changes for translations, only about how to support multiple languages in the same folder / book. Before I continue, let's explain what the When you render the book (
That is the "only" information we get from the If we want to support multiple languages for one book, there are two possible designs (that I thought off):
Let's see both in more details. One SUMMARY.md for all languagesConsider this # Summary
- [hello world](hello-world.md)
- [second chapter](second-chapter.md)and this directory structure:
As you can see here, every language has the same markdown files defined in the global AdvantagesHaving a guarantee that every chapter in one language has a corresponding chapter in another language gives us the possibility to change the language from any chapter and land on that same chapter in the other language. Example: I am reading the "borrowing" chapter of the Rust book. I want to see that same chapter in French. I just select "French" from the dropdown button in the menu-bar and I will land on the French version of the chapter. DrawbacksWhen the Problems that could occur:
Content is not modified by the Another drawback is that I am not sure yet how translations will give a translation for the chapter titles in the sidebar ( One SUMMARY.md for EVERY languageLet's consider this directory structure:
As you can see here, every language has it's own There is absolutely no more guarantee that the French version contains the same chapters as the English version. No 1 to 1 mapping. Essentially every language is its own separate book, they could have exactly the same structure or they could have totally different chapters. There is no way for the program to know that. It is thus impossible to change the language from a chapter. You would have navigate to the French version manually and search for the chapter you were reading if it exists in the French version at all! AdvantagesTranslations have a lot more freedom, but this can also be seen as a drawback. Translations do not need to have the same structure, so when the DrawbacksThere is no guarantee that a chapter in one language as an equivalent in another language.(No 1 to 1 mapping) The program can not know what chapters are equivalent in the different languages and it would thus be impossible to change the language from a chapter to land on the same chapter in the other language. I hope this made it more clear, if there is still something you don't understand I can elaborate more on some specific area. EDIT: A little quote from a response I made on Rust's internals forum:
You can already group the multiple translations in one directory as different books each with it's own |
This comment has been minimized.
This comment has been minimized.
defuz
commented
Jan 14, 2016
Regarding Rust Book translation process, it is not disadvantages of some solution, but simply a fact. I think that the other projects that will use mdBook with multiple languages will have the same problem.
Can we make it simple and assume that the files with the same name in different languages are the same chapter? Then we can give the opportunity to switch to another language. I think this approach will satisfy both cases:
|
This comment has been minimized.
This comment has been minimized.
defuz
commented
Jan 14, 2016
|
Also, I don't like the idea that when I read the book in Russian, I'll see TOC in English. I think we should not assume that the reader is familiar enough with the language of original to understand the chapter titles. |
This comment has been minimized.
This comment has been minimized.
How would you handle that? On some pages you can change the language and on others not? That would be really confusing for users I think.
Of course that was not the plan, I just hadn't found a good solution for it yet so I didn't discuss it too much |
This comment has been minimized.
This comment has been minimized.
defuz
commented
Jan 14, 2016
Why not? We can clearly indicate that the translation for this chapter is not available yet. Another possible situation is that translation for some languages is available, but for other languages it's not. |
This comment has been minimized.
This comment has been minimized.
defuz
commented
Jan 14, 2016
|
Another example that I care about. Let's compare the structure of the section "Getting started" in the nightly and stable books. As you can see, Steve joined 4 chapters into one. Imagine that not all the language versions supported this change yet. If we have common TOC, this means that there is no possibility to open "Installing Rust", "Hello World" and "Hello Cargo" chapters in non-English version of book, because they do not exist in the original TOC anymore. |
This comment has been minimized.
This comment has been minimized.
|
Yes I totally agree with you! This would be a big problem. However I am not sure I want to settle with the solution Gitbook proposes either. Maybe we can come up with something better that combines all the advantages and none of the drawbacks? (even if it's a little more complex) Gitbook uses the "one I think you could already achieve something very similar with mdBook with multiple books and configuring the source and output directories according to what you want. The only difference is that Gitbook makes it just a little bit easier to setup. |
This comment has been minimized.
This comment has been minimized.
defuz
commented
Jan 14, 2016
|
My suggestion is to have "one SUMMARY.md per language", but support page-to-page cross-linking between the different languages. The easiest way to do this is to consider that the files with the same name are the same chapters. In 99% this should work. A more complex way to do this is to add some kind of identifier to each file (something like UUID). If the identifiers of the files are identical, we can cross-link them. |
This comment has been minimized.
This comment has been minimized.
|
Hmm yes that might be a good compromise. At least if the translations don't diverge to much from the original. I will try to think about this a little more and see if I can come up with other ideas. Thanks for the valuable input! :) |
azerupi
added this to the 0.1.0 milestone
Jan 21, 2016
This was referenced Dec 25, 2016
This comment has been minimized.
This comment has been minimized.
|
FWIW, there are tools to handle translations which I didn't see mentioned here yet. For example, crowdin is used (or was when I was involved) over at freecad for document translation of their wiki. It was noteworthy that when an update was made to an english file, the plugin would notify you that the other translations need to be updated for that specific section or they would be out of date. The page linked above actually lists how complete each language translation is and maintains that information. It is possible a tool like crowdin could just be added to the build process as a plugin which has been notified of which files require translating. Then it will maintain the database itself somewhere and you could tell mdbook where the translated files are located. A solution like this seems worth the time exploring before spending effort creating a new ground up approach to solve the same problem. EDIT: Also note they offer free support to open source projects |
azerupi
added
A-Internal-representation
S-Wishlist
T-Enhancement
and removed
Status: Wishlist
labels
May 16, 2017
azerupi
removed this from the 0.1.0 milestone
May 18, 2017
This comment has been minimized.
This comment has been minimized.
tyoc213
commented
Jun 24, 2017
|
For you information, what about single file for the source??? like
Well, just saying :) (I mean for example for making a book/tutorial with code examples it will be better to only have one source code but the explanation in different languages. And sure, switching between languages could be possible, and if there is no paragraph, show the default language of the document. |
This comment has been minimized.
This comment has been minimized.
sebras
commented
Aug 6, 2017
|
How about a So the rule would be:
Consider e.g. the case you mentioned above where the original English book combined several chapters into one (or conversely split one into many). In this case the English translation would need to update In the example above before the English original text combined its chapters, Do you think an approach like this is feasible and desirable? I'm eager to do a translation of the Rust book, so I'd like for mdbook to resolve this bug and support translations, hence I'm trying to help you make progress. :) |
This comment has been minimized.
This comment has been minimized.
|
Thank you for your input!
Unfortunately, I don't think this will work well in practice because there is a lot of overhead for the author of the original text. Every time the original texts diverge, the burden is on the the author to copy over the old summary to the translations before making a change. If he forgets, things will break, this seems very error prone. I am more in favour of having one summary per language, cross-link files with the same name. This approach is, in my opinion, simpler to understand and doesn't require any extra work when the original text and the translations diverge. I hope to make progress on this issue in the "near" future, we are slowly reworking parts of the internals to make it possible. |
This comment has been minimized.
This comment has been minimized.
sebras
commented
Aug 8, 2017
If there is one SUMMARY.md per language, what forces the files containing chapters to be named the same way in every language? I do agree about this design being less work for the original author of course. :) |
This comment has been minimized.
This comment has been minimized.
Nothing, it would be a convention. A translation would keep the same file structure and just modify the content of the files. If the translations diverge, you loose cross-linking but everything still works. I am open to alternative ideas, but I think we should go with something that has minimal friction. :) |
This comment has been minimized.
This comment has been minimized.
sebras
commented
Aug 8, 2017
That's a good point. Maybe mdBook can warn if this is the case?
Yes, I absolutly. I was worried was no progress because of lack of design discussion, hence my suggestion to try to help you decide. I don't know the mdBook code base (or rust) yet. :) |
This comment has been minimized.
This comment has been minimized.
sebasmagri
commented
Aug 19, 2017
|
HI! I'm probably going to reiterate on some already discussed topics but I'd still like to describe this case hoping it's useful to define the best mechanism for book translations in mdbook. So I've been trying to define a process we could recommend for a localisation team to tackle tasks such as The Rust Programming Language book translation. One of the things is how to integrate translated contents with the build output. For this specific case, and after having asked the docs team for feedback, it should be easier to handle all of the book contents independently in its own directory, including Another thing is how to link translated content in the output. It could be linked on a per document fashion by mapping translations using the exact file name, in which case we'd have folder structure enforcement, or it could be linked only on the front page, in which case translation would have complete freedom on the folder structure, and even the Tree/Table of Contents. In the latter case, the contents tree guidelines could be defined by maintainers but not enforced at all by the tooling. This two features or mechanisms, however, might not work for people wanting to use tools such as crowdin, transifex or weblate to manage their translations, which is probably more adequate for Software translation than for book translations. To support this case mdbook might need to generate a paragraph level mapping of translations and probably support output to any standard internationalization format such as gettext's PO files or L20N. I'm absolutely willing to dedicate some time to this feature since this could be one of the primary goals of the localisation team. So of course I'm completely open to any kind of feedback and collaboration so we can lay out a plan to implement this. Regards, |
This comment has been minimized.
This comment has been minimized.
|
Hi @sebasmagri Thank you for the input! I would love to work together with the concerned parties to end up with a strong design that is both useful for simple and more complex requirements. Currently, the design we are considering is the following: To make a book multi-lingual, you would have to add some information to the configuration file: [languages]
en = { name = "English", default = true }
fr = { name = "Français" }
# OR alternatively
# [languages.en]
# name = "English"
# default = true
#
# [languages.fr]
# name = "Français"For the example above, we would expect to have sub-folders in the We could imagine having an optional We also think it is better to have a For the HTML output, we consider cross-linking chapters from different languages based on the file structure. An English chapter called
This seems very complex? I am not very familiar with this issue but it seems to me that it would either require a lot of manual annotations for correct paragraph mapping or some heuristics. I would think this is (currently) out of scope for mdBook. Lets first focus on having basic but strong multi-lingual facilities and eventually expand from there. :) Does that correspond to the requirements of the localisation team? If there is anything I missed or there are additional requirements that haven't been considered, please feel free to post
That would be wonderful, I am particularly interested in the perspective of the Rust project on this issue because I think they will be the ones using this feature the most. |
steveklabnik
referenced this issue
Sep 14, 2017
Closed
Suggestion for simple translate system for this repo? #642
This comment has been minimized.
This comment has been minimized.
cauebs
commented
May 6, 2018
|
Just to resurface what @mattico said at #687
|
azerupi commentedJul 29, 2015
Add support for multiple languages.