[for review] multi-language mdbook #200

gambhiro · 2017-01-13T17:36:19Z

PR 200

See a demo: Alice's Adventures in Wonderland, in three languages with mdbook. Chapter excerpts for testing. Markdown sources here.

Use the gambhiro/mdbook/multilang branch for compiling or reading the code.

This PR is for reviewing and discussing a large refactoring which involves:

Add multilingual support #5 (multi-lingual support)
[Discussion] Book representation #146 (book representation)
[Discussion] New renderers design #149 (separate renderer design)
[Discussion] Change configuration file to toml instead of json #96 (TOML format for config)
Add an empty, but included last, custom.css file #178 (custom.css)

This PR incorporates PR #147 (new book struct), as I started by merging that and working from there.

This also prepares the ground for #88 (ebooks), which after this can be implemented as another renderer.

Does everything still work?

Let's compose a list of user-level features that I can test and debug:

the Rust Book should build
CLI commands
single-language book
multi-language book
any more?

Rust book

Move src/img to assets/img for the images to be copied. Paths in the Markdown source don't have to change.

CLI commands

init

SUMMARY.md is parsed and missing chapter files are created
.gitignore is created on confirmation

build

the html builds
custom template is used if found in assets/_html-template
book assets are copied if found in assets

watch

making a change in a chapter rebuilds the book

serve

book is served on :3000
making a change in a chapter updates the page in the browser

test

chapter files are tested with rustdoc

Single language book

See src/tests/book-minimal

Works as expected as far as I can tell.

Multi-language book

See src/tests/book-wonderland-multilang

Works as expected as far as I can tell.

Features

Renderer

Renderer is a trait expecting two functions, .build() and .render():

pub trait Renderer {

    /// When the output format is determined (by a CLI argument for example),
    /// call `.build()` of the selected Renderer implementation.
    ///
    /// Constructs an `MDBook` struct given the path of the book project,
    /// preparing the project and calling `render()`, doing what is necessary
    /// for the particular output format.
    ///
    /// This involves parsing config options from `book.toml` and parsing the
    /// `SUMMARY.md` of each translation to a nested `Vec<TocItem>`.
    ///
    /// Finally it calls `render()` to process the chapters and static assets.
    fn build(&self, project_root: &PathBuf) -> Result<(), Box<Error>>;

    /// Responsible for rendering the chapters and copying static assets.
    fn render(&self, book_project: &MDBook) -> Result<(), Box<Error>>;

}

The general idea is that a data structure (MDBook) is constructed by parsing
the book's files, and this is given to the renderer's .render() function which
does whatever it needs to do with it to produce its output format.

.build() is responsible for calling the necessary functions to construct
MDBook, and .render() is responsible for writing the output based on that
MDBook.

So MDBook is not responsible to rendering or writing anything, but it should
represent all the information that the renderer might need to write all its
output files.

MDBook has a .render_intent attribute that is a descriptive enum which
internal functions can inspect if they need to make decisions based on what
output format has been selected.

pub struct MDBook {
    ...
    /// Informs other functions which renderer has been selected, either by
    /// default or CLI argument.
    render_intent: RenderIntent,
    ...
}

pub enum RenderIntent {
    HtmlHandlebars,
}

Static assets

The application's static assets are embedded in the binary from data/ using includedir.

The book's static assets are expected in assets/. Everything will be copied to the book's output folder, except for folders which start with underscore (i.e. user might have _sass or _html-template and so on).

chapter TOML headers

TOML headers can be added at the beginning of a chapter file, these will be parsed into the attributes of the Chapter struct. See babel.md.

+++
title = "The Library of Babel"
author = "Jorge Luis Borges"
translator = "James E. Irby"
+++

# Babel

The universe (which others call the Library) is composed of an indefinite and
perhaps infinite number of hexagonal galleries, with vast air shafts between,
surrounded by very low railings. From any of the hexagons one can see,
interminably, the upper and lower floors.

Multi-language books

See src/tests/book-wonderland-multilang

translation cross-linking

Automatic chapter-to-chapter linking is implemented with incrementally trying harder to find a translation, but never refusing to build the book. I.e. something should always happen, whatever the author does, and more and more should happen as they keep working on their content.

links to the top-level index pages of the translations are displayed above the TOC in the sidebar
chapter translations are displayed in the title bar when the application can find a translation, otherwise it displays a grayed-out text of the language code

Buidling on the ideas described in #5, finding a translation works step by step this way:

taking the manual links are given in the TOML header (see alice/rabbit-hole)
finding a match by a specific chapter.translation_id string given in the TOML header (see alice/long-tale)
finding a match by chapter.src_path (see alice/tears)
finding a match by section number, if the TOC is structurally the same, checking by counting the number of sections

This covers the following scenarios:

If the translator copy-pastes the SUMMARY.md of the original, changing only the titles, translations will be identified by the .src_path.

If they rename the file names too, so that the URLs also are in the target language, but the TOC keeps the same sectioning structure, translations will be identified by section numbers.

If they change the sectioning structure too, they can insert a chapter.translation_id string in the original and the translation. Any string, not necessarily an UUID. This would maintain cross-links when the original changes file name.

If nothing else, they can provide the translation link directly in the TOML header. This breaks when the target file changes file name.

It has to be kept in mind that translations are projects on their own, and can even present the same material in a different structure than the original. The original and its translation are likely to be at different stages at various times. In addition, people have different workflows, they don't necessarily work in a one-two-three disciplined way either.

book.toml

The main language is recognized as the first given in the TOML. Otherwise it has to be marked with is_main_book = true.

The language code will always be the translation key, the language name can be set optionally.

[[translations.en]]
title = "Alice's Adventures in Wonderland"
author = "Lewis Carroll"

[[translations.fr]]
title = "Alice au pays des merveilles"
author = "Lewis Carroll"
language_name = "Français"

[[translations.hu]]
title = "Alice Csodaországban"
author = "Lewis Carroll"

folder layout

book-wonderland-multilang
├── book.toml
├── assets
│  └── images
│     ├── Queen.jpg
│     ├── Rabbit.png
│     ├── Tail.png
│     └── Tears.png
└── src
   ├── en
   │  ├── SUMMARY.md
   │  ├── long-tale.md
   │  ├── rabbit-hole.md
   │  ├── tears.md
   │  └── titlepage.md
   ├── fr
   │  ├── SUMMARY.md
   │  ├── cocasse.md
   │  ├── larmes.md
   │  ├── terrier.md
   │  └── titre.md
   └── hu
      ├── SUMMARY.md
      ├── cimoldal.md
      ├── konnyto.md
      ├── nyuszi.md
      └── tarka-farka.md

Documentation

update documentation in the code
update documentation in book-example

Structs

Catching up

The last common commit with master was 8a178e3, I will have to catch up with the updates since then.

catch up with updates
resolve conflicts with master

…tion

…ruct

parses toc and chapters renders html hbs helpers and asset embedding copy static assets by pattern review fix prev nav link copy local assets when found multilang renders is_multilang as property theme is template cli init and build bump version structs diagram

rename with _ translation links css for translation links

steveklabnik · 2017-01-13T17:51:03Z

Oh neat! I will dig into the technical details of this later.

One comment about the language switcher: it always goes back to the index, rather than switching the page that you're on. Is this intended?

gambhiro · 2017-01-13T17:55:27Z

@steveklabnik For now the translation just links to the index page of each translation,
except in this case where the linking is defined in a TOML header of the
markdown chapter.

gambhiro · 2017-01-21T21:17:01Z

Onward to glory. At this point it's over to you. No rush, take your time. I think this is now a decent update. I'm sure there will be bugs but I think I used the binary from this branch enough to catch the obvious ones.

Conflicts with master resolved, docs updated, tests are passing with the advanced method of ignoring the failing ones. I'd rather describe them in a separate issue then struggling with them here.

On linux i686 something is wrong with ci/script.sh, and on Windows it's related to the path separator. It might even be just the way the test is written.

azerupi · 2017-02-07T15:03:23Z

I am going to go through a couple of topics, but since this is a huuuge change, I am not going to go through every detail.

Renderer

pub trait Renderer {
    fn build(&self, project_root: &PathBuf) -> Result<(), Box<Error>>;

    fn render(&self, book_project: &MDBook) -> Result<(), Box<Error>>;

}

The renderer should not launch the build and construct MDBook, it should rather happen the other way around. MDBook should be the "command center" where you launch the build, it will then call the renderer(s) to render the book to a specific format.

The user may specify multiple rendering targets in the configuration file:

# "outputs" is a table where each entry is the identifier of a renderer
# containing the configuration options for that renderer
[output]
html = { path = "book/" }
pdf = { path = "pdf/mdBook.pdf" }
# OR alternatively
# [output.html]
# path = "book/"
#
# [output.pdf]
# path = "pdf/mdBook.pdf"

In that case, mdBook should render to all specified targets when running build. If the renderer initiates the build, then every renderer will individually create an MDBook struct, which results in redundant work.

Also, mdBook should allow to be used programatically. This means that the MDBook struct can/could be created by an alternative means than parsing the summary file. When rendering, you don't want that struct to be destroyed.

So, the way I see it, the renderer should take an (immutable?) reference to the MDBook struct and perform it's task with that.

I had some vague idea for a hypothetical feature that I may want to implement in the future:
A web interface where the user can modify his book directly from the browser and get "instant feedback" on how it would be rendered. But for this to work, the renderer should only be a small link in a chain and not the overarching, all-powerful construct.

Multi-language

From your examples, it seems that there is no "primary language", but in my opinion it is better to make a distinction between the primary language and the translations.

The primary language will be used as the landing page and translated chapters can then be mapped to the ones of the primary language.

[languages]
en = { name = "English", default = true }
fr = { name = "Français" }
# OR alternatively
# [languages.en]
# name = "English"
# default = true
#
# [languages.fr]
# name = "Français"

Others

I really like the chapter TOML headers, a lot more than YAML headers. The only thing I would add is a way to escape it (if it's not already supported).

Way Forward

This is a huge change, it touches almost the whole codebase and unfortunately I can't review this as a whole. I have been trying to go through it for the last couple of days but there is just too much. 😕

I don't want to discourage you, because I really want to merge this, but the only way I see this moving forward is if it is split into multiple smaller PRs and merged incrementally.

There are a couple of design decisions that I would like to see changed, the two points above are the most important ones. There are also a couple of small changes I would avoid merging and others would require some modifications or improvements.

So considering all this, I think it is better to incrementally add the changes by making smaller PRs focused on one aspect that can be more easily reviewed and discussed. It will be more gratifying for both of us to see small parts of this PR get merged one at the time instead of the whoel thing being blocked by a couple of issues.

I now have time to work on it too, so let's coordinate our efforts!

gambhiro · 2017-02-08T18:47:25Z

Re: Renderer and MDBook.

When the user calls mdbook build, a few general things have to happen in any
case, not necessarily in this order:

a) figure out the render format(s)
b) call the appropriate renderer(s)
c) construct general representation (chapters, etc.) of the book from the input sources
d) construct any further data specific to the renderer (ebook metadata, etc.)
e) write out the target format's files.

So you always have this separation between a representation you build from the
input files, and the process of creating files in the specific manner for the
target format.

I did notice that the renderer function was originally attached to the MDBook
struct, but this got confusing and difficult to work with. The borrow checker
can't inspect the renderer. You can't derive debug. You can't construct a simple
MDBook for testing.

So I gave up on that and treated the MDBook as purely the representation of the
book, which then you can pass around mutably for internal construction (such as
TOC building) or immutably for producing files from (rendering).

This made it much easier to think about how to implement features. The MDBook is
data, give it to a function that does some work.

The Renderer trait is separated to build() and render() because it makes things
clear to encapsulate what sequence of function call prepare the MDBook as data
before we are ready to write output files from it. This is easier to write test
for as well.

Re: multiple rendering targets.

There is no problem there, since the MDBook is just data, you don't have to
start figuring out how multiple kinds of format-specific behaviours should
happen.

The config example you gave above can work that way too. Keep in mind you don't
have to do everything with MDBook's functions. The behaviour logic gets very
complicated to think that way.

The cli command function is the closest to the user's interaction, determine any
and all rendering targets there and call build() and render(). The cli
command is in the best position to figure out the user's intention from cli
arguments and parsing specific sections of the book.toml.

every renderer will individually create an MDBook struct, which results in redundant work.

This is a benefit, as target formats have different construction procedures.

For every target format, it is very specific what data needs to be prepared
before you are ready to write files. There may be format specific structs which
the rendere wants to construct as well, such as metadata for an ebook or
publisher data for PDF.

The renderer's build() encapsulates that. It is good that every renderer
should build its MDBook and other supporting structs.

Also, mdBook should allow to be used programatically.

This is fine. The tests are written like that.

A web interface where the user can modify his book directly from the browser
and get "instant feedback" on how it would be rendered. But for this to work,
the renderer should only be a small link in a chain and not the overarching,
all-powerful construct.

Imagine how the pieces connect. MDBook is just data. The web server has access
to an MDBook which it's API calls can manipulate. You serialize the MDBook to
JSON and the frontend requests it with a GET call. Say if it's Clojurescript,
construct an atom from it. User types his book in the frontend, Clojurescript
updates the atom, the changes are rendered realtime in the window.

You don't have to talk to the server until you are saving files, and then you
serialize the atom to JSON, put it in a POST request to the server and it will
update the necessarily files on the disk.

I can recommend these to whet your appetite:

Reagent: Minimalistic React for ClojureScript

reagent-template

Lambda Days 2015 - Norbert Wójtowicz - ClojureScript + React.js

ClojureScript Release - Rich Hickey

Interactive programming Flappy Bird in ClojureScript

Clojurescript had an enormous creative effect on me, really enjoyed building UI with it, finally I could stop thinking in Javascript.

Other ppl like Vue.js, I suppose it's a matter of temperament or something.

The only thing you should stay away from is Angular, don't believe the marketing.
It was good in it's time but the world moved on.

Re: multi-language

The first language in the book.toml is taken as the main language, unless the
user specifically sets is_main_book = true on a different one. At the moment
it affect which index page is the root landing page for a multi-language book.

So the information is parsed, other features can do logic with it.

Re: escaping TOML

What you mean is a TOML block between +++ lines in the middle of the page?

If the block has anything else than whitespace before it (i.e. not the first
thing in a chapter file), it is not taken as chapter attributes and not parsed.

Re: small PRs

I don't see how that would work or why is it necessary. You can't move this in a
piece-by-piece fashion, it wouldn't even compile.

It is backwards compatible as far as CLI users are concerned. The can update and
their book should build just the same.

Test whether the user-visible features work as intended, and if that works, then
I don't see the blocking factor.

gambhiro · 2017-02-17T07:17:32Z

Hm, what should we do about this?

It's accumulating conflicts, and I can resolve that, but beyond that?

steveklabnik · 2017-02-19T01:08:51Z

I would like to try and check it out this coming week.

I share @azerupi 's feeling in general, smaller, incremental PRs are better. I haven't actually looked at the diff here yet, so I can't say specifically, but this is just always true for any open source.

gambhiro · 2017-02-19T04:50:23Z

Yes, I realise that small PRs are easier to review, but this changes the main structs in ways that doesn't really offer a step-by-step progression. There is one big refactoring at the beginning and then the additions are fairly incremental.

It is perhaps easier to review by scrolling through the files in the source, rather than using the PR diff.

gambhiro · 2017-02-19T06:46:31Z

@azerupi @steveklabnik I added you both as collaborators to the PR's repo, if you see something you wish to change directly, by all means go ahead and commit it, I'm mostly interested in that the features work, and maybe you see better ways of doing it. Remember the PR is in the multilang branch.

gambhiro · 2017-04-23T07:37:11Z

How do you feel about this?

Maybe this multilang business is something cool, but not necessarily a goal for mdbook as a tool.

I know that in the next few months I won't have time to contribute much to this PR.

So how about cancelling this? It is a loose end now with no apparent direction.

This little PR allowed me to learn Rust, it was much easier to get into the language with your encouragement. Thank you for the time you spent on feedback and discussion, it kept me motivated.

So I don't mind if you'd rather cancel this and approach the problem at another time. It has been an excellent experience for me, and the valuable results are often not the tangible ones.

steveklabnik · 2017-04-24T07:11:26Z

How do you feel about this?

I mostly feel still terrible that I have not had the time to truly dig into a PR this big.

Maybe this multilang business is something cool, but not necessarily a goal for mdbook as a tool.

multilang is something we need, generally. That is, at least for The Rust Programming Language, we want this feature. It's just hard to review something so big, and it'll probably be a few more weeks before I can really spend the time to do things other than fix small issues in mdbook.

So I don't mind if you'd rather cancel this and approach the problem at another time. It has been an excellent experience for me, and the valuable results are often not the tangible ones.

Let's close it for now, then, as that sounds like the best thing to do. When I start to really dig into this problem, I may base it off of this work, we'll see.

Sorry that this has dragged on. I'm glad you got some stuff out of it. Open source is tough. ❤️

azerupi and others added 27 commits June 28, 2016 16:38

Add structs holding metadata for the books

ebd075a

Add a new Chapter struct for the new Book struct

e584858

Add the new book struct

c69161c

Derive Clone and Debug for Chapter and Book

d664618

Include the new Book struct in a hashmap alongside the old representa…

3e0dca5

…tion

Add structs holding metadata for the books

7862dc0

Add a new Chapter struct for the new Book struct

9f99ba2

Add the new book struct

42909cf

Derive Clone and Debug for Chapter and Book

99e8082

Include the new Book struct in a hashmap alongside the old representa…

efc2644

…tion

frontmatter, mainmatter, backmatter

9189053

preparing structs

bfdc70c

Merge branch 'book-struct' of github.com:gambhiro/mdBook into book-st…

5525cd4

…ruct

docs for refactoring design

7edee42

plan for reorganizing structs

852fc1d

Merge remote-tracking branch 'origin/book-struct' into ebooks

44b03bd

Merge remote-tracking branch 'origin/parse-toml' into ebooks

2cae064

Merge remote-tracking branch 'origin/docs' into ebooks

204a878

multilang

e7ba6fa

parses toc and chapters renders html hbs helpers and asset embedding copy static assets by pattern review fix prev nav link copy local assets when found multilang renders is_multilang as property theme is template cli init and build bump version structs diagram

translation links

c021940

rename with _ translation links css for translation links

watch and serve are back

3aa8f7d

remove todo comment

0532295

look in different paths for page template

2a033de

render markdown in make_data()

5ca380f

write print.html

da9c57e

chapter.content, .src_path, .dest_path

0713d49

upd structs diagram

a1375b1

gambhiro mentioned this pull request Jan 13, 2017

Translate this book to Viet Nam version rust-lang/book#375

Open

Luthaf mentioned this pull request Jan 16, 2017

Support ebooks and pdf export #88

Open

Gambhiro added 7 commits January 18, 2017 07:23

cli test command is back

eb5a9b9

apply highlightjs style by class on body

27721c2

upd test assets

a8021eb

clean output folder without removing dotfiles

cb861a7

more simple language declaration in book.toml

8441bb9

update docs in the source

4fe0e8d

link in custom.css if the user has it

ca718b3

gambhiro mentioned this pull request Jan 18, 2017

Add an empty, but included last, custom.css file #178

Closed

Gambhiro added 8 commits January 19, 2017 17:08

catch up with changes in master

c2ff0c6

resolve conflicts

a7666e7

not testing on stable channel for now

05ec8ae

make tests windows compatible

0bebd9e

update documentation in book-example

0a5efcc

only whitespace before chapter toml header

6213943

upd

78505ff

ignore failing tests

b162d04

steveklabnik closed this Apr 24, 2017

steveklabnik mentioned this pull request Apr 24, 2017

[WIP] New book struct #147

Closed

funkill mentioned this pull request Jul 28, 2019

i18n question rust-lang/async-book#26

Open

lauraluebbert mentioned this pull request Aug 15, 2023

Multilingual mdbook structure #2167

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[for review] multi-language mdbook #200

[for review] multi-language mdbook #200

gambhiro commented Jan 13, 2017 •

edited

steveklabnik commented Jan 13, 2017

gambhiro commented Jan 13, 2017

gambhiro commented Jan 21, 2017

azerupi commented Feb 7, 2017

gambhiro commented Feb 8, 2017

gambhiro commented Feb 17, 2017

steveklabnik commented Feb 19, 2017

gambhiro commented Feb 19, 2017

gambhiro commented Feb 19, 2017

gambhiro commented Apr 23, 2017

steveklabnik commented Apr 24, 2017

[for review] multi-language mdbook #200

[for review] multi-language mdbook #200

Conversation

gambhiro commented Jan 13, 2017 • edited

Does everything still work?

Features

Renderer

Static assets

chapter TOML headers

Multi-language books

translation cross-linking

book.toml

folder layout

Documentation

Structs

Catching up

steveklabnik commented Jan 13, 2017

gambhiro commented Jan 13, 2017

gambhiro commented Jan 21, 2017

azerupi commented Feb 7, 2017

Renderer

Multi-language

Others

Way Forward

gambhiro commented Feb 8, 2017

gambhiro commented Feb 17, 2017

steveklabnik commented Feb 19, 2017

gambhiro commented Feb 19, 2017

gambhiro commented Feb 19, 2017

gambhiro commented Apr 23, 2017

steveklabnik commented Apr 24, 2017

gambhiro commented Jan 13, 2017 •

edited