Support ebooks and pdf export #88

mdinger · 2015-12-30T16:59:50Z

Gitbook supports export to ebooks and pdfs via calibre. This might be easy to hook into.

See also rust-lang/rust-by-example#684 for problems this implementation creates for rustbyexample.

azerupi · 2015-12-30T17:54:30Z

I would like to support pdf and ebook format. I think this could already be developed out of tree if you use the Renderer trait from mdBook.

I am not sure I want to depend on a full blown Gui tool though. There must surely be a better alternative for that.

mdinger · 2015-12-30T17:58:08Z

Not familiar with many conversion tools like this. Pandoc also seems like a plausible option. Don't know of any others.

azerupi · 2015-12-30T18:12:47Z

Yeah pandoc seems a lot better!

asolove · 2016-01-11T14:07:58Z

Did some exploration on this and seems doable. Here's the default epub version of the Rust book. Note the chapters out of order and links not working.

To get good output, I think we would need to:

parse the ToC to get the list of md files, in the right order
concat and transform the markdown files, replacing file links with internal links
match the themes with epub versions of the styles

I'm interested in working on this but will be a bit slow.

Useful info here: Pandoc commands and styling options

killercup · 2016-01-11T14:18:33Z

parse the ToC to get the list of md files, in the right order

concat and transform the markdown files, replacing file links with internal links

@asolove, I have implemented this (among other transformations) in https://github.com/killercup/trpl-ebook, feel free to use my code.

asolove · 2016-01-11T14:20:22Z

@killercup great, thanks!

azerupi · 2016-01-11T14:35:51Z

Great! Thanks for doing this :)

parse the ToC to get the list of md files, in the right order

This is already done in the Rust code, the MDBook struct can be iterated on. If you make a new Renderer you have access to that.

concat and transform the markdown files, replacing file links with internal links

Concatenating the markdown files is also not that hard, I do it for the print page.

Replacing the links could be a little trickier, what should internal links look like for pandoc? I know that pulldown-cmark gives you the ability to transform the parsed markdown events before rendering, but it's not well documented. Maybe link replacing is in it's capabilities.

Static files, like images, will probably also need some special treatment to be included correctly?

I'm interested in working on this but will be a bit slow.

That is absolutely no problem, there is no rush. ~~I will assign this issue to you so that others can see you are working on it.~~ (can't assign you). If you need any help, feel free to ask here :)

I am also planning on doing a big refactor (#90) to clean up and create a better API. For example, I am thinking about adding a way to poll the MDBook struct for specific chapters, etc. This would make it a lot more flexible for Renderers and if I end up doing something like #93. If you have suggestions or requests that might be relevant, post them in #90 so that I / we can brainstorm and come up with a good design :)

killercup · 2016-01-11T15:16:55Z

Replacing the links could be a little trickier, what should internal links look like for pandoc?

FIY, I'm doing some regex work to transform links relative to the doc.rust-lang.org domain and make reference link names unique for the combined markdown file.

azerupi · 2016-01-11T15:38:37Z

FIY, I'm doing some regex work to transform links relative to the doc.rust-lang.org domain

let cross_section_link = Regex::new(r"]\((?P<file>[\w-_]+)\.html\)").unwrap();
output = cross_section_link.replace_all(&output, r"](#sec--$file)");

let cross_section_ref = Regex::new(r"(?m)^\[(?P<id>.+)\]:\s(?P<file>[^:^/]+)\.html$").unwrap();
output = cross_section_ref.replace_all(&output, r"[$id]: #sec--$file");

let cross_subsection_link = Regex::new(r"]\((?P<file>[\w-_]+)\.html#(?P<subsection>[\w-_]+)\)").unwrap();
output = cross_subsection_link.replace_all(&output, r"](#$subsection)");

let cross_subsection_ref = Regex::new(r"(?m)^\[(?P<id>.+)\]:\s(?P<file>[^:^/]+)\.html#(?P<subsection>[\w-_]+)$").unwrap();
output = cross_subsection_ref.replace_all(&output, r"[$id]: #$subsection");

Thanks! Does pandoc auto-generate the anchors from the markdown files in those formats? like #sec--$file? Or is that also handled by your code?

killercup · 2016-01-11T15:54:10Z

@azerupi I'm pretty sure pandoc generates those. I've had problems before because pandoc generates slugs in a different way than rustdoc.

It should be possible to add a specific id to each header, though. The syntax is # Header Name {#header-name} IIRC.

You might also want to look at adjust_header_level.rs and adjust_reference_names.rs.

azerupi · 2016-01-11T16:08:15Z

Ok thanks for all the information, this will probably help @asolove a lot! :)

cetra3 · 2016-01-12T06:02:38Z

Not sure if this will help you guys, but I've created a simple rust tool which will collate multiple markdown files into one, resolving internal links and turning them into anchor links

We can use this in a pipeline on the way to converting to PDF:

mdcollate book-example/src/SUMMARY.md | pulldown-cmark > test.html && wkhtmltopdf test.html test.pdf

Code can be found here:
https://github.com/cetra3/mdcollate

Happy to accept any PRs

azerupi · 2016-01-12T11:12:25Z

@cetra3 That is really cool!
The plan is to make a "renderer" that does everything so that it can be used with the mdbook build command. So using a command line tool adds some complications. Have you thought about exposing the functionality as a crate?

I am not sure I would add a dependency just for that functionality, because there is always the possibility that it will not be maintained actively. But it could be considered if it offers enough useful methods that we wouldn't have to reinvent here.

mkpankov · 2016-01-12T21:45:56Z

I'm also sceptical about Calibre. We use it in Russian translation of TRPL and we've come along several problems with EPUB (links are to descriptions in Russian, for reference):

EPUB isn't displayed correctly and has broken links
EPUB has question marks instead of characters
EPUB has duplicated code blocks
I also vaguely remember we had to hack styles in order to get better PDF. Not sure if it's necessary or not with Pandoc

azerupi · 2016-01-12T21:59:54Z

Thanks for sharing your experience :)
We will see if pandoc has the same problems, but I think @killercup used it without too much / any problems?

I also vaguely remember we had to hack styles in order to get better PDF. Not sure if it's necessary or not with Pandoc

I am not sure how this is handled with Pandoc, but having a custom theme could be a good thing.

cetra3 · 2016-01-13T00:25:01Z

It's probably possible to wrap up those command line tools into a combined tool or expose it as a rust library. The last component (html to pdf) would need to use FFI as wkhtmltopdf is written in C. Not sure whether this adds too much dependency on externalities though.

The complication arises in that markdown is a superset of HTML which means that you need something that can present HTML in a printable fashion. In my experience with this problem, Pandoc and Calibre will do a subset, but you won't get full parity.

killercup · 2016-01-13T08:37:51Z

There are a few things to be aware of, but in general pandoc is really amazing at converting Markdown to LaTeX. Which is what you want, I think—it has some very nice features that you currently can't get with HTML-to-PDF converters. For example, my PDF versions of the Rust Book include cross-references like "This is a mutable variable binding (section 5, page 163)".

If you're no LaTeX wizard (I'm not), you might want to look at this template I threw together.

If you have any issues with this, just mention me.

azerupi · 2016-01-13T13:39:00Z

Thanks for all your help Pascal!
I will definitely look at what you have currently running and I am pretty sure we will end up stealing a lot of your code (if that is ok with you) 😉

gambhiro · 2016-08-08T14:57:57Z

+1 for the effort, I am looking forward to using mdbook to produce ebooks.

It seems to have stalled a bit, is anyone currently working on this?

azerupi · 2016-08-08T15:43:35Z

It seems to have stalled a bit, is anyone currently working on this?

Indeed, it has stalled a bit. In the last 6 months I have been overwhelmed with work at school 😕

I am (very) slowly working on the refactoring / clean-up that I wanted to do. And that work is probably going to change the way this specific feature is going to be implemented. Hopefully I will have some time in September to make significant progress on the internal rewrite so that I can work on new features again.

gambhiro · 2016-08-09T11:46:06Z

@azerupi How much space is there for discussing this feature? There are some specific things I would be looking for in a CLI ebook helper, but maybe you are already determined in which way to go.

Some time ago I wrote prophecy, a ruby gem to automate the tasks I needed when producing ebooks. This is and example of the output. It has been very useful for me, but I believe I am the only user :)

I have been wanting to rewrite it with some of the hindsight since its early days, but when I saw this I thought maybe mdbook would be able to produce the same results.

There is an asciinema recording to see to sort of things it does.

azerupi · 2016-08-09T12:05:28Z

I'm open to all ideas :)

d8aninja · 2019-02-26T15:31:21Z

One of the things that doesn't seem to be mentioned anywhere on this ticket is the ability to highlight the important bits. I have used a chrome extension called Hypothesis to do this until recently but a) chrome extension, ew b) its pretty sloppy about whether the highlights are saved under your personal view or public view (ie, in some cases you can see others' highlights) and c) I say recently because I'm pretty sure when the book gets updated, all my highlights and attached notes get deleted, too.

Anyway, just wanted to add my two cents and support for a PDF version to be released in parallel to the online book's update. I realize that must be a lot harder than many of us make it out to be, and you all are doing a wonderful job regardless of the format in which we are all consuming your work. Thanks!

XVilka · 2019-06-06T10:10:27Z

Seems that mdproof can be used for such a task

mkurnikov · 2019-10-12T15:57:56Z

Trying to open print.html page from the browser in Print format gives me out-of-memory, but this one works

google-chrome-stable --headless --print-to-pdf=rust_book.pdf https://doc.rust-lang.org/beta/book/print.html

Binlogo · 2020-12-12T04:52:17Z

Nice feature request.
Print the pdf with browser is a workaround way, the format looks nice already. 👍
Of course, if mdbook support export feature, it will be convenient for Continuous Integration.

heyakyra · 2021-01-22T15:15:02Z

if mdbook support export feature, it will be convenient for Continuous Integration.

Yes, I think this can be an important bit. Is this work useful? void-linux/void-docs#416

Huy-Ngo · 2021-04-21T14:03:56Z

I can't install mdbook-latex because it depends on harfbuzz_rs, which currently can't be compiled for a minor bug (harfbuzz/harfbuzz_rs#30). It's fixed but not released.

I think the maintainers would rather encourage user to use external plugins like mdbook-latex or mdbook-epub rather than internal implementation.

ildar · 2021-05-07T11:48:42Z

As an example on how I'd like a ebook to look like. Rust by example book: https://flibusta.is/b/619885/

Huy-Ngo · 2021-05-07T13:21:25Z

As an example on how I'd like a ebook to look like.
Rust by example book: https://flibusta.is/b/619885/

I see an almost blank page in Russian (and the text seems to say "The page is not found"). Which part of it do you mean you want an ebook to look like?

ildar · 2021-05-10T16:01:05Z

yes, sorry. The right URL is https://flibusta.is/b/619928

XVilka · 2021-09-22T04:01:41Z

Since there is zero interest to support that in mdBook, I recommend a relatively new framework to create books, more flexible that commonly known Bookdown - Quarto. It's pandoc-based, thus can export to basically anything. You can see their gallery for samples how such different formats and exports look like. It's quite actively developed as well.

aplatypus · 2021-11-07T13:45:37Z

@dustinmatlock ... I am afraid that I disagree:

Probably the easiest way: print at the top right and save as PDF. Of course, this doesn't solve the EPUB issue, but programming books are sometimes best viewed in PDF. The book's website looks great on mobile.

A PDF file is usually complete in that a link from the Table of Contents, when clicked would go to page #202 say. With a generated PDF file from a browser, clicking a hyperlink in the resulting PDF provieds this informative explaination ...

Firefox can’t establish a connection to the server at 127.0.0.1:3000.

    The site could be temporarily unavailable or too busy. Try again in a few moments. ...

An export function would at least support hyperlinks for Table of Contents, an Index and Footnotes. Sometimes it is useful for an exported file to link enternally.. In such cases I believe the PDF format specifies "internal" and "external" links.

Yes, a saved PDF is a usable solution to an off-line document, it is not a solution that I can save to a tablet and use when I'm out of touch with the internet, or late at night, etc.

Have a good one ...!

HollowMan6 · 2022-01-30T03:06:51Z

Hi all! I just created a mdBook backend named mdbook-pdf for generating PDF based on headless chrome and Chrome DevTools Protocol Page.printToPDF. It depends on Google Chrome / Microsoft Edge / Chromium. The generated page are pretty much alike the one you manually printed to PDF in your browser by opening print.html or mentioned here: #88 (comment) , but with customization of PDF paper orientation, scale of the webpage rendering, paper width and height, page margins, generated PDF page ranges, whether to display header and footer as well as customize their formats, and more, as well as automation. It supports all the platform where Google Chrome / Microsoft Edge / Chromium would work. You can check samples of the generated PDF files in the Artifacts here.

For the issue aplatypus just mentioned above by using this method #88 (comment) , I guess for those "internal" links inside the book, work should be done in the mdbook side for print.html referring here so that all the links linked "internally" would jump inside the generated print.html, as all the contents should already be on the print.html, there shouldn't be any hyperlinks that jump to other html files in the book. By resolving in this way, the generated PDF would also jump internally instead of opening a browser that won't connect to anything.

jacobmellin · 2022-06-10T22:36:07Z

Hi, I wrote a quick bash script to generate a PDF from mdBook markdown using pandoc and the Eisvogel Pandoc/LaTeX template. Maybe it is of help to someone:

#!/bin/sh
# This script converts mdBook markdown output into a pdf using Pandoc/LaTeX and
# the eisvogel pandoc template (https://github.com/Wandmalfarbe/pandoc-latex-template).
# By default, it assumes that the script is put in to a direct subfolder of your
# mdBook project, next to the eisvogel.latex file and your mdBook project root
# contains the book.toml, your markdown sources at ./src and the preprocessed markdown
# will be created in ./book/markdown. Your book.toml file needs to contain the line 
# [output.markdown]
# The path of the resulting pdf file will be ./book/pdf/output.pdf

# Directory that this script is in (e. g. subfolder of PROJECT_DIR)
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

# Project directory (contains book.toml, src folder and book output folder)
# Change this if your script is not inside a subfolder (e.g. 'scripts') of the project directory.
PROJECT_DIR="$( dirname $SCRIPT_DIR)"

# Pandoc LaTeX template
# This script works with the eisvogel-Template
# (https://github.com/Wandmalfarbe/pandoc-latex-template).
# If you want to use this template, please put the file
# eisvogel.latex in the same directory as this script
# (e.g. $PROJECT_DIR/scripts/eisvogel.latex).
TPL="$SCRIPT_DIR/eisvogel.latex"

# Build markdown
# Ensure that your book.toml contains the line
# [output.markdown]
mdbook build

# Make output and temp folders
mkdir -p $PROJECT_DIR/book/pdf
mkdir -p $PROJECT_DIR/book/markdown-temp/images

# Copy all images to a single directory
find $PROJECT_DIR/src -name \*.png -exec cp {} $PROJECT_DIR/book/markdown-temp/images \;

# Define output file path
OUTPUT_FILE=$PROJECT_DIR/book/markdown-temp/output.md

# Read meta information from book.toml
CONFIG_FILE_CONTENTS=$( < $PROJECT_DIR/book.toml )

[[ $CONFIG_FILE_CONTENTS =~ title\ +=\ +\"(.*)\" ]] \
    && DOCUMENT_TITLE=${BASH_REMATCH[1]}

[[ $CONFIG_FILE_CONTENTS =~ language\ =\ \"([a-z]*)\" ]] \
    && DOCUMENT_LANGUAGE=${BASH_REMATCH[1]}

[[ $CONFIG_FILE_CONTENTS =~ authors\ +=\ +(\[[^\]]+])\ * ]] \
    && DOCUMENT_AUTHORS=${BASH_REMATCH[1]}

# Write the document title and configuration to output file
cat > $OUTPUT_FILE<< EOF
---
title: ${DOCUMENT_TITLE}
author: ${DOCUMENT_AUTHORS}
date: "11.06.2022"
titlepage: true
fontsize: 10pt
logo: ""
logo-width: 110mm
toc: true
toc-own-page: true
keywords: [Markdown, Example]
...
EOF

# echo -e "# $DOCUMENT_TITLE\n" >> $OUTPUT_FILE

# Read SUMMARY.md, combine output titles and individual .md file contents
# into single output markdown file
while read line
do
    [[ $line =~ Summary ]] && continue
    # Write SUMMARY.md section titles to markdown file
    # [[ $line =~ ^\# ]] && echo -e "$line\n" >> $OUTPUT_FILE
    
    # Combine different markdown files, increasing the section level
    # for each headline
    # [[ $line =~ \((.*\.md)\) ]] \
        # && sed -e 's/^#/##/g' \
            # "$PROJECT_DIR/book/markdown/${BASH_REMATCH[1]}" \
                # >> $OUTPUT_FILE

    # Combine markdown files, leaving the section headings as they are
    [[ $line =~ \((.*\.md)\) ]] \
        && cat "$PROJECT_DIR/book/markdown/${BASH_REMATCH[1]}" \
            >> $OUTPUT_FILE
        echo -e "\n" >> $OUTPUT_FILE
done < $PROJECT_DIR/src/SUMMARY.md

# Do pandoc conversion of markdown
cd $PROJECT_DIR/book/markdown-temp
pandoc -w latex --template $TPL -o ../pdf/output.pdf output.md --number-sections -V lang=$DOCUMENT_LANGUAGE

hoijui · 2022-06-11T05:12:59Z

The main issue when creating a PDF from mdbook sources, is that the Markdown sources are a tree, potentially/likely randomly interlinked (just like HTML, which makes the conversion trivial), while a PDF is a single, linear document.
The main thing to be done there, is to get from a tree to a single, linear document. This can be seen in @jacobmellin s script, for example.
I solved this in MoVeDo (a set of scripts abstracting over multiple tools that take MD sources and produce HTML and/or PDF),
also with a BASH script, considering a few more of the issues (probably not all of them either, though). The script doing this is called linearize. It uses Pandoc filters to shift header levels, it removes individual files front matters, extracts the titles from there and adds them as headers, prepends header-ids with their sanitized source-file path, and rewrites internal links (links between the MD source files), so they still work within the resulting, single MD file, and maybe one or two additional small things. It uses a file called doc.yml as the FrontMatter for the resulting doc.md.
The script relies on other scripts and filters inside MoVeDo, but it is by far the most interesting/useful part of the whole piece of software (Maybe the only useful part at all, for anyone but myself). It should probably be extracted/made stand-alone some day.

jacobmellin · 2022-06-11T20:02:46Z

@hoijui Very nice project, I'll definitely check it out.

aplatypus · 2022-09-07T00:06:10Z

@hoijui ... You could look to the open source Okular tool to see how they load a MD document and render it as PDF.

hoijui · 2022-12-08T18:25:29Z

@aplatypus As I wrote before, the issue is not how to render a single MD file as PDF, that is trivial and possible with many tools and libraries. The issue is, how to convert a tree of Markdown files/documents into a single Markdown file.

ourongxing · 2022-12-20T07:14:29Z

Try to use https://github.com/busiyiworld/web-printer/tree/main/packages/mdbook

HollowMan6 · 2022-12-27T20:30:45Z

mdbook-pdf now supports Table of Content, see: HollowMan6/mdbook-pdf#1 (comment)

Hi all! I just created a mdBook backend named mdbook-pdf for generating PDF based on headless chrome and Chrome DevTools Protocol Page.printToPDF. It depends on Google Chrome / Microsoft Edge / Chromium. The generated page are pretty much alike the one you manually printed to PDF in your browser by opening print.html or mentioned here: #88 (comment) , but with customization of PDF paper orientation, scale of the webpage rendering, paper width and height, page margins, generated PDF page ranges, whether to display header and footer as well as customize their formats, and more, as well as automation. It supports all the platform where Google Chrome / Microsoft Edge / Chromium would work. You can check samples of the generated PDF files in the Artifacts here.

For the issue aplatypus just mentioned above by using this method #88 (comment) , I guess for those "internal" links inside the book, work should be done in the mdbook side for print.html referring here so that all the links linked "internally" would jump inside the generated print.html, as all the contents should already be on the print.html, there shouldn't be any hyperlinks that jump to other html files in the book. By resolving in this way, the generated PDF would also jump internally instead of opening a browser that won't connect to anything.

LegNeato · 2023-11-29T21:07:46Z

If you are looking for pdf output, check out the project I just posted in #815 (comment)

max-heller · 2023-12-23T18:16:24Z

I built mdbook-pandoc, a backend powered by Pandoc. Pandoc is quite mature and supports many output formats, including PDF (I've mainly tested LaTeX) and EPUB. Sample rendered PDF books are here.

azerupi added Feature Request labels Dec 30, 2015

azerupi added Status: Wishlist and removed Status: Feature Request labels Jan 9, 2016

asolove mentioned this issue Jan 11, 2016

doc: provide epub download of The Book rust-lang/rust#20866

Closed

asolove mentioned this issue Jan 12, 2016

WIP on ePub #94

Closed

5 tasks

azerupi added Status: Claimed and removed Status: Wishlist labels Jan 12, 2016

adamgreen mentioned this issue Feb 15, 2019

epub/mobi/pdf version rust-embedded/book#132

Open

XVilka mentioned this issue Jun 6, 2019

Make a pdf version of the std api reference. rust-lang/rust#60577

Closed

heyakyra mentioned this issue Apr 29, 2020

Exporting to PDF and ePub #815

Open

ericonr mentioned this issue Jul 23, 2020

mdbook-latex: Add mdBook backend script. void-linux/void-docs#416

Merged

avivace mentioned this issue Apr 18, 2021

PDF export gbadev-org/gbadoc#10

Open

XVilka mentioned this issue Jun 1, 2021

Consider switching from mdBook to Bookdown rizinorg/book#60

Closed

XVilka mentioned this issue Oct 11, 2021

RFC: reorganizing docs leanprover/lean4#717

Open

HollowMan6 mentioned this issue Jan 30, 2022

Convert all the links in the generated print.html for linking inside the book into URL fragment form #1736

Open

atc0005 mentioned this issue Jun 5, 2023

EPUB export rust-lang/book#3669

Closed

Support ebooks and pdf export #88

Support ebooks and pdf export #88

Comments

mdinger commented Dec 30, 2015

azerupi commented Dec 30, 2015

mdinger commented Dec 30, 2015

azerupi commented Dec 30, 2015

asolove commented Jan 11, 2016

killercup commented Jan 11, 2016

asolove commented Jan 11, 2016

azerupi commented Jan 11, 2016

killercup commented Jan 11, 2016

azerupi commented Jan 11, 2016

killercup commented Jan 11, 2016

azerupi commented Jan 11, 2016

cetra3 commented Jan 12, 2016

azerupi commented Jan 12, 2016

mkpankov commented Jan 12, 2016

azerupi commented Jan 12, 2016

cetra3 commented Jan 13, 2016

killercup commented Jan 13, 2016

azerupi commented Jan 13, 2016

gambhiro commented Aug 8, 2016

azerupi commented Aug 8, 2016

gambhiro commented Aug 9, 2016

azerupi commented Aug 9, 2016

d8aninja commented Feb 26, 2019 • edited Loading

XVilka commented Jun 6, 2019

mkurnikov commented Oct 12, 2019 • edited Loading

Binlogo commented Dec 12, 2020

heyakyra commented Jan 22, 2021

Huy-Ngo commented Apr 21, 2021

ildar commented May 7, 2021 via email

Huy-Ngo commented May 7, 2021 • edited Loading

ildar commented May 10, 2021 via email

XVilka commented Sep 22, 2021

aplatypus commented Nov 7, 2021

HollowMan6 commented Jan 30, 2022

jacobmellin commented Jun 10, 2022 • edited Loading

hoijui commented Jun 11, 2022

jacobmellin commented Jun 11, 2022

aplatypus commented Sep 7, 2022

hoijui commented Dec 8, 2022

ourongxing commented Dec 20, 2022

HollowMan6 commented Dec 27, 2022

LegNeato commented Nov 29, 2023

max-heller commented Dec 23, 2023

d8aninja commented Feb 26, 2019 •

edited

Loading

mkurnikov commented Oct 12, 2019 •

edited

Loading

Huy-Ngo commented May 7, 2021 •

edited

Loading

jacobmellin commented Jun 10, 2022 •

edited

Loading