New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ebooks and pdf export #88

Open
mdinger opened this Issue Dec 30, 2015 · 47 comments

Comments

Projects
None yet
@mdinger
Copy link
Contributor

mdinger commented Dec 30, 2015

Gitbook supports export to ebooks and pdfs via calibre. This might be easy to hook into.

See also rust-lang/rust-by-example#684 for problems this implementation creates for rustbyexample.

@azerupi

This comment has been minimized.

Copy link
Collaborator

azerupi commented Dec 30, 2015

I would like to support pdf and ebook format. I think this could already be developed out of tree if you use the Renderer trait from mdBook.

I am not sure I want to depend on a full blown Gui tool though. There must surely be a better alternative for that.

@mdinger

This comment has been minimized.

Copy link
Contributor

mdinger commented Dec 30, 2015

Not familiar with many conversion tools like this. Pandoc also seems like a plausible option. Don't know of any others.

@azerupi

This comment has been minimized.

Copy link
Collaborator

azerupi commented Dec 30, 2015

Yeah pandoc seems a lot better!

@asolove

This comment has been minimized.

Copy link
Contributor

asolove commented Jan 11, 2016

Did some exploration on this and seems doable. Here's the default epub version of the Rust book. Note the chapters out of order and links not working.

To get good output, I think we would need to:

  • parse the ToC to get the list of md files, in the right order
  • concat and transform the markdown files, replacing file links with internal links
  • match the themes with epub versions of the styles

I'm interested in working on this but will be a bit slow.

Useful info here: Pandoc commands and styling options

@killercup

This comment has been minimized.

Copy link

killercup commented Jan 11, 2016

  • parse the ToC to get the list of md files, in the right order
  • concat and transform the markdown files, replacing file links with internal links

@asolove, I have implemented this (among other transformations) in https://github.com/killercup/trpl-ebook, feel free to use my code.

@asolove

This comment has been minimized.

Copy link
Contributor

asolove commented Jan 11, 2016

@killercup great, thanks!

@azerupi

This comment has been minimized.

Copy link
Collaborator

azerupi commented Jan 11, 2016

Great! Thanks for doing this :)

parse the ToC to get the list of md files, in the right order

This is already done in the Rust code, the MDBook struct can be iterated on. If you make a new Renderer you have access to that.

concat and transform the markdown files, replacing file links with internal links

Concatenating the markdown files is also not that hard, I do it for the print page.

Replacing the links could be a little trickier, what should internal links look like for pandoc? I know that pulldown-cmark gives you the ability to transform the parsed markdown events before rendering, but it's not well documented. Maybe link replacing is in it's capabilities.

Static files, like images, will probably also need some special treatment to be included correctly?


I'm interested in working on this but will be a bit slow.

That is absolutely no problem, there is no rush. I will assign this issue to you so that others can see you are working on it. (can't assign you). If you need any help, feel free to ask here :)

I am also planning on doing a big refactor (#90) to clean up and create a better API. For example, I am thinking about adding a way to poll the MDBook struct for specific chapters, etc. This would make it a lot more flexible for Renderers and if I end up doing something like #93. If you have suggestions or requests that might be relevant, post them in #90 so that I / we can brainstorm and come up with a good design :)

@killercup

This comment has been minimized.

Copy link

killercup commented Jan 11, 2016

Replacing the links could be a little trickier, what should internal links look like for pandoc?

FIY, I'm doing some regex work to transform links relative to the doc.rust-lang.org domain and make reference link names unique for the combined markdown file.

@azerupi

This comment has been minimized.

Copy link
Collaborator

azerupi commented Jan 11, 2016

FIY, I'm doing some regex work to transform links relative to the doc.rust-lang.org domain

let cross_section_link = Regex::new(r"]\((?P<file>[\w-_]+)\.html\)").unwrap();
output = cross_section_link.replace_all(&output, r"](#sec--$file)");

let cross_section_ref = Regex::new(r"(?m)^\[(?P<id>.+)\]:\s(?P<file>[^:^/]+)\.html$").unwrap();
output = cross_section_ref.replace_all(&output, r"[$id]: #sec--$file");

let cross_subsection_link = Regex::new(r"]\((?P<file>[\w-_]+)\.html#(?P<subsection>[\w-_]+)\)").unwrap();
output = cross_subsection_link.replace_all(&output, r"](#$subsection)");

let cross_subsection_ref = Regex::new(r"(?m)^\[(?P<id>.+)\]:\s(?P<file>[^:^/]+)\.html#(?P<subsection>[\w-_]+)$").unwrap();
output = cross_subsection_ref.replace_all(&output, r"[$id]: #$subsection");

Thanks! Does pandoc auto-generate the anchors from the markdown files in those formats? like #sec--$file? Or is that also handled by your code?

@killercup

This comment has been minimized.

Copy link

killercup commented Jan 11, 2016

@azerupi I'm pretty sure pandoc generates those. I've had problems before because pandoc generates slugs in a different way than rustdoc.

It should be possible to add a specific id to each header, though. The syntax is # Header Name {#header-name} IIRC.

You might also want to look at adjust_header_level.rs and adjust_reference_names.rs.

@azerupi

This comment has been minimized.

Copy link
Collaborator

azerupi commented Jan 11, 2016

Ok thanks for all the information, this will probably help @asolove a lot! :)

@asolove asolove referenced this issue Jan 12, 2016

Closed

WIP on ePub #94

0 of 5 tasks complete
@cetra3

This comment has been minimized.

Copy link
Contributor

cetra3 commented Jan 12, 2016

Not sure if this will help you guys, but I've created a simple rust tool which will collate multiple markdown files into one, resolving internal links and turning them into anchor links

We can use this in a pipeline on the way to converting to PDF:

mdcollate book-example/src/SUMMARY.md | pulldown-cmark > test.html && wkhtmltopdf test.html test.pdf

Code can be found here:
https://github.com/cetra3/mdcollate

Happy to accept any PRs

@azerupi

This comment has been minimized.

Copy link
Collaborator

azerupi commented Jan 12, 2016

@cetra3 That is really cool!
The plan is to make a "renderer" that does everything so that it can be used with the mdbook build command. So using a command line tool adds some complications. Have you thought about exposing the functionality as a crate?

I am not sure I would add a dependency just for that functionality, because there is always the possibility that it will not be maintained actively. But it could be considered if it offers enough useful methods that we wouldn't have to reinvent here.

@mkpankov

This comment has been minimized.

Copy link

mkpankov commented Jan 12, 2016

I'm also sceptical about Calibre. We use it in Russian translation of TRPL and we've come along several problems with EPUB (links are to descriptions in Russian, for reference):

@azerupi

This comment has been minimized.

Copy link
Collaborator

azerupi commented Jan 12, 2016

Thanks for sharing your experience :)
We will see if pandoc has the same problems, but I think @killercup used it without too much / any problems?

I also vaguely remember we had to hack styles in order to get better PDF. Not sure if it's necessary or not with Pandoc

I am not sure how this is handled with Pandoc, but having a custom theme could be a good thing.

@cetra3

This comment has been minimized.

Copy link
Contributor

cetra3 commented Jan 13, 2016

It's probably possible to wrap up those command line tools into a combined tool or expose it as a rust library. The last component (html to pdf) would need to use FFI as wkhtmltopdf is written in C. Not sure whether this adds too much dependency on externalities though.

The complication arises in that markdown is a superset of HTML which means that you need something that can present HTML in a printable fashion. In my experience with this problem, Pandoc and Calibre will do a subset, but you won't get full parity.

@killercup

This comment has been minimized.

Copy link

killercup commented Jan 13, 2016

There are a few things to be aware of, but in general pandoc is really amazing at converting Markdown to LaTeX. Which is what you want, I think—it has some very nice features that you currently can't get with HTML-to-PDF converters. For example, my PDF versions of the Rust Book include cross-references like "This is a mutable variable binding (section 5, page 163)".

If you're no LaTeX wizard (I'm not), you might want to look at this template I threw together.

If you have any issues with this, just mention me.

@azerupi

This comment has been minimized.

Copy link
Collaborator

azerupi commented Jan 13, 2016

Thanks for all your help Pascal!
I will definitely look at what you have currently running and I am pretty sure we will end up stealing a lot of your code (if that is ok with you) 😉

@gambhiro

This comment has been minimized.

Copy link
Contributor

gambhiro commented Aug 8, 2016

+1 for the effort, I am looking forward to using mdbook to produce ebooks.

It seems to have stalled a bit, is anyone currently working on this?

@azerupi

This comment has been minimized.

Copy link
Collaborator

azerupi commented Aug 8, 2016

It seems to have stalled a bit, is anyone currently working on this?

Indeed, it has stalled a bit. In the last 6 months I have been overwhelmed with work at school 😕

I am (very) slowly working on the refactoring / clean-up that I wanted to do. And that work is probably going to change the way this specific feature is going to be implemented. Hopefully I will have some time in September to make significant progress on the internal rewrite so that I can work on new features again.

@gambhiro

This comment has been minimized.

Copy link
Contributor

gambhiro commented Aug 9, 2016

@azerupi How much space is there for discussing this feature? There are some specific things I would be looking for in a CLI ebook helper, but maybe you are already determined in which way to go.

Some time ago I wrote prophecy, a ruby gem to automate the tasks I needed when producing ebooks. This is and example of the output. It has been very useful for me, but I believe I am the only user :)

I have been wanting to rewrite it with some of the hindsight since its early days, but when I saw this I thought maybe mdbook would be able to produce the same results.

There is an asciinema recording to see to sort of things it does.

@azerupi

This comment has been minimized.

Copy link
Collaborator

azerupi commented Aug 9, 2016

I'm open to all ideas :)

@Luthaf

This comment has been minimized.

Copy link

Luthaf commented Jan 16, 2017

Would you be interested in a PDF renderer using LaTeX, once #200 is merged? I could try to contribute this. I am using mdbook to build the user manual for my project, and I'd like to have a PDF version of it.

@gambhiro

This comment has been minimized.

Copy link
Contributor

gambhiro commented Jan 17, 2017

@Luthaf That would be mighty good. I am also interested in using mdbook for producing LaTeX files. Would you mind opening a separate issue for it and outline what you have in mind?

The options I see is either to depend on the pandoc crate (and hence on the user installing pandoc), or to take a token list as it comes from pulldown-cmark and write an exporter to LaTeX (crowbook does that in latex.rs).

Pandoc is very reliable and comprehensive. The user would have to install LaTeX anyway, and at that point it's not a big deal to also install pandoc.

The advantage of writing a token exporter is that it allows for custom typographic processing (recognising the first word of a paragraph, etc.), and I imagine we are not so concerned with that.

@azerupi

This comment has been minimized.

Copy link
Collaborator

azerupi commented Jan 17, 2017

@Luthaf Yes, very interested!
I think the first step is to implement a renderer for pulldown-cmark to convert from markdown to latex instead of html. I am not sure how stable the pulldown-cmark API is. This can be done independently of #200 and it would be useful for more than just mdBook :)

@Luthaf

This comment has been minimized.

Copy link

Luthaf commented Jan 17, 2017

I think the first step is to implement a renderer for pulldown-cmark to convert from markdown to latex instead of html.

That was basically my idea. pandoc looks nice (I've never used it), but why add another dependency if we can easily go without it? I'll try to get a poc working, and will open an issue here about this.

@gambhiro

This comment has been minimized.

Copy link
Contributor

gambhiro commented Jan 17, 2017

if we can easily go without it

Tables with headings and rules, smart quoting, tex ligatures, ampersands and other specials, list within quotes... there are so many things to think about.

@Luthaf

This comment has been minimized.

Copy link

Luthaf commented Jan 17, 2017

Tables with headings and rules, smart quoting, tex ligatures, ampersands and other specials, list within quotes... there are so many things to think about.

Yep, that true =/. I'll give it a try to grasp the internal of the code, and I'll check how hard this is. If I can not make it work, I'll check how I can make this work on Windows/Linux/OS X with calling user-provided pandoc.

Do you know any test suite for markdown files? Maybe the pandoc's one would be a good listing of all the weird cases.

@gambhiro

This comment has been minimized.

Copy link
Contributor

gambhiro commented Jan 17, 2017

I'd say just adopt pandoc. Even if you did without it, the PDF feature is still not stand-alone, because there is always the assumption that the user has gone through the trouble of installing a LaTeX distribution.

There exists a spec for CommonMark and there is Multimarkdown's test suite. 'Simple' is not the right word for it...

@gambhiro

This comment has been minimized.

Copy link
Contributor

gambhiro commented Jan 17, 2017

@Luthaf for getting familiar with the mdbook code, I recommend starting with the multilang branch. It changes the representation of the book's data than the current master. It will already build books, and if you discover bugs that would help for #200.

@vandenoever

This comment has been minimized.

Copy link

vandenoever commented Sep 22, 2017

A very naive conversion that works pretty nice for the rust book second edition:

cd second-edition/src
pandoc -o rust.epub [Sc]*.md a*.md
@dkotrada

This comment has been minimized.

Copy link

dkotrada commented Sep 22, 2017

@budziq

This comment has been minimized.

Copy link
Collaborator

budziq commented Sep 23, 2017

@vandenoever thanks, this seams similar to what @killercup had provided in https://github.com/killercup/trpl-ebook

@dkotrada tectonic looks interesting! I understand that it does require 3'rd party tools to convert TeX/LaTeX to pdf. Am I right?

@dkotrada

This comment has been minimized.

Copy link

dkotrada commented Sep 23, 2017

@budziq yes you are right.

@Michael-F-Bryan

This comment has been minimized.

Copy link
Collaborator

Michael-F-Bryan commented Dec 13, 2017

As a bit of an update on this front, I've started working on adding the ability to use alternative backends (#507). It's still very much in the experimental phase and nothing has been merged yet, but I've already got an EPUB renderer and it generates fairly reasonable outputs.

Under the hood we use @lise-henry's epub-builder crate (awesome crate by the way) to generate an EPUB document, with mdbook calling the mdbook-epub renderer and passing in a copy of the entire book's contents, as well as configuration, via stdin.

This means backends aren't limited to just Rust, for example you could use a python script that passes all Rust snippets to rustc or rustdoc and makes sure they work (like the builtin mdbook test command), with the "rendered output" just being a pass or fail.

@dustinmatlock

This comment has been minimized.

Copy link

dustinmatlock commented Dec 16, 2017

Probably the easiest way: print at the top right and save as PDF. Of course, this doesn't solve the EPUB issue, but programming books are sometimes best viewed in PDF. The book's website looks great on mobile.

Print Rust PDF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment