Export notebooks to docx, and epub #229

choldgraf · 2019-07-11T13:07:35Z

It would be useful if Jupyter Book could easily create the following things in addition to HTML-based versions of notebooks/md files:

~~PDF documents: see Export pages to PDF #267~~
Word documents
epub documents
latex documents (and maybe PDF from latex)

Users could then have a "download" button on each page that lets them download that page's content as each of these formats.

Let's keep this issue open to track which if any of these have been implemented.

nozebacle · 2019-07-29T02:00:32Z

This sounds like a very interesting addition that I'd love to see.
In my particular case, even though I like the online version, I can see some of my readers preferring a pdf version, and even printing it so they can write all over it.

choldgraf · 2019-09-17T04:09:39Z

PDF printing is now in the master branch of jupyter book! updating this issue accordingly

NatalieZelenka · 2019-11-09T16:18:04Z

Hey, are there any plans/existing functionality for downloading the whole book as a PDF (as opposed to individual pages)? I feel like it would make it possible for students to write their Masters/PhD theses in Jupyter Books, which would be amazing.

choldgraf · 2019-11-09T23:52:30Z

@NatalieThurlby definitely a goal for this project, just a matter of hours in the day :-)

lu-kas · 2019-11-20T20:12:55Z

Being able to download the whole book as a single (PDF) document would be really great! Is there maybe another way to create a PDF for a whole project while building the HTML pages? I.e. not necessary having a download button but a precompiled PDF file which can be distributed otherwise.

pgierz · 2019-12-16T10:23:24Z

Can I add LaTeX export to the list? That'd be really cool for writing publications!

choldgraf · 2019-12-17T00:08:14Z

done!

pgierz · 2019-12-17T10:22:43Z

@choldgraf if you point me in the direction where the download button calls its actual commands, I can have a crack at this over the holidays as some hobby-programming. If it's just nbconvert calls I think I might be able to get the word export working. I already know how to do that from the command line...

choldgraf · 2019-12-17T17:49:33Z

The trickiest thing is that right now the "download" button will only do one of the following two things:

Download the raw content file (ipynb, md, etc) used to generate a page
Use PrintJS to convert the current page's content to PDF

Supporting latex would require supporting the idea of a static file that is generated along with each page, I don't think it could be done on the client side (as the PrintJS library does). That might be a little bit tricky, but doable. It may just take a bit extra work to figure out the right pattern to follow there

LinkHS · 2020-02-15T07:43:33Z

@NatalieThurlby definitely a goal for this project, just a matter of hours in the day :-)

Hi @choldgraf , has this feature, exporting all pages to a pdf, been done? If not, is there any plan or update?

choldgraf · 2020-02-15T22:57:31Z

Nope, not yet - it is partially on-hold pending some broader refactoring of the project (with the goal of making things like PDF output easier)...but it is still on the radar, just unsure how long it'll take to be implemented.

pgierz · 2020-03-17T09:14:34Z

@choldgraf I have some time to look into this now. You mentioned:

Supporting latex would require supporting the idea of a static file that is generated along with each page

I think this shouldn't be too tricky. We could use either pandoc (for non-notebook files) or nbconvert (for notebooks); both of which have python interfaces. See here for pandoc. Nbconvert is already used anyway...

I just forked the code and am looking through it. I think this needs to happen somewhere in the build.py, but I'm not entirely sure. Is there any other location where static files could be made?

choldgraf · 2020-03-17T14:54:19Z

@pgierz thanks for offering to look into this! We are actually in the process of working on a backend-switch for jupyter book, and this might make it easier to output different kinds of formats. We are building off of a tool called Sphinx, which is kind of like Pandoc but written in Python. If you're interested, the docs for the new version are here: https://beta.jupyterbook.org/intro.html and more general information about the project is here: https://ebp.jupyterbook.org/en/latest/

I'm happy to step you through the new codebase and explain stuff if you'd like! I think that enhancements to Jupyter Book will be more impactful if they are on the new tool-chain.

pgierz · 2020-03-18T06:55:49Z

Hi @choldgraf, I'm familiar with Sphinx already (at least a little bit). I'll download the new code and poke around a little bit. Can I let you know if I get stuck or have any questions?

pgierz · 2020-03-19T11:26:17Z

So, I was able to make some progress!

I added download buttons for latex and docx to the HTML template. Currently, they point into nirvana, but that's OK for now.
I extended the jupyter-book build command to automatically also generate LaTeX (this works already, since it's built into Sphinx) and found a docx extension for sphinx (here: https://docxbuilder.readthedocs.io/en/latest/index.html). I also added docx to the MyST-NB transform.py render priority. So far, it's like this:

WIDGET_VIEW_MIMETYPE = "application/vnd.jupyter.widget-view+json"
RENDER_PRIORITY = {
    "html": [
        WIDGET_VIEW_MIMETYPE,
        "application/javascript",
        "text/html",
        "image/svg+xml",
        "image/png",
        "image/jpeg",
        "text/latex",
        "text/plain",
    ],
    "latex": ["text/latex", "text/plain"],
    # PG: Not sure about this...
    "docx": ["text/plain"]
}
RENDER_PRIORITY["readthedocs"] = RENDER_PRIORITY["html"]

However, now I'm running into errors:

unning Sphinx v2.4.4
Adding copy buttons to code blocks...
loading pickled environment... done
building [mo]: targets for 0 po files that are out of date
building [docx]: pass
updating environment: [config changed ('author')] 30 added, 0 changed, 0 removed
checking for /Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/references.bib in bibtex cache... up to date                                                          
checking for /Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/mdrefs.bib in bibtex cache... up to date                                                     
reading sources... [100%] test_pages/test                                                                                                                                      
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
processing JupyterBook.docx... 
resolving references.../Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/guide/03_build.md:11: WARNING: None:any reference target not found: 02_create
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/guide/04_publish.md:12: WARNING: None:any reference target not found: 03_build

writing... WARNING: Missing refuri :guide/old_docs/features/titles
WARNING: Missing refuri :guide/features/hiding
WARNING: Missing refuri :guide/old_docs/features/interactive_cells
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/markdown.md:: WARNING: Not support remote image files yet
WARNING: Missing refuri :features/features/myst#project-jupyter-proc-scipy-2018
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
WARNING: Missing refuri :features/old_docs/features/layout
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/hiding.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/hiding.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/hiding.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/hiding.ipynb:: WARNING: Ignore unknown node CellNode
WARNING: Missing refuri :features/features/citations#holdgraf-rapid-2016
WARNING: Missing refuri :features/features/citations#holdgraf-evidence-2014
WARNING: Missing refuri :features/features/citations#holdgraf-portable-2017
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:: WARNING: Not support remote image files yet
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/code.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/limits.md:209: WARNING: Not support remote image files yet
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/limits.md:212: WARNING: Not support remote image files yet

Exception occurred:
  File "/Users/pgierz/opt/miniconda3/envs/jbook_dev/lib/python3.7/site-packages/docxbuilder/writer.py", line 1196, in visit_Text
    self._doc_stack[-1].add_text(node.astext())
AttributeError: 'Table' object has no attribute 'add_text'
The full traceback has been saved in /var/folders/nn/sdjny2nn5v338x7w6q999yt00000gp/T/sphinx-err-71o5gf09.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!

I guess this comes out of the docx-converter but maybe my changes for MyST-NB aren't complete yet...?

I've forked everything and will check all the changes into branches labeled docx. @choldgraf If you take a look, I'd be grateful as I'm a bit stuck at the moment...

pgierz · 2020-03-19T11:26:46Z

Also, maybe this issue should move somewhere to the ExecutableBookProject repos so the discussion is a bit clearer...

choldgraf · 2020-03-19T16:11:18Z

Wow, thanks very much for the quick push on this! I agree we should add this to the MyST-NB repository for now so we can track progress (or maybe jupyter-sphinx, but let's start w/ myst-nb). There is also another team of folks who will work on a first-class latex -> pdf builder as well. As you mention, myst-nb is definitely still in a formative stage, there is much to be done!

Wanna open an issue about DOCX in myst-nb and paste in your comments above?

pgierz · 2020-03-20T07:28:38Z

OK, done! :-)

drscotthawley · 2020-05-16T01:13:33Z

+1 for EPUB support!
New user here, very excited upon hearing about "jupyter book" as I'm writing an interactive book using EPUB3's support for Javascript, plus I love writing in Markdown and using Jupyter notebooks. I was expecting EPUB and MOBI to be the intended output formats.

Now am trying to figure out what is meant by "book":
So "book" = "series of web pages"?
...And then you can print a PDF of an individual web page (which our browsers already do)?
...And then a publisher is supposed to use this how?

So, how is jupyter book different from a Jeyll-based blogging platform, such as fastpages?

I understand that you're in beta, so I'm not trying to diminish from your hard work at all; I'm very honestly just not "getting" it and I want to. Thanks.

choldgraf · 2020-05-16T01:28:01Z

in fact Jupyter Book used to use Jekyll :-)

mostly we're defining book as "a way of packaging multiple [notebooks/md files/etc] together in a bundle" with the goal of outputting many kinds of bundles.

Fastpages is also great - though I think of it more as a quick way to have a blog, as opposed to a book. Major differences with Jupyter Book (other than obvious things like design), would be more "book-like" features such as cross-references, citations, figures, injecting notebook outputs into pages, etc. And yes, one day we want to support more outputs like EPUB and docx

drscotthawley · 2020-05-16T01:34:03Z

Thanks for the clarification @choldgraf !

drscotthawley · 2020-05-16T19:38:11Z

Last night I spent some time looking into pandoc's EPUB support and softcover.io's build pipeline.

Probably you already know all this, but what I found was:

Trying to use Pandoc to directly convert one of the example jupyter book .ipynb files (python_by_example.ipynb) resulted in so much memory overhead that my machine (with 64 GB of RAM!) locked up and the kernel auto-killed it. Repeatedly.

         $ pandoc -s python_by_example.ipynb -o test.epub
         (wait a really long time)
         Killed
         $

Using nbconvert to generate LaTeX and then use Pandoc to convert LaTeX to EPUB resulted in conflicts between the tcolorbox environment and the Verbatim environment:

      $ jupyter nbconvert --to latex python_by_example.ipynb 
      $ pandoc -s python_by_example.tex -o test.epub
        Error at "source" (line 421, column 15):
        expecting \end{tcolorbox}
        \end{Verbatim}
                     ^

^My attempts to resolve that were unsuccessful, despite spending time perusing related issues on TeX StackExchange.

The Pandoc "progit" example of generating EPUB from a series of Markdown files worked fine. But these weren't .ipynb files, and didn't have any figures or 'advanced' executable/interactive features. So, perhaps that's not very relevant.

So...I'm sure if these things had worked you would have done them already. Just noting what a few of the barriers might be to 'easy' fulfillment of this feature request.

choldgraf · 2020-05-17T01:26:40Z

Thanks for looking into all that, and for reporting back! I believe that Sphinx also has its own EPUB builder that we might be able to piggy-back off of (since Jupyter Book now uses Sphinx under the hood). I'm not familiar with it but I know that readthedocs has some of this functionality which uses Sphinx too. E.g.: https://www.sphinx-doc.org/en/master/faq.html?highlight=epub#epub-info

ScriptAutomate · 2020-08-20T06:55:05Z

Potential Quick Win Paths

StackOverflow seems to have potential directions for getting what you need:

How to create a PDF-out-of-Sphinx-documentation-tool
- An updated gist that may be more relevant: how-to.txt: pdf output from sphinx with rst2pdf

The top solution uses the rst2pdf python package with Sphinx projects. There are other listed answers that may be of help.

I haven't tested rst2pdf, but it may be the lowest barrier to entry, and I've seen some people using rst2pdf in the Write The Docs community. If you all aren't already there, you should definitely join their Slack workspace.

The official Sphinx docs also point to rinohtype as a potential alternative to the latex-backend for generating PDFs from Sphinx projects.

A More Complicated Path

I have used Sphinx to build PDFs of the SaltStack documentation. Sphinx does also have the ability to export EPUB, but I do not have experience with it.

It does need to be generated from source, much like one generates the HTML (make html vs something like make pdflatex or make pdf). For an example of the power of this, here is a Sphinx generated site, along with a download of the PDF that gets autogenerated whenever the site content is updated:

https://enterprise.saltstack.com
Download the 6.3 PDF version of enterprise.saltstack.com (as the older versions were Jekyll md sites converted to PDFs in not-ideal fashion)

Much of the source is private, other than what you can see with Show Source buttons on the site, which shows raw rst it had converted. But, SaltStack also has a variety of open-sourced projects. I'm digging into our open-sourced tooling for the Salt docs (one of the biggest open source projects on GitHub, and made it onto this interesting list of GitHub repos), which I am less versed in, but anyone here can take a look at it if they'd like. I need to collaborate with some people to document how the build pipeline and tooling works when it comes to the docs, and/or try to create a GitHub template repo that is bootstrapped for PDF/EPUB gen. Here are the links:

Salt docs HTML site build result from Sphinx and autodoc: https://docs.saltstack.com/en/latest/topics/index.html
- Source code repo: https://github.com/saltstack/salt/tree/master/doc
- PDF download (over 5,000 pages because of how massive the Salt project is, and all the autodoc results pulled in from docstrings!)
- ePUB download
Separate source repository that houses all the docs build tools, container(s), and build pipeline: https://gitlab.com/saltstack/open/docs/builddocs
- CI pipeline config, including jobs specific to PDF and EPUB generation (the build jobs for a PDF or EPUB take 9 - 14 minutes, granted, this is over 5000 pages of content from straight rst and also autodoc extracted docstrings from Python code): https://gitlab.com/saltstack/open/docs/builddocs/-/blob/master/.gitlab-ci.yml#L53-87

Though, this did mean needing to get containers involved that brought in all the LaTeX requirements beforehand, and for the CI jobs to pull the containers down. They are part of the build pipelines and make life easy, though it then complicates the build process for newcomers and for new projects working to mimic the functionality. sphinx-build supports converting to LaTeX, and then to PDF: https://www.sphinx-doc.org/en/master/usage/builders/index.html#sphinx.builders.latex.LaTeXBuilder

Not sure if starting with something like https://hub.docker.com/r/tk0miya/sphinx-pdf could make things easier or not?

Source code of docker images: https://github.com/tk0miya/sphinx-docker

Since EPB / Jupyter Book is going the Sphinx route, it does also mean that if anyone is using readthedocs.io for hosting their Sphinx output, they include pipeline jobs to help autocreate PDFs and ePUBs of the docs, too, as part of their free hosting: https://docs.readthedocs.io/en/stable/features.html#downloadable-documentation -- which could help make life easier for people who aren't wanting to implement their own pipelines.

jtbayly · 2020-11-05T13:48:25Z

Just wanted to say that without epub, mobi and PDF support, I don't really see myself using this. Hoping these things can be added.

jtbayly · 2021-03-31T02:52:52Z

So now that PDF is supported (in two ways), how difficult is it going to be to add ePub, for example? Is it just a matter of integrating copying what was done for PDF but using the sphinx.builders.epub3.Epub3Builder class?

choldgraf mentioned this issue Jul 11, 2019

Is it possible to export Word, PDF and other formats? #228

Closed

choldgraf added the enhancement New feature or request label Jul 11, 2019

choldgraf changed the title ~~Export notebooks to PDF, docx, and epub~~ Export notebooks to docx, and epub Sep 17, 2019

pgierz mentioned this issue Mar 20, 2020

Support for DOCX executablebooks/MyST-NB#90

Open

patrickmineault mentioned this issue Sep 4, 2021

Add documentation on how to use custom-builder to generate epub #1451

Merged

mmcky closed this as completed in #1451 Oct 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export notebooks to docx, and epub #229

Export notebooks to docx, and epub #229

choldgraf commented Jul 11, 2019 •

edited

nozebacle commented Jul 29, 2019

choldgraf commented Sep 17, 2019

NatalieZelenka commented Nov 9, 2019

choldgraf commented Nov 9, 2019

lu-kas commented Nov 20, 2019

pgierz commented Dec 16, 2019

choldgraf commented Dec 17, 2019

pgierz commented Dec 17, 2019

choldgraf commented Dec 17, 2019

LinkHS commented Feb 15, 2020 •

edited

choldgraf commented Feb 15, 2020

pgierz commented Mar 17, 2020

choldgraf commented Mar 17, 2020

pgierz commented Mar 18, 2020

pgierz commented Mar 19, 2020

pgierz commented Mar 19, 2020

choldgraf commented Mar 19, 2020

pgierz commented Mar 20, 2020

drscotthawley commented May 16, 2020

choldgraf commented May 16, 2020

drscotthawley commented May 16, 2020

drscotthawley commented May 16, 2020 •

edited

choldgraf commented May 17, 2020

ScriptAutomate commented Aug 20, 2020 •

edited

jtbayly commented Nov 5, 2020

jtbayly commented Mar 31, 2021

Export notebooks to docx, and epub #229

Export notebooks to docx, and epub #229

Comments

choldgraf commented Jul 11, 2019 • edited

nozebacle commented Jul 29, 2019

choldgraf commented Sep 17, 2019

NatalieZelenka commented Nov 9, 2019

choldgraf commented Nov 9, 2019

lu-kas commented Nov 20, 2019

pgierz commented Dec 16, 2019

choldgraf commented Dec 17, 2019

pgierz commented Dec 17, 2019

choldgraf commented Dec 17, 2019

LinkHS commented Feb 15, 2020 • edited

choldgraf commented Feb 15, 2020

pgierz commented Mar 17, 2020

choldgraf commented Mar 17, 2020

pgierz commented Mar 18, 2020

pgierz commented Mar 19, 2020

pgierz commented Mar 19, 2020

choldgraf commented Mar 19, 2020

pgierz commented Mar 20, 2020

drscotthawley commented May 16, 2020

choldgraf commented May 16, 2020

drscotthawley commented May 16, 2020

drscotthawley commented May 16, 2020 • edited

choldgraf commented May 17, 2020

ScriptAutomate commented Aug 20, 2020 • edited

Potential Quick Win Paths

A More Complicated Path

jtbayly commented Nov 5, 2020

jtbayly commented Mar 31, 2021

choldgraf commented Jul 11, 2019 •

edited

LinkHS commented Feb 15, 2020 •

edited

drscotthawley commented May 16, 2020 •

edited

ScriptAutomate commented Aug 20, 2020 •

edited