Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export notebooks to docx, and epub #229

Closed
choldgraf opened this issue Jul 11, 2019 · 26 comments · Fixed by #1451
Closed

Export notebooks to docx, and epub #229

choldgraf opened this issue Jul 11, 2019 · 26 comments · Fixed by #1451
Labels
enhancement New feature or request

Comments

@choldgraf
Copy link
Member

choldgraf commented Jul 11, 2019

It would be useful if Jupyter Book could easily create the following things in addition to HTML-based versions of notebooks/md files:

Users could then have a "download" button on each page that lets them download that page's content as each of these formats.

Let's keep this issue open to track which if any of these have been implemented.

@nozebacle
Copy link

This sounds like a very interesting addition that I'd love to see.
In my particular case, even though I like the online version, I can see some of my readers preferring a pdf version, and even printing it so they can write all over it.

@choldgraf choldgraf changed the title Export notebooks to PDF, docx, and epub Export notebooks to docx, and epub Sep 17, 2019
@choldgraf
Copy link
Member Author

PDF printing is now in the master branch of jupyter book! updating this issue accordingly

@NatalieZelenka
Copy link

Hey, are there any plans/existing functionality for downloading the whole book as a PDF (as opposed to individual pages)? I feel like it would make it possible for students to write their Masters/PhD theses in Jupyter Books, which would be amazing.

@choldgraf
Copy link
Member Author

@NatalieThurlby definitely a goal for this project, just a matter of hours in the day :-)

@lu-kas
Copy link

lu-kas commented Nov 20, 2019

Being able to download the whole book as a single (PDF) document would be really great! Is there maybe another way to create a PDF for a whole project while building the HTML pages? I.e. not necessary having a download button but a precompiled PDF file which can be distributed otherwise.

@pgierz
Copy link

pgierz commented Dec 16, 2019

Can I add LaTeX export to the list? That'd be really cool for writing publications!

@choldgraf
Copy link
Member Author

done!

@pgierz
Copy link

pgierz commented Dec 17, 2019

@choldgraf if you point me in the direction where the download button calls its actual commands, I can have a crack at this over the holidays as some hobby-programming. If it's just nbconvert calls I think I might be able to get the word export working. I already know how to do that from the command line...

@choldgraf
Copy link
Member Author

The trickiest thing is that right now the "download" button will only do one of the following two things:

  1. Download the raw content file (ipynb, md, etc) used to generate a page
  2. Use PrintJS to convert the current page's content to PDF

Supporting latex would require supporting the idea of a static file that is generated along with each page, I don't think it could be done on the client side (as the PrintJS library does). That might be a little bit tricky, but doable. It may just take a bit extra work to figure out the right pattern to follow there

@LinkHS
Copy link

LinkHS commented Feb 15, 2020

@NatalieThurlby definitely a goal for this project, just a matter of hours in the day :-)

Hi @choldgraf , has this feature, exporting all pages to a pdf, been done? If not, is there any plan or update?

@choldgraf
Copy link
Member Author

Nope, not yet - it is partially on-hold pending some broader refactoring of the project (with the goal of making things like PDF output easier)...but it is still on the radar, just unsure how long it'll take to be implemented.

@pgierz
Copy link

pgierz commented Mar 17, 2020

@choldgraf I have some time to look into this now. You mentioned:

Supporting latex would require supporting the idea of a static file that is generated along with each page

I think this shouldn't be too tricky. We could use either pandoc (for non-notebook files) or nbconvert (for notebooks); both of which have python interfaces. See here for pandoc. Nbconvert is already used anyway...

I just forked the code and am looking through it. I think this needs to happen somewhere in the build.py, but I'm not entirely sure. Is there any other location where static files could be made?

@choldgraf
Copy link
Member Author

@pgierz thanks for offering to look into this! We are actually in the process of working on a backend-switch for jupyter book, and this might make it easier to output different kinds of formats. We are building off of a tool called Sphinx, which is kind of like Pandoc but written in Python. If you're interested, the docs for the new version are here: https://beta.jupyterbook.org/intro.html and more general information about the project is here: https://ebp.jupyterbook.org/en/latest/

I'm happy to step you through the new codebase and explain stuff if you'd like! I think that enhancements to Jupyter Book will be more impactful if they are on the new tool-chain.

@pgierz
Copy link

pgierz commented Mar 18, 2020

Hi @choldgraf, I'm familiar with Sphinx already (at least a little bit). I'll download the new code and poke around a little bit. Can I let you know if I get stuck or have any questions?

@pgierz
Copy link

pgierz commented Mar 19, 2020

So, I was able to make some progress!

  1. I added download buttons for latex and docx to the HTML template. Currently, they point into nirvana, but that's OK for now.

  2. I extended the jupyter-book build command to automatically also generate LaTeX (this works already, since it's built into Sphinx) and found a docx extension for sphinx (here: https://docxbuilder.readthedocs.io/en/latest/index.html). I also added docx to the MyST-NB transform.py render priority. So far, it's like this:

WIDGET_VIEW_MIMETYPE = "application/vnd.jupyter.widget-view+json"
RENDER_PRIORITY = {
    "html": [
        WIDGET_VIEW_MIMETYPE,
        "application/javascript",
        "text/html",
        "image/svg+xml",
        "image/png",
        "image/jpeg",
        "text/latex",
        "text/plain",
    ],
    "latex": ["text/latex", "text/plain"],
    # PG: Not sure about this...
    "docx": ["text/plain"]
}
RENDER_PRIORITY["readthedocs"] = RENDER_PRIORITY["html"]

However, now I'm running into errors:

unning Sphinx v2.4.4
Adding copy buttons to code blocks...
loading pickled environment... done
building [mo]: targets for 0 po files that are out of date
building [docx]: pass
updating environment: [config changed ('author')] 30 added, 0 changed, 0 removed
checking for /Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/references.bib in bibtex cache... up to date                                                          
checking for /Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/mdrefs.bib in bibtex cache... up to date                                                     
reading sources... [100%] test_pages/test                                                                                                                                      
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
processing JupyterBook.docx... 
resolving references.../Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/guide/03_build.md:11: WARNING: None:any reference target not found: 02_create
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/guide/04_publish.md:12: WARNING: None:any reference target not found: 03_build

writing... WARNING: Missing refuri :guide/old_docs/features/titles
WARNING: Missing refuri :guide/features/hiding
WARNING: Missing refuri :guide/old_docs/features/interactive_cells
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/markdown.md:: WARNING: Not support remote image files yet
WARNING: Missing refuri :features/features/myst#project-jupyter-proc-scipy-2018
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
WARNING: Missing refuri :features/old_docs/features/layout
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/hiding.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/hiding.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/hiding.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/hiding.ipynb:: WARNING: Ignore unknown node CellNode
WARNING: Missing refuri :features/features/citations#holdgraf-rapid-2016
WARNING: Missing refuri :features/features/citations#holdgraf-evidence-2014
WARNING: Missing refuri :features/features/citations#holdgraf-portable-2017
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:: WARNING: Not support remote image files yet
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/code.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/limits.md:209: WARNING: Not support remote image files yet
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/limits.md:212: WARNING: Not support remote image files yet

Exception occurred:
  File "/Users/pgierz/opt/miniconda3/envs/jbook_dev/lib/python3.7/site-packages/docxbuilder/writer.py", line 1196, in visit_Text
    self._doc_stack[-1].add_text(node.astext())
AttributeError: 'Table' object has no attribute 'add_text'
The full traceback has been saved in /var/folders/nn/sdjny2nn5v338x7w6q999yt00000gp/T/sphinx-err-71o5gf09.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!

I guess this comes out of the docx-converter but maybe my changes for MyST-NB aren't complete yet...?

I've forked everything and will check all the changes into branches labeled docx. @choldgraf If you take a look, I'd be grateful as I'm a bit stuck at the moment...

@pgierz
Copy link

pgierz commented Mar 19, 2020

Also, maybe this issue should move somewhere to the ExecutableBookProject repos so the discussion is a bit clearer...

@choldgraf
Copy link
Member Author

Wow, thanks very much for the quick push on this! I agree we should add this to the MyST-NB repository for now so we can track progress (or maybe jupyter-sphinx, but let's start w/ myst-nb). There is also another team of folks who will work on a first-class latex -> pdf builder as well. As you mention, myst-nb is definitely still in a formative stage, there is much to be done!

Wanna open an issue about DOCX in myst-nb and paste in your comments above?

@pgierz
Copy link

pgierz commented Mar 20, 2020

OK, done! :-)

@drscotthawley
Copy link

+1 for EPUB support!
New user here, very excited upon hearing about "jupyter book" as I'm writing an interactive book using EPUB3's support for Javascript, plus I love writing in Markdown and using Jupyter notebooks. I was expecting EPUB and MOBI to be the intended output formats.

Now am trying to figure out what is meant by "book":
So "book" = "series of web pages"?
...And then you can print a PDF of an individual web page (which our browsers already do)?
...And then a publisher is supposed to use this how?

So, how is jupyter book different from a Jeyll-based blogging platform, such as fastpages?

I understand that you're in beta, so I'm not trying to diminish from your hard work at all; I'm very honestly just not "getting" it and I want to. Thanks.

@choldgraf
Copy link
Member Author

in fact Jupyter Book used to use Jekyll :-)

mostly we're defining book as "a way of packaging multiple [notebooks/md files/etc] together in a bundle" with the goal of outputting many kinds of bundles.

Fastpages is also great - though I think of it more as a quick way to have a blog, as opposed to a book. Major differences with Jupyter Book (other than obvious things like design), would be more "book-like" features such as cross-references, citations, figures, injecting notebook outputs into pages, etc. And yes, one day we want to support more outputs like EPUB and docx

@drscotthawley
Copy link

Thanks for the clarification @choldgraf !

@drscotthawley
Copy link

drscotthawley commented May 16, 2020

Last night I spent some time looking into pandoc's EPUB support and softcover.io's build pipeline.

Probably you already know all this, but what I found was:

  • Trying to use Pandoc to directly convert one of the example jupyter book .ipynb files (python_by_example.ipynb) resulted in so much memory overhead that my machine (with 64 GB of RAM!) locked up and the kernel auto-killed it. Repeatedly.
         $ pandoc -s python_by_example.ipynb -o test.epub
         (wait a really long time)
         Killed
         $
  • Using nbconvert to generate LaTeX and then use Pandoc to convert LaTeX to EPUB resulted in conflicts between the tcolorbox environment and the Verbatim environment:
      $ jupyter nbconvert --to latex python_by_example.ipynb 
      $ pandoc -s python_by_example.tex -o test.epub
        Error at "source" (line 421, column 15):
        expecting \end{tcolorbox}
        \end{Verbatim}
                     ^

^My attempts to resolve that were unsuccessful, despite spending time perusing related issues on TeX StackExchange.

  • The Pandoc "progit" example of generating EPUB from a series of Markdown files worked fine. But these weren't .ipynb files, and didn't have any figures or 'advanced' executable/interactive features. So, perhaps that's not very relevant.

So...I'm sure if these things had worked you would have done them already. Just noting what a few of the barriers might be to 'easy' fulfillment of this feature request.

@choldgraf
Copy link
Member Author

Thanks for looking into all that, and for reporting back! I believe that Sphinx also has its own EPUB builder that we might be able to piggy-back off of (since Jupyter Book now uses Sphinx under the hood). I'm not familiar with it but I know that readthedocs has some of this functionality which uses Sphinx too. E.g.: https://www.sphinx-doc.org/en/master/faq.html?highlight=epub#epub-info

@ScriptAutomate
Copy link

ScriptAutomate commented Aug 20, 2020

Potential Quick Win Paths

StackOverflow seems to have potential directions for getting what you need:

The top solution uses the rst2pdf python package with Sphinx projects. There are other listed answers that may be of help.

I haven't tested rst2pdf, but it may be the lowest barrier to entry, and I've seen some people using rst2pdf in the Write The Docs community. If you all aren't already there, you should definitely join their Slack workspace.

The official Sphinx docs also point to rinohtype as a potential alternative to the latex-backend for generating PDFs from Sphinx projects.

A More Complicated Path

I have used Sphinx to build PDFs of the SaltStack documentation. Sphinx does also have the ability to export EPUB, but I do not have experience with it.

It does need to be generated from source, much like one generates the HTML (make html vs something like make pdflatex or make pdf). For an example of the power of this, here is a Sphinx generated site, along with a download of the PDF that gets autogenerated whenever the site content is updated:

Much of the source is private, other than what you can see with Show Source buttons on the site, which shows raw rst it had converted. But, SaltStack also has a variety of open-sourced projects. I'm digging into our open-sourced tooling for the Salt docs (one of the biggest open source projects on GitHub, and made it onto this interesting list of GitHub repos), which I am less versed in, but anyone here can take a look at it if they'd like. I need to collaborate with some people to document how the build pipeline and tooling works when it comes to the docs, and/or try to create a GitHub template repo that is bootstrapped for PDF/EPUB gen. Here are the links:

Though, this did mean needing to get containers involved that brought in all the LaTeX requirements beforehand, and for the CI jobs to pull the containers down. They are part of the build pipelines and make life easy, though it then complicates the build process for newcomers and for new projects working to mimic the functionality. sphinx-build supports converting to LaTeX, and then to PDF: https://www.sphinx-doc.org/en/master/usage/builders/index.html#sphinx.builders.latex.LaTeXBuilder

Not sure if starting with something like https://hub.docker.com/r/tk0miya/sphinx-pdf could make things easier or not?

Since EPB / Jupyter Book is going the Sphinx route, it does also mean that if anyone is using readthedocs.io for hosting their Sphinx output, they include pipeline jobs to help autocreate PDFs and ePUBs of the docs, too, as part of their free hosting: https://docs.readthedocs.io/en/stable/features.html#downloadable-documentation -- which could help make life easier for people who aren't wanting to implement their own pipelines.

@jtbayly
Copy link

jtbayly commented Nov 5, 2020

Just wanted to say that without epub, mobi and PDF support, I don't really see myself using this. Hoping these things can be added.

@jtbayly
Copy link

jtbayly commented Mar 31, 2021

So now that PDF is supported (in two ways), how difficult is it going to be to add ePub, for example? Is it just a matter of integrating copying what was done for PDF but using the sphinx.builders.epub3.Epub3Builder class?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants