Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should repo2docker builds come with dependencies to export to PDF? #1089

Open
willingc opened this issue Dec 11, 2017 · 26 comments
Open

Should repo2docker builds come with dependencies to export to PDF? #1089

willingc opened this issue Dec 11, 2017 · 26 comments

Comments

@willingc
Copy link
Contributor

When troubleshooting jupyterhub/jupyterhub#1572, I tried to download a notebook as a PDF. I received the following error:

screenshot 2017-12-11 11 30 46

Since pandoc is used by nbconvert, shouldn't it be installed in the default image?

@yuvipanda
Copy link
Collaborator

I think part of the problem is that installing pandoc increases the size of the base image significantly (by more than a gig or so I think?), also slowing down builds and launches by quite a bit. I think right now you need an 'apt.txt' with 'pandoc' in it to have it installed. I'd personally like to keep it that way for now, since IMO slowing down the experience for everyone is not worth the extra step for folks who want to use pandoc.

We could possibly try to grep logs we have to see how many people have tried to download a notebook as PDF on binder, to help quantify this decision?

@choldgraf
Copy link
Member

Is this use-case common enough that we should document it in binder-examples?

@willingc
Copy link
Contributor Author

Hmm... the difficulty is that anything that relies on nbconvert in the classic notebook UI (primarily downloads) will run across this error. I think documenting solves one problem: getting it to work so +1 to that. The bigger problem is that the classic notebook's UI for download will not work for PDF or other formats relying on nbconvert. At minimum, we should add this to some sort of "known issues" doc.

@yuvipanda
Copy link
Collaborator

+1 on adding it to binder-examples as a minimum start. Should we start a FAQ for a 'known issues' type document?

@willingc
Copy link
Contributor Author

I think that for mybinder.org-deploy a 'known issues' type of thing (even if just an issue) would be helpful.

@choldgraf
Copy link
Member

I also just noticed that pandoc is installed by default with a conda environment (I think, anyway)

@willingc
Copy link
Contributor Author

Thanks for the detective work @choldgraf. I added an environment.yml to the my test repo (willingc/ThinkDSP) that I'm using that matches the contents of the requirements.txt. Interestingly, the error is a bit different since it's referencing xelatex. Thoughts?

screenshot 2017-12-12 20 48 15

@willingc
Copy link
Contributor Author

As a workaround, print preview does work within the notebook with conda and the preview can be saved as a PDF.

@betatim
Copy link
Member

betatim commented Dec 13, 2017

Is it feasible for us to disable menu items? Then at least we prevent people from getting a 500 and maybe they come looking for docs as to why the "save as PDF" menu item is missing?

👍 on not making the image bigger if we are correct with our assumption that not very many people try to "export as ..."

@choldgraf
Copy link
Member

Could we update this issue with our actionable next-step on this one? To me it seems like:

  1. Adding an example for how to get nbconvert working in binder-examples
  2. Document this behavior in the docs either way
  3. Look into disabling this button per @betatim 's suggestion (this one feels more long-term)

@willingc
Copy link
Contributor Author

willingc commented Dec 1, 2018

Bump. This has come up again at a GW workshop.

@betatim
Copy link
Member

betatim commented Dec 2, 2018

I think the next steps are:

  • report this as a bug in Jupyter notebook and work towards the UI only showing the "convert to X" menu items when it can actually do that task
  • contribute to the notebook so that "download as notebook" does not use nbconvert/create a tab with a "500 internal server error"
  • create a binder-example that shows which dependencies need to be installed for nbconvert to successfully use pandoc which uses Latex to convert a notebook to PDF

I don't think we should add a full Latex distribution to our default image because it increases the size too much.

@minrk
Copy link
Member

minrk commented Dec 4, 2018

report this as a bug in Jupyter notebook

This would be an nbconvert issue, I think, since that's where it decides what outputs are available or not. However, removing a menu item might be more confusing than the current informative error message. I suspect will instead get users saying "Why did the download as PDF button disappear?" with no info for the user, rather than a specific error message telling them exactly what's missing, which is what they get right now.

contribute to the notebook so that "download as notebook"

I don't think "download as notebook" uses nbconvert.

create a binder-example that shows which dependencies need to be installed for nbconvert to successfully use pandoc which uses Latex to convert a notebook to PDF

👍 . Any conda-installed notebook should have pandoc as a dependency (which means all images now), which is used for all formats other than html. It is only PDF at this point that requires the extra layer of latex that might not be present.

I don't think we should add a full Latex distribution to our default image

👍

@betatim
Copy link
Member

betatim commented Dec 4, 2018

contribute to the notebook so that "download as notebook"

I don't think "download as notebook" uses nbconvert.

Then I don't understand why I get a new tab with an error message as well as the notebook when I click download as notebook :-/

Otherwise 👍 to your comments.

@minrk
Copy link
Member

minrk commented Dec 4, 2018

Then I don't understand why I get a new tab with an error message as well as the notebook when I click download as notebook :-/

Neither do I, but that's definitely a bug somewhere :). When do you get this error and what is the error that you see?

@choldgraf
Copy link
Member

I agree that the end result of this should be "if a user attempts and fails at 'download as pdf', we catch it and give them a link to instructions for how to enable this"

@minrk
Copy link
Member

minrk commented Dec 12, 2018

@choldgraf right now, the behavior is an error message with a URL pointing to instructions for installing tex from the nbconvert docs. Is that requirement not satisfied, then?

@betatim
Copy link
Member

betatim commented Dec 22, 2018

Launching https://mybinder.org/v2/gh/binder-examples/requirements/master, opening the index.ipynb, File -> Download as -> ipynb I can't reproduce the error message anymore. I now get a download dialogue and two new empty tabs being opened :-/

screen shot 2018-12-22 at 10 07 31

This is with Firefox 65.0b4.

@choldgraf
Copy link
Member

I think that repo might not be the greatest to test this out with since it had been last-built in july. I just pushed a tiny commit to re-trigger a build, and I now get @minrk 's error! @willingc @betatim is this now your experience on that repo?

@betatim
Copy link
Member

betatim commented Dec 24, 2018

When i click "download as notebook" I still get the behaviour I described in https://github.com/jupyterhub/binderhub/issues/341#issuecomment-449556807. This is a different problem from what happens if you click "download as PDF".

@choldgraf
Copy link
Member

choldgraf commented Feb 11, 2019

A quick question: @yuvipanda mentioned that pandoc adds like 1GB to the base image...I'm wondering where that's coming from. I was looking into the Pandoc bindaries, and they're somewhere around like 10-50mb, nothing close to the 1GB. Does it have extra dependencies somewhere?

Specifically, I wonder if pandoc downloads a distribution of Latex (which would certainly add some cruft to the base image). If that's the thing that's causing the big images, what if we tried using weasyprint instead of latex for the PDF creation? https://pandoc.org/MANUAL.html#creating-a-pdf

Maybe this would require a change in nbconvert, but it might be a bit simpler now that pandoc supports .ipynb formats

@willingc
Copy link
Contributor Author

I'm not sure which image you are using now.
Perhaps this Dockerfile would be smaller: https://github.com/pandoc/dockerfiles

Reference to recent issue activity on pandoc repo

@manics
Copy link
Member

manics commented Sep 20, 2021

Is there something we still need to do here?

@choldgraf
Copy link
Member

I think it depends on whether we think that PDF export via Latex / Pandoc should be in the default environment of Binder. Trying to export a notebook as PDF from a Binder session just now led to this error:

image

But it also seems reasonable to tell users that if they want people to export via latex, they need to explicitly install in the environment. I think the problem here is that the "export as PDF" option is available in default Binder, even though it doesn't work

@manics
Copy link
Member

manics commented Sep 21, 2021

I'll move this to repo2docker

@manics manics transferred this issue from jupyterhub/binderhub Sep 21, 2021
@consideRatio consideRatio changed the title 500 error that pandoc is not installed when downloading a PDF Should repo2docker builds come with dependencies to export to PDF? Oct 13, 2022
@consideRatio
Copy link
Member

JupyterLab doesn't provide an export to PDF menu item, but the classical notebook interface does and still errors like described above.

My take is that we shouldn't add support for this functionality by default.

I suggest that if we don't propose an action point in a month or two, we can close this issue on next issue triage round.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants