New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JPG compression and lightweight html nbconvert template for blogs #4448
Conversation
Hi, Thanks ! I don't think the choice for jpeg should be done in the kernel before nbconvert processing. I think It would be much better to have a preprocessor in nbconvert that png->jpeg. I'll re-check, but the highlighting in the browser is something that make sens and that I was planning to do on nbviewer too to reduce page size. But I'd like to avoid to hardcode Will do more comment later. |
Hello, Indeed, I just stumbled upon stubs (in the nbconvert's code) that convert pictures as a preprocessor. This is definitely the way to go to convert to a blog format (or anything else). However, the InlineBackend.figure_format option also allows for JPG compression during normal notebook (not nbconvert-targeted) use. Anyway, if arbitrary mime-types are planned then jpeg is included, so I'm fine with that ^^. Concerning hard-coded css classes, I wasn't aware so many languages were supported. It certainly makes sense not to hardcode the language. I will check to see if this can be handled more gracefully. |
The two previous commits enable in-browser syntax highlighting both in notebook and in nbconvert in markdown and input cells.
The current situation where i use both prettify and highlight.js is because my primary choice was prettify and suspected the highlighting in the notebook was being done by pygments on the server side. I then found out there was already highlight.js in the code base. I'm undecided regarding what's the best choice. |
You have restored old behavior for highlighting with no language specified, and we found this to be problematic, hence the current behavior. The auto language detection should be removed. |
I was suspecting that. The code was screaming something like: "i've been disabled for some reason". This morning I found both the commit that disabled it and a justification (there is no other way to disable highlighting locally, or in other words don't highlight by default). Will revert. |
I reverted notebook highlighting to its old behaviour: for code blocks inside markdown cells, if no language is given we don't hightlight. The nbconvert "lightweight_blog" template now follows the same rule. The only difference is the highlighter being used. The notebook uses highlight.js while the template uses prettify. I think I prefer prettify for one main reason: line numbers can be enabled. On a side note, I think I've been misusing git fetch/merge upstream to keep up-to-date with upstream. Maybe rebase would have been better. If this causes headaches, I'll just create a good old patch. But I really need to understand this git thingy more deeply! |
Yep! this need a rebase ;-) |
…Saves some more bytes in the transfer.
…root element should be a div not body
…cludes some comments to guide the user in publishing the blog post
…ht if no language is specified.
do not highlight code in markdown that doesn't have a language specified. There is a bug in pandoc < 1.12.1 where the --no-highlight option is not honored in some situations. The IPython.nbconvert.utils.pandoc module prints a warning if minimal version is not satisfied..
Hello, After many headaches I finally think I have rebased the branch on top of ipython/master. Tell me if there's anything. Daniel |
Sorry about not having communicated this more clearly before, but in order to speed up the distribution of nbconvert templates and make it simpler to share such contributions, we encourage sharing those links here. could you please put a link to your (I've now added some documenation about this in #4650) |
To be a little more specific about what ivanov said: We discussed this PR yesterday on google hangout, so you can get exactly what was said it is available on youtube. To recap, we believe that this PR mix many things. and that they should probably be separated.
The templates itself will not be accepted, as we try to keep bare minimal template into IPython itself. We agreed that highlighting should be fixed; we don't know how yet. I have no strong feeling about pandoc version
why not keep I would like also to apologize for responding late, and usually we respond to PR paster. Thanks. |
Hello guys, Thanks for the feedback. I'm all for splitting this into chunks and putting the template somewhere else. Carreau : Do you mean that the name What I'll do, if you agree, is redo this work cleanly :) It was my first github collaboration and I think my workflow was not good (should have worked in branches of course!). So, this PR can be closed and I'll submit new ones when specific points are (from your bullet list) are ready. |
No the name was right, and in any case, filter get named by the dictionary that map them. import stuff as thing
def myfun(arg):
return things(arg)
# I could also have written
myfun = thing
#finally
filter_dict['i_m_a_filter'] = myfun It is less verbose, and more readable to do import stuff as i_am_a_filter
filter_dict['i_m_a_filter'] = i_am_a_filter In you case the following was enough: try:
from html import escape as escape_for_html
except:
from cgi import escape as escape_for_html |
Also,
Well for a first contribution it was quite good, feel free to close submit new PR as you like. |
New PRs are coming so I'm closing this one! |
These commits make publishing on blog sites (eg: blogger) easier.
Rationale
Using "ipython nbconvert" produces self contained html (with data URIs for images) which is very handy for blog posts : one just copies the body a pastes it in the new blog post. However, hosts limit the size per post so, for this purpose, a more compact html representations of the notebook is desired.
Strategy
Two places where bytes can be easily saved have currently been identified:
The solution works in the scenario where the user starts ipython notebook, creates and saves notebooks, then uses nbconvert to convert the notebook to html. Only the body is useful.
JPG compression for images
By starting the notebook with "--InlineBackend.figure_format=jpg" the figures are transported in the JPG format. The user also has access to "--InlineBackend.quality=XX" to set the level of compression. Of course, JPG is lossy but many figures can do with some compression. This is only enabled if PIL is installed.
This can also be useful in the general use case, to save some bandwidth.
In browser code highlighting
Syntax highlighting in html introduces many tags which eat many precious bytes. The strategy retained here is to convert code blocks to "html escaped ascii" code blocks (raw code) and assign them css classes that enable google's prettyfier to run on the code, in the browser. This is done by using a new template for nbconvert: "lightweight_blog"
Doubts
The user must still edit his blog's template settings to add the required JS and CSS to make the blog post look as much as possible like the notebook. The generated html includes instructions on what should go where but those bits shouldn't be included in the blog post to save bytes. As they are common to many blog posts, it is better to put those resources into the template than into each post. Are there ways to remove the need for user intervention? This is unlikely with the constraints given. Are there better ways to tell the user to edit his blog's template (an nbconvert post-processor that prints an info message)?