JPG compression and lightweight html nbconvert template for blogs #4448

dbarbeau · 2013-10-28T13:07:53Z

These commits make publishing on blog sites (eg: blogger) easier.

Rationale

Using "ipython nbconvert" produces self contained html (with data URIs for images) which is very handy for blog posts : one just copies the body a pastes it in the new blog post. However, hosts limit the size per post so, for this purpose, a more compact html representations of the notebook is desired.

Strategy

Two places where bytes can be easily saved have currently been identified:

Data URIs for matplotlib figures are currently PNG by default, or SVG optionnaly. Both can get bigger than data-uri encoded JPG quickly.
Code cells are pre-highlighted, which introduces many tags for even small code blocks.

The solution works in the scenario where the user starts ipython notebook, creates and saves notebooks, then uses nbconvert to convert the notebook to html. Only the body is useful.

JPG compression for images

By starting the notebook with "--InlineBackend.figure_format=jpg" the figures are transported in the JPG format. The user also has access to "--InlineBackend.quality=XX" to set the level of compression. Of course, JPG is lossy but many figures can do with some compression. This is only enabled if PIL is installed.

This can also be useful in the general use case, to save some bandwidth.

In browser code highlighting

Syntax highlighting in html introduces many tags which eat many precious bytes. The strategy retained here is to convert code blocks to "html escaped ascii" code blocks (raw code) and assign them css classes that enable google's prettyfier to run on the code, in the browser. This is done by using a new template for nbconvert: "lightweight_blog"

Doubts

The user must still edit his blog's template settings to add the required JS and CSS to make the blog post look as much as possible like the notebook. The generated html includes instructions on what should go where but those bits shouldn't be included in the blog post to save bytes. As they are common to many blog posts, it is better to put those resources into the template than into each post. Are there ways to remove the need for user intervention? This is unlikely with the constraints given. Are there better ways to tell the user to edit his blog's template (an nbconvert post-processor that prints an info message)?

Carreau · 2013-10-28T13:18:52Z

Hi,

Thanks !

I don't think the choice for jpeg should be done in the kernel before nbconvert processing.
In any case this is a mechanism we will update soon, so the part of the patch that allow inline-backend=jpeg will probably be refused (or at least it should be discussed in an orthogonal PR). We will move toward arbitrary mime type in new notebook format.

I think It would be much better to have a preprocessor in nbconvert that png->jpeg.

I'll re-check, but the highlighting in the browser is something that make sens and that I was planning to do on nbviewer too to reduce page size. But I'd like to avoid to hardcode lang-python as ipynb can also have ruby/julia/haskell...

Will do more comment later.

dbarbeau · 2013-10-28T14:47:08Z

Hello,

Indeed, I just stumbled upon stubs (in the nbconvert's code) that convert pictures as a preprocessor. This is definitely the way to go to convert to a blog format (or anything else).

However, the InlineBackend.figure_format option also allows for JPG compression during normal notebook (not nbconvert-targeted) use. Anyway, if arbitrary mime-types are planned then jpeg is included, so I'm fine with that ^^.

Concerning hard-coded css classes, I wasn't aware so many languages were supported. It certainly makes sense not to hardcode the language. I will check to see if this can be handled more gracefully.

dbarbeau · 2013-10-28T21:38:52Z

The two previous commits enable in-browser syntax highlighting both in notebook and in nbconvert in markdown and input cells.

In notebooks it will highlight markdown code blocks with highlight.js' language autodetection. If fenced blocks are used it might be able to use the specified language (untested). It doesn't touch to input cells (handled by codemirror).
In nbconvert's output it will highlight markdown code blocks with google_code_prettify language autodetection. It will highlight input cells using the cell's language attribute.

The current situation where i use both prettify and highlight.js is because my primary choice was prettify and suspected the highlighting in the notebook was being done by pygments on the server side. I then found out there was already highlight.js in the code base. I'm undecided regarding what's the best choice.

minrk · 2013-10-28T21:41:10Z

You have restored old behavior for highlighting with no language specified, and we found this to be problematic, hence the current behavior. The auto language detection should be removed.

dbarbeau · 2013-10-29T10:56:01Z

I was suspecting that. The code was screaming something like: "i've been disabled for some reason". This morning I found both the commit that disabled it and a justification (there is no other way to disable highlighting locally, or in other words don't highlight by default). Will revert.

dbarbeau · 2013-10-30T20:00:58Z

I reverted notebook highlighting to its old behaviour: for code blocks inside markdown cells, if no language is given we don't hightlight.

The nbconvert "lightweight_blog" template now follows the same rule. The only difference is the highlighter being used. The notebook uses highlight.js while the template uses prettify. I think I prefer prettify for one main reason: line numbers can be enabled.

On a side note, I think I've been misusing git fetch/merge upstream to keep up-to-date with upstream. Maybe rebase would have been better. If this causes headaches, I'll just create a good old patch. But I really need to understand this git thingy more deeply!

damianavila · 2013-10-31T18:55:51Z

Yep! this need a rebase ;-)

…Saves some more bytes in the transfer.

…root element should be a div not body

…cludes some comments to guide the user in publishing the blog post

========= - Enable syntax highlighting inside markdown code. Uses google code prettify because it doesn't need <code> rags inside <pre> tags. Hum... is this a good reason? - Fix Javascript tags and code.

======== - Enable in-browser syntax highlighting inside markdown. It solely relies on highlight.js' language autodetection. A better way would be to use fenced code blocks which can include info about the language.

…ht if no language is specified.

do not highlight code in markdown that doesn't have a language specified. There is a bug in pandoc < 1.12.1 where the --no-highlight option is not honored in some situations. The IPython.nbconvert.utils.pandoc module prints a warning if minimal version is not satisfied..

…able.

dbarbeau · 2013-11-02T10:43:16Z

Hello,

After many headaches I finally think I have rebased the branch on top of ipython/master. Tell me if there's anything.
I reordered commits so that the JPG figure_format patch comes last, in one single commit and can (hopefuly) be easily skipped.

Daniel

ivanov · 2013-12-05T19:35:12Z

Sorry about not having communicated this more clearly before, but in order to speed up the distribution of nbconvert templates and make it simpler to share such contributions, we encourage sharing those links here.

could you please put a link to your IPython/nbconvert/templates/lightweight_blog.tpl file on the wiki.

(I've now added some documenation about this in #4650)

Carreau · 2013-12-06T15:55:27Z

To be a little more specific about what ivanov said:

We discussed this PR yesterday on google hangout, so you can get exactly what was said it is available on youtube. To recap, we believe that this PR mix many things. and that they should probably be separated.

Jpeg Formater for inline figure
lightweight blog templates
minimal pandoc version check
some fix for highlight
filters for nbconvert.

The templates itself will not be accepted, as we try to keep bare minimal template into IPython itself.
If you cannot do something with --template flag or config, then there is probably something we need to fix.

We agreed that highlighting should be fixed; we don't know how yet.

I have no strong feeling about pandoc version

def escape_for_html(text):
 +    return html_escape(text)

why not keep html_escape ?

I would like also to apologize for responding late, and usually we respond to PR paster.
I hope that splitting this into smaller chunk will help to review each of them more quickly.

Thanks.

dbarbeau · 2013-12-06T16:39:16Z

Hello guys,

Thanks for the feedback. I'm all for splitting this into chunks and putting the template somewhere else.

Carreau : Do you mean that the name html_escape was better? I think you're right, I have no clue to why I suddenly changed it! If you mean there's already an html_escape equivalent function somewhere, I probably missed it OR it had limitations that since then I forgot about (and that I should have documented).

What I'll do, if you agree, is redo this work cleanly :) It was my first github collaboration and I think my workflow was not good (should have worked in branches of course!). So, this PR can be closed and I'll submit new ones when specific points are (from your bullet list) are ready.

Carreau · 2013-12-06T18:43:25Z

Do you mean that the name html_escape was better

No the name was right, and in any case, filter get named by the dictionary that map them.
So in the end you have multiple layer of indirection:

import stuff as thing

def myfun(arg):
    return things(arg)

# I could also have written 
myfun = thing

#finally 
filter_dict['i_m_a_filter'] = myfun

It is less verbose, and more readable to do

import stuff as i_am_a_filter
filter_dict['i_m_a_filter'] = i_am_a_filter

In you case the following was enough:

try:
    from html import escape as escape_for_html
except:
    from cgi import escape as escape_for_html

Carreau · 2013-12-06T18:44:47Z

Also,

It was my first github collaboration and I think my workflow was not good (should have worked in branches of course!). So, this PR can be closed and I'll submit new ones when specific points are (from your bullet list) are ready.

Well for a first contribution it was quite good, feel free to close submit new PR as you like.
And thanks for contributing !

dbarbeau · 2013-12-12T13:10:17Z

New PRs are coming so I'm closing this one!

dbarbeau added 11 commits November 2, 2013 10:35

Add a template that uses google prettyprinter for code highlighting. …

52dba9b

…Saves some more bytes in the transfer.

google_pretty_print template should use html_basic as base. Also the …

cca5240

…root element should be a div not body

rename google_prettyprint_code to lightweight

5c338b4

lightweight.tpl is better named lightweight_blog.tpl. The template in…

48afb68

…cludes some comments to guide the user in publishing the blog post

NbConvert

fd7cb4b

========= - Enable syntax highlighting inside markdown code. Uses google code prettify because it doesn't need <code> rags inside <pre> tags. Hum... is this a good reason? - Fix Javascript tags and code.

Notebook

af162d8

======== - Enable in-browser syntax highlighting inside markdown. It solely relies on highlight.js' language autodetection. A better way would be to use fenced code blocks which can include info about the language.

Revert global markdown hljs.highlightAuto(). Default is don't highlig…

ecf1390

…ht if no language is specified.

fixing Python3 compatibility issue (thanks to that Travis dude)

f31fbba

sync with upstream

532c7e0

Add JPEG as an image format for inline backend if PIL/pillow is avail…

2cf59d7

…able.

dbarbeau closed this Dec 12, 2013

This was referenced Dec 12, 2013

JPG compression for inline pylab #4679

Merged

Minimal pandoc version warning #4680

Merged

Javascript hightlighting for NBConvert html output #4682

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JPG compression and lightweight html nbconvert template for blogs #4448

JPG compression and lightweight html nbconvert template for blogs #4448

dbarbeau commented Oct 28, 2013

Carreau commented Oct 28, 2013

dbarbeau commented Oct 28, 2013

dbarbeau commented Oct 28, 2013

minrk commented Oct 28, 2013

dbarbeau commented Oct 29, 2013

dbarbeau commented Oct 30, 2013

damianavila commented Oct 31, 2013

dbarbeau commented Nov 2, 2013

ivanov commented Dec 5, 2013

Carreau commented Dec 6, 2013

dbarbeau commented Dec 6, 2013

Carreau commented Dec 6, 2013

Carreau commented Dec 6, 2013

dbarbeau commented Dec 12, 2013

JPG compression and lightweight html nbconvert template for blogs #4448

JPG compression and lightweight html nbconvert template for blogs #4448

Conversation

dbarbeau commented Oct 28, 2013

Rationale

Strategy

JPG compression for images

In browser code highlighting

Doubts

Carreau commented Oct 28, 2013

dbarbeau commented Oct 28, 2013

dbarbeau commented Oct 28, 2013

minrk commented Oct 28, 2013

dbarbeau commented Oct 29, 2013

dbarbeau commented Oct 30, 2013

damianavila commented Oct 31, 2013

dbarbeau commented Nov 2, 2013

ivanov commented Dec 5, 2013

Carreau commented Dec 6, 2013

dbarbeau commented Dec 6, 2013

Carreau commented Dec 6, 2013

Carreau commented Dec 6, 2013

dbarbeau commented Dec 12, 2013