Javascript hightlighting for NBConvert html output #4682

Merged
merged 1 commit into from Apr 23, 2014

Conversation

Projects
None yet
8 participants
Contributor

dbarbeau commented Dec 12, 2013

Follow-up of #4448

Add a HTMLExporter.javascript_highlight CLI Bool to decide wether we use inbrowser highlighting or prehighlighted html.

Add a switch to nbconvert's markdown2html filter that will make pandoc output fenced code blocks as html

 blocks (no highlight) and rewrite the css code-classes so that google prettyprinter can pick them up and do the highlighting in the browser.

Add a html_escape filter. It is useful in templates that output raw code in

 tags. The code needs to be properly html-escaped because for example >= and <= will cause html parsing errors.

Update html_*.tpl templates accordingly to enable javascript highlighting.

The filters could not be part of another PR because they are necessary side-products of the feature being added.

Owner

Carreau commented Dec 15, 2013

So, I agree with no-highlight. Not sure we want to make prettify from google the default, as most of the time I suppose this will be used in blog post or alike and then including the library will not be made through nbconvert.

I would be in favor of having a global flag (potentially in Highlight2Html) that when set to false make it not hightlight (but wrap in pre-tag with language class like you did). I think it is already possible to ask TeX exporter not to highlight (@jdfreder ?) . It would be nice to be consistent in how to define that for both modes.

This would be usefull especially in nbviewer where I would like to use CodeMirror to highlight as then we are sure to be consistent with the notebook app.

@Carreau Carreau commented on an outdated diff Dec 15, 2013

IPython/nbconvert/templates/html_basic.tpl
@@ -55,13 +83,13 @@ In&nbsp;[{{ cell.prompt_number }}]:
{% block markdowncell scoped %}
<div class="text_cell_render border-box-sizing rendered_html">
-{{ cell.source | markdown2html | strip_files_prefix }}
+{{ cell.source | markdown2html(resources.javascript_highlight) | strip_files_prefix }}
@Carreau

Carreau Dec 15, 2013

Owner

I would pass resources only, so that if we need more from resources later, we can access it with changing API/templates.

@Carreau Carreau commented on an outdated diff Dec 15, 2013

IPython/nbconvert/filters/markdown.py
@@ -50,9 +51,14 @@ def markdown2latex(source):
return pandoc(source, 'markdown', 'latex')
-def markdown2html(source):
+pandoc_code_markup_re = re.compile(r'<pre class="(\w*)"><code>', re.S)
+def markdown2html(source, browser_highlight=False):
"""Convert a markdown string to HTML via pandoc"""
@Carreau

Carreau Dec 15, 2013

Owner

See my other comment,browser_highlight-> smth like resource.

@Carreau Carreau commented on an outdated diff Dec 15, 2013

IPython/nbconvert/filters/strings.py
@@ -23,6 +23,10 @@
from urllib.parse import quote # Py 3
except ImportError:
from urllib2 import quote # Py 2
+try:
+ from html import escape as html_escape
+except:
@Carreau

Carreau Dec 15, 2013

Owner

Try to avoid bare except: and if it is for python 2/3 compatibility, either a comment to say which is for what (then when we drop Python2 we can search for those.) or use if pycompat.PY3.

Contributor

dbarbeau commented Dec 15, 2013

Thanks for the comments, I'm working on an update and come back as soon as I get something showable.

Contributor

damianavila commented Dec 16, 2013

Not sure we want to make prettify from google the default, as most of the time I suppose this will be used in blog post or alike and then including the library will not be made through nbconvert.

I agree here with @Carreau ... I would not like to have a default with google prettify...

@damianavila damianavila and 1 other commented on an outdated diff Dec 16, 2013

IPython/nbconvert/templates/html_basic.tpl
@@ -1,5 +1,27 @@
{%- extends 'display_priority.tpl' -%}
+{% if resources.javascript_highlight %}
@damianavila

damianavila Dec 16, 2013

Contributor

I don't think this belongs to html_basic... block header belongs to html_full template.

@dbarbeau

dbarbeau Dec 17, 2013

Contributor

I'm looking into reverting html_basic to its previous form and still have the 'js-highlight' switch.

@damianavila damianavila commented on an outdated diff Dec 16, 2013

IPython/nbconvert/templates/html_basic.tpl
@@ -1,5 +1,27 @@
{%- extends 'display_priority.tpl' -%}
+{% if resources.javascript_highlight %}
+ {% block header %}
+ <script type="application/javascript" src="https://google-code-prettify.googlecode.com/svn/loader/run_prettify.js"></script>
+ <script type="application/javascript">
+ window.highlightAll = function() {
+ if ( typeof(PR) != "undefined" ) {
+ PR.prettyPrint()
+ }
+ </script>
+ {% endblock header %}
+{% endif %}
+
+{% block body %}
@damianavila

damianavila Dec 16, 2013

Contributor

Same here for block body... I think this could be relocated to html_full template.

Contributor

dbarbeau commented Dec 17, 2013

I would be in favor of having a global flag (potentially in Highlight2Html) that when set to false make it not hightlight (but wrap in pre-tag with language class like you did). I think it is already possible to ask TeX exporter not to highlight (@jdfreder ?) . It would be nice to be consistent in how to define that for both modes.

The issue is that language classes differ from one library to another. I was playing with the idea of a Hightlight2Html.css_class_template parameter.

Example

nbconvert --Highlight2Html.css_class_template="prettyprint lang-{lang}"

where {lang} is replaced by the language for the block. It would result in (for python code) something like: <pre class="prettyprint lang-python">...</pre>

This way the user can "easily" customize the classes. And if this flag is empty, then use pygments ?

Contributor

dbarbeau commented Dec 17, 2013

By the way, shouldn't the --no-highlight option be in Highlight2Html ?

Owner

Carreau commented Dec 17, 2013

--something and --no-somthing are usually shortcut to longer config options that are handled at application level, not in each class.

One could make an abstract/empty class from which both hightlight inherit that get the common highlight_enabled option.

Contributor

dbarbeau commented Dec 17, 2013

Ugh sorry, I got confused! There is no --no-highlight option yet. In this PR there is --HTMLExporter.javascript_highlight, which restrains to HTML.

So I'll rephrase to see if I've got things straight this time : your proposal is to remove this flag and replace it with a --Highlight2(Html|Latex).highlight_enabled that would be inherited from a HighlightBase class. If true use pygment, else use <pre class="..."> in HTML and ... what in TeX? (I don't know enough TeX to have a clue here). I'm 90% OK with this (the remaining is the TeX bit). Is this what you meant?

Then I could add --Highlight2*Html*.css_class_template (Unicode, default="{lang}") flag to further customize what Highlight2*Html* puts in the pre class="..." as exposed above.

Now, I realize I haven't identified what you were referring to in your first comment when you mentioned --no-highlight...

Contributor

dbarbeau commented Dec 18, 2013

Well now to get the behaviour implemented in the first commit of this PR it is a bit more complicated but at least it is now flexible and not tied to google prettify.

  • To skip pygments on code cells and export
     code:
    
    • ipython nbconvert ... --Highlight2Html.highlight_enabled=False
  • To skip pygments on code cells and customize the css class:
    • ipython nbconvert ... --Highlight2Html.highlight_enabled=False --Highlight2Html.css_class_template="prettyprint lang_{lang}"
  • To skip pygments on code cells and customize the css class and to disable pandoc highlighting on markdown cells:
    • ipython nbconvert ... --Highlight2Html.highlight_enabled=False --Highlight2Html.css_class_template="prettyprint lang_{lang}" --HTMLExporter.preprocess_highlight=False
  • To skip pygments on code cells and customize the css class and to disable pandoc highlighting on markdown cells:
    • ipython nbconvert ... --Highlight2Html.highlight_enabled=False --Highlight2Html.css_class_template="prettyprint lang_{lang}" --HTMLExporter.preprocess_highlight=False --HTMLExporter.css_class_template="prettyprint lang_{lang}"

The command line gets kind of scary but it should be possible to implement a shortcut like --no-preprocessed-highlighting.

@jdfreder jdfreder and 2 others commented on an outdated diff Jan 13, 2014

IPython/nbconvert/exporters/html.py
@@ -54,3 +66,9 @@ def default_config(self):
})
c.merge(super(HTMLExporter,self).default_config)
return c
+
+ def from_notebook_node(self, nb, resources=None, **kw):
+ resources = resources or {}
+ resources["preprocess_highlight"] = self.preprocess_highlight
+ resources["css_class_template"] = self.css_class_template
@jdfreder

jdfreder Jan 13, 2014

Owner

Why do you have traits defined in html.py that just re-direct to the same traits in highlight.py?

@ellisonbg

ellisonbg Jan 13, 2014

Owner

Yes, there should only be one trait that controls this behavior.

@ellisonbg

ellisonbg Jan 14, 2014

Owner

I should clarify. There are two things that need to be configured: 1) should the highlight be done at nbconvert time or later and 2) the css class template. Both of these traitlets should be defined and configured in one place. Currently they are in both exporters/html.py and in filters/highlight.py. This should be fixed.

@dbarbeau

dbarbeau Jan 14, 2014

Contributor

Hello!

Sorry for the delay, I took the time to remember why I did it that way. I didn't want to alter the templates too much and the highlighting is handled by two code paths depending if the code is from code cells or markdown fenced code blocks. That's why I ended up putting similar traits in two places. The ultimate idea was to place two more traits (again!) to rule them all, at the application level.

To de-duplicate these traits, I would need to pass resources from the template on to the highlight2html filter or to alter the templates. I don't mind, its easy to do! I just don't want to mess things up (thus adding the traits in Highlight2Html circumvented my worries).

I would go with propagating resources to the concerned filters to preserve the templates - even though instead of asking the highlight function not to highlight, the template should simply not call the highlight function. I just get that itchy feeling that the resources object could actually be needed by potentially many filters and, you know, the urge to unify filter calls :)

For the moment: resources as a kw argument for just the filters who need them?

Owner

jdfreder commented Jan 14, 2014

This needs a rebase now.

Contributor

dbarbeau commented Jan 15, 2014

Hum... well, I think I rebased it. Having to use git push --force to publish changes after a rebase feels a bit criminal. Is this ok?

Anyway, I removed the extra traits as discussed above.

Contributor

damianavila commented Jan 15, 2014

Hum... well, I think I rebased it. Having to use git push --force to publish changes after a rebase feels a bit criminal. Is this ok?

It is one of the few cases where it is ok to do it without guilt... 😉
Here you have a good explanation: http://stackoverflow.com/questions/8939977/git-push-rejected-after-feature-branch-rebase

@takluyver takluyver and 1 other commented on an outdated diff Jan 24, 2014

IPython/nbconvert/exporters/html.py
+ highlight_css_class_template = Unicode(default_value="{lang}", config=True,
@takluyver

takluyver Jan 24, 2014

Owner

I feel like having this configurable and with substitution might be unnecessary complexity at this stage. Can we just say that the class is the language name for now, and add config options later on if they become necessary?

@dbarbeau

dbarbeau Jan 24, 2014

Contributor

It's not super sexy but it is quite helpful. If class==language, it means the user must do an extra text manipulation on the output of nbconvert to let javascript highlighters pick up the code to highlight. It's a leap into the future since it's already necessary for one guy around :)

BTW guys, thanks for the time you spend reviewing the PRs. It's not an easy task and it's done well so thanks!

@takluyver

takluyver Jan 24, 2014

Owner

OK, if it's already useful, I think that's OK.

Thanks for submitting PRs ;-)

@takluyver takluyver commented on an outdated diff Jan 24, 2014

IPython/nbconvert/exporters/html.py
output_mimetype = 'text/html'
+
+
+
+ def _raw_mimetype_default(self):
+ return 'text/html'
@takluyver

takluyver Jan 24, 2014

Owner

We no longer have a raw_mimetype traitlet. raw_mimetypes (plural) gets its default value from output_mimetype, so this should be unnecesary.

@takluyver takluyver and 1 other commented on an outdated diff Jan 24, 2014

IPython/nbconvert/filters/markdown.py
"""Convert a markdown string to HTML via pandoc"""
- return pandoc(source, 'markdown', 'html', extra_args=['--mathjax'])
+
+ if not resources or resources.get("preprocess_highlight"):
+ return pandoc(source, 'markdown', 'html', extra_args=['--mathjax'])
+ else:
+ pandoc_code_markup_re = re.compile(r'<pre class="(\w*)"><code>', re.S)
+ # convert the css template into a regexp
+ css_tpl = resources.get("highlight_css_class_template")
+ css_tpl = css_tpl.replace("{lang}", r"\1")
+ # build the subsitution and go!
+ substitute = "<pre class='%s'><code>"%css_tpl
+ # careful! only works with pandoc >= 1.12.1
@takluyver

takluyver Jan 24, 2014

Owner

Is this a reasonable version to depend on? Do we check pandoc version numbers anywhere? If we rely on a certain version, we should be able to produce a clear error message if the user has an older version.

@dbarbeau

dbarbeau Jan 25, 2014

Contributor

Pandoc version checking is another PR #4680.

@takluyver takluyver and 1 other commented on an outdated diff Jan 24, 2014

IPython/nbconvert/filters/markdown.py
"""Convert a markdown string to HTML via pandoc"""
- return pandoc(source, 'markdown', 'html', extra_args=['--mathjax'])
+
+ if not resources or resources.get("preprocess_highlight"):
+ return pandoc(source, 'markdown', 'html', extra_args=['--mathjax'])
+ else:
+ pandoc_code_markup_re = re.compile(r'<pre class="(\w*)"><code>', re.S)
@takluyver

takluyver Jan 24, 2014

Owner

This looks rather brittle - if pandoc even adds a newline between these two tags, the regex won't match.

@dbarbeau

dbarbeau Jan 25, 2014

Contributor

Doesn't re.S take care of the newline ? Will check though.

@dbarbeau

dbarbeau Jan 25, 2014

Contributor

But it is indeed not very robust as any minimal change (extra space or what) will cause bad behaviors. Will test more robust regexps.

@takluyver takluyver and 1 other commented on an outdated diff Jan 24, 2014

IPython/nbconvert/filters/markdown.py
"""Convert a markdown string to HTML via pandoc"""
- return pandoc(source, 'markdown', 'html', extra_args=['--mathjax'])
+
+ if not resources or resources.get("preprocess_highlight"):
@takluyver

takluyver Jan 24, 2014

Owner

resources.get("preprocess_highlight", True) to get the correct default if it's missing from the resources dict for some reason?

@dbarbeau

dbarbeau Jan 25, 2014

Contributor

Ah right! weird, what did I have in mind?

Owner

takluyver commented Jan 24, 2014

Oh, and a heads up that Min's PR #4655 will probably interfere with this.

Contributor

dbarbeau commented Jan 25, 2014

Exact. Thanks for the heads up. I'll rebase on top of it, it shouldn't be a technical problem since the pandoc trac is preserved. It might however need to be more specific with the CLI options, like adding "only for pandoc" or something like that to the help strings.

jdfreder was assigned Jan 26, 2014

@jdfreder jdfreder commented on an outdated diff Jan 28, 2014

IPython/nbconvert/utils/html.py
@@ -0,0 +1,18 @@
+# coding: utf-8
+""" Utility functions used in conjunction with HTML.
+"""
+#-----------------------------------------------------------------------------
+# Copyright (c) 2013, the IPython Development Team.
@jdfreder

jdfreder Jan 28, 2014

Owner

2014 now :)

@jdfreder jdfreder commented on an outdated diff Jan 28, 2014

IPython/nbconvert/filters/highlight.py
@@ -20,7 +20,8 @@
# Our own imports
from IPython.nbconvert.utils.base import NbConvertBase
-
+from IPython.utils.traitlets import Bool, Unicode
@jdfreder

jdfreder Jan 28, 2014

Owner

Unused now

@jdfreder jdfreder commented on an outdated diff Jan 28, 2014

IPython/nbconvert/exporters/html.py
import os
+from IPython.utils.traitlets import Unicode, List, Bool
@jdfreder

jdfreder Jan 28, 2014

Owner

List is unused

Owner

jdfreder commented Jan 28, 2014

@dbarbeau A couple of small comments inline. Also, a rebase is needed.

@damianavila damianavila commented on an outdated diff Jan 28, 2014

IPython/nbconvert/utils/base.py
@@ -22,7 +22,7 @@
class NbConvertBase(LoggingConfigurable):
"""Global configurable class for shared config
- Usefull for display data priority that might be use by many trasnformers
+ Usefull for display data priority that might be use by many transformers
@damianavila

damianavila Jan 28, 2014

Contributor

Useful 😉

Contributor

dbarbeau commented Jan 28, 2014

Notes taken, will do tomorrow!

Contributor

dbarbeau commented Jan 29, 2014

Hum, the latest commitx opens a small can of worms by trying to make the marked way similar to the pandoc way... There is a pattern in these two tracks. I'm just not sure if it is in the scope of this branch. Anyway, there's still some work to do!

Owner

jdfreder commented Jan 29, 2014

I haven't checked out this branch yet, but it looks like you managed to get the marked stuff working with these changes?

Owner

minrk commented Jan 29, 2014

Given how very much simpler this seems to be with marked, what do you think about only supporting it if marked is the renderer? Pandoc is really a second-class citizen when it comes to markdown to html.

Contributor

dbarbeau commented Jan 29, 2014

@jdfreder : I've tested a little bit and it seems to be working. The nbconvert.utils.marked module is just copy/paste/find/replace of it's pandoc sibling. I'll factorize these two module's code into one base class.

@minrk I think that the basic mechanism of toggling preprocessed highlight on/off should be supported by both tracks (because the selection of the renderer is somewhat opaque to the user). It already is in this branch and the implementation should become more maintainable with some factorization. I also think (or see in my crystal ball) that the marked track will naturally supersede the pandoc track (mainly because it is consistent with how it is done in the live notebooks). So if future killer features are too much of a headache to port to the pandoc track, well... too bad?

Owner

minrk commented Jan 29, 2014

Ok, if it's already done, might as well leave it. In the future, I think that we should never add any HTML-specific features to anything we pass to pandoc (no matter how simple or complex), and say "pandoc is a stripped-down fallback, when the desired converter is unavailable".

Contributor

dbarbeau commented Jan 29, 2014

OK, seams reasonable!

@minrk minrk commented on an outdated diff Jan 29, 2014

IPython/nbconvert/utils/marked.py
+ (unless `clean_cache()` is called).
+
+ Exceptions
+ ----------
+ MarkedMissing will be raised if marked is unavailable.
+ """
+ global __version
+
+ if __version is None:
+ if not is_cmd_found("node"):
+ raise MarkedMissing()
+ __version = "0.0.0"
+ return __version
+
+
+def check_marked_version():
@minrk

minrk Jan 29, 2014

Owner

Please remove all of the marked version checking. We ship marked, so the version cannot vary. The only thing to check for is the presence of node.

@minrk

minrk Jan 29, 2014

Owner

In fact, what would probably be best is removing this file entirely. marked is not handled like pandoc, so it doesn't make sense to copy all of the extra version-check logic from utils.pandoc to somewhere it won't be used.

@minrk minrk and 1 other commented on an outdated diff Jan 29, 2014

IPython/nbconvert/filters/markdown.py
"""Convert a markdown string to HTML via marked"""
- command = ['node', marked]
- try:
- p = subprocess.Popen(command,
- stdin=subprocess.PIPE, stdout=subprocess.PIPE
- )
- except OSError as e:
- raise NodeJSMissing(
- "The command '%s' returned an error: %s.\n" % (" ".join(command), e) +
- "Please check that Node.js is installed."
- )
- out, _ = p.communicate(cast_bytes(source, encoding))
- out = TextIOWrapper(BytesIO(out), encoding, 'replace').read()
+ nohl = resources and not resources.get("preprocess_highlight", False)
+ if not nohl:
+ out = pandoc(source, 'markdown', 'html', extra_args=['--mathjax'])
@minrk

minrk Jan 29, 2014

Owner

It looks you have pandoc inside markdown2html_marked, which I presume is an error.

@dbarbeau

dbarbeau Jan 31, 2014

Contributor

definitely, sorry I should have be clearer that this is not ready for review anymore ^^ I need to review the quick hacks I did. Will tell you when it is ready.

@minrk minrk and 1 other commented on an outdated diff Jan 30, 2014

IPython/nbconvert/exporters/html.py
@@ -39,7 +41,18 @@ def _default_template_path_default(self):
def _template_file_default(self):
return 'full'
+
+ preprocess_highlight = Bool(True, config=True,
@minrk

minrk Jan 30, 2014

Owner

Can this just be 'highlight' or 'enable_highlight'? If it's true, highlighting is done, otherwise no highlighting takes place (in nbconvert, that is). 'preprocess' is extra confusing, because there is a preprocess step, and highlighting never happens there.

@dbarbeau

dbarbeau Jan 31, 2014

Contributor

Ok, I'm fine with that :)

Owner

minrk commented Jan 30, 2014

Adding a flag to our marked.js that just disables highlighting would actually remove the need to do any regular expression substitutions because it will wrap the area in:

<pre><code class="language-foo">content</code></pre>

or we could even allow specifying the highlight language prefix letting the class be set to "my-class language-foo" if people actually need that. Still, this is without any substitutions performed. I'm not sure we can do the same with pandoc, but that doesn't seem like a problem to me.

Contributor

dbarbeau commented Jan 31, 2014

Hi, I already added a flag for marked.js to export just the wrapped non-highlighted code.
I really think the substitution to be useful as all js highlighting engine don't take the same CSS class format, I'm afraid a simple prefix can't fit all uses, or at least the use I'm making of it. Substitution regexps just need to be tuned. Anyway, this isn't ready at all. Will come back to you with more ideas next week. Kinda busy lately.

Contributor

dbarbeau commented Feb 18, 2014

Hi guys and sorry for the long gap.

Anyway I took the time to think about this proposal. I'm thinking of rewriting it like this:

  • a flag to make markdown engines export wrapped code with just the language name in the html class attribute.
  • a post-processor that treats the class attributes and modifies them to match a substitution string given by the user.

What do you think of this approach?

Daniel

@jdfreder jdfreder modified the milestone: 3.0, 2.0 Feb 20, 2014

Contributor

dbarbeau commented Feb 24, 2014

Hi guys, I'm back on this (and trying to converge)

So, I reverted that hackish nbconvert.utils.marked module I had created just to have symmetric code between pandoc and marked. I can live with some asymmetry.

I moved all regexp substitutions out of filters and its now all done in one place : a post processor. I think it looks a lot cleaner.

Cheers!
Daniel

@jdfreder jdfreder and 1 other commented on an outdated diff Feb 24, 2014

IPython/nbconvert/postprocessors/custom_css.py
+# Imports
+#-----------------------------------------------------------------------------
+import io
+import re
+
+from IPython.utils.traitlets import Unicode, Bool
+
+from .base import PostProcessorBase
+
+rec = lambda x: re.compile(x, re.S)
+
+pandoc_code_markup_re = rec(ur'<pre\s*class="(\w*?)">\s*<code>')
+marked_code_markup_re = rec(ur'<pre>\s*<code\s*class="language-(\w*)">')
+
+
+class HTMLCustomLanguageClassPostProcessor(PostProcessorBase):
@jdfreder

jdfreder Feb 24, 2014

Owner

HTMLCustomLanguageClassPostProcessor longest class name 2014! Maybe HTMLHighlightPostProcessor?

@dbarbeau

dbarbeau Feb 24, 2014

Contributor

Ahahahaa :) Yes, quite long specially on the command line. Will shrink it.

Owner

jdfreder commented Feb 24, 2014

One comment inline, but other than that it's looking good to me.

Contributor

dbarbeau commented Feb 24, 2014

Ok, if you guys are satisfied by this design, I'll tackle the test cases.

Owner

jdfreder commented Feb 24, 2014

@minrk you showed concern with the design earlier, is this design better?

Contributor

dbarbeau commented Feb 28, 2014

On a side note (out of this PR's scope), couldn't the highlighting of input cells (nbconvert.filters.highlight) be done with the same highlighter as for code inside markdown (be it pandoc or marked, whichever is available)? This way, all HTML highlighting would go through the same pipes.

Owner

ellisonbg commented Mar 1, 2014

Needs rebase.

Owner

ellisonbg commented Mar 1, 2014

I haven't looked at this in detail, but I have one general comment. There is a significant amount of complexity cost here. We have to maintain that complexity forever. I am not sure this feature is worth it.

Owner

ellisonbg commented Mar 1, 2014

Maybe a good question to ask: can we get 80% of the functionality with 20% of the complexity?

Contributor

dbarbeau commented Mar 1, 2014

Hi and thanks for the feedback.

I'll give it a try but if you can be more specific about where the complexity is located it could be helpful. In the latest iteration I tried to concentrate the biggest part of the business in a separate class, a postprocessor.

  • I can remove a few lines of code in the diff by removing the nbconvert.utils.html module which doesn't do much and is pretty much a leftover of a previous imp where html_escape was used in multiple places.
  • There are a few useless changes in nbconvertapp
  • Highlight2Latex doesn't need to respect the highlight option since it is only switchable with html (seems obvious this morning, don't know why I hacked that class in the first place).
  • Highlight2Html could use markdown2html with a little hack (wrapping in fenced md). It would unify highlighting paths for markdown code and input cells (instead of pandoc/marked vs pygments right now)
  • the JsHighlightPostProcessor can see its in_code_tag trait removed as I suspect highlight engines don't really care about this.

But the main complexity I see is having to pass the highlight trait down to the templates and then to the different filters. I don't know of another way of doing so.

Daniel

Contributor

dbarbeau commented Mar 1, 2014

Ok, its not so easy to just pipe Highlight2Html to markdown2html since markdown2html doesn't have an IPython lexer. Will forget about that for now.

Contributor

dbarbeau commented Mar 1, 2014

Just a note to let you know I still need to review a few things to make sure they are necessary before considering this ready for review.

Contributor

dbarbeau commented Mar 4, 2014

Hello guys and thanks for your patience!
I think this thing is ready for a new review. Here's a brief summary of functional changes:

  • Add a switch to nbconvert's html export to disable preprocessed highlighting and export <pre> wrapped code blocks instead. This is needed to minimize the size of blog posts (html markup for syntax highlighting is heavy).
  • Add a post processor to further customize the exported html by replacing code css classes with a custom version so that the user's preferred highlighter picks it up.

Usage for google prettyprint:

ipython nbconvert --HTMLExporter.highlight=False --post=js_highlight --JsHighlightPostProcessor.css_substitution="prettyprint lang-{lang}" MyNb.ipynb

or

ipython nbconvert --HTMLExporter.highlight=False MyNb.ipynb
python -m IPython.nbconvert.postprocessors.js_highlight MyNb.html "prettyprint lang-{lang}"

And a summary of structural changes:

  • HTMLExporter receives a highlight trait that is added to the resources dictionary used in jinja templates.
  • The html templates now pass that resources to the different places where highlighting is done : markdown2html and highlight2html
  • These functions check for the existence of the resources dict and the highlight key in it (else it defaults to True). If False they return html-escaped code wrapped in <pre> tags, else they do their original behaviour.
    • marked required passing an extra arg to the nodejs command + checking for that arg in marked.js.
    • pandoc required to pass an extra arg to pandoc which already knows about it.
    • highlight2html bypasses pygments and simply wraps the html-escaped source code
  • The new JsHighlightPostProcessor which checks for pandoc and marked type of output and substitutes them with whatever the user desires.

Daniel

Contributor

dbarbeau commented Mar 8, 2014

Do you mind if I squash all the commits and rebase? It's becoming difficult to rebase through all these commits.

Owner

minrk commented Mar 8, 2014

not at all

Contributor

dbarbeau commented Mar 9, 2014

Finally, I moved almost everything into the postprocessor. This is 100% of functionality with 5% code dispersion and well 50% complexity... or something like that.
JsHighlightPostProcessor uses a HtmlParser subclass to strip static highlighting and rewrite the html. This is where things can go wrong but for the moment tests show a correct output.

Owner

jdfreder commented Apr 9, 2014

I like the post processor approach a lot. It looks like the changes required to nbconvert are actually very small and all of the code is self contained in js_highlight.py. Which makes me wonder if we should instead merge just those changes and move the js_highlight.py file to the ipython contrib repo. Opinions? @minrk ?

Contributor

damianavila commented Apr 9, 2014

Probably is a good candidate for ipython-contrib...

Owner

jdfreder commented Apr 9, 2014

Probably is a good candidate for ipython-contrib...

Yeah, I put it on the hackpad for discussion tomorrow.

Contributor

dbarbeau commented Apr 10, 2014

I'm ok with it :)

@ivanov ivanov and 1 other commented on an outdated diff Apr 10, 2014

IPython/nbconvert/filters/highlight.py
@@ -3,7 +3,7 @@
from within Jinja templates.
"""
#-----------------------------------------------------------------------------
-# Copyright (c) 2013, the IPython Development Team.
+# Copyright (c) 2014, the IPython Development Team.
@ivanov

ivanov Apr 10, 2014

Owner

if you're going to update lines like this, please just remove the year. See our copyright policy which was recently updated to reflect this.

@jdfreder

jdfreder Apr 10, 2014

Owner

In his defense, he made those changes 2 months ago because I suggested he touch the header while working on the file 💩

Contributor

dbarbeau commented Apr 11, 2014

Will do a beauty pass soon! So if there are more little fixes like the (c) notice, it's the right time :)

Contributor

dbarbeau commented Apr 17, 2014

Ok, it still seems to work fine (this claim is based on the observation of this page which I updated today, granted it doesn't test in-markdown code, I have another file for that)

What is the process to move this to contribs?

Contributor

damianavila commented Apr 17, 2014

Just open a PR in ipython-contrib: https://github.com/ipython-contrib/IPython-notebook-extensions
We have some folders to arrange the extension type... probably styling is the best, but check yourself to see which one fits better...

Owner

jdfreder commented Apr 17, 2014

@dbarbeau That is a really cool b-spline notebook!

Contributor

dbarbeau commented Apr 17, 2014

@damianavila ok thanks. So the only things that need to be merged here are nbconvert/filters/highlight.py and nbconvert/templates/html/basic.tpl. I will prepare a new commit then.

@jdfreder Thanks! I'm currently looking at clarifying it (I just re-read it and got confused by my own wording!) and pursuing the exploration, so nbconvert and this postprocessor will be put to work!

Owner

jdfreder commented Apr 21, 2014

@dbarbeau did you want to prepare that new commit in this PR, or do you want me to close this and you can open another PR?

Contributor

dbarbeau commented Apr 22, 2014

@jdfreder : I'll do it in this one, although the changes are now minimal and do not exactly reflect the title.

dbarbeau referenced this pull request in ipython-contrib/jupyter_contrib_nbextensions Apr 22, 2014

Merged

Add a postprocessor to customize nbconvert's html output css classes for... #54

Contributor

dbarbeau commented Apr 22, 2014

And for the record, the PR against IPython-contrib is here: ipython-contrib/IPython-notebook-extensions#54

@minrk minrk and 1 other commented on an outdated diff Apr 22, 2014

IPython/nbconvert/templates/html/basic.tpl
@@ -42,7 +42,7 @@ In&nbsp;[{{ cell.prompt_number }}]:
{% block input %}
<div class="inner_cell">
<div class="input_area">
-{{ cell.input | highlight2html(language=resources.get('language'), metadata=cell.metadata) }}
+{{ cell.input | highlight2html(language=resources.get('language') or cell.language, metadata=cell.metadata) }}
@minrk

minrk Apr 22, 2014

Owner

Why do you have or cell.language here? I don't think it's necessary (cell.language is redundant, and will be removed in future nbformats).

@dbarbeau

dbarbeau Apr 23, 2014

Contributor

As I understood this, resources.get('language') is globally set for the whole notebook, while cell.language is per cell, allowing for input cells of different types in the future, but this goal may still be well out of sight :)

Owner

minrk commented Apr 22, 2014

One last comment, then I think this is ready for merge. Thanks for all your patience!

@dbarbeau dbarbeau Pygment now adds a language-class to the highlighted html so that
postprocessors can use it to identify the code block and the language
of the code inside.
adb6d04
Contributor

dbarbeau commented Apr 23, 2014

It actually doesn't feel like it's been 6 months since the initial PR! Thank you all for the review job and your commitment to the project!

@minrk minrk added a commit that referenced this pull request Apr 23, 2014

@minrk minrk Merge pull request #4682 from dbarbeau/syntax-highlight
Javascript hightlighting for NBConvert html output
2444fac

@minrk minrk merged commit 2444fac into ipython:master Apr 23, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
Owner

minrk commented Apr 23, 2014

That was the idea when we added it, but there isn't currently a mechanism for the cell language to be changed, and that hasn't caused a problem yet, so we are going to remove it in nbformat 4.

minrk referenced this pull request Apr 29, 2014

Merged

keep .highlight css class #5753

@mattvonrocketstein mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014

@minrk minrk Merge pull request #4682 from dbarbeau/syntax-highlight
Javascript hightlighting for NBConvert html output
18f17e5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment