Skip to content

Commit

Permalink
Support ipynb (Jupyter notebook) as input and output format.
Browse files Browse the repository at this point in the history
[API change]

* Depend on ipynb library.

* Add `ipynb` as input and output format.

* Added Text.Pandoc.Readers.Ipynb (supports both nbformat v3 and v4).

* Added Text.Pandoc.Writers.Ipynb (supports nbformat v4).

* Added ipynb readers and writers to T.P.Readers,
  T.P.Writers, and T.P.Extensions.  Register the
  file extension .ipynb for this format.

* Add `PandocIpynbDecodingError` constructor to Text.Pandoc.Error.Error.

* Note: there is no template for ipynb.
  • Loading branch information
jgm committed Jan 23, 2019
1 parent 5ddd7b1 commit 395ea03
Show file tree
Hide file tree
Showing 13 changed files with 638 additions and 9 deletions.
126 changes: 125 additions & 1 deletion MANUAL.txt
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,7 @@ General options {.options}
if you need extensions not supported in [`gfm`](#markdown-variants).
- `haddock` ([Haddock markup])
- `html` ([HTML])
- `ipynb` ([Jupyter notebook])
- `jats` ([JATS] XML)
- `json` (JSON version of native AST)
- `latex` ([LaTeX])
Expand Down Expand Up @@ -300,6 +301,7 @@ General options {.options}
- `html` or `html5` ([HTML], i.e. [HTML5]/XHTML [polyglot markup])
- `html4` ([XHTML] 1.0 Transitional)
- `icml` ([InDesign ICML])
- `ipynb` ([Jupyter notebook])
- `jats` ([JATS] XML)
- `json` (JSON version of native AST)
- `latex` ([LaTeX])
Expand Down Expand Up @@ -471,6 +473,7 @@ General options {.options}
[PDF]: https://www.adobe.com/pdf/
[reveal.js]: http://lab.hakim.se/reveal-js/
[FictionBook2]: http://www.fictionbook.org/index.php/Eng:XML_Schema_Fictionbook_2.1
[Jupyter notebook]: https://nbformat.readthedocs.io/en/latest/
[InDesign ICML]: http://wwwimages.adobe.com/www.adobe.com/content/dam/acom/en/devnet/indesign/sdk/cs6/idml/idml-cookbook.pdf
[TEI Simple]: https://github.com/TEIC/TEI-Simple
[Muse]: https://amusewiki.org/library/manual
Expand Down Expand Up @@ -730,6 +733,8 @@ General writer options {.options}
where there are nonsemantic newlines in the source, there
will be nonsemantic newlines in the output as well).
Automatic wrapping does not currently work in HTML output.
In `ipynb` output, this option affects wrapping of the
contents of markdown cells.

`--columns=`*NUMBER*

Expand Down Expand Up @@ -902,6 +907,7 @@ Options affecting specific writers {.options}
: Use ATX-style headers in Markdown output. The default is
to use setext-style headers for levels 1-2, and then ATX headers.
(Note: for `gfm` output, ATX headers are always used.)
This option also affects markdown cells in `ipynb` output.

`--top-level-division=[default|section|chapter|part]`

Expand Down Expand Up @@ -1806,6 +1812,10 @@ section [Pandoc's Markdown] below (See [Markdown variants] for
`commonmark` and `gfm`.) In the following, extensions that also work
for other formats are covered.

Note that markdown extensions added to the `ipynb` format
affect Markdown cells in Jupyter notebooks (as do command-line
options like `--atx-headers`).

Typography
----------

Expand Down Expand Up @@ -1955,11 +1965,19 @@ This extension can be enabled/disabled for the following formats

input formats
: `latex`, `org`, `textile`, `html` (environments, `\ref`, and
`\eqref` only)
`\eqref` only), `ipynb`

output formats
: `textile`, `commonmark`

Note: as applied to `ipynb`, `raw_html` and `raw_tex` affect not
only raw TeX in markdown cells, but data with mime type
`text/html` in output cells. Since the `ipynb` reader attempts
to preserve the richest possible outputs when several options
are given, you will get best results if you disable `raw_html`
and `raw_tex` when converting to formats like `docx` which don't
allow raw `html` or `tex`.

#### Extension: `native_divs` {#native_divs}

This extension is enabled by default for HTML input. This means that
Expand Down Expand Up @@ -4747,6 +4765,112 @@ with the `src` attribute. For example:
</source>
</audio>

Creating Jupyter notebooks with pandoc
======================================

When creating a [Jupyter notebook], pandoc will try to infer the
notebook structure. Code blocks with the class `code` will be
taken as code cells, and intervening content will be taken as
Markdown cells. Attachments will automatically be created for
images in Markdown cells. For example:

````
---
title: My notebook
nbformat: 4
nbformat_minor: 5
kernelspec:
display_name: Python 2
language: python
name: python2
language_info:
codemirror_mode:
name: ipython
version: 2
file_extension: ".py"
mimetype: "text/x-python"
name: "python"
nbconvert_exporter: "python"
pygments_lexer: "ipython2"
version: "2.7.15"
---

# Lorem ipsum

**Lorem ipsum** dolor sit amet, consectetur adipiscing elit. Nunc luctus
bibendum felis dictum sodales.

``` code
print("hello")
```

## Pyout

``` code
from IPython.display import HTML
HTML("""
<script>
console.log("hello");
</script>
<b>HTML</b>
""")
```

## Image

This image ![image](myimage.png) will be
included as a cell attachment.
````

If you want to add cell attributes, group cells differently, or
add output to code cells, then you need to include divs to
indicate the structure. You can use either [fenced
divs][Extension: `fenced_divs`] or [native divs][Extension:
`native_divs`] for this. Here is an example:

````
:::::: {.cell .markdown}
# Lorem

**Lorem ipsum** dolor sit amet, consectetur adipiscing elit. Nunc luctus
bibendum felis dictum sodales.
::::::

:::::: {.cell .code execution_count=1}
``` {.python}
print("hello")
```

::: {.output .stream .stdout}
```
hello
```
:::
::::::

:::::: {.cell .code execution_count=2}
``` {.python}
from IPython.display import HTML
HTML("""
<script>
console.log("hello");
</script>
<b>HTML</b>
""")
```

::: {.output .execute_result execution_count=2}
```{=html}
<script>
console.log("hello");
</script>
<b>HTML</b>
hello
```
:::
::::::
````

Syntax highlighting
===================

Expand Down
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ library. It can convert *from*
- `creole` ([Creole 1.0](http://www.wikicreole.org/wiki/Creole1.0))
- `docbook` ([DocBook](http://docbook.org))
- `docx` ([Word docx](https://en.wikipedia.org/wiki/Office_Open_XML))
- `dokuwiki` ([DokuWiki markup](https://www.dokuwiki.org/dokuwiki))
- `epub` ([EPUB](http://idpf.org/epub))
- `fb2`
([FictionBook2](http://www.fictionbook.org/index.php/Eng:XML_Schema_Fictionbook_2.1)
Expand All @@ -44,6 +45,8 @@ library. It can convert *from*
- `haddock` ([Haddock
markup](https://www.haskell.org/haddock/doc/html/ch03s08.html))
- `html` ([HTML](http://www.w3.org/html/))
- `ipynb` ([Jupyter
notebook](https://nbformat.readthedocs.io/en/latest/))
- `jats` ([JATS](https://jats.nlm.nih.gov) XML)
- `json` (JSON version of native AST)
- `latex` ([LaTeX](http://latex-project.org))
Expand Down Expand Up @@ -105,6 +108,8 @@ It can convert *to*
- `html4` ([XHTML](http://www.w3.org/TR/xhtml1/) 1.0 Transitional)
- `icml` ([InDesign
ICML](http://wwwimages.adobe.com/www.adobe.com/content/dam/acom/en/devnet/indesign/sdk/cs6/idml/idml-cookbook.pdf))
- `ipynb` ([Jupyter
notebook](https://nbformat.readthedocs.io/en/latest/))
- `jats` ([JATS](https://jats.nlm.nih.gov) XML)
- `json` (JSON version of native AST)
- `latex` ([LaTeX](http://latex-project.org))
Expand Down
1 change: 1 addition & 0 deletions cabal.project
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ source-repository-package
type: git
location: https://github.com/jgm/pandoc-citeproc
tag: 4b467c62af17ddfc739933891c5ea2291a6b9b76

19 changes: 11 additions & 8 deletions pandoc.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,14 @@ description: Pandoc is a Haskell library for converting from one markup
(subsets of) HTML, reStructuredText, LaTeX, DocBook, JATS,
MediaWiki markup, DokuWiki markup, TWiki markup,
TikiWiki markup, Creole 1.0, Haddock markup, OPML,
Emacs Org-Mode, Emacs Muse, txt2tags,
Vimwiki, Word Docx, ODT, EPUB, FictionBook2, roff man,
and Textile, and it can write Markdown, reStructuredText,
XHTML, HTML 5, LaTeX, ConTeXt, DocBook, JATS, OPML, TEI,
OpenDocument, ODT, Word docx, PowerPoint pptx,
RTF, MediaWiki, DokuWiki, ZimWiki, Textile,
Emacs Org-Mode, Emacs Muse, txt2tags, ipynb (Jupyter
notebooks), Vimwiki, Word Docx, ODT, EPUB, FictionBook2,
roff man, and Textile, and it can write Markdown,
reStructuredText, XHTML, HTML 5, LaTeX, ConTeXt, DocBook,
JATS, OPML, TEI, OpenDocument, ODT, Word docx,
PowerPoint pptx, RTF, MediaWiki, DokuWiki, ZimWiki, Textile,
roff man, roff ms, plain text, Emacs Org-Mode,
AsciiDoc, Haddock markup, EPUB (v2 and v3),
AsciiDoc, Haddock markup, EPUB (v2 and v3), ipynb,
FictionBook2, InDesign ICML, Muse, LaTeX beamer slides,
and several kinds of HTML/JavaScript slide shows
(S5, Slidy, Slideous, DZSlides, reveal.js).
Expand Down Expand Up @@ -398,7 +398,8 @@ library
http-types >= 0.8 && < 0.13,
case-insensitive >= 1.2 && < 1.3,
unicode-transforms >= 0.3 && < 0.4,
HsYAML >= 0.1.1.1 && < 0.2
HsYAML >= 0.1.1.1 && < 0.2,
ipynb >= 0.1 && < 0.2
if impl(ghc < 8.0)
build-depends: semigroups == 0.18.*,
-- basement 0.0.8 and foundation 0.0.21, transitive
Expand Down Expand Up @@ -470,12 +471,14 @@ library
Text.Pandoc.Readers.Man,
Text.Pandoc.Readers.FB2,
Text.Pandoc.Readers.DokuWiki,
Text.Pandoc.Readers.Ipynb,
Text.Pandoc.Writers,
Text.Pandoc.Writers.Native,
Text.Pandoc.Writers.Docbook,
Text.Pandoc.Writers.JATS,
Text.Pandoc.Writers.OPML,
Text.Pandoc.Writers.HTML,
Text.Pandoc.Writers.Ipynb,
Text.Pandoc.Writers.ICML,
Text.Pandoc.Writers.LaTeX,
Text.Pandoc.Writers.ConTeXt,
Expand Down
1 change: 1 addition & 0 deletions src/Text/Pandoc/App/FormatHeuristics.hs
Original file line number Diff line number Diff line change
Expand Up @@ -90,5 +90,6 @@ formatFromFilePath x =
".txt" -> Just "markdown"
".wiki" -> Just "mediawiki"
".xhtml" -> Just "html"
".ipynb" -> Just "ipynb"
['.',y] | y `elem` ['1'..'9'] -> Just "man"
_ -> Nothing
3 changes: 3 additions & 0 deletions src/Text/Pandoc/Error.hs
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ data PandocError = PandocIOError String IOError
| PandocEpubSubdirectoryError String
| PandocMacroLoop String
| PandocUTF8DecodingError String Int Word8
| PandocIpynbDecodingError String
deriving (Show, Typeable, Generic)

instance Exception PandocError
Expand Down Expand Up @@ -124,6 +125,8 @@ handleError (Left e) =
"UTF-8 decoding error in " ++ f ++ " at byte offset " ++ show offset ++
" (" ++ printf "%2x" w ++ ").\n" ++
"The input must be a UTF-8 encoded text."
PandocIpynbDecodingError w -> err 93 $
"ipynb decoding error: " ++ w

err :: Int -> String -> IO a
err exitCode msg = do
Expand Down
2 changes: 2 additions & 0 deletions src/Text/Pandoc/Extensions.hs
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,8 @@ getDefaultExtensions "markdown_phpextra" = phpMarkdownExtraExtensions
getDefaultExtensions "markdown_mmd" = multimarkdownExtensions
getDefaultExtensions "markdown_github" = githubMarkdownExtensions
getDefaultExtensions "markdown" = pandocExtensions
getDefaultExtensions "ipynb" = enableExtension Ext_tex_math_dollars
githubMarkdownExtensions
getDefaultExtensions "muse" = extensionsFromList
[Ext_amuse,
Ext_auto_identifiers]
Expand Down
3 changes: 3 additions & 0 deletions src/Text/Pandoc/Readers.hs
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ module Text.Pandoc.Readers
, readEPUB
, readMuse
, readFB2
, readIpynb
-- * Miscellaneous
, getReader
, getDefaultExtensions
Expand All @@ -90,6 +91,7 @@ import Text.Pandoc.Readers.Docx
import Text.Pandoc.Readers.DokuWiki
import Text.Pandoc.Readers.EPUB
import Text.Pandoc.Readers.FB2
import Text.Pandoc.Readers.Ipynb
import Text.Pandoc.Readers.Haddock
import Text.Pandoc.Readers.HTML (readHtml)
import Text.Pandoc.Readers.JATS (readJATS)
Expand Down Expand Up @@ -147,6 +149,7 @@ readers = [ ("native" , TextReader readNative)
,("muse" , TextReader readMuse)
,("man" , TextReader readMan)
,("fb2" , TextReader readFB2)
,("ipynb" , TextReader readIpynb)
]

-- | Retrieve reader, extensions based on formatSpec (format+extensions).
Expand Down
Loading

0 comments on commit 395ea03

Please sign in to comment.