# Export Jupyter Notebooks

The following code takes a bunch of Jupyter notebooks and extracts code cells building a library out of them. The goal is to have a tool that allows us to write all Python code in the form of Jupyter notebooks, so as to better document their development, underlying algorithms and even sample applications, and then build useful (and compact) libraries out of those self-documented files.

This notebook itself is an example of the intended use, being written as a collection of text and code cells, which are selectively exported into a single Python file, the library `exportnb.py`

The logic of the program is as follows. It takes a bunch of Jupyter notebook file names and reads all of those notebooks, one after another. For each cell, it looks for the first line to find whether it contains the line `# file: xxxxx`, where `xxxxx` can be any valid file name. If so, it adds the cell to the code that is to be exported into the file with that name.

Some remarks:

* File names may contain directory names. By default, those directories will be built by the exporting routine.

* Multiple cells can be written to the same Python file. They are written in the order in which they appear, which is the usual one.

* Also, a single Jupyter notebook can contain code for multiple files.

* If the file name contains spaces, use enclose it in double-quotes, as in `# file: "my long file name.py"`. I am not sure this is useful, though, as `import` statements do not allow spaces in file names.

* By default, the code inserts empty newlines between cells. This avoids frequent errors that originate from different indentations in different cells, functions being joined, etc.

## The library

We begin the code importing libraries to read Jupyter notebooks, which are nothing but JSON code -- i.e. javascript parseable data.

In [1]:
# file: exportnb.py
#
# exportnb.py
#
#  Library for exporting multiple Jupyter notebooks into a series of
#  Python files. Main interface provided by function `export_notebooks`
#  below.
#
#  Author: Juan José García Ripoll
#  License: See http://opensource.org/licenses/MIT
#  Version: 1.0 (15/07/2018)
#
import sys, json, re, pathlib

We then build the function that parses a cell and determines whether it is to be exported.

In [2]:
# file: exportnb.py
def file_cell(lines):
    #
    # Determine whether a cell is to be saved as code. This
    # is done by inspecting the lines of the cell and looking
    # for a line with a comment of the form # file: xxxxx
    # If so, it eliminates this line and collects the remaining
    # text as code.
    #
    if len(lines):
        ok = re.search('^#[ ]+file:[ ]+("[^\\"]*"|[^ \n]*)[ \n]*$', lines[0])
        if ok:
            return ok.group(1), lines[1:]
    return False, lines

This function uses the previous one to decide whether to add the lines to a dictionary that associates file names with text content (the lines of the cell that we received above).

In [3]:
# file: exportnb.py

def register_cell(dictionary, cell_lines, add_newline=True):
    #
    # Input:
    #  - dictionary: a map from file names to lists of lines
    #    of code that will be written to the file
    #  - cell_lines: lines of a cell in a Jupyter notebook
    #  - add_newline: add empty line after each cell
    #
    # Output:
    #  - updated dictionary
    #
    file, lines = file_cell(cell_lines)
    if file:
        if file in dictionary:
            lines = dictionary[file] + lines
        if add_newline:
            lines += ['\n']
        dictionary[file] = lines
    return dictionary

We now create a file that parses a whole notebook, loading the content into a dictionary that associates files with cell content.

In [4]:
# file: exportnb.py

def read_notebook(dictionary, notebook, add_newline = True):    
    with open(notebook, 'r', encoding='utf-8') as f:
        j = json.load(f)
        if j["nbformat"] >=4:
            for i,cell in enumerate(j["cells"]):
                dictionary = register_cell(dictionary, cell["source"], add_newline)
        else:
            for i,cell in enumerate(j["worksheets"][0]["cells"]):
                dictionary = register_cell(dictionary, cell["input"], add_newline)

Finally, we save the content of a whole dictionary, overwriting files. We add some intelligence, ensuring that directories are properly built.

In [5]:
# file: exportnb.py

def write_notebooks(dictionary, root='', mkdirs=True):
    #
    # Input:
    #  - dictionary: a map from file names to list of lines of codes
    #    to be written
    #  - root: prefix to be added to all file names
    #  - mkdirs: create parent directories if they do not exist
    #
    for file in dictionary.keys():
        path = pathlib.Path(file)
        if mkdirs:
            path.parent.mkdir(parents=True, exist_ok=True)
        with path.open('w', encoding='utf-8') as f:
            for line in dictionary[file]:
                f.write(line)

All these functions are combined into a single interface, give by the next one.

In [8]:
# file: exportnb.py

def export_notebooks(notebooks, root='', add_newline=True, mkdirs=True):
    #
    # Input:
    #  - notebooks: list of notebooks as file names
    #  - root: prefix for exporting all notebooks
    #  - add_linewline: add empty lines between cells
    #
    dictionary = {}
    for nb in notebooks:
        read_notebook(dictionary, nb, add_newline=add_newline)
    write_notebooks(dictionary, root=root, mkdirs=mkdirs)

## Test and build the library

We test the library with the following line, which packs the whole file into a single file `exportnb.py`, as specified above.

In [7]:
export_notebooks(['Export Jupyter notebooks to Python library.ipynb'])