# jupyter2blog: Notebook basics with nbformat
> A post on how we can use `nbformat` to create our own notebook file. We will learn what a notebook file actually is and how we can work with `notebook_node`'s.

- tags: jupyter, nbformat, notebook
- category: jupyter2blog

The goal of the jupyter2blog series is to use Jupyter notebooks as a source for a static website generator.
For all future work, the first step will _always_ be to parse a notebook.
Then we can apply transformations and export the notebook into a different format.

So before we try to apply any transformations to our source notebooks, we need to understand the notebook format itself a _little_ better.
The main tool for loading (and dynamically creating) notebooks, or `ipynb` files, is the [nbformat](https://nbformat.readthedocs.io/en/latest/) library.
The [nbformat](https://nbformat.readthedocs.io/en/latest/) library also supports fast validation and format migration.
<!-- END_TEASER -->

Let's start by simply creating a notebook _object_, or to use the Jupyter terminology, a `NotebookNode`.

> Warning: This post uses the notebook specification `>=v4.5`.

In [3]:
import nbformat
from pprint import pprint

nbformat.__version__

'5.1.3'

To create a notebook, we can call the `new_notebook()` function.
The result of the function is different for every notebook specification version.
The target specification is given by accessing the relevant module.
The following code shows a list of all of the available specification versions.

In [4]:
notebook_spec_versions = [f"v{v_number}" for v_number in nbformat.versions.keys()]
pprint(notebook_spec_versions)

['v1', 'v2', 'v3', 'v4']


In [5]:
nb = nbformat.v4.new_notebook()
nb

{'nbformat': 4, 'nbformat_minor': 5, 'metadata': {}, 'cells': []}

In [6]:
# hide
assert "metadata" in nb
assert "cells" in nb

Now we have created our first valid notebook! 🎉
We can see that the empty notebook is rather boring.
But we learn quite a few properties of our notebook simply by looking at the string representation.

1. We can see that under the hood a `ipynb` file is nothing more than a [JSON](https://en.wikipedia.org/wiki/JSON) file.
1. Each notebooks will inform us what JSON schema (notebook specification version) was used to create the file
    - This allows the `nbformat` library to stay backwards compatible and an easy on-the-fly conversion from the _original_ notebook to a newer (or older!) version.
1. Global metadata stored to the `metadata` field
1. The actual _cells_ that we see on our screens are defined in `cells`

> Tip: For me it was confusing that the current library version is >4 but the highest `vX` module (currently) is `v4`. The reason is that the _JSON schema_ that defines how notebooks are encoded is _independent_ of the library version! 

There are three types of cells:
- Markdown
- Code
- Raw

In [7]:
md_cell = nbformat.v4.new_markdown_cell("# This is a markdown h1 header")
md_cell

{'id': '51a32702',
 'cell_type': 'markdown',
 'source': '# This is a markdown h1 header',
 'metadata': {}}

The structure of a cell is straight-forward:

- `id`: Unique cell id
    - Added in `nbformat=4.5`
    - Ensures that cells can be referenced across a notebook's life-time
    - See [JEP62](https://github.com/jupyter/enhancement-proposals/blob/master/62-cell-id/cell-id.md) for more information
- `cell_type`: Defines what type of cell the cell itself is
- `source`: The actual source/code/text of the cell
- `metadata`: Extra **cell-specific** metadata

A _raw_ cell looks exactly like a markdown cell, with the only difference being the value of the `cell_type` option:

In [8]:
raw_cell = nbformat.v4.new_raw_cell("# This is a raw cell")
raw_cell

{'id': 'dba2c85e',
 'cell_type': 'raw',
 'source': '# This is a raw cell',
 'metadata': {}}

So why does it exists?

The reason why we need a _raw_ cell is to allow interfaces or exporters to know that the cell should be parsed _as is_ and should not be interpreted as a markdown cell.
Instead of raw, we could also think of it as _plain text_.

Take a look at your preferred notebook interface (`jupyterlab` for example) and see the difference between the output of `# hello` if the cell is defined as a markdown or a raw cell yourself!
Since `#` is equivalent to a `<h1>`/title heading in markdown, the output will be formatted accordingly.
The _raw_ cell will keep the output unchanged.

The last cell type is the _code_ cell.
The code cell has a bit more information:

In [9]:
code_cell = nbformat.v4.new_code_cell("# This is Python comment")
code_cell

{'id': '73f40b4d',
 'cell_type': 'code',
 'metadata': {},
 'execution_count': None,
 'source': '# This is Python comment',
 'outputs': []}

The code cell additionally contains:
- `execution_count`: The execution number we see next to the code cell in a jupyter interface
    - The number indicates when the cell was last executed
- `outputs`: The output we see under the code cell after its last execution

All of these keys can also be dynamically added during the cell instantiation process.
The execution count is a simple integer.
But, the `outputs` field has a special format, which is out of the scope of the current article.
If you are interested you can read more about how the `outputs` fields is defined and used under the hood in the [nbformat documentation](https://nbformat.readthedocs.io/en/latest/format_description.html#code-cell-outputs).

In [10]:
code_cell_with_count = nbformat.v4.new_code_cell("# Python comment", execution_count=42)

We can add all of these cells to our initial empty notebook node.

In [11]:
# collapse_output
cells = [md_cell, raw_cell, code_cell, code_cell_with_count]
# remember nb.cells is simply a normal list
nb.cells.extend(cells)
pprint(nb)

{'cells': [{'cell_type': 'markdown',
            'id': '51a32702',
            'metadata': {},
            'source': '# This is a markdown h1 header'},
           {'cell_type': 'raw',
            'id': 'dba2c85e',
            'metadata': {},
            'source': '# This is a raw cell'},
           {'cell_type': 'code',
            'execution_count': None,
            'id': '73f40b4d',
            'metadata': {},
            'outputs': [],
            'source': '# This is Python comment'},
           {'cell_type': 'code',
            'execution_count': 42,
            'id': 'd5c2e6d4',
            'metadata': {},
            'outputs': [],
            'source': '# Python comment'}],
 'metadata': {},
 'nbformat': 4,
 'nbformat_minor': 5}


Since we have a notebook with content, let's export it!

In [12]:
# hide
from pathlib import Path

# this is location when run from nikola blog exporter
tmp_location = Path.cwd() / "files"
# this is the location when run from jupyterlab
if not tmp_location.exists():
    tmp_location = Path.cwd() / ".." / "files"

tmp_location = tmp_location.resolve()
assert tmp_location.exists()


In [13]:
# change tmp_location to your desired path or simply delete until "_out.ipynb"
path = tmp_location / "_out.ipynb"
nbformat.write(nb, path)

You can then open the notebook with your favorite notebook interface and see if it looks like you expect!
If we would like to read the notebook and change any of it contents you simple need to use `nbformat.read`.


Here is how the generated output {{% Footnote %}}This inline notebook rendering is possible due to the amazing work from the [jupyter nbviewer](https://nbviewer.org/) team! {{% /Footnote %}} looks like: 

{{% NBViewer title="Generated NB" %}}_out.ipynb{{% /NBViewer %}}



To ensure that your code is compatible with future updates to the notebook specification, you are _required_ to provide the desired **target** specification.
The `nbformat` library will ensure that the notebook is read with the original and converted to the desired version.

> Note: Remember that the notebook file itself contains an entry about what notebook schema is used!

In [15]:
# collapse_output
# Note that it is automatically converted to the old v3 specification!
v3_nb = nbformat.read(path, as_version=3) 
# If you like you can take a look at the older v3 notebook schema:
pprint(v3_nb)

{'metadata': {'name': ''},
 'nbformat': 3,
 'nbformat_minor': 0,
 'orig_nbformat': 4,
 'orig_nbformat_minor': 5,
 'worksheets': [{'cells': [{'cell_type': 'heading',
                            'level': 1,
                            'metadata': {},
                            'source': 'This is a markdown h1 header'},
                           {'cell_type': 'raw',
                            'metadata': {},
                            'source': '# This is a raw cell'},
                           {'cell_type': 'code',
                            'collapsed': False,
                            'input': '# This is Python comment',
                            'language': 'python',
                            'metadata': {},
                            'outputs': [],
                            'prompt_number': None},
                           {'cell_type': 'code',
                            'collapsed': False,
                            'input': '# Python comment',
                    

Congratulations! 🎉

Now you know the Jupyter notebooks basics!
Using the `nbformat` library you know how to:
- read notebooks
- dynamically create notebooks
- convert between notebooks specification versions

In the next post, we will take a closer look at the next important Jupyter library: `nbconvert`.