## Introduction: `*.ipynb` and nbformat

###  `*.ipynb`

* JSON on-disk representation
* different versions 
    * current major version: 4
* versions define json.schema

Straightforward questions: 

1. **Minimal structure** needed to meet the schema?
2. **Validate** a notebook against the schema?

### nbformat

Python library for for simple programmatic notebook operations.

#### Minimal structure 

In [2]:
import nbformat
from nbformat.v4 import new_notebook
nb = new_notebook()
display(nb)

{'cells': [], 'metadata': {}, 'nbformat': 4, 'nbformat_minor': 2}

* `cells`: list
* `metadata`: dict
* `nbformat`, `nbformat_minor`: int, int

**Validate**

In [3]:
nbformat.validate(nb)

What happens if it's invalid?

In [4]:
nb.pizza = True
nbformat.validate(nb)

NotebookValidationError: Additional properties are not allowed ('pizza' was unexpected)

Failed validating 'additionalProperties' in notebook:

On instance:
{'cells': ['...0 cells...'],
 'metadata': {},
 'nbformat': 4,
 'nbformat_minor': 2,
 'pizza': True}

#### Cells their `source`s

Before we can add cells, we need to create them.

* Three types of cells:
    * code_cell
    * markdown_cell
    * raw_cell

In [5]:
from nbformat.v4 import new_code_cell, new_markdown_cell, new_raw_cell

In [6]:
nb = new_notebook()
md = new_markdown_cell("First argument is the source.")
display(md)
nb.cells.append(md)
nbformat.validate(nb)

{'cell_type': 'markdown',
 'metadata': {},
 'source': 'First argument is the source.'}

#### Markdown cells
* `cell_type`: str, "markdown"
* `metadata`: dict
* `source`: str or list of strings

In [7]:
raw = new_raw_cell("Sources can be one (multil-line)\nstring.")
display(raw)

{'cell_type': 'raw',
 'metadata': {},
 'source': 'Sources can be one (multil-line)\nstring.'}

#### Raw cells
* `cell_type`: str, "raw"
* `metadata`: dict
* `source`: str or list of strings

In [8]:
code = new_code_cell(["#Sources can also be a list of strings.\n", "print('like this example')"])
display(code)

{'cell_type': 'code',
 'execution_count': None,
 'metadata': {},
 'outputs': [],
 'source': ['#Sources can also be a list of strings.\n',
  "print('like this example')"]}

#### Code cells
* `cell_type`: str, "code"
* `execution_count`: `None` or int
* `metadata`: dict
* `outputs`: list
* `source`: str

## Creating outputs

Need to specify the output type:

In [None]:
from nbformat.v4 import new_output

In [21]:
output_stream = new_output("stream")
display(output_stream)

{'name': 'stdout', 'output_type': 'stream', 'text': ''}

In [22]:
output_disp = new_output("display_data")
display(output_disp)

{'data': {}, 'metadata': {}, 'output_type': 'display_data'}

In [23]:
output_ex = new_output("execute_result", execution_count=None)
display(output_ex)

{'data': {},
 'execution_count': None,
 'metadata': {},
 'output_type': 'execute_result'}

In [25]:
output_err = new_output("error", 
                        ename="ErrorName", 
                        evalue="Error message",
                        traceback=["Error traceback", "as an array of", "strings"]
                       )
display(output_err)

{'ename': 'ErrorName',
 'evalue': 'Error message',
 'output_type': 'error',
 'traceback': ['Error traceback', 'as an array of', 'strings']}

#### Note on display_data

Each output can have multiple mimetypes, see the `IPython` display machinery for more info. 