# Scripted document conversion with embedded alternative versions

This example shows how to articulate the inlining of alternative versions or formats of a document (e.g. PDF version for printing) in order to produce a HTML report. [Here](report.ipynb) is the report core. A quick gander shows that it is composed of three cells, two of which are Markdown cells. In addition, the top cell carries tag `alts`, as well as *named artifacts* designating alternative versions of the report for distinct modes of consumption.

## Step 1: produce the alternative versions

These embedded documents should not contain the links to alternatives -- how would they be resolved? Artifact inlining is not [Ouroboros](https://en.wikipedia.org/wiki/Ouroboros) instantiation. So we will take out the cell with those artifact links when rendering the embedded documents.

In [1]:
from pathlib import Path

from nbconvert import PDFExporter, NotebookExporter, HTMLExporter
from traitlets.config import Config

In [2]:
c_embed = Config()
c_embed.Exporter.preprocessors = ["nbconvert.preprocessors.TagRemovePreprocessor"]
c_embed.TagRemovePreprocessor.remove_cell_tags = ["alts"]
c_embed

{'Exporter': {'preprocessors': ['nbconvert.preprocessors.TagRemovePreprocessor']},
 'TagRemovePreprocessor': {'remove_cell_tags': ['alts']}}

For PDF export, the default title corresponds to the stem of the notebook's file name, and there is no author in the `\author` LaTeX sense. To set title and author, one must edit notebook metadata, from the right toolbar, and add the following JSON fields to the metadata object:

```json
"title": "The title of the document",
"authors": [{"name": "Author 1"}, {"name": "Author 2"}...]
```

The [notebook](report.ipynb) we are working with defines these, if you would like to take inspiration.

In [3]:
%%time
exporter_pdf = PDFExporter(config=c_embed)
pdf, _ = exporter_pdf.from_filename("report.ipynb")
type(pdf), len(pdf), pdf[:16]

CPU times: total: 438 ms
Wall time: 15.7 s


(bytes, 15908, b'%PDF-1.5\n%\xe4\xf0\xed\xf8\n1')

In [4]:
%%time
exporter_notebook = NotebookExporter(config=c_embed)
notebook, _ = exporter_notebook.from_filename("report.ipynb")
type(notebook), len(notebook), notebook[:16]

CPU times: total: 719 ms
Wall time: 733 ms


(str, 1330, '{\n "cells": [\n  ')

Our notebook is a JSON string, let's make it a `bytes`  string. Thanks to the [JSON specification](https://datatracker.ietf.org/doc/html/rfc7159), we know this byte string must be encoded to UTF-8.

In [5]:
notebook_bytes = notebook.encode("utf-8")
notebook_bytes[:16]

b'{\n "cells": [\n  '

## Step 2: put together the *master* document

The Python environment is perfect for configuring a more complex preprocessing nbconvert pipeline.

In [6]:
c_master = Config()
c_master.Exporter.preprocessors = ["nbconvert_inline_artifacts.ArtifactInlinePreprocessor"]
c_master.ArtifactInlinePreprocessor.artifacts = {
    "pdf": {"mime_type": "application/pdf", "content": pdf},
    "notebook": {"mime_type": "application/vnd.jupyter", "content": notebook_bytes}
}

In [7]:
path_master = Path("report.html")

In [8]:
%%time
exporter_master = HTMLExporter(config=c_master)
with path_master.open(mode="wb") as file:
    master, _ = exporter_master.from_filename("report.ipynb")
    assert isinstance(master, str)
    file.write(master.encode("utf-8"))

CPU times: total: 766 ms
Wall time: 762 ms


In [9]:
assert path_master.is_file()
path_master.stat().st_size

600305

For such mostly-text files, most of the file mass stems from the incorporation of styling content that nbconvert itself generates when converting a Jupyter notebook to a HTML file. However, as soon as a notebook starts containing any significant binary images, videos or anything of the sort, one may presume that the inlining of alternative versions would multiply the mass of the master document. The author should be conscious of this trade-off and evaluate whether network connectivity or storage capacity costs the most in their computing environment and for their content distribution context.