## Reporter - a reporting tool for Data Science Project
This notebook is part of a set of standardize data science tools I am creating. This intend to automate a data science projects and the reporting of it.
It handles multiples notebook execution, parametrization and exporting. 

This code can be easily adapted to match the needs of a data science project. Querying data, Raising error, configure how notebooks are merged, etc.

**Note** : due to limitation to get notebook's directory, notebook executed by Reporter will work from current working directory of reporter.ipynb. This can pose problem when your notebooks input or output data.

This can be overcome in several ways : 
- use hardcoded path : it gives you the freedom of your project's structure (not recommended)
- give path as parameter of your notebooks (used in this notebook)
- use relative path and set Reporter at the same level in the tree structure as the notebooks you are executing (temp folder can be located anywhere)) :

***Project folder***

    - Prediction model

        - model.ipynb

    - Dashboard

        - dashboard.ipynb

    - Reporter

        - reporter.ipynb 

### I - Configuration

In [166]:
import os
import sys
from pathlib import Path
import shutil
import papermill as pm

# this project use nbmerge with bash command !nbmerge

In [167]:
# Set to current directory
PROJECT_DIR = Path.cwd().joinpath('project_example')
# Where the final report will be saved
REPORT_DIR = Path.cwd()
# Where the data is stored
DATA_FOLDER = PROJECT_DIR.joinpath('data')
# Setting no to temp_folder will execute notebooks in place
TEMP_FOLDER = REPORT_DIR.joinpath('temp')

delete_temp = True

print('Project directory: {}'.format(PROJECT_DIR))

Project directory: /Users/mathisderenne/Mon Drive/GitHub/DS - Reporting/project_example


In [168]:
# Raise error if project_dir does not exist
if not PROJECT_DIR.exists():
    raise ValueError('Project directory does not exist: {}'.format(PROJECT_DIR))

# Create data folder and temp folder if they do not exist
if DATA_FOLDER and not DATA_FOLDER.exists():
    print('Creating data folder: {}'.format(DATA_FOLDER))
    DATA_FOLDER.mkdir()
if not TEMP_FOLDER.exists():
    print('Creating temp folder: {}'.format(TEMP_FOLDER))
    TEMP_FOLDER.mkdir()

In [169]:
# List of Notebooks to execute, parameterize, and report
# folder_name can be set to None or a path to a folder relative to PROJECT_DIR 
NOTEBOOKS = [
    {
        'notebook_name' : 'generating_data.ipynb',
        'folder_name' : PROJECT_DIR,
        'param' : {'DATA_FOLDER' : str(DATA_FOLDER)}
    },
    {
        'notebook_name' : 'EDA.ipynb',
        'folder_name' : PROJECT_DIR,
        'param' : {'DATA_FOLDER' : str(DATA_FOLDER)}
    }
]
    
for notebook in NOTEBOOKS:
    if notebook['folder_name'] is None:
        notebook['notebook_path'] = PROJECT_DIR.joinpath(notebook['notebook_name'])
    else:
        notebook['notebook_path'] = PROJECT_DIR.joinpath(notebook['folder_name'], notebook['notebook_name'])

for nb in NOTEBOOKS: 
    print(f"{nb['notebook_name']} at {nb['notebook_path']}")
    # Raise error if notebook does not exist
    if not nb['notebook_path'].exists():
        raise ValueError(f"Notebook {nb['notebook_name']} does not exist: {nb['notebook_path']}")

generating_data.ipynb at /Users/mathisderenne/Mon Drive/GitHub/DS - Reporting/project_example/generating_data.ipynb
EDA.ipynb at /Users/mathisderenne/Mon Drive/GitHub/DS - Reporting/project_example/EDA.ipynb


### II - Execution

**Process :** 
1. Move file to temp folder if set
2. Execute notebook with parameters

In [170]:
if TEMP_FOLDER is not None:
    for nb in NOTEBOOKS:
        # Copy notebooks to temp folder
        shutil.copy(nb['notebook_path'], TEMP_FOLDER)
        # Update notebook path to temp folder
        nb['notebook_path'] = TEMP_FOLDER.joinpath(nb['notebook_name'])

In [171]:
# Execute notebooks
for nb in NOTEBOOKS:
    print(f"Executing notebook: {nb['notebook_name']}")
    pm.execute_notebook(nb['notebook_path'], nb['notebook_path'], parameters = nb['param'],progress_bar = True)

Black is not installed, parameters wont be formatted


Executing notebook: generating_data.ipynb


Executing:   0%|          | 0/6 [00:00<?, ?cell/s]

Unable to load extension: pydevd_plugins.extensions.types.pydevd_plugin_pandas_types
Black is not installed, parameters wont be formatted


Executing notebook: EDA.ipynb


Executing:   0%|          | 0/10 [00:00<?, ?cell/s]

Unable to load extension: pydevd_plugins.extensions.types.pydevd_plugin_pandas_types


### III - Reporting

**Process :**
1. Merge the executed version of notebooks
1. Export the merged notebook to html

In [172]:
import nbformat
import nbconvert as nbc
from nbconvert import HTMLExporter
from traitlets.config import Config
from nbconvert.preprocessors import TagRemovePreprocessor
# import functions.main
from functions.main import TagFilterPreprocessor

In [173]:
# Merge notebooks
notebook_paths = " ".join(['"{}"'.format(nb['notebook_path']) for nb in NOTEBOOKS])
merged_nb_path = TEMP_FOLDER.joinpath('merged_notebook.ipynb')
output_path = '"{}"'.format(str(merged_nb_path))

print(f"Merging notebooks as merged_notebook.ipynb in TEMP_FOLDER")

!nbmerge {notebook_paths} -o {output_path}

if not merged_nb_path.exists():
    raise ValueError('Merged notebook does not exist: {}'.format(merged_nb_path))

Merging notebooks as merged_notebook.ipynb in TEMP_FOLDER


In [175]:

def export_nb(nb_path):
    # Read merged_notebook
    with open(nb_path) as f:
        print(f"Reading merged notebook: {nb_path}")
        nb = nbformat.read(f, as_version=4)
    
    # Notebook Preprocessing and Exporter Configuration
    c = Config()

    # remove input cells
    c.TemplateExporter.exclude_input = True

    # exclude cells that do not contain these tag
    c.TagFilterPreprocessor.allowed_tags = ['output']
    # you may want to create tags for logging purposes

    exporter = HTMLExporter(template_name='classic', config = c)
    exporter.register_preprocessor(TagFilterPreprocessor(allowed_tags=c.TagFilterPreprocessor.allowed_tags, config=c),True)

    (body, resources) = exporter.from_notebook_node(nb)

    # Notebook reporting
    with open("report.html",  "w") as f:
        print(f"Writing report.html")
        f.write(body)

    return

export_nb(merged_nb_path)

Reading merged notebook: /Users/mathisderenne/Mon Drive/GitHub/DS - Reporting/temp/merged_notebook.ipynb
Writing report.html
