In [None]:
# !pip install papermill pandas hvplot

In [1]:
import papermill as pm
import glob

# Parameterization of Notebooks

Parameterizing Jupyter notebooks means making them flexible so you can easily change input values, like datasets, without having to edit the code each time. 
This is helpful when you need to run the same analysis or process with different data. 
It saves time and avoids mistakes because you don’t have to rewrite anything. 
Instead, you can just set the new inputs and run the notebook again for different tasks or data.

## Passing in Parameters To Notebooks With Papermill

Papermill helps with parameterizing Jupyter notebooks by allowing us to inject new inputs (parameters) into a notebook before running it. 
Parameters have placeholders in the template notebook, and when we run Papermill, it fills those placeholders with the actual values we provide. 
Papermill then executes the entire notebook with the new inputs, saving the results in a new output notebook. 
This makes it easy to reuse the same notebook for different data or settings, automating tasks like batch processing, reporting, or experimentation with different variables.

For this example we will use two notebooks: `notebook_1.ipynb` (access data from a remote url and prepares a processed csv file) and `notebook_2.ipynb` (needs processed csv for analysis)

Run the below code to access notebooks. Feel free to go through the notebooks.

In [2]:
import sys
sys.path.append('src')
import sciebo

sciebo.download_file('https://uni-bonn.sciebo.de/s/bHyVzeE8yGAzD5S', 'parameterization/notebook_1.ipynb')
sciebo.download_file('https://uni-bonn.sciebo.de/s/xS6Skp6IOO0DhDI', 'parameterization/notebook_2.ipynb')

Downloading parameterization/notebook_1.ipynb: 100%|██████████████████████████████████████| 4.92k/4.92k [00:00<?, ?B/s]
Downloading parameterization/notebook_2.ipynb: 100%|██████████████████████████████████████| 2.05k/2.05k [00:00<?, ?B/s]


**Example** In `notebook_1.ipynb`, tag the cell containing the below variables as `parameters`. 

| **Parameter**   | **Description**                   | **Type**  |
|-----------------|-----------------------------------|-----------|
| `input_csv_url`     | url of input CSV file        | String    |
| `output_csv`    | Path to save the processed CSV    | String    |
| `num_rows_display`    | Number of rows to display    | Integer    |

To make papermill know that a cell contains parameters

1. Put all parameters in a single cell before any other cell that uses them
2. Click on the cell and then the gear icon next to the notebook
3. Type `parameters` within Cell Tags

It has been done for `notebook_1.ipynb`. 

In `notebook_2.ipynb`, tag the cell containing the below variables as `parameters`.

| **Parameter**      | **Description**                   | **Type**  |
|--------------------|-----------------------------------|-----------|
| `processed_csv`    | Path to the processed CSV file    | String    |


**Example** Run `notebook_1.csv` specifying that the output should be called `processed_run_1.csv`

In [4]:
params = dict(output_csv = "processed_run_1.csv")
pm.execute_notebook(
    'parameterization/notebook_1.ipynb',
    'output_notebook_run_1.ipynb',
    parameters = params
);

  from .autonotebook import tqdm as notebook_tqdm
Executing: 100%|█████████████████████████████████████████████████████████████████████| 12/12 [00:03<00:00,  3.33cell/s]


When you open output_notebook_run_1.ipynb, you will see that it is the same as the input notebook with an addition of one more cell with our injected parameters right beneath the cell tagged as parameters.

Run `notebook_1.ipynb` specifying that the output should be called `processed_run_2.csv`

In [6]:
params = dict(output_csv = "processed_run_2.csv")
pm.execute_notebook(
    'parameterization/notebook_1.ipynb',
    'output_notebook_run_2.ipynb',
    parameters = params
);

Executing: 100%|█████████████████████████████████████████████████████████████████████| 12/12 [00:03<00:00,  3.53cell/s]


Let's assume that we have a new dataset stored remotely in `url="https://uni-bonn.sciebo.de/s/W7BDPZefE53j7sy/download"`. 
We want to process that data the same way with `notebook_1.ipynb`.

Run `notebook_1.ipynb` specifying the remote source is now in `https://uni-bonn.sciebo.de/s/W7BDPZefE53j7sy/download`

In [9]:
params = dict(input_csv_url = "https://uni-bonn.sciebo.de/s/W7BDPZefE53j7sy/download")
pm.execute_notebook(
    'parameterization/notebook_1.ipynb',
    'parameterization/output_notebook_run_3.ipynb',
    parameters = params
);

Executing: 100%|█████████████████████████████████████████████████████████████████████| 12/12 [00:02<00:00,  4.03cell/s]
