In [None]:
# !pip install papermill

In [50]:
import papermill as pm
import glob

## Parameterization of Jupyter Notebooks with Papermill

Parameterizing Jupyter notebooks means making them flexible so you can easily change input values, like datasets, without having to edit the code each time. This is helpful when you need to run the same analysis or process with different data. It saves time and avoids mistakes because you don’t have to rewrite anything. Instead, you can just set the new inputs and run the notebook again for different tasks or data.

## Passing in Parameters To Jupyter Notebooks

Papermill helps with parameterizing Jupyter notebooks by allowing you to inject new inputs (parameters) into a notebook before running it. You define placeholders for these parameters in the notebook, and when you run Papermill, it fills those placeholders with the actual values you provide. Papermill then executes the entire notebook with the new inputs, saving the results in a new output notebook. This makes it easy to reuse the same notebook for different data or settings, automating tasks like batch processing, reporting, or experimentation with different variables.

For this example we will use two notebooks: notebook_1.ipynb (access csv from a remote url and prepares a processed csv) and notebook_2.ipynb (needs processed csv for analysis)

We will parameterize the following

**Notebook 1**

| **Parameter**   | **Description**                   | **Type**  |
|-----------------|-----------------------------------|-----------|
| `input_csv_url`     | url of input CSV file        | String    |
| `output_csv`    | Path to save the processed CSV    | String    |
| `num_rows_display`    | Number of rows to display    | Integer    |

| **Output**      | **Description**                   | **Type**  |
|-----------------|-----------------------------------|-----------|
| `Processed CSV` | CSV file with processed data      | CSV File  |



**Notebook 2**

| **Parameter**      | **Description**                   | **Type**  |
|--------------------|-----------------------------------|-----------|
| `processed_csv`    | Path to the processed CSV file    | String    |


**Example** To make papermill know that a cell contains parameters
1. Put all parameters in a single cell before any other cell that uses them
2. Click on the cell and then the gear icon next to the notebook
3. Type `parameters` within Cell Tags

We have done that for `notebook_1`. Can you do the same for `notebook_2` 

**Example** Execute papermill injecting output_csv called "processed_1.csv" 

In [6]:
params = dict(output_csv = "processed_1.csv")
pm.execute_notebook(
    'papermill_workflow/notebook_1.ipynb',
    'papermill_workflow/output_notebook_1.ipynb',
    parameters = params
);

Executing: 100%|███████████████████████████████████████████████████████████████| 10/10 [00:02<00:00,  3.49cell/s]


When you open output_notebook_1.ipynb, you will see that it is the same as the input notebook with an addition of one more cell with our injected parameters right beneath the cell tagged as parameters.

Execute papermill injecting output_csv called "processed_1_new.csv"

In [8]:
params = dict(output_csv = "processed_1_new.csv")
pm.execute_notebook(
    'papermill_workflow/notebook_1.ipynb',
    'papermill_workflow/output_notebook_1.ipynb',
    parameters = 
);

Executing: 100%|███████████████████████████████████████████████████████████████| 10/10 [00:02<00:00,  3.43cell/s]


Execute papermill injecting "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download" as input_csv_url

In [9]:
params = dict(input_csv_url = "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download")
pm.execute_notebook(
    'papermill_workflow/notebook_1.ipynb',
    'papermill_workflow/output_notebook_1.ipynb',
    parameters = params
);

Executing: 100%|███████████████████████████████████████████████████████████████| 10/10 [00:02<00:00,  3.42cell/s]


**Example** Inject "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download" for input url and processed_mul_1.csv as output

In [10]:
params = dict(input_csv_url = "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download", output_csv = "processed_mul_1.csv")
pm.execute_notebook(
    'papermill_workflow/notebook_1.ipynb',
    'papermill_workflow/output_notebook_1.ipynb',
    parameters = params
);

Executing: 100%|███████████████████████████████████████████████████████████████| 10/10 [00:02<00:00,  3.58cell/s]


Inject "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download" for input url and processed_mul_1_new.csv as output

In [12]:
params = dict(input_csv_url = "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download", output_csv = "processed_mul_1_new.csv")
pm.execute_notebook(
    'papermill_workflow/notebook_1.ipynb',
    'papermill_workflow/output_notebook_1.ipynb',
    parameters = params
);

Executing: 100%|███████████████████████████████████████████████████████████████| 12/12 [00:02<00:00,  4.15cell/s]


Inject "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download" for input url, processed_mul_1_new.csv as output, and display 3 rows

In [None]:
params = dict(input_csv_url = "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download", output_csv = "processed_mul_1_new.csv")
pm.execute_notebook(
    'papermill_workflow/notebook_1.ipynb',
    'papermill_workflow/output_notebook_1.ipynb',
    parameters = params
);

## Looping Through Parameters

We saw how to parameterize one notebook. Now let's loop through parameters.

Let's start with a small example with loops

**Example** Use a for-loop to iterate over 

```python
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
```

In [18]:
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
for i, input_csv_url in enumerate(input_csv_urls):
    print(i, input_csv_url)

0 https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download
1 https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download


Here, for loop is printing every input_csv_url along with its index in the list.

Use a for-loop to iterate over 

```python
num_rows = [5, 10]
```

In [20]:
num_rows = [5, 10]
for i, num_row in enumerate(num_rows):
    print(i, num_row)

0 5
1 10


Use a for-loop to iterate over 

```python
processed_files = ["processed_1.csv", "processed_2.csv"]
```

In [21]:
processed_files = ["processed_1.csv", "processed_2.csv"]
for i, processed_file in enumerate(processed_files):
    print(i, processed_file)

0 processed_1.csv
1 processed_2.csv


**Example** Use a for-loop to iterate over 

```python
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
num_rows = [5, 10]
```

In [22]:
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
num_rows = [5, 10]
for i, input_csv_url in enumerate(input_csv_urls):
    print(i, input_csv_url, num_rows[i])

0 https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download 5
1 https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download 10


Use a for-loop to iterate over 

```python
num_rows = [5, 10]
processed_files = ["processed_1.csv", "processed_2.csv"]
```

In [23]:
num_rows = [5, 10]
processed_files = ["processed_1.csv", "processed_2.csv"]
for i, num_row in enumerate(num_rows):
    print(i, num_row, processed_files[i])

0 5 processed_1.csv
1 10 processed_2.csv


Use a for-loop to iterate over 

```python
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
num_rows = [5, 10]
processed_files = ["processed_1.csv", "processed_2.csv"]
```

In [24]:
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
num_rows = [5, 10]
processed_files = ["processed_1.csv", "processed_2.csv"]

for i, input_csv_url in enumerate(input_csv_urls):
    print(i, input_csv_url, num_rows[i], processed_files[i])

0 https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download 5 processed_1.csv
1 https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download 10 processed_2.csv


**Example** Run papermill on notebook_1.ipynb with the below parameters

```python
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
num_rows = [5, 10]
```

In [26]:
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
num_rows = [5, 10]

for i, input_csv_url in enumerate(input_csv_urls):
    params = dict(input_csv_url=input_file, num_rows_display=num_rows[i])
    pm.execute_notebook(
        'papermill_workflow/notebook_1.ipynb',
        'papermill_workflow/output_notebook1.ipynb',
        parameters = params
    )


Executing: 100%|███████████████████████████████████████████████████████████████| 12/12 [00:02<00:00,  4.27cell/s]
Executing: 100%|███████████████████████████████████████████████████████████████| 12/12 [00:02<00:00,  4.39cell/s]


Run papermill on notebook_1.ipynb with the below parameters

```python
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
processed_files = ["processed_1.csv", "processed_2.csv"]
```

In [27]:
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
processed_files = ["processed_1.csv", "processed_2.csv"]

for i, input_csv_url in enumerate(input_csv_urls):
    params = dict(input_csv_url=input_file, output_csv=processed_files[i])
    pm.execute_notebook(
        'papermill_workflow/notebook_1.ipynb',
        'papermill_workflow/output_notebook1.ipynb',
        parameters = params
    )


Executing: 100%|███████████████████████████████████████████████████████████████| 12/12 [00:06<00:00,  1.96cell/s]
Executing: 100%|███████████████████████████████████████████████████████████████| 12/12 [00:02<00:00,  4.10cell/s]


Run papermill on notebook_1.ipynb with the below parameters

```python
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
num_rows = [5, 10]
processed_files = ["processed_1.csv", "processed_2.csv"]
```

In [28]:
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
num_rows = [5, 10]
processed_files = ["processed_1.csv", "processed_2.csv"]

for i, input_csv_url in enumerate(input_csv_urls):
    params = dict(input_csv_url=input_file, output_csv=processed_files[i], num_rows_display=num_rows[i])
    pm.execute_notebook(
        'papermill_workflow/notebook_1.ipynb',
        'papermill_workflow/output_notebook1.ipynb',
        parameters = params
    )


Executing: 100%|███████████████████████████████████████████████████████████████| 12/12 [00:02<00:00,  4.23cell/s]
Executing: 100%|███████████████████████████████████████████████████████████████| 12/12 [00:02<00:00,  4.33cell/s]


Run papermill on notebook_1.ipynb with the below parameters

```python
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
num_rows = [5, 10]
processed_files = ["processed_1.csv", "processed_2.csv"]
ouput_notebooks = ["papermill_workflow/o1.csv", "papermill_workflow/o2.csv"]
```

In [32]:
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
num_rows = [5, 10]
processed_files = ["processed_1.csv", "processed_2.csv"]
output_notebooks = ["papermill_workflow/o1.ipynb", "papermill_workflow/o2.ipynb"]


for i, input_csv_url in enumerate(input_csv_urls):
    params = dict(input_csv_url=input_file, output_csv=processed_files[i], num_rows_display=num_rows[i])
    pm.execute_notebook(
        'papermill_workflow/notebook_1.ipynb',
        output_notebooks[i],
        parameters = params
    )


Executing: 100%|███████████████████████████████████████████████████████████████| 12/12 [00:02<00:00,  4.80cell/s]
Executing: 100%|███████████████████████████████████████████████████████████████| 12/12 [00:02<00:00,  4.28cell/s]


## Naming Ouput Notebooks with f-strings

As you saw in the previous section, using for-loop can automate notebook execution workflow. 
We used only one output file for all the iterations.
But it is important to keep separate notebook for each iteration to go back and look at issues and/or analysis.
Here, let's learn a way to create new output notebooks automatically without having to create another list.

**Example** Display "Hello, John Doe" with f-string

In [33]:
name = "John Doe"
f"Hello, {name}"

'Hello, John Doe'

Display "Hello, Jane Doe" with f-string

In [34]:
name = "Jane Doe"
f"Hello, {name}"

'Hello, Jane Doe'

Display "Hello, John Doe and Jane Doe"

In [35]:
name_1 = "John Doe"
name_2 = "Jane Doe"
f"Hello, {name_1} and {name_2}"

'Hello, John Doe and Jane Doe'

**Example** Set output_file_1.csv as output_name where 1 is a variable 

In [36]:
num = 1
output_name = f"output_file_{num}"
output_name

'output_file_1'

Set output_file_5.csv as output_name where 5 is a variable

In [37]:
num = 5
output_name = f"output_file_{num}"
output_name

'output_file_5'

Set output_file_01.csv as output_name where "01" is a variable

In [38]:
num = "01"
output_name = f"output_file_{num}"
output_name

'output_file_01'

**Example** Create new output file names for each of the value in the below list "output_{num}.ipynb" format where num is the index of the url

```python
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
```

In [40]:
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
for i, input_csv_url in enumerate(input_csv_urls):
    output_file_name = f"output_{i}.ipynb"
    print(input_csv_url, output_file_name)

https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download output_0.ipynb
https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download output_1.ipynb


Create new output file names for each of the value in the below list in "output_{num_rows}.ipynb" format where num_rows is the index of the num_rows

```python
num_rows = [5, 10]
```

In [42]:
num_rows = [5, 10]
for i, num_row in enumerate(num_rows):
    output_file_name = f"output_{i}.ipynb"
    print(num_row, output_file_name)

5 output_0.ipynb
10 output_1.ipynb


Create new output file names for each of the value in the below list in "output_num_rows_{num_rows}.ipynb" format where num_rows is the value of the num_rows

```python
num_rows = [5, 10]
```

In [43]:
num_rows = [5, 10]
for i, num_row in enumerate(num_rows):
    output_file_name = f"output_{num_row}.ipynb"
    print(num_row, output_file_name)

5 output_5.ipynb
10 output_10.ipynb


Create new output file names for each of the value in the below list in "output_{num_rows}_{num}.ipynb" format where num_rows is the value of the num_rows and num in the index

```python
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
num_rows = [5, 10]
```

In [45]:
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
num_rows = [5, 10]
for i, input_csv_url in enumerate(input_csv_urls):
    output_file_name = f"output_{num_rows[i]}_{i}.ipynb"
    print(input_csv_url, output_file_name)

https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download output_5_0.ipynb
https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download output_10_1.ipynb


**Example** Run papermill on notebook_1.ipynb with the below parameters with output file for each input as 'output_{num}' where is num value and num is the index

```python
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
```

In [48]:
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]

for i, input_csv_url in enumerate(input_csv_urls):
    params = dict(input_csv_url=input_file, num_rows_display=num_rows[i])
    output_file_name = f"output_{i}.ipynb"
    pm.execute_notebook(
        'papermill_workflow/notebook_1.ipynb',
        output_file_name,
        parameters = params
    )

Executing: 100%|███████████████████████████████████████████████████████████████| 12/12 [00:03<00:00,  3.30cell/s]
Executing: 100%|███████████████████████████████████████████████████████████████| 12/12 [00:03<00:00,  3.98cell/s]


Run papermill on notebook_1.ipynb with the below parameters with output file for each input as 'output_{num_row}_{num}' where num_row is num_rows value and num is the index

```python
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
num_rows = [2, 5]
```

In [47]:
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
num_rows = [2, 5]

for i, input_csv_url in enumerate(input_csv_urls):
    params = dict(input_csv_url=input_file, num_rows_display=num_rows[i])
    output_file_name = f"output_{num_rows[i]}_{i}.ipynb"
    pm.execute_notebook(
        'papermill_workflow/notebook_1.ipynb',
        output_file_name,
        parameters = params
    )

Executing: 100%|███████████████████████████████████████████████████████████████| 12/12 [00:02<00:00,  4.19cell/s]
Executing: 100%|███████████████████████████████████████████████████████████████| 12/12 [00:02<00:00,  4.23cell/s]


Run papermill on notebook_1.ipynb with the below parameters with output file for each input as 'output_{processed_csv_name}_{num}.ipynb' where processed_csv_name is name of the csv file value and num is the index

```python
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
processed_files = ["processed_1.csv", "processed_2.csv"]
```

In [49]:
input_csv_urls = ["https://uni-bonn.sciebo.de/s/FYJPmdTyPo1qwRX/download", "https://uni-bonn.sciebo.de/s/O1ybfiVcqROP00W/download"]
processed_files = ["processed_1.csv", "processed_2.csv"]

for i, input_csv_url in enumerate(input_csv_urls):
    params = dict(input_csv_url=input_file, output_csv=processed_files[i])
    output_file_name = f"output_{processed_files[i]}_{i}.ipynb"
    pm.execute_notebook(
        'papermill_workflow/notebook_1.ipynb',
        output_file_name,
        parameters = params
    )

Executing: 100%|███████████████████████████████████████████████████████████████| 12/12 [00:02<00:00,  4.28cell/s]
Executing: 100%|███████████████████████████████████████████████████████████████| 12/12 [00:02<00:00,  4.98cell/s]


## Accessing Files From Disk

When working with multiple data files that need to be processed in a consistent and automated way, managing the workflow efficiently becomes crucial. 
For instance, if you have several CSV files that must be processed through the same Jupyter notebook for analysis or transformation, running the notebook manually for each file can be time-consuming and prone to errors. 
You need a method to dynamically find all files in a directory and feed them into a Papermill-powered notebook for processing. Without this automation, it would require repetitive tasks and manual handling, making it difficult to scale or maintain consistency. 

Here we will look at a solution where using glob.glob to automate the process of finding all csv files to pass into papermill.

**Example** List all files in the current directory

In [52]:
files = glob.glob('*')
files

['05_Workflow Integration in Jupyter.ipynb',
 '06_Managing Notebooks and Scripts using Command-Line Tools in Jupyter.ipynb',
 '07_.ipynb',
 '08_.ipynb',
 'analysis_workflow',
 'data',
 'data_analysis',
 'experiment_info.py',
 'experiment_info.txt',
 'output_0.ipynb',
 'output_1.ipynb',
 'output_10_1.ipynb',
 'output_2_0.ipynb',
 'output_5_0.ipynb',
 'output_5_1.ipynb',
 'output_processed_1.csv_0.ipynb',
 'output_processed_2.csv_1.ipynb',
 'papermill_workflow',
 'parameters.png',
 'processed.csv',
 'processed_1.csv',
 'processed_1_new.csv',
 'processed_2.csv',
 'processed_mul_1.csv',
 'processed_mul_1_new.csv',
 'run_section',
 'script.ipynb',
 'script.py']

List all files in `papermill_workflow` directory

In [53]:
files = glob.glob('papermill_workflow/*')
files

['papermill_workflow\\mouse_counts_part1.csv',
 'papermill_workflow\\mouse_counts_part2.csv',
 'papermill_workflow\\notebook_1.ipynb',
 'papermill_workflow\\notebook_2.ipynb',
 'papermill_workflow\\o1.csv',
 'papermill_workflow\\o1.ipynb',
 'papermill_workflow\\o2.csv',
 'papermill_workflow\\o2.ipynb',
 'papermill_workflow\\output_notebook1.ipynb',
 'papermill_workflow\\output_notebook_1.ipynb',
 'papermill_workflow\\output_notebook_1_new.ipynb',
 'papermill_workflow\\output_notebook_2.ipynb',
 'papermill_workflow\\output_notebook_2_new.ipynb',
 'papermill_workflow\\processed.csv']

List all jupyter notebooks files in current directory

In [57]:
files = glob.glob('*.ipynb')
files

['05_Workflow Integration in Jupyter.ipynb',
 '06_Managing Notebooks and Scripts using Command-Line Tools in Jupyter.ipynb',
 '07_.ipynb',
 '08_.ipynb',
 'output_0.ipynb',
 'output_1.ipynb',
 'output_10_1.ipynb',
 'output_2_0.ipynb',
 'output_5_0.ipynb',
 'output_5_1.ipynb',
 'output_processed_1.csv_0.ipynb',
 'output_processed_2.csv_1.ipynb',
 'script.ipynb']

List all jupyter notebooks files in papermill_workflow directory

In [58]:
files = glob.glob('papermill_workflow/*.ipynb')
files

['papermill_workflow\\notebook_1.ipynb',
 'papermill_workflow\\notebook_2.ipynb',
 'papermill_workflow\\o1.ipynb',
 'papermill_workflow\\o2.ipynb',
 'papermill_workflow\\output_notebook1.ipynb',
 'papermill_workflow\\output_notebook_1.ipynb',
 'papermill_workflow\\output_notebook_1_new.ipynb',
 'papermill_workflow\\output_notebook_2.ipynb',
 'papermill_workflow\\output_notebook_2_new.ipynb']

We can use this approach to find all processed csv files in our directory and then run papermill on notebook_2.ipynb

Download the processed directory from the URL and place it in the directory

[processed_files](https://uni-bonn.sciebo.de/s/o9HQzca7NZwRCns/download)

Find all csv files

In [59]:
files = glob.glob('*.csv')
files

['processed.csv',
 'processed_1.csv',
 'processed_1_new.csv',
 'processed_2.csv',
 'processed_mul_1.csv',
 'processed_mul_1_new.csv']

Find all files starting with processed

In [60]:
files = glob.glob('processed*')
files

['processed.csv',
 'processed.zip',
 'processed_1.csv',
 'processed_1_new.csv',
 'processed_2 (1).csv',
 'processed_2.csv',
 'processed_mul_1.csv',
 'processed_mul_1_new.csv']

Find all files starting with processed in processed directory

In [62]:
files = glob.glob('processed/processed*')
files

['processed\\processed_1.csv', 'processed\\processed_2.csv']

**Example** glob syntax applied to automation with papermill

In [66]:
processed_files = glob.glob('processed/processed*')

for i, processed_file in enumerate(processed_files):
    params = dict(processed_csv=processed_file)
    output_file_name = f"output_nb2_processed_{i}.ipynb"
    pm.execute_notebook(
        'papermill_workflow/notebook_2.ipynb',
        output_file_name,
        parameters = params
    )


Executing:   0%|                                                                         | 0/8 [00:00<?, ?cell/s][A
Executing: 100%|█████████████████████████████████████████████████████████████████| 8/8 [00:01<00:00,  4.10cell/s][A

Executing:   0%|                                                                         | 0/8 [00:00<?, ?cell/s][A
Executing: 100%|█████████████████████████████████████████████████████████████████| 8/8 [00:01<00:00,  4.49cell/s][A
