In [2]:
# !pip install papermill nbconvert jupytext seaborn

In [3]:
import papermill as pm

# Automate Notebook Execution and Parameterization

In this notebook we learn how to use command-line tools automate the execution and management of Jupyter notebooks. 
We start by learning how to run command-line commands, like managing files or installing software, directly from the notebook. 
Then, we explore how to run entire notebooks from the command line, which helps when we need to automate tasks. 
We then see how to pass in parameters to a template notebook to generate automated analysis reports.
We also see how to do batch processing of notebooks using Papermill.

## Running Command-Line Commands in Jupyter

A command line is a text-based interface that allows users to interact with their computer’s operating system by typing commands, rather than using graphical interfaces.
In this interface, users can navigate directories, manage files, run programs, and perform a wide range of tasks by typing specific commands.
Popular command-line environments include Bash (common in Linux and macOS) and the Windows Command Prompt or PowerShell.

As researchers we may need to use command-line for file management (move, rename, delete, or organize datasets), automate repeating tasks that may involve external tools, install software etc. 

Incorporating command-line commands into our analysis notebooks allows us to integrate external tools, automate repeating tasks, and manage data all within the same environment. 

**Run below code to downlaod data for this section. You do not have to know the contents of the file as we are only learning how to manage files**

In [4]:
import sys
sys.path.append('src')
import sciebo

sciebo.download_file('https://uni-bonn.sciebo.de/s/yDiGZT44SXLvK5r', 'command_line/text_config.txt')
sciebo.download_file('https://uni-bonn.sciebo.de/s/apw9RMXjgfhQaK5', 'command_line/python_config.py')
sciebo.download_file('https://uni-bonn.sciebo.de/s/lwVMGbzKQXFuIax', 'command_line/notebook_config.ipynb')

Downloading command_line/text_config.txt: 100%|██████████| 79.0/79.0 [00:00<00:00, 39.5kB/s]
Downloading command_line/python_config.py: 100%|██████████| 80.0/80.0 [00:00<00:00, 39.9kB/s]
Downloading command_line/notebook_config.ipynb: 100%|██████████| 2.41k/2.41k [00:00<00:00, 2.39MB/s]


**Example** Install `pandas`

In [5]:
# !pip install pandas

Install `numpy`

In [6]:
# !pip install numpy

Install seaborn

In [7]:
# !pip install seaborn

You can use any option that comes along with the command-line command

**Example** Upgrade matplotlib

In [8]:
# !pip install --upgrade matplotlib

Upgrade seaborn

In [9]:
# !pip install --upgrade seaborn

Upgrade nbformat

In [10]:
# !pip install --upgrade nbformat

Let's practice converting scripts to notebooks

**Example** Convert `script.py` (run below code to generate the file) to notebook. How does the resulting notebook look?

In [11]:
%%writefile script.py
num_mouse = 10
num_contrast_left = 4
num_contrast_right = 4

Overwriting script.py


In [12]:
!jupytext --to notebook script.py

[jupytext] Reading script.py in format py
[jupytext] Writing script.ipynb


Convert `script.py` (run below code to generate the file) to notebook. How does the resulting notebook look?

In [13]:
%%writefile script.py
num_mouse = 10
num_contrast_left = 4
num_contrast_right = 4

print(num_mouse)

Overwriting script.py


In [14]:
!jupytext --to notebook script.py

[jupytext] Reading script.py in format py
[jupytext] Writing script.ipynb (destination file replaced [use --update to preserve cell outputs and ids])


Convert `script.py` (run below code to generate the file) to notebook. How does the resulting notebook look?

In [15]:
%%writefile script.py
num_mouse = 10
num_contrast_left = 4
num_contrast_right = 4

num_mouse

Overwriting script.py


In [16]:
!jupytext --to notebook script.py

[jupytext] Reading script.py in format py
[jupytext] Writing script.ipynb (destination file replaced [use --update to preserve cell outputs and ids])


Convert `script.py` (run below code to generate the file) to notebook. How does the resulting notebook look?

In [17]:
# %% [markdown]
# Title

# %%
a = 10

In [18]:
!jupytext --to notebook script.py

[jupytext] Reading script.py in format py
[jupytext] Writing script.ipynb (destination file replaced [use --update to preserve cell outputs and ids])


Create `script.py` with the a title "Data Analysis" and a=10, b=100. Convert it to notebook. How does the resulting notebook look?

In [19]:
# %% [markdown]
# Data Analysis

# %%
a=10
b=10

In [20]:
!jupytext --to notebook script.py

[jupytext] Reading script.py in format py
[jupytext] Writing script.ipynb (destination file replaced [use --update to preserve cell outputs and ids])


**Example** Create a new directory called `data_1`

In [21]:
!mkdir data_1

Create a new directory `data_2`

In [22]:
!mkdir data_2

Create a new directory `data_1/data_1_sub`

(`data_1\data_1_sub` for windows machines)

In [23]:
!mkdir data_1\data_1_sub

We can run Linux command-line commands within a cell using %%bash

**Example** Copy `magic_commands/hello.py` to `data_1` directory

In [24]:
%%bash
cp command_line/python_config.py data_1/python_config.py

Copy `magic_commands/text_config.txt` to `data_1`

In [25]:
%%bash
cp command_line/text_config.txt data_1/text_config.txt

Copy `magic_commands/notebook_config.ipynb` to `data_1/data_1_sub` with a name `nb_config.ipynb`

In [26]:
%%bash
cp command_line/notebook_config.ipynb data_1/data_1_sub/nb_config.ipynb

In [27]:
%%bash
rm data_1/text_config.txt

Delete `data_1/python_config.txt` (Only file)

In [28]:
%%bash
rm data_1/python_config.py

Delete `data_2` directory

In [29]:
%%bash
rm -r data_2

Delete `data_1` including sub-directories

In [30]:
%%bash
rm -r data_1

---

**Run below code to download data for this section**

In [31]:
import sys
sys.path.append('src')
import sciebo

sciebo.download_file('https://uni-bonn.sciebo.de/s/nih6mIiDSLOlPHU', 'data/2016-12-14_Cori.csv')
sciebo.download_file('https://uni-bonn.sciebo.de/s/dkPOipzGNjkBiXQ', 'parameterization/01_notebook_brain_area.ipynb')
sciebo.download_file('https://uni-bonn.sciebo.de/s/WReS5HIxAK8cws4', 'parameterization/02_notebook_fixed_response.ipynb')
sciebo.download_file('https://uni-bonn.sciebo.de/s/QxcX90gL9B7paar', 'parameterization/03_notebook_fixed_feedback.ipynb')

Downloading data/2016-12-14_Cori.csv: 100%|██████████| 25.5M/25.5M [00:07<00:00, 3.47MB/s]
Downloading parameterization/01_notebook_brain_area.ipynb: 100%|██████████| 4.29k/4.29k [00:00<00:00, 8.41MB/s]
Downloading parameterization/02_notebook_fixed_response.ipynb: 100%|██████████| 2.36k/2.36k [00:00<00:00, 2.36MB/s]
Downloading parameterization/03_notebook_fixed_feedback.ipynb: 100%|██████████| 2.37k/2.37k [00:00<?, ?B/s]


## Executing Notebooks from Command Line

Running a notebook from command-line can be useful to automate execution of Jupyter notebook as part of a workflow or pipeline.
It can help us integrate it with task scheduling tools to perform routine tasks without manually opening and running the notebook.
Another use would be when dealing with multiple notebooks, running from command-line allows for batch processing enabling us to execute several notebooks sequentially without manually interacting with each one.

Here we will look into a tool called `papermill` that can execute notebooks from command-line. For this, we use three notebooks

1. `parameterization/01_notebook_brain_area.ipynb`: Filters 2016-12-14_Cori.csv to a selected brain area to make a processed csv file. By default, it will be `VISp`
2. `parameterization/02_notebook_fixed_response.ipynb`: Based on the selected response type, it examines how feedback affects LFP signals in the brain area using the processed csv.
3. `parameterization/03_notebook_fixed_feedback.ipynb`: Based on the selected feedback type, it examines how mice's response affects LFP signals in the brain area using the processed csv.


Notebooks 2 and 3 are not dependent on each other.
Both use the output from `notebook 1` for their analysis. 

**Example** Execute notebook1 as `output.ipynb` and examine it. Was any other file generated from this?

In [32]:
!papermill parameterization/01_notebook_brain_area.ipynb output.ipynb

Input Notebook:  parameterization/01_notebook_brain_area.ipynb
Output Notebook: output.ipynb

Executing:   0%|          | 0/18 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:   6%|▌         | 1/18 [00:02<00:40,  2.39s/cell]
Executing:  11%|█         | 2/18 [00:03<00:22,  1.40s/cell]
Executing:  33%|███▎      | 6/18 [00:03<00:04,  2.50cell/s]
Executing:  56%|█████▌    | 10/18 [00:03<00:01,  4.78cell/s]
Executing:  72%|███████▏  | 13/18 [00:03<00:00,  6.84cell/s]
Executing: 100%|██████████| 18/18 [00:04<00:00,  8.64cell/s]
Executing: 100%|██████████| 18/18 [00:04<00:00,  3.90cell/s]


Execute notebook 2 as `output.ipynb` and examine the output.

In [33]:
!papermill parameterization/02_notebook_fixed_response.ipynb output.ipynb

Input Notebook:  parameterization/02_notebook_fixed_response.ipynb
Output Notebook: output.ipynb

Executing:   0%|          | 0/8 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:  12%|█▎        | 1/8 [00:05<00:38,  5.52s/cell]
Executing:  62%|██████▎   | 5/8 [00:05<00:02,  1.18cell/s]
Executing: 100%|██████████| 8/8 [00:06<00:00,  1.68cell/s]
Executing: 100%|██████████| 8/8 [00:07<00:00,  1.14cell/s]


Execute notebook 3 as `output.ipynb` and examine the output.

In [34]:
!papermill parameterization/03_notebook_fixed_feedback.ipynb output.ipynb

Input Notebook:  parameterization/03_notebook_fixed_feedback.ipynb
Output Notebook: output.ipynb

Executing:   0%|          | 0/8 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:  12%|█▎        | 1/8 [00:04<00:34,  4.86s/cell]
Executing:  62%|██████▎   | 5/8 [00:04<00:02,  1.33cell/s]
Executing: 100%|██████████| 8/8 [00:06<00:00,  1.70cell/s]
Executing: 100%|██████████| 8/8 [00:06<00:00,  1.16cell/s]


Delete `output_data/processed_brain_area.csv` file.

Execute notebook 3 as `output.ipynb` and examine it. What do you see?

In [35]:
# !papermill parameterization/03_notebook_fixed_feedback.ipynb output.ipynb

It gives an error in the output of the cell. 
In `data_analysis/output.ipynb`, you will see a huge error in red on top of the notebook and another red text before the cell where it encountered an error.

Let's see how to execute them sequentially

**Example** Execute notebooks 1 and 2 one after the other.

In [36]:
!papermill parameterization/01_notebook_brain_area.ipynb output_1.ipynb
!papermill parameterization/02_notebook_fixed_response.ipynb output_2.ipynb

Input Notebook:  parameterization/01_notebook_brain_area.ipynb
Output Notebook: output_1.ipynb

Executing:   0%|          | 0/18 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:   6%|▌         | 1/18 [00:02<00:46,  2.75s/cell]
Executing:  11%|█         | 2/18 [00:03<00:25,  1.56s/cell]
Executing:  33%|███▎      | 6/18 [00:03<00:05,  2.24cell/s]
Executing:  56%|█████▌    | 10/18 [00:04<00:01,  4.31cell/s]
Executing:  78%|███████▊  | 14/18 [00:04<00:00,  6.92cell/s]
Executing: 100%|██████████| 18/18 [00:04<00:00,  7.65cell/s]
Executing: 100%|██████████| 18/18 [00:05<00:00,  3.54cell/s]
Input Notebook:  parameterization/02_notebook_fixed_response.ipynb
Output Notebook: output_2.ipynb

Executing:   0%|          | 0/8 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:  12%|█▎        | 1/8 [00:05<00:35,  5.09s/cell]
Executing:  88%|████████▊ | 7/8 [00:05<00:00,  1.83cell/s]
Executing: 100%|██████████| 8/8 [00:06<00:00,  1.25cell/s]


Execute notebooks 1 and 3 one after the other.

In [37]:
!papermill parameterization/01_notebook_brain_area.ipynb output_1.ipynb
!papermill parameterization/03_notebook_fixed_feedback.ipynb output_3.ipynb

Input Notebook:  parameterization/01_notebook_brain_area.ipynb
Output Notebook: output_1.ipynb

Executing:   0%|          | 0/18 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:   6%|▌         | 1/18 [00:05<01:27,  5.17s/cell]
Executing:  11%|█         | 2/18 [00:05<00:41,  2.59s/cell]
Executing:  33%|███▎      | 6/18 [00:06<00:08,  1.45cell/s]
Executing:  50%|█████     | 9/18 [00:06<00:03,  2.53cell/s]
Executing:  61%|██████    | 11/18 [00:06<00:02,  3.38cell/s]
Executing: 100%|██████████| 18/18 [00:07<00:00,  5.94cell/s]
Executing: 100%|██████████| 18/18 [00:07<00:00,  2.32cell/s]
Input Notebook:  parameterization/03_notebook_fixed_feedback.ipynb
Output Notebook: output_3.ipynb

Executing:   0%|          | 0/8 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:  12%|█▎        | 1/8 [00:05<00:36,  5.24s/cell]
Executing:  62%|██████▎   | 5/8 [00:05<00:02,  1.25cell/s]
Executing: 100%|██████████| 8/8 [00:06<00:00,  1.55cell/s]
Executing: 100%|█████

Execute all the three notebooks one after the other

In [38]:
!papermill parameterization/01_notebook_brain_area.ipynb output_1.ipynb
!papermill parameterization/02_notebook_fixed_response.ipynb output_2.ipynb
!papermill parameterization/03_notebook_fixed_feedback.ipynb output_3.ipynb

Input Notebook:  parameterization/01_notebook_brain_area.ipynb
Output Notebook: output_1.ipynb

Executing:   0%|          | 0/18 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:   6%|▌         | 1/18 [00:05<01:35,  5.63s/cell]
Executing:  11%|█         | 2/18 [00:06<00:48,  3.03s/cell]
Executing:  33%|███▎      | 6/18 [00:07<00:09,  1.24cell/s]
Executing:  44%|████▍     | 8/18 [00:07<00:05,  1.86cell/s]
Executing:  61%|██████    | 11/18 [00:07<00:02,  3.05cell/s]
Executing:  83%|████████▎ | 15/18 [00:07<00:00,  5.19cell/s]
Executing: 100%|██████████| 18/18 [00:08<00:00,  5.20cell/s]
Executing: 100%|██████████| 18/18 [00:08<00:00,  2.02cell/s]
Input Notebook:  parameterization/02_notebook_fixed_response.ipynb
Output Notebook: output_2.ipynb

Executing:   0%|          | 0/8 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:  12%|█▎        | 1/8 [00:06<00:47,  6.76s/cell]
Executing:  62%|██████▎   | 5/8 [00:06<00:03,  1.05s/cell]
Executing: 100%|███

---

## Passing in Parameters To Notebooks With Papermill

Papermill helps with parameterizing Jupyter notebooks by allowing us to inject new inputs (parameters) into a notebook before running it. 
Parameters have placeholders in the template notebook, and when we run Papermill, it fills those placeholders with the actual values we provide. 
Papermill then executes the entire notebook with the new inputs, saving the results in a new output notebook. 
This makes it easy to reuse the same notebook as a template for different data or settings essentially creating an analysis report for different parameter.

For this example we will use two same notebooks as the previous section and get some practice with passing parameters to template notebooks.

**Setting Parameters**

To make papermill know that a cell contains parameters

1. Put all parameters in a single cell before any other cell that uses them
2. Click on the cell and then the gear icon next to the notebook
3. Type `parameters` within Cell Tags

Do this for all the three notebooks

With papermill, we can pass different values for any variable inside the cell tagged as `parameters` by adding a `-p` for each parameter.

In this section, let us use the three notebooks as templates and make reports for different brain areas, responses, and feedbacks to learn how papermill works. Same technique can be applied to complex problems as well. 

**Example** Run notebook 1 specifying that the output should be called `processed_VISp.csv`

In [39]:
!papermill parameterization/01_notebook_brain_area.ipynb -p output_csv output_data/processed_visp.csv 01_notebook_visp.ipynb

Input Notebook:  parameterization/01_notebook_brain_area.ipynb
Output Notebook: 01_notebook_visp.ipynb

Executing:   0%|          | 0/19 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:   5%|▌         | 1/19 [00:04<01:18,  4.34s/cell]
Executing:  11%|█         | 2/19 [00:05<00:40,  2.35s/cell]
Executing:  37%|███▋      | 7/19 [00:05<00:06,  1.86cell/s]
Executing:  47%|████▋     | 9/19 [00:05<00:03,  2.61cell/s]
Executing:  63%|██████▎   | 12/19 [00:06<00:01,  4.04cell/s]
Executing:  89%|████████▉ | 17/19 [00:06<00:00,  7.27cell/s]
Executing: 100%|██████████| 19/19 [00:07<00:00,  2.64cell/s]


Run notebook 2 specifying that the input csv is now called `output_data/processed_VISp.csv`

In [40]:
!papermill parameterization/02_notebook_fixed_response.ipynb -p input_csv output_data/processed_VISp.csv 02_notebook_fixed_response_visp.ipynb

Input Notebook:  parameterization/02_notebook_fixed_response.ipynb
Output Notebook: 02_notebook_fixed_response_visp.ipynb

Executing:   0%|          | 0/9 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:  11%|█         | 1/9 [00:07<00:57,  7.16s/cell]
Executing:  44%|████▍     | 4/9 [00:07<00:06,  1.38s/cell]
Executing:  67%|██████▋   | 6/9 [00:07<00:02,  1.20cell/s]
Executing: 100%|██████████| 9/9 [00:10<00:00,  1.17cell/s]
Executing: 100%|██████████| 9/9 [00:11<00:00,  1.25s/cell]


Run notebook 3 specifying that the input csv is now called `output_data/processed_VISp.csv`

In [41]:
!papermill parameterization/03_notebook_fixed_feedback.ipynb -p input_csv output_data/processed_VISp.csv 03_notebook_fixed_feedback_visp.ipynb

Input Notebook:  parameterization/03_notebook_fixed_feedback.ipynb
Output Notebook: 03_notebook_fixed_feedback_visp.ipynb

Executing:   0%|          | 0/9 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:  11%|█         | 1/9 [00:09<01:15,  9.48s/cell]
Executing:  56%|█████▌    | 5/9 [00:09<00:05,  1.43s/cell]
Executing:  89%|████████▉ | 8/9 [00:09<00:00,  1.26cell/s]
Executing: 100%|██████████| 9/9 [00:14<00:00,  1.65s/cell]


Run notebook 3 specifying that the input csv is now called `output_data/processed_ACA.csv`. Examine the output notebook. What information do you get?

In [42]:
# !papermill parameterization/03_notebook_fixed_feedback.ipynb -p input_csv output_data/processed_ACA.csv 03_notebook_fixed_feedback_aca.ipynb

**Example** Run `notebook 1` specifying that the brain area is `ACA` and output should be called `processed_ACA.csv`

In [43]:
!papermill parameterization/01_notebook_brain_area.ipynb -p brain_area ACA -p output_csv output_data/processed_ACA.csv 01_notebook_aca.ipynb

Input Notebook:  parameterization/01_notebook_brain_area.ipynb
Output Notebook: 01_notebook_aca.ipynb

Executing:   0%|          | 0/19 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:   5%|▌         | 1/19 [00:06<01:49,  6.08s/cell]
Executing:  11%|█         | 2/19 [00:07<00:57,  3.36s/cell]
Executing:  32%|███▏      | 6/19 [00:07<00:10,  1.26cell/s]
Executing:  42%|████▏     | 8/19 [00:08<00:07,  1.55cell/s]
Executing:  53%|█████▎    | 10/19 [00:08<00:04,  2.21cell/s]
Executing:  63%|██████▎   | 12/19 [00:08<00:02,  2.87cell/s]
Executing:  79%|███████▉  | 15/19 [00:08<00:00,  4.47cell/s]
Executing:  95%|█████████▍| 18/19 [00:09<00:00,  6.50cell/s]
Executing: 100%|██████████| 19/19 [00:10<00:00,  1.84cell/s]


Run `notebook 2` specifying that the input file is `output_data/processed_ACA.csv` and response_type as 0

In [44]:
!papermill parameterization/02_notebook_fixed_response.ipynb -p input_csv output_data/processed_ACA.csv -p response_type 0 02_notebook_response_0_aca.ipynb

Input Notebook:  parameterization/02_notebook_fixed_response.ipynb
Output Notebook: 02_notebook_response_0_aca.ipynb

Executing:   0%|          | 0/9 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:  11%|█         | 1/9 [00:11<01:31, 11.45s/cell]
Executing:  67%|██████▋   | 6/9 [00:11<00:04,  1.46s/cell]
Executing: 100%|██████████| 9/9 [00:14<00:00,  1.27s/cell]
Executing: 100%|██████████| 9/9 [00:15<00:00,  1.76s/cell]


Run `notebook 2` specifying that the input file is `output_data/processed_ACA.csv` and response_type as -1. Compare with previous report (output notebook)

In [45]:
!papermill parameterization/02_notebook_fixed_response.ipynb -p input_csv output_data/processed_ACA.csv -p response_type -1 02_notebook_response_min_1_aca.ipynb

Input Notebook:  parameterization/02_notebook_fixed_response.ipynb
Output Notebook: 02_notebook_response_min_1_aca.ipynb

Executing:   0%|          | 0/9 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:  11%|█         | 1/9 [00:10<01:23, 10.46s/cell]
Executing:  67%|██████▋   | 6/9 [00:10<00:03,  1.32s/cell]
Executing: 100%|██████████| 9/9 [00:13<00:00,  1.22s/cell]
Executing: 100%|██████████| 9/9 [00:14<00:00,  1.63s/cell]


Run all three notebooks one after the other for brain area `SUB`, response type 0, and feedback type -1.

In [46]:
!papermill parameterization/01_notebook_brain_area.ipynb -p brain_area SUB -p output_csv output_data/processed_SUB.csv 01_notebook_sub.ipynb
!papermill parameterization/02_notebook_fixed_response.ipynb -p input_csv output_data/processed_SUB.csv -p response_type 0 02_notebook_response_0_sub.ipynb
!papermill parameterization/03_notebook_fixed_feedback.ipynb -p input_csv output_data/processed_SUB.csv -p feedback_type -1 03_notebook_fixed_feedback_sub.ipynb

Input Notebook:  parameterization/01_notebook_brain_area.ipynb
Output Notebook: 01_notebook_sub.ipynb

Executing:   0%|          | 0/19 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:   5%|▌         | 1/19 [00:04<01:14,  4.13s/cell]
Executing:  11%|█         | 2/19 [00:04<00:37,  2.18s/cell]
Executing:  37%|███▋      | 7/19 [00:05<00:05,  2.01cell/s]
Executing:  58%|█████▊    | 11/19 [00:05<00:02,  3.62cell/s]
Executing:  74%|███████▎  | 14/19 [00:05<00:00,  5.16cell/s]
Executing: 100%|██████████| 19/19 [00:06<00:00,  6.58cell/s]
Executing: 100%|██████████| 19/19 [00:06<00:00,  2.88cell/s]
Input Notebook:  parameterization/02_notebook_fixed_response.ipynb
Output Notebook: 02_notebook_response_0_sub.ipynb

Executing:   0%|          | 0/9 [00:00<?, ?cell/s]Executing notebook with kernel: python3

Executing:  11%|█         | 1/9 [00:08<01:09,  8.67s/cell]
Executing:  56%|█████▌    | 5/9 [00:08<00:05,  1.32s/cell]
Executing:  78%|███████▊  | 7/9 [00:08<00:01,  1.17cel

## Batch Processing Notebooks With Papermill Python API

In a Jupyter notebook, you can use a for loop to automate the execution of multiple notebooks with different input parameters using papermill. 
This approach allows for dynamic notebook execution by iterating over a list of notebooks and their corresponding parameter sets, enabling each notebook to be run with customized inputs. 
During each iteration of the loop, papermill executes the notebook with the specified parameters and generates a new output notebook, which is can be saved with a unique filename.

Let's get some practice batch processing in Python using for-loops

In [47]:
import pandas as pd
df = pd.read_csv('data/2016-12-14_Cori.csv')
df.brain_area_lfp.unique()

array(['ACA', 'LS', 'MOs', 'CA3', 'DG', 'SUB', 'VISp'], dtype=object)

**Example** Run notebook 1 for brain area `LS` and `CA3`

In [48]:
template_noteboook = 'parameterization/01_notebook_brain_area.ipynb'
params = [dict(output_csv='output_data/01_notebook_LS.csv', brain_area='LS'), dict(output_csv='output_data/01_notebook_CA3.csv', brain_area='CA3')]
output_nb_names = ['01_notebook_LS.ipynb', '01_notebook_CA3.ipynb']

for param, output_nb_name in zip(params, output_nb_names):
    pm.execute_notebook(
        template_noteboook,
        output_nb_name,
        parameters=param
    )

  from .autonotebook import tqdm as notebook_tqdm
Executing: 100%|██████████| 19/19 [00:05<00:00,  3.19cell/s]
Executing: 100%|██████████| 19/19 [00:05<00:00,  3.19cell/s]


Run notebook 2 for brain area `LS` and response types of 1, 0, and -1.

In [49]:
template_noteboook = 'parameterization/02_notebook_fixed_response.ipynb'
params = [dict(input_csv='output_data/01_notebook_LS.csv', response_type=1), dict(input_csv='output_data/01_notebook_LS.csv', response_type=0), dict(input_csv='output_data/01_notebook_LS.csv', response_type=-1)]
output_nb_names = ['02_notebook_LS_response_left.ipynb', '02_notebook_LS_response_zero.ipynb', '02_notebook_LS_response_right.ipynb']

for param, output_nb_name in zip(params, output_nb_names):
    pm.execute_notebook(
        template_noteboook,
        output_nb_name,
        parameters=param
    )

Executing: 100%|██████████| 9/9 [00:09<00:00,  1.09s/cell]
Executing: 100%|██████████| 9/9 [00:12<00:00,  1.34s/cell]
Executing: 100%|██████████| 9/9 [00:09<00:00,  1.05s/cell]


Run notebook 3 for brain area LS and response types 1 and -1

In [50]:
template_noteboook = 'parameterization/03_notebook_fixed_feedback.ipynb'
params = [dict(input_csv='output_data/01_notebook_LS.csv', feedback_type=1), dict(input_csv='output_data/01_notebook_LS.csv', feedback_type=-1)]
output_nb_names = ['03_notebook_reward.ipynb', '03_notebook_punish.ipynb']

for param, output_nb_name in zip(params, output_nb_names):
    pm.execute_notebook(
        template_noteboook,
        output_nb_name,
        parameters=param
    )

Executing: 100%|██████████| 9/9 [00:08<00:00,  1.03cell/s]
Executing: 100%|██████████| 9/9 [00:08<00:00,  1.02cell/s]


We can automate naming of outputs by making use of f-strings in Python to help us follow a structured naming of files. Names will be automatically set inside the for-loop and not in the params dictionary.

**Example** Run notebook 1 for brain area `LS`, `CA3`

In [51]:
template_noteboook = 'parameterization/01_notebook_brain_area.ipynb'
params = [dict(brain_area='LS'), dict(brain_area='CA3')]

for param in params:
    param['output_csv'] = 'output_data/' + f'01_notebook_{param['brain_area']}.csv'
    output_nb_name = f'01_notebook_{param['brain_area']}.ipynb'
    
    pm.execute_notebook(
        template_noteboook,
        output_nb_name,
        parameters=param
    )

Executing: 100%|██████████| 19/19 [00:06<00:00,  2.80cell/s]
Executing: 100%|██████████| 19/19 [00:06<00:00,  2.73cell/s]


Run notebook 1 for brain area `LS`, `CA3`, and `SUB`

In [52]:
template_noteboook = 'parameterization/01_notebook_brain_area.ipynb'
params = [dict(brain_area='LS'), dict(brain_area='CA3'), dict(brain_area='SUB')]

for param in params:
    param['output_csv'] = 'output_data/' + f'01_notebook_{param['brain_area']}.csv'
    output_nb_name = f'01_notebook_{param['brain_area']}.ipynb'
    
    pm.execute_notebook(
        template_noteboook,
        output_nb_name,
        parameters=param
    )

Executing: 100%|██████████| 19/19 [00:07<00:00,  2.51cell/s]
Executing: 100%|██████████| 19/19 [00:11<00:00,  1.66cell/s]
Executing: 100%|██████████| 19/19 [00:07<00:00,  2.61cell/s]


This is especially helpful when we have to run the template notebook for large number of values for a given parameter

Run notebook 1 for brain area `LS`, `CA3`, `SUB`, `VISp`, `MOs`, `DG`, `ACA`

In [53]:
template_noteboook = 'parameterization/01_notebook_brain_area.ipynb'
params = [dict(brain_area='LS'), dict(brain_area='CA3'), dict(brain_area='SUB'), dict(brain_area='VISp'), dict(brain_area='MOs'), dict(brain_area='DG'), dict(brain_area='ACA')]

for param in params:
    param['output_csv'] = 'output_data/' + f'01_notebook_{param['brain_area']}.csv'
    output_nb_name = f'01_notebook_{param['brain_area']}.ipynb'
    
    pm.execute_notebook(
        template_noteboook,
        output_nb_name,
        parameters=param
    )

Executing: 100%|██████████| 19/19 [00:07<00:00,  2.46cell/s]
Executing: 100%|██████████| 19/19 [00:06<00:00,  2.82cell/s]
Executing: 100%|██████████| 19/19 [00:06<00:00,  2.76cell/s]
Executing: 100%|██████████| 19/19 [00:06<00:00,  3.12cell/s]
Executing: 100%|██████████| 19/19 [00:05<00:00,  3.54cell/s]
Executing: 100%|██████████| 19/19 [00:05<00:00,  3.51cell/s]
Executing: 100%|██████████| 19/19 [00:05<00:00,  3.73cell/s]


**(DEMO)** We can make nested for-loops to run all the three notebooks for every parameter combination. This code can be in another notebook that can be executed whenever we have to re-run an entire analysis workflow without having to go back and change parameters in the template notebook.

In [54]:
template_noteboook_1 = 'parameterization/01_notebook_brain_area.ipynb'
template_noteboook_2 = 'parameterization/02_notebook_fixed_response.ipynb'
template_noteboook_3 = 'parameterization/03_notebook_fixed_feedback.ipynb'

params_brain_area = [dict(brain_area='LS'), dict(brain_area='CA3'), dict(brain_area='SUB'), dict(brain_area='VISp'), dict(brain_area='MOs'), dict(brain_area='DG'), dict(brain_area='ACA')]
params_response = [dict(response_type=1), dict(response_type=0), dict(response_type=-1)]
params_feedback = [dict(feedback_type=1), dict(feedback_type=-1)]

for param_brain_area in params_brain_area:
    param_brain_area['output_csv'] = 'output_data/' + f'01_notebook_{param['brain_area']}.csv'
    output_nb_name = f'01_notebook_{param['brain_area']}.ipynb'
    
    pm.execute_notebook(
        template_noteboook_1,
        output_nb_name,
        parameters=param_brain_area
    )

    for param_response in params_response:
        output_nb_name = f'02_notebook_{param_brain_area['brain_area']}_response_{param_response['response_type']}.ipynb'
        param_response['input_csv'] = 'output_data/' + f'01_notebook_{param['brain_area']}.csv'

        pm.execute_notebook(
            template_noteboook_2,
            output_nb_name,
            parameters=param_response
        )

    for param_feedback in params_feedback:
        output_nb_name = f'02_notebook_{param_brain_area['brain_area']}_feedback_{param_feedback['feedback_type']}.ipynb'
        param_feedback['input_csv'] = 'output_data/' + f'01_notebook_{param['brain_area']}.csv'

        pm.execute_notebook(
            template_noteboook_3,
            output_nb_name,
            parameters=param_feedback
        )

Executing: 100%|██████████| 19/19 [00:05<00:00,  3.53cell/s]
Executing: 100%|██████████| 9/9 [00:09<00:00,  1.00s/cell]
Executing: 100%|██████████| 9/9 [00:09<00:00,  1.02s/cell]
Executing: 100%|██████████| 9/9 [00:09<00:00,  1.01s/cell]
Executing: 100%|██████████| 9/9 [00:08<00:00,  1.01cell/s]
Executing: 100%|██████████| 9/9 [00:11<00:00,  1.23s/cell]
Executing: 100%|██████████| 19/19 [00:05<00:00,  3.39cell/s]
Executing: 100%|██████████| 9/9 [00:07<00:00,  1.24cell/s]
Executing: 100%|██████████| 9/9 [00:07<00:00,  1.27cell/s]
Executing: 100%|██████████| 9/9 [00:05<00:00,  1.50cell/s]
Executing: 100%|██████████| 9/9 [00:06<00:00,  1.50cell/s]
Executing: 100%|██████████| 9/9 [00:05<00:00,  1.54cell/s]
Executing: 100%|██████████| 19/19 [00:04<00:00,  4.65cell/s]
Executing: 100%|██████████| 9/9 [00:05<00:00,  1.57cell/s]
Executing: 100%|██████████| 9/9 [00:05<00:00,  1.56cell/s]
Executing: 100%|██████████| 9/9 [00:05<00:00,  1.54cell/s]
Executing: 100%|██████████| 9/9 [00:06<00:00,  1.4