In [1]:
# !pip install doit papermill pandas hvplot matplotlib

In [2]:
from doit import load_ipython_extension
load_ipython_extension()

In [3]:
import sys
sys.path.append('src')
import sciebo

sciebo.download_file('https://uni-bonn.sciebo.de/s/N8t6uo4mn6itdtG', 'data/steinmetz_all.csv')
sciebo.download_file('https://uni-bonn.sciebo.de/s/5ke7GSFfMErS20y', 'nb_active_trials.ipynb')
sciebo.download_file('https://uni-bonn.sciebo.de/s/UQKaks9opGYu211', 'nb_stats.ipynb')
sciebo.download_file('https://uni-bonn.sciebo.de/s/dFomib4RDkGL39A', 'nb_plots.ipynb')

Downloading data/steinmetz_all.csv: 100%|██████████| 1.52M/1.52M [00:00<00:00, 4.54MB/s]
Downloading nb_active_trials.ipynb: 100%|██████████| 1.78k/1.78k [00:00<00:00, 173kB/s]
Downloading nb_stats.ipynb: 100%|██████████| 10.2k/10.2k [00:00<00:00, 900kB/s]
Downloading nb_plots.ipynb: 100%|██████████| 18.8k/18.8k [00:00<00:00, 577kB/s]


# Dependency Management with Pydoit

In Pydoit, dependency management is a core feature that ensures tasks are executed only when necessary. Dependencies can be files, task results, or parameters, and Pydoit will automatically skip tasks if none of their dependencies have changed.

## File Dependency

In pydoit, file dependencies (file_dep) specify which files a task depends on to complete its action. A task will be executed only if one or more of its dependencies have changed since the last time the task ran. If none of the files have been modified, pydoit will skip the task.

```python
def task_name():
    return {
        'actions': ['command to execute'],
        'file_dep': ['file_name']
    }
```

**Example** Make `process` depend on file `data/steinmetz_all.csv` and run %doit.

```python
def task_process():
    return {
        'actions': ['papermill nb_active_trials.ipynb process.ipynb']
    }
```

In [12]:
def task_process():
    return {
        'actions': ['papermill nb_active_trials.ipynb process.ipynb'],
        'file_dep': ['data/steinmetz_all.csv'],
        'targets': ['data/active_trials.csv'],
    }

In [8]:
%doit process

-- process


Run `process` again. What difference do you notice?

In [7]:
%doit process

-- process


Make `stats` depend on file `data/active_trials.csv` and run %doit.

```python
def task_stats():
    return {
        'actions': ['papermill nb_stats.ipynb stats.ipynb']
    }
```

In [9]:
def task_stats():
    return {
        'actions': ['papermill nb_stats.ipynb stats.ipynb'],
        'file_dep': ['data/active_trials.csv']
    }

In [11]:
%doit stats

-- stats


Run `stats` again. What do you see?

Make `plot` depend on file `data/active_trials.csv` and run %doit.

```python
def task_plot():  
    return {
        'actions': ['papermill nb_plots.ipynb plots.ipynb']
    }

```

In [9]:
def task_plot():  
    return {
        'actions': ['papermill nb_plots.ipynb plots.ipynb'],
        'file_dep': ['data/active_trials.csv']
    }


Run `plot` again. What do you see?

In [None]:
%doit plot

## Targets

In pydoit, targets (targets) specify the output files that a task is expected to create. The task will only run if the target files are missing or outdated based on file dependencies. Here's how you can define the targets for each of your tasks

```python
def task_name():
    return {
        'actions': ['command to execute'],
        'file_dep': ['file_name'],
        'targets': ['target_file_name']
    }
```

**Delete data directory and run the below cell to only download steinmetz_all.csv**

In [None]:
import sys
sys.path.append('src')
import sciebo

sciebo.download_file('https://uni-bonn.sciebo.de/s/N8t6uo4mn6itdtG', 'data/steinmetz_all.csv')

**Example** Make `process` task depend on its output `data/active_trials.csv`

In [15]:
def task_process():
    return {
        'actions': ['papermill nb_active_trials.ipynb process.ipynb'],
        'file_dep': ['data/steinmetz_all.csv'],  
        'targets': ['data/active_trials.csv']    
    }

In [None]:
%doit process

Run `process` task again. Does it run?

Delete **active_trials.csv** file only which is the target of `process` task and run it again. Does it run now?

Make `stats` task depend on its output `data/stats.csv`. What do you see is happening here?

Run `stats` task again. What do you see?

Delete `stats.csv` and run `stats` task again.

Delete `active_trials.csv` and run `stats` process. What happened now?

Make `plot` task depend on its output `response_time_histogram.png`. 

Delete `response_time_histogram.png` and run `plot` task again.

### Adding papermill output notebooks as targets

We can also add the output notebooks from papermill executions as targets since they are generated by the tasks.

**Example** Make `process` depend on its output `process.ipynb` and `data/active_trials.csv`

In [28]:
def task_process():
    return {
        'actions': ['papermill nb_active_trials.ipynb process.ipynb'],
        'file_dep': ['data/steinmetz_all.csv'],  
        'targets': ['data/active_trials.csv', 'process.ipynb']    
    }

Make `stats` depend on its output `stats.ipynb` and `data/stats.csv`

Make `plot` depend on its output `plots.ipynb` and `response_time_histogram.png`