To run this locally, [install Ploomber](https://docs.ploomber.io/en/latest/get-started/quick-start.html) and execute: `ploomber examples -n guides/cron`

Found an issue? [Let us know.](https://github.com/ploomber/projects/issues/new?title=guides/cron%20issue)

Questions? [Ask us on Slack.](https://ploomber.io/community/)


# Scheduling with cron

This guide shows how to schedule Ploomber pipelines using `cron`.

**Note:** `cron` is only available on macOS and Linux.

## Pre-requisites

Ensure cron is installed:

```sh
crontab -l
```

If you don't see a "command not found error", you have `cron` installed and can continue.

## Get the example

```sh
pip install ploomber
ploomber examples -n guides/cron -o cron
cd cron
```

## Setup

Configure virtual environment:

```sh
ploomber install --create-env --use-venv
```

Activate environment:

```sh
source venv-cron/bin/activate
```

## The code

The pipeline has two stages: load, and plot:

```yaml
# Content of pipeline.yaml
tasks:
  - source: scripts/load.py
    product:
      nb: products/{{timestamp}}/load.html
      data: products/{{timestamp}}/load.csv

  - source: scripts/plot.py
    product:
      nb: products/{{timestamp}}/plot.html
```

Notice that all the products are parametrized prefixed by: `products/{{timestamp}}`; this will allow us to store the products of each run depending on the runtime timestamp.

## Development

When modifying the pipeline, you can call the following command:

In [1]:
%%sh
ploomber build

Loading pipeline...
name    Ran?      Elapsed (s)    Percentage
------  ------  -------------  ------------
load    True          1.90359       39.4266
plot    True          2.9246        60.5734


  from pyarrow import HadoopFileSystem
Building task 'load':   0%|          | 0/2 [00:00<?, ?it/s]
Executing:   0%|          | 0/4 [00:00<?, ?cell/s][A
Executing: 100%|██████████| 4/4 [00:01<00:00,  2.67cell/s][A
Building task 'plot':  50%|█████     | 1/2 [00:01<00:01,  1.91s/it]
Executing:   0%|          | 0/4 [00:00<?, ?cell/s][A
Executing:  25%|██▌       | 1/4 [00:02<00:06,  2.10s/cell][A
Executing: 100%|██████████| 4/4 [00:02<00:00,  1.54cell/s][A
Building task 'plot': 100%|██████████| 2/2 [00:04<00:00,  2.42s/it]


Note that products will go to `output/dev/`.

For scheduling the workflow, we need to tell Ploomber to use a different configuration file:

In [2]:
%%sh
# tell ploomber to use env.cron.yaml as config file
export PLOOMBER_ENV_FILENAME=env.cron.yaml
# build pipeline
ploomber build
# delete env variable
unset PLOOMBER_ENV_FILENAME

Loading pipeline...
name    Ran?      Elapsed (s)    Percentage
------  ------  -------------  ------------
load    True          1.99751       40.9529
plot    True          2.88008       59.0471


  from pyarrow import HadoopFileSystem
Building task 'load':   0%|          | 0/2 [00:00<?, ?it/s]
Executing:   0%|          | 0/4 [00:00<?, ?cell/s][A
Executing: 100%|██████████| 4/4 [00:01<00:00,  2.48cell/s][A
Building task 'plot':  50%|█████     | 1/2 [00:02<00:02,  2.00s/it]
Executing:   0%|          | 0/4 [00:00<?, ?cell/s][A
Executing:  25%|██▌       | 1/4 [00:01<00:05,  1.80s/cell][A
Executing: 100%|██████████| 4/4 [00:02<00:00,  1.58cell/s][A
Building task 'plot': 100%|██████████| 2/2 [00:04<00:00,  2.44s/it]


Let's see the contents of the products directory:

In [3]:
%%sh
ls products

2022-03-12T13:09:16.783678
dev


You should see two folders, `dev/` and another one with the runtime timestamp.

## Scheduling

Now let's schedule in cron. First, to edit the cron configuration file:

```sh
crontab -e
```

Note that this will open the configuration file in the default editor,
if you don't know what that is, you can open it with `nano`:

```sh
EDITOR=nano crontab -e
```

Once the editor opens, add a line like this:

```txt
* * * * *  PROJ=/path/to/cron && cd $PROJ && bash run.sh >> cron.log 2>&1
```

**Note:** If using macOS Big Sur (11.6) or newer, you may need to follow a few [extra steps](https://osxdaily.com/2020/04/27/fix-cron-permissions-macos-full-disk-access/) to enable cron.

Replace the `/path/to/cron/` with the absolute path to the `cron/` directory that contains the sample code. (Tip: so get the absolute path, enter `pwd` in your terminal).

If you opened the configuration file with `nano`, save your changes with `CTRL + O` and exit the editor with `CTRL + X`.

After a minute, you'll start to see more directories in the products folder; this is what mine looks like:

```
2022-03-12T11:14:47.506532/ 
2022-03-12T11:25:12.707618/ 
dev/
```

If you see something like this, congratulations, you have a scheduled pipeline up and running!

To learn how to modify the scheduling interval, see the Overview section in cron's [Wikipedia article.](https://en.wikipedia.org/wiki/Cron)

## Troubleshooting

If you don't see the new directories, check out the `cron.log` file, which will contain any error messages, and ping us [on Slack](https://ploomber.io/community) so we can help you.