In [2]:
# conda activate {env} doesn't work well here
# so we manually modify the path
PATH=$CONDA_PREFIX/envs/ploomber/bin:$PATH

# Running experiments in parallel

You can parametrize notebooks and run multiple copies in parallel (each one with a different set of parameters). Let' see how!

## Notebook configuration

First, add a cell at the top of your notebook with the notebook parameters:

```python
# PARAMETERS
n_estimators = 1
```

**Important:** You must add the comment `# PARAMETERS` in the cell.

Next, ensure that such parameters are used in the notebook's body. Ploomber Cloud will change this values at runtime.

Now, add another raw cell at the top (TODO: add a link to a FAQ in our docs explaining what a raw cell is and how to add it, include a screenshot in the FAQ).

In the raw cell, put the parameter values you want to use under the `grid` section:

```yaml
grid:
    n_estimators: [1, 5, 10, 20]
```

Your notebook can have more than one parameter. In such case, Ploomber Cloud will run the notebook with all possible combinations.

*Note:* the raw cell must be a valid YAML string (TO DO: briefly explain what YAML is)

## Submit notebook

Let's submit a notebook that fits a regressor and uses 4 parameter values:

In [33]:
ploomber cloud nb notebooks/grid.ipynb

Uploading grid-92b7be78.ipynb...
Triggering execution of grid-92b7be78.ipynb...
[0m

Check that the task was submitted:

In [13]:
ploomber cloud list

created_at      runid                                 status
--------------  ------------------------------------  --------
27 seconds ago  55a91e67-0116-41c6-a12b-72d8a62598c8  created
7 minutes ago   f8b099be-024f-42c0-87c7-ade5e171af63  failed
25 minutes ago  89c5c4b5-aea8-4aaf-816d-a120b799b952  finished
43 minutes ago  41fe4e15-fc0f-40bc-b07a-0d45457ba84b  finished
52 minutes ago  ffe17c8c-a5a6-4561-a53a-5e7fd4045036  failed
[0m

Wait for 1-2 minutes for the Docker image to build, you'll see the following message once it's done:

In [16]:
ploomber cloud logs @latest --image | tail -n 10

[Container] 2022/10/21 13:26:33 Phase complete: BUILD State: SUCCEEDED

[Container] 2022/10/21 13:26:33 Phase context status code:  Message: 

[Container] 2022/10/21 13:26:33 Entering phase POST_BUILD

[Container] 2022/10/21 13:26:33 Phase complete: POST_BUILD State: SUCCEEDED

[Container] 2022/10/21 13:26:33 Phase context status code:  Message: 



Now you'll see that the notebook has `started`:

In [17]:
ploomber cloud list

created_at     runid                                 status
-------------  ------------------------------------  --------
3 minutes ago  f858f6b3-9b3f-4c86-b55d-de23fcba045f  started
an hour ago    2e0bae66-6c90-4ad5-84a9-b6cdb746a5af  finished
11 hours ago   0dd11c31-3200-41cc-81ca-ff0f59bc03e7  finished
11 hours ago   32e54117-4890-4001-93a3-beacc394e983  finished
11 hours ago   3e30c17f-f789-4771-8b6a-7bf34d89d63e  finished
[0m

Let's see the status of each task (one task per parameter value):

In [19]:
ploomber cloud status @latest

Geting latest ID...
Got ID: f858f6b3-9b3f-4c86-b55d-de23fcba045f
Unknown status: started
taskid                     name             runid                      status
-------------------------  ---------------  -------------------------  --------
59f28ad3-1db2-494d-8de9-d  grid-7bd167c1-2  f858f6b3-9b3f-4c86-b55d-d  created
33c9e6306f8                                 e23fcba045f
7a318b97-6de8-4431-9f7c-6  grid-7bd167c1-0  f858f6b3-9b3f-4c86-b55d-d  created
abd6c454e96                                 e23fcba045f
8d07cc57-3fd7-44eb-a761-5  grid-7bd167c1-3  f858f6b3-9b3f-4c86-b55d-d  created
50ab3bacd28                                 e23fcba045f
26098820-1487-4a47-af36-b  grid-7bd167c1-1  f858f6b3-9b3f-4c86-b55d-d  created
179f09055b6                                 e23fcba045f
[0m

After a few minutes, they are done:

In [35]:
ploomber cloud status @latest

Geting latest ID...
Got ID: 6f4c14dd-f8ca-4704-9e0f-f80b2d1301ff
Pipeline finished...
taskid                     name             runid                      status
-------------------------  ---------------  -------------------------  --------
9e091632-d57f-4cc3-81f9-4  grid-92b7be78-3  6f4c14dd-f8ca-4704-9e0f-f  finished
9147be31c8d                                 80b2d1301ff
4ef8a7f3-6afe-4ca9-9f59-e  grid-92b7be78-2  6f4c14dd-f8ca-4704-9e0f-f  finished
898730b297a                                 80b2d1301ff
fb744f5d-afb9-4c60-8e16-9  grid-92b7be78-0  6f4c14dd-f8ca-4704-9e0f-f  finished
7cb681f1579                                 80b2d1301ff
d6aa6e1e-f861-43d7-a55a-8  grid-92b7be78-1  6f4c14dd-f8ca-4704-9e0f-f  finished
96e2586921d                                 80b2d1301ff
[0m

Let's see what's in our outputs workspace:

In [36]:
ploomber cloud products

path
-----------------------------------------------------
grid-27f5c7e1/output/notebook-n_estimators=1-0.ipynb
grid-27f5c7e1/output/notebook-n_estimators=10-2.ipynb
grid-27f5c7e1/output/notebook-n_estimators=20-3.ipynb
grid-27f5c7e1/output/notebook-n_estimators=5-1.ipynb
grid-3a53522a/output/notebook-n_estimators=1-0.ipynb
grid-3a53522a/output/notebook-n_estimators=10-2.ipynb
grid-3a53522a/output/notebook-n_estimators=20-3.ipynb
grid-3a53522a/output/notebook-n_estimators=5-1.ipynb
grid-7bd167c1/output/notebook-n_estimators=1-0.ipynb
grid-7bd167c1/output/notebook-n_estimators=10-2.ipynb
grid-7bd167c1/output/notebook-n_estimators=20-3.ipynb
grid-7bd167c1/output/notebook-n_estimators=5-1.ipynb
grid-92b7be78/output/notebook-n_estimators=1-0.ipynb
grid-92b7be78/output/notebook-n_estimators=10-2.ipynb
grid-92b7be78/output/notebook-n_estimators=20-3.ipynb
grid-92b7be78/output/notebook-n_estimators=5-1.ipynb
output-1c12b73a/nb.ipynb
output-3e80e326/nb.ipynb
output-eaf1fbd7/nb.ipynb
penguins-c

Download all the executed notebooks:

In [None]:
ploomber cloud download 'grid-92b7be78/*'

Note that we're using the identifier printed when we submitted the notebook.

# Uploading input data

If your notebook requires input data, you can upload it.

Let's see what happens if we try to run a notebook with missing input data:

In [24]:
ploomber cloud nb notebooks/input-data.ipynb

Uploading input-data-c96b0748.ipynb...
Triggering execution of input-data-c96b0748.ipynb...
Error: Error validating inputs/outputs: {'missing': {'../data/penguins.csv'}} (status: 400)
[0m

: 1

Ploomber Cloud will parse your notebook and look for referenced files. If they're missing in your data workspace, it'll show an error like the one above.

In our notebook, we have the following line:

```python
df = pd.read_csv('../data/penguins.csv')
```

Ploomber realizes you're using a local file at `../data/penguins.csv`. Since files can be either inputs or outputs, you have to indicate Ploomber what they are. To fix this, add a raw cell (TODO: add a link to a FAQ in our docs explaining what a raw cell is and how to add it) at the top:

```yaml
# this determines where to look for input data and where
# to store outputs
prefix: penguins-classification

# for each path in our notebook, indicate if it's an input or output
# the values must be the same as in your notebook
inputs:
    - ../data/penguins.csv

# no outputs, so no need to add an "outputs" section
```

Let's run a notebook that contains the raw cell:

In [25]:
ploomber cloud nb notebooks/input-data-with-raw-cell.ipynb

Uploading input-data-with-raw-cell-e6620d67.ipynb...
Triggering execution of input-data-with-raw-cell-e6620d67.ipynb...
Error: Cannot start execution. The following inputs are missing:
	- ../data/penguins.csv
Upload them to your data workspace or using the CLI: 
ploomber cloud data --upload ../data/penguins.csv --prefix penguins-classification/input --name data-penguins.csv
 (status: 400)
[0m

: 1

This time, Ploomber Cloud is telling us the files are not in our data workspace. So let's upload it:

First, let's get the data:

In [26]:
curl https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv -o penguins.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13478  100 13478    0     0  47835      0 --:--:-- --:--:-- --:--:-- 47964


Use the command printed in the error message:

In [27]:
ploomber cloud data \
    # NOTE: you may need to change the path
    # in the --upload argument if
    # the file is somewhere else
    --upload penguins.csv \
    --prefix penguins-classification/input \
    --name data-penguins.csv

Uploading data-penguins.csv...
[0m

Let's submit the notebook:

In [37]:
ploomber cloud nb notebooks/input-data-with-raw-cell.ipynb

Uploading input-data-with-raw-cell-48d064fe.ipynb...
Triggering execution of input-data-with-raw-cell-48d064fe.ipynb...
[0m

Wait for a couple of minutes to finish (`status` will appear as `finished`):

In [40]:
ploomber cloud list

created_at      runid                                 status
--------------  ------------------------------------  --------
6 minutes ago   4608ad54-82c0-425e-8af4-d556cdf03268  finished
12 minutes ago  6f4c14dd-f8ca-4704-9e0f-f80b2d1301ff  finished
20 minutes ago  2c4335bf-bf08-4b8c-a183-4c3a4aadf7f5  failed
30 minutes ago  55a91e67-0116-41c6-a12b-72d8a62598c8  failed
36 minutes ago  f8b099be-024f-42c0-87c7-ade5e171af63  failed
[0m

The `prefix` in the raw cell determines where the outputs are stored. Hence, to download all outputs:

In [42]:
ploomber cloud download 'penguins-classification/*'

Writing file into path penguins-classification/output/.notebook.ipynb.metadata
Writing file into path penguins-classification/output/notebook.ipynb
[0m

# Notebook artifacts

```yaml
prefix: some-experiment

outputs:
    - path/to/model.pickle
```

# Requesting more memory, CPU and GPU

You can request more resources for your notebook execution by adding the following in the raw cell:

```yaml
task_resources:
    vcpus: 8 # number of CPUs
    memory: 16384 # memory in MiB
```

See the`notebooks/resources.ipynb` notebook for an example.

**Note:** The free community plan is capped to 2 CPUS and 4GiB of memory and no GPUs. If you need more resources, you can subscribe to the Teams plan. If you're a student or researcher, join our [Slack](https://ploomber.io/community) and we'll lift the restrictions.

# Specify package versions

By default, Ploomber Cloud will parse your `import` statements and install the latest version. If you want a specific version, add this in your raw cell:

```yaml
dependencies:
    - matplotlib==3.5.3
    - scikit-learn==1.1.0
```


See the`notebooks/dependencies.ipynb` notebook for an example.

## Concurrent runs

The free community plan allows you to run parallel jobs via the `grid` feature. However, you cannot start a new execution until that one is done. If you need concurrent runs, you can subscribe to the Teams plan. If you're a student or researcher, join our [Slack](https://ploomber.io/community) and we'll lift the restrictions.

To abort your latest run:

```sh
ploomber cloud abort @latest
```

# Debugging

To see the status of your runs:

```sh
ploomber cloud list
```

To see tasks within a given run:

```sh
ploomber cloud status {runid}

# or for the latest run
ploomber cloud status @latest
```

Even if your notebook fails, the failed notebook is uploaded, you can use it for debugging:

```sh
ploomber cloud download 'path/to/notebook.ipynb'
```

To list existing files in your products workspace:

```sh
ploomber cloud products
```

To get the logs for all tasks in the run:

```sh
ploomber cloud logs {runid}

# or for the latest run
ploomber cloud logs @latest
```

To get the logs for the Docker building process:

```sh
ploomber cloud logs {runid} --image

# or for the latest run
ploomber cloud logs @latest --image
```