# Introduction

## What is a jupyter notebook

We are running the codes in a Jupyter Notebook environment.  

Jupyter Notebook is an interactive computing environment that allows you to write and execute code, visualize data, and document your workflow in a single document. It is widely used in data science, machine learning, and scientific computing due to its flexibility and ease of use.  

A Jupyter Notebook consists of two main types of cells:  

- **Code Cells:** These contain executable code. When you run a code cell, the output (such as numerical results, plots, or tables) is displayed directly below it.  
- **Markdown Cells:** These contain formatted text, explanations, equations, and images. Markdown cells are useful for adding descriptions, instructions, and documentation to make the notebook more readable and informative.  

By combining code and markdown cells, Jupyter Notebook provides an efficient way to write, execute, and document code in one place.

In [1]:
# this is a code cell
x = 1
print(x)

1


This is a markdown cell

To run a cell in a Jupyter Notebook, simply click on it and press **Shift + Enter** or click the **Run** button in the toolbar. You can execute cells in any order, meaning you don’t have to run them sequentially from top to bottom. However, keep in mind that the execution order affects the notebook's state, so if a later cell depends on variables or functions defined in an earlier cell, make sure those have been run first. You can check the execution order by looking at the number next to each code cell.

## Paividerpy pipeline

### Minimal Pipeline Execution: Runs in just **3 lines of code**

```python
from paidiverpy.pipeline import Pipeline # import the package
pipeline = Pipeline(config_file_path="config/dsp/config_introduction.yml") # instantiate the class
pipeline.run() # run the pipeline
```

If you want to export the outputs, you need to run an extra line:

```python
pipeline.save_images() # export the images to the output path
```

### This is the configuration file:


```yaml
general:
  input_path: "/groups/paidiver/images/benthic_clarion"
  output_path: "output"
  metadata_path: "metadata/metadata_benthic_csv.csv"
  metadata_type: "CSV_FILE"
  image_type: "PNG"
  n_jobs: 1
  sampling:
    - mode: "percent"
      name: "sampling"
      params:
        value: 0.1

steps:
  - colour:
      name: "colour_alteration"
      mode: "colour_alteration"
      params:
        method:
          "white_balance"
  - colour:
      name: "gaussian_blur"
      mode: "gaussian_blur"
      params:
        sigma: 1.0

  - colour:
      name: "sharpen"
      mode: "sharpen"
      params:
        alpha: 1.5
        beta: -0.5

  - colour:
      name: "contrast"
      mode: "contrast"
```

### Sample data

If you do not have examples datasets to use, you can use our sample dataset for testing purposes.
In this case, you need to edit the configuration file in the general part:

```yaml
# Instead of these lines on the configuration file:
general:
  input_path: "images"
  output_path: "output"
  metadata_path: "metadata/metadata_benthic_csv.csv"
  metadata_type: CSV_FILE

# Please use these lines:
general:
  sample_data: "benthic_ifdo"
```

In this case, it will download the sample dataset "benthic_ifdo" automatically.

If you prefer, you can also download the sample dataset manually:

In [1]:
from paidiverpy.utils import data

data.load("benthic_csv")

{'input_path': '/home/tobfer/.paidiverpy_cache/benthic_csv/images',
 'metadata_path': '/home/tobfer/.paidiverpy_cache/benthic_csv/metadata/metadata_benthic_csv.csv',
 'metadata_type': 'CSV_FILE',
 'image_type': 'PNG',
 'append_data_to_metadata': '/home/tobfer/.paidiverpy_cache/benthic_csv/metadata/appended_metadata_benthic_csv.csv'}


And then you can use the information above to update the configuration file with the correct paths.

The file `config/config_simple_sample_data.yml` has an example of the configuration file with the sample data

## Running the code

Now, I will run the codes using the sample dataset

**The code below is intended solely to define the correct path for the configuration files, making this notebook more adaptable. Please comment out any lines that are not needed. This code is not required in production mode.**

In [None]:
import os
config_file_path = f"{os.path.expanduser("~")}/paidiver-workshop/tutorials/config/dsp/config_introduction_sample_data.yml"

**!IMPORTANT: YOU MAY NEED TO UPDATE THE OUTPUT PATH ON THE CONFIGURATION FILE**

### Running the full pipeline and export images

In [None]:
from paidiverpy.pipeline import Pipeline
pipeline = Pipeline(config_file_path=config_file_path)
pipeline.run()
pipeline.save_images()

[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 19:13:24 | Processing images using 1 cores[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 19:13:24 | Running step 0: raw - OpenLayer[0m


Open Images: 100%|████████████████████████████████████████████████████████████████████| 44/44 [00:00<00:00, 391.66it/s]


[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 19:13:24 | Step 0 completed[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 19:13:24 | Running step 1: colour_alteration - ColourLayer[0m


Processing images: 100%|█████████████████████████████████████████████████████████████| 44/44 [00:00<00:00, 1303.90it/s]

[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 19:13:25 | Step 1 completed[0m





[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 19:13:25 | Running step 2: gaussian_blur - ColourLayer[0m


Processing images: 100%|█████████████████████████████████████████████████████████████| 44/44 [00:00<00:00, 3498.23it/s]


[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 19:13:25 | Step 2 completed[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 19:13:25 | Running step 3: sharpen - ColourLayer[0m


Processing images: 100%|██████████████████████████████████████████████████████████████| 44/44 [00:00<00:00, 344.57it/s]

[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 19:13:25 | Step 3 completed[0m





[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 19:13:25 | Running step 4: contrast - ColourLayer[0m


Processing images: 100%|███████████████████████████████████████████████████████████████| 44/44 [00:00<00:00, 71.20it/s]

[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 19:13:26 | Step 4 completed[0m





[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 19:13:26 | Saving images from step: last[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 19:13:26 | Saving images[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 19:13:26 | Images are saved to: output[0m


## Running the same code on the command line

**In real-life applications, Jupyter Notebook is primarily used for exploration and investigation. Once you are satisfied with your pipeline, running the code from the command line is often more efficient, providing better performance and faster execution.**

```bash
cd ~/paidiver-workshop/tutorials

# if you are using DSP
paidiverpy -c "config/dsp/config_introduction.yml"

# if you are using JASMIN
paidiverpy -c "config/jasmin/config_introduction.yml"
```