## Run a pipeline with data on a remote s3 bucket

This process will run the entire pipeline described in a configuration file. The input data is available in a remote s3 bucket.

## Import dependencies

In [1]:
%load_ext autoreload
%autoreload 2
from paidiverpy.pipeline import Pipeline

## Working with a public object store (https data)

When working with public object store, you just need to give as input path the paths of the files in the configuration file, as you can see in the example below:

```yaml
general:
  input_path: "https://paidiver-o.s3-ext.jc.rl.ac.uk/paidiverpy/data/lazy_load_benthic/"
  output_path: "output"
  metadata_path: "https://paidiver-o.s3-ext.jc.rl.ac.uk/paidiverpy/data/lazy_load_benthic/metadata_ifdo_hf.json"
```

In this case, I am giving the path on the object store where all my images are storage. For the metadata path, you have to give the exact link of the metadata

In [2]:
pipeline = Pipeline(config_file_path="../config_files/config_object_store.yml")

Checking files have unique names 100/100


In [3]:
# See the pipeline steps. Click in a step to see more information about it
pipeline

In [4]:
# Run the pipeline
pipeline.run()

[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:24:57 | Processing images using 8 cores[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:24:57 | Running step 0: raw - OpenLayer[0m
[########################################] | 100% Completed | 1.42 sms
[########################################] | 100% Completed | 309.09 ms
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:24:59 | Step 0 completed[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:24:59 | Running step 1: colour_correction - ColourLayer[0m
[########################################] | 100% Completed | 102.44 ms
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:24:59 | Step 1 completed[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:24:59 | Running step 2: datetime - ResampleLayer[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:24:59 | Number of photos to be removed: 0[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:24:59 | Step 2 completed[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:2

In [5]:
# See the images output
# pipeline.images

## Get files with a private object store with credentials

Get files from a private object store. In this case, you need to pass the credentials of the object store as env variables:

```
OS_SECRET=
OS_TOKEN=
OS_ENDPOINT=
```

Normally, in this case, the path starts with "s3://", like  in the example below:

```yaml
general:
  input_path: "s3://paidiverpy/data/lazy_load_benthic/"
  output_path: "s3://paidiverpy/data/lazy_load_benthic/output2/"
  metadata_path: "s3://paidiverpy/data/lazy_load_benthic/metadata_ifdo_hf.json"
```


In [6]:
pipeline = Pipeline(config_file_path="../config_files/config_object_store2.yml")

Checking files have unique names 100/100


In [7]:
pipeline.run()

[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:25:02 | Processing images using 8 cores[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:25:02 | Running step 0: raw - OpenLayer[0m
[########################################] | 100% Completed | 1.82 sms
[########################################] | 100% Completed | 313.93 ms
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:25:05 | Step 0 completed[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:25:05 | Running step 1: colour_correction - ColourLayer[0m
[########################################] | 100% Completed | 101.92 ms
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:25:05 | Step 1 completed[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:25:06 | Running step 2: datetime - ResampleLayer[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:25:06 | Number of photos to be removed: 0[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:25:06 | Step 2 completed[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:2

In [8]:
pipeline.images

# Export Images to file

In this case, the images will be saved in the object store, in the path provided

In [9]:
pipeline.save_images(image_format="png")

[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:25:10 | Saving images from step: last[0m
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:25:10 | Uploading images to S3 using Dask[0m
[########################################] | 100% Completed | 11.13 ss
[92m☁ paidiverpy ☁  |       INFO | 2025-02-27 11:25:21 | Images are saved to: s3://paidiverpy/data/lazy_load_benthic/output2/[0m
