<img src='https://gitlab.eumetsat.int/eumetlab/oceans/ocean-training/tools/frameworks/-/raw/main/img/Standard_banner.png' align='right' width='100%'/>

<font color="#138D75">**NERO Winter School training**</font> <br>
**Copyright:** (c) 2025 EUMETSAT <br>
**License:** GPL-3.0-or-later <br>
**Authors:** Dominika Leskow-Czyżewska (EUMETSAT), based on <a href='https://github.com/wekeo/wekeo4data/blob/main/wekeo-eocanvas/03_EOCanvas_DataTailor.ipynb'>EOCanvas WEkEO training</a> by Anna-Lena Erdmann (EUMETSAT)

<div class="alert alert-block alert-success">
<h3> WEkEO EOCanvas - Processing in the Cloud for Copernicus Data</h3></div>

<div class="alert alert-block alert-warning">
    
<b>PREREQUISITES </b>
    
This notebook has the following prerequisites:
  - **<a href="https://my.wekeo.eu/user-registration" target="_blank">A WEkEO account</a>**

  

</div>
<hr>

# Using the Data Tailor EOCanvas Function

### Learning outcomes

At the end of this notebook you will know;

* how to use the <a href='https://user.eumetsat.int/resources/user-guides/data-tailor-standalone-guide' target='_blank'>EUMETSAT Data Tailor</a> Chains with the EOCanvas


### Outline

The EOCanvas is a WEkEO service to process Coperncius data in the cloud. This notebooks shows a simple example of how the <a href='https://user.eumetsat.int/resources/user-guides/data-tailor-standalone-guide' target='_blank'>EUMETSAT Data Tailor</a> processing chains can be executed using the EOCavas. 

<div class="alert alert-info" role="alert">

### Contents <a id='totop'></a>

</div>
    
 1. [Setting Up](#section0)
 2. [Inputs to the EOCanvas Data Tailor Process](#section1)
 3. [Executing the Process](#section2)
 4. [Examine the results](#section3)

<hr>

<div class="alert alert-info" role="alert">

## 1. <a id='section0'></a>Setting Up
[Back to top](#totop)
    
</div>

This example notebook shows you how to use the functions of the <a href='https://user.eumetsat.int/resources/user-guides/data-tailor-standalone-guide' target='_blank'>EUMETSAT Data Tailor</a> within the EOCanvas package. If you are a user of the Data Tailor, you can simply reuse your processes chains in the EOCanvas.  

Loadning necessary libraries

In [1]:
!pip install eocanvas

Collecting eocanvas
  Using cached eocanvas-0.2.4-py3-none-any.whl.metadata (1.6 kB)
Collecting cryptography<44.0.0,>=43.0.1 (from eocanvas)
  Using cached cryptography-43.0.3-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (5.4 kB)
Collecting lxml<6.0.0,>=5.2.2 (from eocanvas)
  Using cached lxml-5.3.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (3.8 kB)
Using cached eocanvas-0.2.4-py3-none-any.whl (102 kB)
Using cached cryptography-43.0.3-cp39-abi3-manylinux_2_28_x86_64.whl (4.0 MB)
Using cached lxml-5.3.0-cp311-cp311-manylinux_2_28_x86_64.whl (5.0 MB)
Installing collected packages: lxml, cryptography, eocanvas
  Attempting uninstall: cryptography
    Found existing installation: cryptography 43.0.0
    Uninstalling cryptography-43.0.0:
      Successfully uninstalled cryptography-43.0.0
Successfully installed cryptography-43.0.3 eocanvas-0.2.4 lxml-5.3.0


In [1]:
from eocanvas import API, Credentials
from eocanvas.api import Input, Config, ConfigOption
from eocanvas.datatailor.chain import Chain
from eocanvas.processes import DataTailorProcess

In [2]:
import credentials

You must replace `<your_user_name>` and `<your_password>` with the information from your WEkEO account (if you don't have one yet, register <a href="https://www.wekeo.eu/" target="_blank">here</a>.

Save your credentials. They will be automatically loaded when required.

In [7]:
c = Credentials(username=credentials.WEKEO_USERNAME, password=credentials.WEKEO_PASSWORD)
c.save()

Credentials are written to file /home/jovyan/.hdarc


<div class="alert alert-info" role="alert">

## 2. <a id='section1'></a>Inputs to the EOCanvas Data Tailor Process
[Back to top](#totop)
    
</div>



The Data Tailor function takes two necessary inputs: 

- data: consists of a url from the wEkEO HDA and a key, that acts as a placeholder
- eptc chain: processing workflow of the data tailor inside a .yml file

### 2.1 Data 

The data is given to the serverless function as an **url**. The url is the reference location of the data in the WEkEO HDA. 

To get the url, you need to make a **request for the data using the WEkEO HDA**.

In [5]:
W = 23.806
E = 23.994827
N = 38.267837
S = 38.017263

start_date = "2024-08-11"
end_date = "2024-08-13"

In [8]:
from hda import Client

c = Client()

query = {
    "dataset_id": "EO:EUM:DAT:SENTINEL-3:SL_1_RBT___",
    "dtstart": start_date + "T00:00:00.000Z",
    "dtend": end_date + "T23:59:00.000Z",
    "bbox": [
        W,
        S,
        E,
        N
    ],
    "timeliness": "NT",
    "orbitdir": "DESCENDING",
}

r = c.search(query)
url_list = r.get_download_urls()

#inputs = Input(key="img1", url=url)

In [9]:
id_list = [result['id'] for result in r.results]
id_list

['S3A_SL_1_RBT____20240813T082011_20240813T082311_20240814T133155_0179_115_349_2340_MAR_O_NT_004.SEN3',
 'S3B_SL_1_RBT____20240812T080738_20240812T081038_20240813T134154_0179_096_192_2340_MAR_O_NT_004.SEN3',
 'S3A_SL_1_RBT____20240811T091234_20240811T091534_20240812T145031_0180_115_321_2340_MAR_O_NT_004.SEN3',
 'S3B_SL_1_RBT____20240811T083349_20240811T083649_20240812T144523_0179_096_178_2340_MAR_O_NT_004.SEN3']

### 2.2 Processing Workflow "Chain"

The Data Tailor process works with so called "Chains". Chains are processing workflows which are defined in a `.yaml` file. For more information on how the chains are build up and which operators inside chains are supported please refer to the official <a href='https://user.eumetsat.int/resources/user-guides/data-tailor-standalone-guide' target='_blank'>EUMETSAT Data Tailor Documentation</a>. 

In this example a chain is provided to **resample** and **reproject** the Sentinel-3 OLCI L2 WFR product which was obtained in the cell above. 

In [10]:
chain = Chain.from_file("input_graphs/slstr_solar_subset.yaml")

You can explore the input parameters of the Chain. They contain the selection of the relevant `bands`, `projection` and the target `resolution`. 

In [11]:
chain

Chain(id=None, product='SLL1RBT', format='geotiff', name=None, description=None, aggregation=None, projection='geographic', roi=None, filter=Filter(id=None, bands=['s1_radiance_an', 's2_radiance_an', 's3_radiance_an', 's6_radiance_an'], name=None, product=None), quicklook=None, resample_method=None, resample_resolution=None, compression=None, xrit_segments=None)

<div class="alert alert-info" role="alert">

## 3. <a id='section2'></a>Execution of the Process
[Back to top](#totop)
    
</div>

In [12]:
from concurrent.futures import ThreadPoolExecutor

MAX_QUOTA = 3

In [13]:
from time import sleep

In [14]:
import os

run_name = "testrun"
output_dir = './testdir/'
# Create a download directory for our downloaded products
download_dir = os.path.join(output_dir, run_name, 'Satellite_ActiveFires', 'S3_FRP')


def exec_data_tailor_process(url):
    print(f"Starting Data Tailor for product at {url}")
    inputs = Input(key="img1", url=url)
    process = DataTailorProcess(epct_chain=chain, epct_input=inputs)
    process.run(download_dir=download_dir)
    return f"Finishing Data Tailor for product at {url}"
    sleep(1)  #Wait for one sec to finish

In [15]:
with ThreadPoolExecutor(max_workers=MAX_QUOTA) as executor:
    futures = [executor.submit(exec_data_tailor_process, url) for url in url_list]

    # Get the results as they complete
    for future in futures:
        print(future.result())

Starting Data Tailor for product at https://gateway.prod.wekeo2.eu/hda-broker/api/v1/dataaccess/download/67adf3e8c0fbf72c11647f71
Starting Data Tailor for product at https://gateway.prod.wekeo2.eu/hda-broker/api/v1/dataaccess/download/67adf3f79639f6fb0283f1e6
Starting Data Tailor for product at https://gateway.prod.wekeo2.eu/hda-broker/api/v1/dataaccess/download/67adf41c9639f6fb0283f1ea
Job: c39e5706-23c2-5f09-9830-62f7012879dd - Status: accepted at 2025-02-13T13:35:48.278795
Job: b4a8c3b2-0f07-5fab-8305-ae36911e8831 - Status: accepted at 2025-02-13T13:35:48.526761
Job: 46eb1dd1-31c4-5a52-90d7-497ec7c12839 - Status: accepted at 2025-02-13T13:35:48.631518
Job: c39e5706-23c2-5f09-9830-62f7012879dd - Status: running at 2025-02-13T13:35:58.375166
Job: b4a8c3b2-0f07-5fab-8305-ae36911e8831 - Status: running at 2025-02-13T13:35:58.586754
Job: 46eb1dd1-31c4-5a52-90d7-497ec7c12839 - Status: running at 2025-02-13T13:35:58.678869
Job: c39e5706-23c2-5f09-9830-62f7012879dd - Status: running at 2025

In [16]:
api = API()

In [17]:
api.get_jobs()

[Job(api=<eocanvas.api.API object at 0x7f09432f0850>, job_id='35237602-6d34-57d0-b3df-7af461d738dd', status='successful', started='2025-02-13 13:38:56', created='2025-02-13 13:38:56', updated='2025-02-13 13:42:09', finished='2025-02-13 13:42:08'),
 Job(api=<eocanvas.api.API object at 0x7f09432f0850>, job_id='46eb1dd1-31c4-5a52-90d7-497ec7c12839', status='successful', started='2025-02-13 13:35:48', created='2025-02-13 13:35:48', updated='2025-02-13 13:39:15', finished='2025-02-13 13:39:14'),
 Job(api=<eocanvas.api.API object at 0x7f09432f0850>, job_id='b4a8c3b2-0f07-5fab-8305-ae36911e8831', status='successful', started='2025-02-13 13:35:48', created='2025-02-13 13:35:48', updated='2025-02-13 13:38:57', finished='2025-02-13 13:38:56'),
 Job(api=<eocanvas.api.API object at 0x7f09432f0850>, job_id='c39e5706-23c2-5f09-9830-62f7012879dd', status='successful', started='2025-02-13 13:35:48', created='2025-02-13 13:35:48', updated='2025-02-13 13:38:53', finished='2025-02-13 13:38:52'),
 Job(api

In [18]:
j = api.get_jobs()[1]
j.job_id

'46eb1dd1-31c4-5a52-90d7-497ec7c12839'

In [19]:
#job = '2cc08632-925b-58e7-9833-0d63ae1c5851'
job = 'aea8a5b4-90a1-511e-8e0f-efaacaf1bb9e'
log = api.get_job_logs(job=job)

In [20]:
from pathlib import Path

In [21]:
# Search for 'output' in the log messages
for line_number, entry in enumerate(log, start=1):
    if 'output-product' in entry.message:  # case-insensitive search
        filename = Path(entry.message.split(' ')[-1]).name

filename

'SLL1RBT_20240812T080744Z_20240812T081044Z_epct_66869edb_FP.tif'

Create the Data Tailor process using the two inputs

<div class="alert alert-info" role="alert">

## 4. <a id='section3'></a>Examine the Results
[Back to top](#totop)
    
</div>

In [22]:
def search_output_filename_in_eocanvas_log(log):
    filename = None
    for line_number, entry in enumerate(log, start=1):
        if 'output-product' in entry.message:  # case-insensitive search
            filename = Path(entry.message.split(' ')[-1]).name
            break
    return filename


def get_eocanvas_log(api, job_index):
    job_id = api.get_jobs()[job_index].job_id
    log = api.get_job_logs(job=job_id)
    return log

In [23]:
result_paths = []
api = API()
for job_index in range(len(url_list)):
    log = get_eocanvas_log(api, job_index)
    file = search_output_filename_in_eocanvas_log(log)
    if file is not None:
        result_paths.append(Path(download_dir) / file)

result_paths

[PosixPath('result/S3_SLSTR_SOLAR/SLL1RBT_20240811T083355Z_20240811T083656Z_epct_bffe6b63_FP.tif'),
 PosixPath('result/S3_SLSTR_SOLAR/SLL1RBT_20240812T080744Z_20240812T081044Z_epct_07edafe9_FP.tif'),
 PosixPath('result/S3_SLSTR_SOLAR/SLL1RBT_20240813T082018Z_20240813T082318Z_epct_8f5f7ea7_FP.tif'),
 PosixPath('result/S3_SLSTR_SOLAR/SLL1RBT_20240811T091240Z_20240811T091540Z_epct_f5fc304d_FP.tif')]

In [30]:
result_paths[0].parent.absolute()

PosixPath('/home/jovyan/nero-winter-school-2025/result/S3_SLSTR_SOLAR')

In [24]:
import rasterio
from rasterio.mask import mask
from shapely.geometry import box
import os

In [26]:
# Define bounding box (xmin, ymin, xmax, ymax)
bounding_box = [W, S, E, N]

for input_tiff in result_paths:
    # Input GeoTIFF path
    #input_tiff = result_paths[1]
    temp_tiff = Path(download_dir) / "temp_subset.tif"

    # Create a geometry for the bounding box
    bbox_geom = [box(*bounding_box)]

    with rasterio.open(input_tiff) as src:
        # Crop the raster using the bounding box
        out_image, out_transform = mask(src, bbox_geom, crop=True)

        # Update metadata
        out_meta = src.meta.copy()
        out_meta.update({
            "driver": "GTiff",
            "height": out_image.shape[1],
            "width": out_image.shape[2],
            "transform": out_transform
        })

        # Write the subset to a temporary file
        with rasterio.open(temp_tiff, "w", **out_meta) as dest:
            dest.write(out_image)
            # Preserve band descriptions
            dest.descriptions = src.descriptions

    # Replace the original file
    os.replace(temp_tiff, input_tiff)

    print(f"Source GeoTIFF overwritten: {input_tiff}")

Source GeoTIFF overwritten: result/S3_SLSTR_SOLAR/SLL1RBT_20240811T083355Z_20240811T083656Z_epct_bffe6b63_FP.tif
Source GeoTIFF overwritten: result/S3_SLSTR_SOLAR/SLL1RBT_20240812T080744Z_20240812T081044Z_epct_07edafe9_FP.tif
Source GeoTIFF overwritten: result/S3_SLSTR_SOLAR/SLL1RBT_20240813T082018Z_20240813T082318Z_epct_8f5f7ea7_FP.tif
Source GeoTIFF overwritten: result/S3_SLSTR_SOLAR/SLL1RBT_20240811T091240Z_20240811T091540Z_epct_f5fc304d_FP.tif
