# Expose a STAC Data Downloader as a CWL Dockerized process through Examind

The goal of this tutorial is to demonstrate how to expose a dockerized process designed to discover and download satellite data from an external STAC (SpatioTemporal Asset Catalog) endpoint using CWL in Examind Community. This process enables automated data fetching from remote catalogs, which can then be used as an upstream step for further processing (WPS, OpenEO, or Process API).

This tutorial can be used as a template / example for the external stac management in Examind.
In this case, this external process is used in the openEO endpoint to download a collection or item.

By Quentin BIALOTA (Geomatys)

*Based on the previous work of Guilhem LEGAL (Geomatys)*

Contact : quentin.bialota@geomatys.com

## Prerequisites :

- *Only if you are using a rootless environment* | Pre-configuration of Examind => [Podman Rootless Conf](./podman_rootless.md)
- **Configure Examind (*follow it in any case*) => [Examind Conf](./examind_conf_cwl.md)**
- External Network Access: Ensure your Examind instance (and the Docker/Podman containers it spawns) has outgoing internet access to reach the STAC API and the data providers. (Or use a local stac endpoint)

## Info :
- You can access the python code used in this process [here](./Code/ExternalStacProcess/stac_downloader.py)
- The stac url given (in argument) can be a **collection** url or an **item** url.
  - If **collection** : it will get the first item of the collection, and the first asset of this item
  - If **item** : it will get the first asset
- It's a "dumb" process, it's just an example of how we can manage this kind of process with cwl and examind.
- The goal is to reuse this template to create a more complex process per STAC endpoint (select other assets, merge them, ...)

## Setup

Setup python environment to run the commands in this notebook. (You can skip this step if you want to use postman, curl or any other HTTP client to run the HTTP requests)

In [2]:
import rasterio
import matplotlib.pyplot as plt
import requests
import json
from requests.auth import HTTPBasicAuth

SERVER_IP = "http://localhost:8080"
STAC_ENDPOINT = "https://stac-pg-api.ifremer.fr/collections/catds_l3qd/items/SM_OPER_MIR_CSQ3A__20251021T000000_20251030T235959_343"
WPS_SERVICE_ID = "CWL_Tests"  # Name of the WPS service created in Examind
USERNAME = "admin"
PASSWORD = "admin"
auth = HTTPBasicAuth(USERNAME, PASSWORD)

## 1. Run & Configure Examind to use CWL processes

**Run examind-community with docker-compose**

Once the build is finished, you can run examind-community with docker-compose:
```bash
cd docker
docker-compose up -d
```

Once Examind is started, you can access the web interface at the URL: `http://localhost:8080/examind`

You can login with the default administrator credentials: admin/admin.

Once logged you have to create a WPS service :
- Go to **Web Services**
- Click on **Create a service**
- Select **Geoprocessing (WPS)**
- Fill the form (Name, Identifier), select version 1.0.0 AND 2.0.0
- Click on **Save**, and **Save** again
- Run the newly created WPS service by clicking on the play button (in green)

## 2. Build Docker, and deploy STAC Downloader CWL process in Examind

**A. Build the STAC Downloader Docker image**

You can find the Dockerfile to build the STAC Downloader Docker image in this folder: [./Code/ExternalStacProcess/](./Code/ExternalStacProcess/)

To build the Docker image, run the following command in the folder containing the Dockerfile:
```bash
docker build -t images.geomatys.com/stac-downloader .
```

**B. Deploy the STAC Downloader CWL process in Examind**

To check if the WPS service works, check it by running this HTTP request in your browser, with curl or postman :
- Method : GET
- URL : http://localhost:8080/examind/WS/wps/CWL_Tests/processes (CWL_Tests here is the name of the WPS service created before)

If it works, you should see a list of processes already exposed by the service.

In [3]:
URL = SERVER_IP + "/examind/WS/wps/" + WPS_SERVICE_ID + "/processes"

r = requests.get(url = URL)

data = r.json()

print("Number of processes deployed in the WPS service '" + WPS_SERVICE_ID + "' : " + str(len(data['processes'])))

Number of processes deployed in the WPS service 'CWL_Tests' : 124


Now you can deploy the STAC Downloader CWL process in Examind, run this HTTP request :
- Method : POST
- URL : http://localhost:8080/examind/WS/wps/CWL_Tests/processes
- Body : Json format, content accessible here : [./deploy_requests_json/DeployRequest_stac_downloader.json](./deploy_requests_json/DeployRequest_stac_downloader.json)

*WARNING: Change the href to your STACDownloader.cwl file (your can find it in this repo here : `./Code/ExternalStacProcess/`)*

`"href": "file:///home/qbialota/Téléchargements/STACDownloader.cwl"`

*TIP: Sometimes the server respond with a 500 error with :*
```json
{
    "code": "NO_APPLICABLE_CODE",
    "description": "Cannot invoke \"org.constellation.dto.service.config.wps.ProcessContext.getProcesses()\" because \"context\" is null"
}
```
*But the process is actually deployed. To check if the process is deployed, run again the GET request to list the processes, or request ...*

In [None]:
postData = {
    "processDescription": {
        "process": {
            "id": "STACDownloader",
            "title": "STAC Downloader",
            "owsContext": {
                "offering": {
                    "code": "http://www.opengis.net/eoc/applicationContext/cwl",
                    "content": {
                        "href": "file:///home/qbialota/Téléchargements/STACDownloader.cwl"
                    }
                }
            },
            "abstract": "Downloads the first asset from a given STAC Collection or Item URL.",
            "keywords": ["STAC", "Download", "ETL"],
            "inputs": [{
                "id": "stac_url",
                "title": "STAC Endpoint URL",
                "minOccurs": "1",
                "maxOccurs": "1",
                "input": {
                    "literalDataDomains": [
                        {
                            "dataType": {
                                "name": "string"
                            }
                        }
                    ]
                }
            }],
            "outputs": [{
                "id": "downloaded_asset",
                "title": "Downloaded Asset",
                "output": {
                    "formats": [{
                        "mimeType": "application/octet-stream",
                        "default": "true"
                    }]
                }
            }]
        },
        "processVersion": "1.0.0",
        "jobControlOptions": [
            "async-execute"
        ],
        "outputTransmission": [
            "reference"
        ]
    },
    "immediateDeployment": "true",
    "executionUnit": [{
        "href": "images.geomatys.com/stac-downloader:latest"
    }],
    "deploymentProfileName": "http://www.opengis.net/profiles/eoc/dockerizedApplication"
}

URL = SERVER_IP + "/examind/WS/wps/" + WPS_SERVICE_ID + "/processes"

r = requests.post(url = URL, data=json.dumps(postData), auth=auth)

data = r.json()

data

**C. Test if the STAC Downloader CWL process is deployed correctly**

To check if the STAC Downloader CWL process is deployed correctly, run this HTTP request :
- Method : GET
- URL : http://localhost:8080/examind/WS/wps/CWL_Tests/processes/urn:exa:wps:examind-dynamic::STACDownloader

If it works, you should see the description of the STAC Downloader CWL process, as [here](./deploy_requests_json/ProcessResult_stac_downloader.json)

In [6]:
PROCESS_ID = "urn:exa:wps:examind-dynamic::STACDownloader"
URL = SERVER_IP + "/examind/WS/wps/" + WPS_SERVICE_ID + "/processes/" + PROCESS_ID

r = requests.get(url = URL)

data = r.json()

data

{'id': 'urn:exa:wps:examind-dynamic::STACDownloader',
 'title': 'STAC Downloader',
 'version': '1.0.0',
 'jobControlOptions': ['sync-execute', 'async-execute', 'dismiss'],
 'outputTransmission': ['reference', 'value'],
 'inputs': [{'id': 'urn:exa:wps:examind-dynamic::STACDownloader:input:stac_url',
   'title': 'STAC Endpoint URL',
   'description': 'No description available',
   'minOccurs': '1',
   'maxOccurs': '1',
   'input': {'literalDataDomains': [{'anyValue': True,
      'dataType': {'name': 'String',
       'reference': 'http://www.w3.org/TR/xmlschema-2/#string'}}]}}],
 'outputs': [{'id': 'urn:exa:wps:examind-dynamic::STACDownloader:output:downloaded_asset',
   'title': 'Downloaded Asset',
   'output': {'formats': [{'mimeType': 'application/octet-stream',
      'default': True}]},
   'abstract': 'No description available'}],
 'executeEndpoint': 'http://localhost:8080/examind/WS/wps/CWL_Tests/processes/urn:exa:wps:examind-dynamic::STACDownloader/jobs',
 'abstract': 'Downloads the

## 5. Execute the STAC Downloader CWL process in Examind

**A. Execution**

You can execute the STAC Downloader CWL process by running this HTTP request :
- Method : POST
- URL : http://localhost:8080/examind/WS/wps/CWL_Tests/processes/urn:exa:wps:examind-dynamic::STACDownloader/jobs
- Body : Json format, content accessible here : [./execute_requests_json/ExecuteRequest_stac_downloader.json](./execute_requests_json/ExecuteRequest_stac_downloader.json)

In [9]:
postData = {
    "inputs": [
        {
            "id": "urn:exa:wps:examind-dynamic::STACDownloader:input:stac_url",
            "input": {
                "value": f"${STAC_ENDPOINT}"
            }
        }
    ],
    "outputs": [
        {
            "id": "urn:exa:wps:examind-dynamic::STACDownloader:output:downloaded_asset",
            "transmissionMode": "reference"
        }
    ]
}


PROCESS_ID = "urn:exa:wps:examind-dynamic::STACDownloader"
URL = SERVER_IP + "/examind/WS/wps/" + WPS_SERVICE_ID + "/processes/" + PROCESS_ID + "/jobs"

r = requests.post(url = URL, data=json.dumps(postData), auth=auth)

data = r.json()

data

{'outputs': [{'id': 'urn:exa:wps:examind-dynamic::STACDownloader:output:downloaded_asset',
   'value': {'href': 'http://localhost:8080/examind/WS/wps/CWL_Tests/products/aa643b7a-a61c-4df1-b449-59ca99fd9c26-results/SM_OPER_MIR_CSQ3A__20251021T000000_20251030T235959_343_001_7.tgz'}}]}

You receive in result a list of links of downloable content (here an asset of the STAC endpoint downloaded).