# Zarr Cloud native format workflow

Invoke a Common Workflow Language Workflow to generate the STAC Zarr

This notebook is linked to: https://eoap.github.io/zarr-cloud-native-format/cwl-workflow/zarr



## Setup

In [1]:
export WORKSPACE=/workspace/zarr-cloud-native-format
export RUNTIME=${WORKSPACE}/runs
mkdir -p ${RUNTIME}
cd ${RUNTIME}

curl -q -L https://github.com/eoap/zarr-cloud-native-format/releases/download/0.3.0/app-water-bodies.0.3.0.cwl > ${WORKSPACE}/cwl-workflow/app-water-bodies.cwl 2> /dev/null

## Run the Zarr Cloud native format workflow

Inspect and use `cwltool` to run the Zarr Cloud native format workflow definition:


In [2]:
cat ${WORKSPACE}/cwl-workflow/app-water-bodies.cwl | yq e '.["$graph"][0]' -

[36mclass[0m:[32m Workflow[0m
[32m[0m[36mid[0m:[32m water-bodies[0m
[32m[0m[36mlabel[0m:[32m Water bodies detection based on NDWI and otsu threshold[0m
[32m[0m[36mdoc[0m:[32m Water bodies detection based on NDWI and otsu threshold applied to Sentinel-2 COG STAC items[0m
[32m[0m[36mrequirements[0m:
  -[36m class[0m:[32m ScatterFeatureRequirement[0m
[32m  [0m-[36m class[0m:[32m SubworkflowFeatureRequirement[0m
[32m  [0m-[36m class[0m:[32m SchemaDefRequirement[0m
[32m    [0m[36mtypes[0m:
      -[36m $import[0m:[32m https://raw.githubusercontent.com/eoap/schemas/main/string_format.yaml[0m
[32m      [0m-[36m $import[0m:[32m https://raw.githubusercontent.com/eoap/schemas/main/geojson.yaml[0m
[32m      [0m-[36m $import[0m: |-
[32m          https://raw.githubusercontent.com/eoap/schemas/main/experimental/api-endpoint.yaml[0m
[32m      [0m-[36m $import[0m:[32m https://raw.githubusercontent.com/eoap/schemas/main/experimental/disc

Run the CWL description, but first prepare the parameters.

Let's build the job parameters file

In [8]:
cat <<'EOF' > zarr-cloud-native-params.yaml
bands:
- green
- nir
search_request:
  bbox:
  - -121.399
  - 39.834
  - -120.74
  - 40.472
  collections:
  - sentinel-2-l2a
  datetime_interval:
    end:
      value: '2021-08-01T23:59:59'
    start:
      value: '2021-06-01T00:00:00'
  limit: 20
  max-items: 10
stac_api_endpoint:
  headers: []
  url:
    value: https://earth-search.aws.element84.com/v1/
EOF

cat zarr-cloud-native-params.yaml | yq .

[36mbands[0m:
  -[32m green[0m
[32m  [0m-[32m nir[0m
[32m[0m[36msearch_request[0m:[36m[0m
[36m  bbox[0m:
    -[95m -121.399[0m
[95m    [0m-[95m 39.834[0m
[95m    [0m-[95m -120.74[0m
[95m    [0m-[95m 40.472[0m
[95m  [0m[36mcollections[0m:
    -[32m sentinel-2-l2a[0m
[32m  [0m[36mdatetime_interval[0m:[36m[0m
[36m    end[0m:[36m[0m
[36m      value[0m:[32m '2021-08-01T23:59:59'[0m[36m[0m
[36m    start[0m:[36m[0m
[36m      value[0m:[32m '2021-06-01T00:00:00'[0m[36m[0m
[36m  limit[0m:[95m 20[0m
[95m  [0m[36mmax-items[0m:[95m 10[0m
[95m[0m[36mstac_api_endpoint[0m:[36m[0m
[36m  headers[0m: [][36m[0m
[36m  url[0m:[36m[0m
[36m    value[0m:[32m https://earth-search.aws.element84.com/v1/[0m


In [9]:


cwltool \
    --podman \
    --outdir ${WORKSPACE}/runs \
    ${WORKSPACE}/cwl-workflow/app-water-bodies.cwl#water-bodies \
    zarr-cloud-native-params.yaml > zarr-cloud-native-results.json 2> zarr-cloud-native.log

Let's look at the content of the stderr:

In [10]:
cat zarr-cloud-native.log | egrep -v "WARNING|JSHINT"

[1;30mINFO[0m /home/fbrito/.local/bin/cwltool 3.1.20250110105449
[1;30mINFO[0m Resolved '/workspace/zarr-cloud-native-format/cwl-workflow/app-water-bodies.cwl#water-bodies' to 'file:///workspace/zarr-cloud-native-format/cwl-workflow/app-water-bodies.cwl#water-bodies'
[1;30mINFO[0m [workflow ] start
[1;30mINFO[0m [workflow ] starting step discovery
[1;30mINFO[0m [step discovery] start
[1;30mINFO[0m [job discovery] /tmp/l7xn_s05$ podman \
    run \
    -i \
    --userns=keep-id \
    --mount=type=bind,source=/tmp/l7xn_s05,target=/INoGBM \
    --mount=type=bind,source=/tmp/3at427zt,target=/tmp \
    --workdir=/INoGBM \
    --read-only=true \
    --user=1000:1000 \
    --rm \
    --cidfile=/tmp/cgnf0slh/20250915145624-309138.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/INoGBM \
    ghcr.io/eoap/schemas/stac-api-client@sha256:a7e346f704836d07f5dabc6b29ee3359e7253f4a294d74f3899973b8920da6f7 \
    stac-client \
    search \
    https://earth-search.aws.element84.com/v1/ \
    --c

Let's inspect the stdout produced. 

In [17]:
cat zarr-cloud-native-results.json | jq  '.zarr_stac_catalog.path' -

[0;32m"/workspace/zarr-cloud-native-format/runs/gaaevbi7"[0m


In [18]:
tree $( cat zarr-cloud-native-results.json | jq -r '.zarr_stac_catalog.path' - ) 

[01;34m/workspace/zarr-cloud-native-format/runs/gaaevbi7[0m
├── catalog.json
└── [01;34mwater-bodies[0m
    ├── collection.json
    └── [01;34mresult.zarr[0m
        ├── [01;34mdata[0m
        │   ├── 0.0.0
        │   ├── 0.0.1
        │   ├── 0.0.10
        │   ├── 0.0.2
        │   ├── 0.0.3
        │   ├── 0.0.4
        │   ├── 0.0.5
        │   ├── 0.0.6
        │   ├── 0.0.7
        │   ├── 0.0.8
        │   ├── 0.0.9
        │   ├── 0.1.0
        │   ├── 0.10.0
        │   ├── 0.10.1
        │   ├── 0.10.10
        │   ├── 0.10.2
        │   ├── 0.10.3
        │   ├── 0.10.4
        │   ├── 0.10.5
        │   ├── 0.10.6
        │   ├── 0.10.7


        │   ├── 0.10.8
        │   ├── 0.10.9
        │   ├── 0.1.1
        │   ├── 0.1.10
        │   ├── 0.11.0
        │   ├── 0.11.1
        │   ├── 0.11.10
        │   ├── 0.11.2
        │   ├── 0.11.3
        │   ├── 0.11.4
        │   ├── 0.11.5
        │   ├── 0.11.6
        │   ├── 0.11.7
        │   ├── 0.11.8
        │   ├── 0.11.9
        │   ├── 0.1.2
        │   ├── 0.12.0
        │   ├── 0.12.1
        │   ├── 0.12.10
        │   ├── 0.12.2
        │   ├── 0.12.3
        │   ├── 0.12.4
        │   ├── 0.12.5
        │   ├── 0.12.6
        │   ├── 0.12.7
        │   ├── 0.12.8
        │   ├── 0.12.9
        │   ├── 0.1.3
        │   ├── 0.13.0
        │   ├── 0.13.1
        │   ├── 0.13.10
        │   ├── 0.13.2
        │   ├── 0.13.3
        │   ├── 0.13.4
        │   ├── 0.13.5
        │   ├── 0.13.6
        │   ├── 0.13.7
        │   ├── 0.13.8
        │   ├── 0.13.9
        │   ├── 0.1.4
        │   ├── 0.14.0
        │   ├── 0.14.1
        │   ├── 0.14.10
        │  

In [19]:
stac describe $( cat zarr-cloud-native-results.json | jq -r '.zarr_stac_catalog.path' - )/catalog.json

* <Catalog id=water-bodies>
    * <Collection id=water-bodies>


In [23]:
jq . $( cat zarr-cloud-native-results.json | jq -r '.zarr_stac_catalog.path' - )/water-bodies/collection.json

[1;39m{
  [0m[1;34m"type"[0m[1;39m: [0m[0;32m"Collection"[0m[1;39m,
  [0m[1;34m"id"[0m[1;39m: [0m[0;32m"water-bodies"[0m[1;39m,
  [0m[1;34m"stac_version"[0m[1;39m: [0m[0;32m"1.1.0"[0m[1;39m,
  [0m[1;34m"description"[0m[1;39m: [0m[0;32m"Collection of detected water bodies"[0m[1;39m,
  [0m[1;34m"links"[0m[1;39m: [0m[1;39m[
    [1;39m{
      [0m[1;34m"rel"[0m[1;39m: [0m[0;32m"root"[0m[1;39m,
      [0m[1;34m"href"[0m[1;39m: [0m[0;32m"../catalog.json"[0m[1;39m,
      [0m[1;34m"type"[0m[1;39m: [0m[0;32m"application/json"[0m[1;39m,
      [0m[1;34m"title"[0m[1;39m: [0m[0;32m"Water bodies catalog"[0m[1;39m
    [1;39m}[0m[1;39m,
    [1;39m{
      [0m[1;34m"rel"[0m[1;39m: [0m[0;32m"parent"[0m[1;39m,
      [0m[1;34m"href"[0m[1;39m: [0m[0;32m"../catalog.json"[0m[1;39m,
      [0m[1;34m"type"[0m[1;39m: [0m[0;32m"application/json"[0m[1;39m,
      [0m[1;34m"title"[0m[1;39m: [0m[0;32m"Water bodies c