# Zarr Cloud native format workflow

Invoke a Common Workflow Language Workflow to generate the STAC Zarr

This notebook is linked to: https://eoap.github.io/zarr-cloud-native-format/cwl-workflow/zarr



## Setup

In [17]:
export WORKSPACE=/workspace/zarr-cloud-native-format
export RUNTIME=${WORKSPACE}/runs
mkdir -p ${RUNTIME}
cd ${RUNTIME}
oras pull ghcr.io/eoap/zarr-cloud-native-format/app-water-bodies:latest --output ${WORKSPACE}

ls -l ${WORKSPACE}/app-water-bodies.cwl

[?25l7[0m8
78
78
78
78[4F[92m✓[0m Pulled      app-water-bodies.cwl                   12.2/12.2 kB 100.00%    2ms[0m
[0K8[3F  └─ sha256:bee4cb87f4a5549003c10f4be0be4991f46f687cb0653ed562a089ee5f62014d[0m
[0K8[2F[92m✓[0m Pulled      application/vnd.oci.image.manifest.v1+j. 559/559  B 100.00%     0s[0m
[0K8[1F  └─ sha256:6f257baaef69e6e5221425c91addcc5d2fb459b9e5b7f3f0075556de3c2a5605[0m
[0K8[0G[2K[?25hPulled [registry] ghcr.io/eoap/zarr-cloud-native-format/app-water-bodies:latest
Digest: sha256:6f257baaef69e6e5221425c91addcc5d2fb459b9e5b7f3f0075556de3c2a5605
-rw-rw-r-- 1 fbrito fbrito 12541 Jan 20 10:33 /workspace/zarr-cloud-native-format/app-water-bodies.cwl


## Run the Zarr Cloud native format workflow

Inspect and use `cwltool` to run the Zarr Cloud native format workflow definition:


In [18]:
cat ${WORKSPACE}/cwl-workflow/app-water-bodies.cwl | yq e '.["$graph"][0]' -

[36mclass[0m:[32m Workflow[0m
[32m[0m[36mid[0m:[32m water-bodies[0m
[32m[0m[36mlabel[0m:[32m Water bodies detection based on NDWI and otsu threshold[0m
[32m[0m[36mdoc[0m:[32m Water bodies detection based on NDWI and otsu threshold applied to Sentinel-2 COG STAC items[0m
[32m[0m[36mrequirements[0m:
  -[36m class[0m:[32m ScatterFeatureRequirement[0m
[32m  [0m-[36m class[0m:[32m SubworkflowFeatureRequirement[0m
[32m  [0m-[36m class[0m:[32m SchemaDefRequirement[0m
[32m    [0m[36mtypes[0m:
      -[36m $import[0m:[32m https://raw.githubusercontent.com/eoap/schemas/main/string_format.yaml[0m
[32m      [0m-[36m $import[0m:[32m https://raw.githubusercontent.com/eoap/schemas/main/geojson.yaml[0m
[32m      [0m-[36m $import[0m: |-
[32m          https://raw.githubusercontent.com/eoap/schemas/main/experimental/api-endpoint.yaml[0m
[32m      [0m-[36m $import[0m:[32m https://raw.githubusercontent.com/eoap/schemas/main/experimental/disc

Run the CWL description, but first prepare the parameters.

Let's build the job parameters file

In [19]:
cat <<'EOF' > zarr-cloud-native-params.yaml
bands:
- green
- nir
search_request:
  bbox:
  - -121.399
  - 39.834
  - -120.74
  - 40.472
  collections:
  - sentinel-2-l2a
  datetime_interval:
    start:
      value: '2021-06-01T00:00:00'
    end:
      value: '2021-07-15T23:59:59'
  limit: 20
  max-items: 2
stac_api_endpoint:
  headers: []
  url:
    value: https://earth-search.aws.element84.com/v1/
EOF

cat zarr-cloud-native-params.yaml | yq .

[36mbands[0m:
  -[32m green[0m
[32m  [0m-[32m nir[0m
[32m[0m[36msearch_request[0m:[36m[0m
[36m  bbox[0m:
    -[95m -121.399[0m
[95m    [0m-[95m 39.834[0m
[95m    [0m-[95m -120.74[0m
[95m    [0m-[95m 40.472[0m
[95m  [0m[36mcollections[0m:
    -[32m sentinel-2-l2a[0m
[32m  [0m[36mdatetime_interval[0m:[36m[0m
[36m    start[0m:[36m[0m
[36m      value[0m:[32m '2021-06-01T00:00:00'[0m[36m[0m
[36m    end[0m:[36m[0m
[36m      value[0m:[32m '2021-07-15T23:59:59'[0m[36m[0m
[36m  limit[0m:[95m 20[0m
[95m  [0m[36mmax-items[0m:[95m 2[0m
[95m[0m[36mstac_api_endpoint[0m:[36m[0m
[36m  headers[0m: [][36m[0m
[36m  url[0m:[36m[0m
[36m    value[0m:[32m https://earth-search.aws.element84.com/v1/[0m


In [20]:
cwltool \
    --podman \
    --parallel \
    --outdir ${WORKSPACE}/runs \
    ${WORKSPACE}/cwl-workflow/app-water-bodies.cwl#water-bodies \
    zarr-cloud-native-params.yaml > zarr-cloud-native-results.json 2> zarr-cloud-native.log

Let's look at the content of the stderr:

In [21]:
cat zarr-cloud-native.log | egrep -v "WARNING|JSHINT"

[1;30mINFO[0m /home/fbrito/.local/bin/cwltool 3.1.20250110105449
[1;30mINFO[0m Resolved '/workspace/zarr-cloud-native-format/cwl-workflow/app-water-bodies.cwl#water-bodies' to 'file:///workspace/zarr-cloud-native-format/cwl-workflow/app-water-bodies.cwl#water-bodies'
[1;30mINFO[0m [workflow ] starting step discovery
[1;30mINFO[0m [workflow ] start
[1;30mINFO[0m [step discovery] start
[1;30mINFO[0m [job discovery] /tmp/q23b3xiv$ podman \
    run \
    -i \
    --userns=keep-id \
    --mount=type=bind,source=/tmp/q23b3xiv,target=/kurFwI \
    --mount=type=bind,source=/tmp/_fwp5hnv,target=/tmp \
    --workdir=/kurFwI \
    --read-only=true \
    --user=1000:1000 \
    --rm \
    --cidfile=/tmp/0mfyf84v/20260120103357-475374.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/kurFwI \
    ghcr.io/eoap/schemas/stac-api-client@sha256:a7e346f704836d07f5dabc6b29ee3359e7253f4a294d74f3899973b8920da6f7 \
    stac-client \
    search \
    https://earth-search.aws.element84.com/v1/ \
    --c

Let's inspect the stdout produced. 

In [22]:
cat zarr-cloud-native-results.json | jq  '.zarr_stac_catalog.path' -

[0;32m"/workspace/zarr-cloud-native-format/runs/5pmu70kn"[0m


In [23]:
tree $( cat zarr-cloud-native-results.json | jq -r '.zarr_stac_catalog.path' - ) 

[01;34m/workspace/zarr-cloud-native-format/runs/5pmu70kn[0m
├── catalog.json
└── [01;34mwater-bodies[0m
    ├── collection.json
    └── [01;34mwater-bodies.zarr[0m
        ├── [01;34mmeasurements[0m
        │   ├── [01;34mndwi[0m
        │   │   ├── [01;34mc[0m
        │   │   │   └── [01;34m0[0m
        │   │   │       ├── [01;34m0[0m
        │   │   │       │   ├── 0
        │   │   │       │   ├── 1
        │   │   │       │   ├── 10
        │   │   │       │   ├── 2
        │   │   │       │   ├── 3
        │   │   │       │   ├── 4
        │   │   │       │   ├── 5
        │   │   │       │   ├── 6
        │   │   │       │   ├── 7
        │   │   │       │   ├── 8
        │   │   │       │   └── 9
        │   │   │       ├── [01;34m1[0m
        │   │   │       │   ├── 0
        │   │   │       │   ├── 1
        │   │   │       │   ├── 10
        │   │   │       │   ├── 2
        │   │   │       │   ├── 3
        │   │   │       │   ├── 4
        │   │   │       

In [24]:
stac describe $( cat zarr-cloud-native-results.json | jq -r '.zarr_stac_catalog.path' - )/catalog.json

* <Catalog id=water-bodies>
    * <Collection id=water-bodies>


In [9]:
jq . $( cat zarr-cloud-native-results.json | jq -r '.zarr_stac_catalog.path' - )/water-bodies/collection.json

[1;39m{
  [0m[1;34m"type"[0m[1;39m: [0m[0;32m"Collection"[0m[1;39m,
  [0m[1;34m"id"[0m[1;39m: [0m[0;32m"water-bodies"[0m[1;39m,
  [0m[1;34m"stac_version"[0m[1;39m: [0m[0;32m"1.1.0"[0m[1;39m,
  [0m[1;34m"description"[0m[1;39m: [0m[0;32m"Detected water bodies"[0m[1;39m,
  [0m[1;34m"links"[0m[1;39m: [0m[1;39m[
    [1;39m{
      [0m[1;34m"rel"[0m[1;39m: [0m[0;32m"root"[0m[1;39m,
      [0m[1;34m"href"[0m[1;39m: [0m[0;32m"../catalog.json"[0m[1;39m,
      [0m[1;34m"type"[0m[1;39m: [0m[0;32m"application/json"[0m[1;39m,
      [0m[1;34m"title"[0m[1;39m: [0m[0;32m"Water bodies"[0m[1;39m
    [1;39m}[0m[1;39m,
    [1;39m{
      [0m[1;34m"rel"[0m[1;39m: [0m[0;32m"store"[0m[1;39m,
      [0m[1;34m"href"[0m[1;39m: [0m[0;32m"water-bodies.zarr"[0m[1;39m,
      [0m[1;34m"type"[0m[1;39m: [0m[0;32m"application/vnd.zarr; version=3"[0m[1;39m,
      [0m[1;34m"title"[0m[1;39m: [0m[0;32m"Zarr store for Water