# Sentinel-2 Cloud native processing workflow

## Goal 

Prepare the CWL Workflow orchestrating CWL CommandLineTool document(s) wrapping the command line tool available container(s).

This notebook is linked to: 
https://eoap.github.io/mastering-app-package/cwl-workflow/cloud-native/


The Cloud native Workflow chains the `crop`, `norm_diff`, `otsu` and `stac` steps taking a single STAC item as input parameters:

- a SpatioTemporal Asset Catalog (STAC) Item
- a bounding box area of interest (AOI)
- the EPSG code of the bounding box area of interest
- a list of common band names (["`green`", "`nir`"])

## Setup

In [1]:
export WORKSPACE=/workspace/mastering-app-package
export RUNTIME=${WORKSPACE}/runs
mkdir -p ${RUNTIME}
cd ${RUNTIME}

## CWL Workflow

We now have a `$graph` list and several CWL descriptions: one `Workflow` and four `CommandLineTool`:

In [8]:
cat ${WORKSPACE}/cwl-workflow/app-water-body-cloud-native.cwl | yq e .'$graph[].class' -

Workflow
CommandLineTool
CommandLineTool
CommandLineTool
CommandLineTool


The `CommandLineTool` ids are the all the CommandLineTool created in the previous step:

In [14]:
cat ${WORKSPACE}/cwl-workflow/app-water-body-cloud-native.cwl | yq e .'$graph[1].id' -
cat ${WORKSPACE}/cwl-workflow/app-water-body-cloud-native.cwl | yq e .'$graph[2].id' -
cat ${WORKSPACE}/cwl-workflow/app-water-body-cloud-native.cwl | yq e .'$graph[3].id' -
cat ${WORKSPACE}/cwl-workflow/app-water-body-cloud-native.cwl | yq e .'$graph[4].id' -

crop
norm_diff
otsu
stac


Let's look at the `Workflow`:


In [16]:
cat ${WORKSPACE}/cwl-workflow/app-water-body-cloud-native.cwl | yq e .'$graph[0]' -

[36mclass[0m:[32m Workflow[0m
[32m[0m[36mid[0m:[32m main[0m
[32m[0m[36mlabel[0m:[32m Water bodies detection based on NDWI and the otsu threshold[0m
[32m[0m[36mdoc[0m:[32m Water bodies detection based on NDWI and otsu threshold applied to a single Sentinel-2 COG STAC item[0m
[32m[0m[36mrequirements[0m:
  -[36m class[0m:[32m ScatterFeatureRequirement[0m
[32m[0m[36minputs[0m:[36m[0m
[36m  aoi[0m:[36m[0m
[36m    label[0m:[32m area of interest[0m
[32m    [0m[36mdoc[0m:[32m area of interest as a bounding box[0m
[32m    [0m[36mtype[0m:[32m string[0m
[32m  [0m[36mepsg[0m:[36m[0m
[36m    label[0m:[32m EPSG code[0m
[32m    [0m[36mdoc[0m:[32m EPSG code[0m
[32m    [0m[36mtype[0m:[32m string[0m
[32m    [0m[36mdefault[0m:[32m "EPSG:4326"[0m[36m[0m
[36m  bands[0m:[36m[0m
[36m    label[0m:[32m bands used for the NDWI[0m
[32m    [0m[36mdoc[0m:[32m bands used for the NDWI[0m
[32m    [0m[36mtype[0m:[3

Let's look at the `inputs` element.

These are the Application Package inputs: 

- a SpatioTemporal Asset Catalog (STAC) Item: `item`
- a bounding box area of interest (AOI): `aoi`
- the EPSG code of the bounding box area of interest: `epsg`
- a list of common band names (["`green`", "`nir`"]): `bands`

In [18]:
cat ${WORKSPACE}/cwl-workflow/app-water-body-cloud-native.cwl | yq e .'$graph[0].inputs' -

[36maoi[0m:[36m[0m
[36m  label[0m:[32m area of interest[0m
[32m  [0m[36mdoc[0m:[32m area of interest as a bounding box[0m
[32m  [0m[36mtype[0m:[32m string[0m
[32m[0m[36mepsg[0m:[36m[0m
[36m  label[0m:[32m EPSG code[0m
[32m  [0m[36mdoc[0m:[32m EPSG code[0m
[32m  [0m[36mtype[0m:[32m string[0m
[32m  [0m[36mdefault[0m:[32m "EPSG:4326"[0m[36m[0m
[36mbands[0m:[36m[0m
[36m  label[0m:[32m bands used for the NDWI[0m
[32m  [0m[36mdoc[0m:[32m bands used for the NDWI[0m
[32m  [0m[36mtype[0m:[32m string[][0m
[32m  [0m[36mdefault[0m: [[32m"green"[0m,[32m "nir"[0m][36m[0m
[36mitem[0m:[36m[0m
[36m  doc[0m:[32m Reference to a STAC item[0m
[32m  [0m[36mlabel[0m:[32m STAC item reference[0m
[32m  [0m[36mtype[0m:[32m string[0m


They all have an `id`, a `label` and a `doc` that describes them

Let's look at the `outputs` element.

The output is a STAC catalog, output id `stac_catalog` and its source comes from `node_stac`. 

`node_stac` is the last step of the `Workflow`


In [20]:
cat ${WORKSPACE}/cwl-workflow/app-water-body-cloud-native.cwl | yq e .'$graph[0].outputs' -

-[36m id[0m:[32m stac_catalog[0m
[32m  [0m[36moutputSource[0m:
    -[32m node_stac/stac_catalog[0m
[32m  [0m[36mtype[0m:[32m Directory[0m


Let's look at the Workflow steps.

It is a mapping of `steps`, each with inputs `in`, outputs `out` and a CWL to run `run` that uses an anchor to the `CommandLineTool` id: 

In [22]:
cat ${WORKSPACE}/cwl-workflow/app-water-body-cloud-native.cwl | yq e .'$graph[0].steps' -

[36mnode_crop[0m:[36m[0m
[36m  run[0m:[32m "#crop"[0m[36m[0m
[36m  in[0m:[36m[0m
[36m    item[0m:[32m item[0m
[32m    [0m[36maoi[0m:[32m aoi[0m
[32m    [0m[36mepsg[0m:[32m epsg[0m
[32m    [0m[36mband[0m:[32m bands[0m
[32m  [0m[36mout[0m:
    -[32m cropped[0m
[32m  [0m[36mscatter[0m:[32m band[0m
[32m  [0m[36mscatterMethod[0m:[32m dotproduct[0m
[32m[0m[36mnode_normalized_difference[0m:[36m[0m
[36m  run[0m:[32m "#norm_diff"[0m[36m[0m
[36m  in[0m:[36m[0m
[36m    rasters[0m:[36m[0m
[36m      source[0m:[32m node_crop/cropped[0m
[32m  [0m[36mout[0m:
    -[32m ndwi[0m
[32m[0m[36mnode_otsu[0m:[36m[0m
[36m  run[0m:[32m "#otsu"[0m[36m[0m
[36m  in[0m:[36m[0m
[36m    raster[0m:[36m[0m
[36m      source[0m:[32m node_normalized_difference/ndwi[0m
[32m  [0m[36mout[0m:
    -[32m binary_mask_item[0m
[32m[0m[36mnode_stac[0m:[36m[0m
[36m  run[0m:[32m "#stac"[0m[36m[0m
[36m  in[0m:

The first step, `crop`, applies the fan-out pattern on the input `bands` which is a list. 

This step is invoked with `bands: [green, nir]`

The `in` mapping maps the step inputs to the Workflow inputs.

In [24]:
cat ${WORKSPACE}/cwl-workflow/app-water-body-cloud-native.cwl | yq e .'$graph[0].steps["node_crop"]' -

[36mrun[0m:[32m "#crop"[0m[36m[0m
[36min[0m:[36m[0m
[36m  item[0m:[32m item[0m
[32m  [0m[36maoi[0m:[32m aoi[0m
[32m  [0m[36mepsg[0m:[32m epsg[0m
[32m  [0m[36mband[0m:[32m bands[0m
[32m[0m[36mout[0m:
  -[32m cropped[0m
[32m[0m[36mscatter[0m:[32m band[0m
[32m[0m[36mscatterMethod[0m:[32m dotproduct[0m


The second step applies the normalized difference.

The `in` mapping defines the `rasters` inputs as the outputs of the `node_crop` step with `source: node_crop/cropped`. This is how the orchestration of the steps is defined in CWL.



In [26]:
cat ${WORKSPACE}/cwl-workflow/app-water-body-cloud-native.cwl | yq e .'$graph[0].steps["node_normalized_difference"]' -

[36mrun[0m:[32m "#norm_diff"[0m[36m[0m
[36min[0m:[36m[0m
[36m  rasters[0m:[36m[0m
[36m    source[0m:[32m node_crop/cropped[0m
[32m[0m[36mout[0m:
  -[32m ndwi[0m


The third step behaves the same way:

In [28]:
cat ${WORKSPACE}/cwl-workflow/app-water-body-cloud-native.cwl | yq e .'$graph[0].steps["node_otsu"]' -

[36mrun[0m:[32m "#otsu"[0m[36m[0m
[36min[0m:[36m[0m
[36m  raster[0m:[36m[0m
[36m    source[0m:[32m node_normalized_difference/ndwi[0m
[32m[0m[36mout[0m:
  -[32m binary_mask_item[0m


The final step, `node_stac` takes the Workflow input `stac_item` and the output of `node_otsu`.

In [29]:
cat ${WORKSPACE}/cwl-workflow/app-water-body-cloud-native.cwl | yq e .'$graph[0].steps["node_stac"]' -

[36mrun[0m:[32m "#stac"[0m[36m[0m
[36min[0m:[36m[0m
[36m  item[0m:[32m item[0m
[32m  [0m[36mrasters[0m:[36m[0m
[36m    source[0m:[32m node_otsu/binary_mask_item[0m
[32m[0m[36mout[0m:
  -[32m stac_catalog[0m


## Running the Workflow

In [31]:
cwltool \
    --parallel \
    --outdir ${WORKSPACE}/runs \
    ${WORKSPACE}/cwl-workflow/app-water-body-cloud-native.cwl \
    --item "https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_10TFK_20210713_0_L2A" \
    --aoi="-121.399,39.834,-120.74,40.472" \
    --epsg "EPSG:4326" \
    --bands green \
    --bands nir

[1;30mINFO[0m /home/fbrito/.local/bin/cwltool 3.1.20240112164112
[1;30mINFO[0m Resolved '/workspace/mastering-app-package/cwl-workflow/app-water-body-cloud-native.cwl' to 'file:///workspace/mastering-app-package/cwl-workflow/app-water-body-cloud-native.cwl'
[1;30mINFO[0m [workflow ] starting step node_crop
[1;30mINFO[0m [step node_crop] start
[1;30mINFO[0m [workflow ] start
[1;30mINFO[0m [step node_crop] start
[1;30mINFO[0m [job node_crop_2] /tmp/2d7t1zi9$ podman \
    run \
    -i \
    --userns=keep-id \
    --mount=type=bind,source=/tmp/2d7t1zi9,target=/XlYjPs \
    --mount=type=bind,source=/tmp/op7to6_k,target=/tmp \
    --workdir=/XlYjPs \
    --read-only=true \
    --user=1000:1000 \
    --rm \
    --cidfile=/tmp/rwfu8ta7/20240414184459-989787.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/XlYjPs \
    --env=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    --env=PYTHONPATH=/app \
    localhost/crop:latest \
    python \
    -m \
    app \
    -