# Platform data management using CWL

This notebook uses the Python kernel.

This notebook reproduces the Platform Data Management using CWL, it:
- executes the stage-in with a reference to catalog entry containing a Landsat-9 STAC Item 
- uses the folder produced by the stage-in step as input for the Application Package execution
- executes the stage-out with as input the S3 parameters and the folder produced by the Application Package execution

In [35]:
import os
from os import listdir
import argparse
import yaml
import json
import pystac
from cwltool.main import main
from io import StringIO


## Stage-in the Landsat-9 scene using CWL

Create the job order for `cwltool`.

It contains the reference to catalog entry containing a Landsat-9 STAC Item 

In [4]:
# create the YAML parameter file for cwltool
with open("stage-in-params.yaml", "w") as f:
    yaml.dump({"reference": "https://planetarycomputer.microsoft.com/api/stac/v1/collections/landsat-c2-l2/items/LC09_L2SP_042033_20231015_02_T1"}, f)

Invoke the `stage-in.cwl` with `cwltool`:

In [5]:
parsed_args = argparse.Namespace(
    podman=True,
    debug=False,
    outdir="./runs",
    workflow="cwl-cli/stage-in.cwl",
    job_order=["stage-in-params.yaml"],
)

stream_out = StringIO()
stream_err = StringIO()

res = main(
    args=parsed_args,
    stdout=stream_out,
)

assert res == 0

[1;30mINFO[0m /workspace/.venv/lib/python3.9/site-packages/ipykernel_launcher.py 3.1.20240909164951
[1;30mINFO[0m Resolved 'cwl-cli/stage-in.cwl' to 'file:///workspace/stac-eoap/notebooks/cwl-cli/stage-in.cwl'
[1;30mINFO[0m [job main] /tmp/6wllcrmb$ podman \
    run \
    -i \
    --userns=keep-id \
    --mount=type=bind,source=/tmp/6wllcrmb,target=/OKsJRa \
    --mount=type=bind,source=/tmp/01twnvfj,target=/tmp \
    --workdir=/OKsJRa \
    --read-only=true \
    --user=1001:100 \
    --rm \
    --cidfile=/tmp/odrdgjmb/20241007125131-693626.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/OKsJRa \
    ghcr.io/eoap/mastering-app-package/stage:1.0.0 \
    python \
    stage.py \
    https://planetarycomputer.microsoft.com/api/stac/v1/collections/landsat-c2-l2/items/LC09_L2SP_042033_20231015_02_T1
[1;30mINFO[0m [job main] Max memory used: 4621MiB
[1;30mINFO[0m [job main] completed success
[1;30mINFO[0m Final process status is success


Inspect the results and retrieve the folder where the Landsat-9 acquisition was staged:

In [15]:
stage_in_results = json.loads(stream_out.getvalue())

staged = stage_in_results["staged"]["location"].replace("file://", "")

staged

'/workspace/stac-eoap/notebooks/runs/6wllcrmb'

In [32]:
listdir(staged)

['catalog.json', 'LC09_L2SP_042033_20231015_02_T1', 'stage.py']

Optionally inspect the staged STAC Catalog:

In [37]:
staged_catalog = pystac.read_file(os.path.join(staged, "catalog.json"))

staged_catalog.describe()


* <Catalog id=catalog>
  * <Item id=LC09_L2SP_042033_20231015_02_T1>


In [42]:
item = next(staged_catalog.get_items())

for key, asset in item.get_assets().items():
    print(key, asset)

qa <Asset href=./LC09_L2SP_042033_20231015_20231016_02_T1_ST_QA.TIF>
ang <Asset href=./LC09_L2SP_042033_20231015_20231016_02_T1_ANG.txt>
red <Asset href=./LC09_L2SP_042033_20231015_20231016_02_T1_SR_B4.TIF>
blue <Asset href=./LC09_L2SP_042033_20231015_20231016_02_T1_SR_B2.TIF>
drad <Asset href=./LC09_L2SP_042033_20231015_20231016_02_T1_ST_DRAD.TIF>
emis <Asset href=./LC09_L2SP_042033_20231015_20231016_02_T1_ST_EMIS.TIF>
emsd <Asset href=./LC09_L2SP_042033_20231015_20231016_02_T1_ST_EMSD.TIF>
trad <Asset href=./LC09_L2SP_042033_20231015_20231016_02_T1_ST_TRAD.TIF>
urad <Asset href=./LC09_L2SP_042033_20231015_20231016_02_T1_ST_URAD.TIF>
atran <Asset href=./LC09_L2SP_042033_20231015_20231016_02_T1_ST_ATRAN.TIF>
cdist <Asset href=./LC09_L2SP_042033_20231015_20231016_02_T1_ST_CDIST.TIF>
green <Asset href=./LC09_L2SP_042033_20231015_20231016_02_T1_SR_B3.TIF>
nir08 <Asset href=./LC09_L2SP_042033_20231015_20231016_02_T1_SR_B5.TIF>
lwir11 <Asset href=./LC09_L2SP_042033_20231015_20231016_02_T1_S

## Invoke the application package with the staged Landsat-9 scene

Create the job order for `cwltool`.

It contains the path to the staged Landsat-9 and the Application Package parameters: 

In [19]:

# create the YAML parameter file
with open("params.yaml", "w") as f:
    f.write(yaml.dump({"item": {"class": "Directory", "path": staged}, "aoi": "-118.985,38.432,-118.183,38.938", "epsg": "EPSG:4326", "bands": ["green", "nir08"]}))


Invoke the Application Package with `cwltool`:

In [23]:


parsed_args = argparse.Namespace(
    podman=True,
    debug=False,
    outdir="./runs",
    workflow="cwl-workflow/app-water-bodies.cwl",
    job_order=["params.yaml"],
)

stream_out = StringIO()
stream_err = StringIO()

res = main(
    args=parsed_args,
    stdout=stream_out,
)

assert res == 0

[1;30mINFO[0m /workspace/.venv/lib/python3.9/site-packages/ipykernel_launcher.py 3.1.20240909164951
[1;30mINFO[0m Resolved 'cwl-workflow/app-water-bodies.cwl' to 'file:///workspace/stac-eoap/notebooks/cwl-workflow/app-water-bodies.cwl'
[1;30mINFO[0m [workflow _4] start
[1;30mINFO[0m [workflow _4] starting step node_detect_4
[1;30mINFO[0m [step node_detect_4] start
[1;30mINFO[0m [job node_detect_4] /tmp/iluwlpiu$ podman \
    run \
    -i \
    --userns=keep-id \
    --mount=type=bind,source=/tmp/iluwlpiu,target=/OKsJRa \
    --mount=type=bind,source=/tmp/_evs452e,target=/tmp \
    --mount=type=bind,source=/workspace/stac-eoap/notebooks/runs/6wllcrmb,target=/var/lib/cwl/stgee28c509-f435-4404-8bf6-43ab8be9e5aa/6wllcrmb,readonly \
    --workdir=/OKsJRa \
    --read-only=true \
    --user=1001:100 \
    --rm \
    --cidfile=/tmp/f4rx6smn/20241007130050-584601.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/OKsJRa \
    ghcr.io/eoap/quickwin/detect-water-body@sha256:e7ae9cd60e197f

Inspect the results and retrieve the folder where the Application Package results where produced:

In [29]:
app_results = json.loads(stream_out.getvalue())

processed = app_results["stac_catalog"]["location"].replace("file://", "")

processed

'/workspace/stac-eoap/notebooks/runs/iluwlpiu'

In [33]:
listdir(processed)

['catalog.json', 'LC09_L2SP_042033_20231015_02_T1', '__pycache__', 'app.py']

Optionally inspect the STAC Catalog generated by the Application Package:

In [43]:
processed_catalog = pystac.read_file(os.path.join(processed, "catalog.json"))

processed_catalog.describe()


* <Catalog id=catalog>
  * <Item id=LC09_L2SP_042033_20231015_02_T1>


In [44]:

item = next(processed_catalog.get_items())

for key, asset in item.get_assets().items():
    print(key, asset)

data <Asset href=./otsu.tif>


## Stage-out

Create the job order for `cwltool`.

It contains the path to the folder containing the Application Package results and the S3 object storage parameters:


In [47]:
# create the YAML parameter file
with open("stage-out-params.yaml", "w") as f:
    yaml.dump({"stac_catalog": {"class": "Directory", "path": processed}, "aws_access_key_id": "test", 
             "aws_secret_access_key": "test",
             "endpoint_url": "http://localstack:4566", 
                "s3_bucket": "results",
                 "sub_path": "run-004", 
                 "region_name": "us-east-1"}, f)

Invoke the `stage-out.cwl` CWL document with `cwltool`:

In [48]:
parsed_args = argparse.Namespace(
    podman=True,
    debug=False,
    outdir="./runs",
    workflow="cwl-cli/stage-out.cwl",
    job_order=["stage-out-params.yaml"],
)

stream_out = StringIO()
stream_err = StringIO()

res = main(
    args=parsed_args,
    stdout=stream_out,
)

assert res == 0

[1;30mINFO[0m /workspace/.venv/lib/python3.9/site-packages/ipykernel_launcher.py 3.1.20240909164951
[1;30mINFO[0m Resolved 'cwl-cli/stage-out.cwl' to 'file:///workspace/stac-eoap/notebooks/cwl-cli/stage-out.cwl'


[1;30mINFO[0m [job stage-out_2] /tmp/kgyxnpiy$ podman \
    run \
    -i \
    --userns=keep-id \
    --mount=type=bind,source=/tmp/kgyxnpiy,target=/OKsJRa \
    --mount=type=bind,source=/tmp/tk7yhj_f,target=/tmp \
    --mount=type=bind,source=/workspace/stac-eoap/notebooks/runs/iluwlpiu,target=/var/lib/cwl/stg680f45e3-3caf-414f-95bc-df8a6188820f/iluwlpiu,readonly \
    --workdir=/OKsJRa \
    --read-only=true \
    --user=1001:100 \
    --rm \
    --cidfile=/tmp/tblgjetb/20241007130945-454058.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/OKsJRa \
    --env=aws_access_key_id=test \
    --env=aws_endpoint_url=http://localstack:4566 \
    --env=aws_region_name=us-east-1 \
    --env=aws_secret_access_key=test \
    ghcr.io/eoap/mastering-app-package/stage:1.0.0 \
    python \
    stage.py \
    /var/lib/cwl/stg680f45e3-3caf-414f-95bc-df8a6188820f/iluwlpiu \
    results \
    run-004
[1;30mINFO[0m [job stage-out_2] Max memory used: 4316MiB
[1;30mINFO[0m [job stage-out_2] completed su

Inspect the results and print the S3 URL to the staged-out `catalog.json` file and associated STAC Item and assets:

In [51]:
stage_out_results = stream_out.getvalue()

json.loads(stage_out_results)["s3_catalog_output"]

's3://results/run-004/catalog.json'