## Application Package reproducibility

### Personas

* **Alice** developed a Water Body detection Earth Observation application and package it as an EO Application Package
* **Bob** scripts the execution of application

### Scenario

Alice included in the water bodies detection Application Package software repository a Continuous Integration configuration relying on Github Actions to:

* build the containers
* push the built containers to Github container registry
* update the Application Package with these new container references
* push the updated Application Package to Github's artifact registry


Alice sent an email to Bob:

<hr>
from: alice@acme.io

to: bob@acme.io

subject: Detecting water bodies with NDWI and the Otsu threshold


Hi Bob!

checkout my new application package for detecting water bodies using NDWI and the Ostu threshold.

I've ran it over our test site bounding box and preliminary result look promising.

The github repo is https://github.com/eoap/quickwin and I've just released version 1.0.0.

Let me know!

Cheers

Alice
<hr>

With this information, Bob scripts the Application Execution in a Jupyter Notebook.

His environment has a container engine (e.g. podman or docker) and the cwltool CWL runner.

## Running the Scenario

In [1]:
import argparse
import asyncio
import json
import os
from datetime import datetime
from io import StringIO

import nest_asyncio
import pystac
import rasterio
from cwltool.main import main
from ipyleaflet import GeoJSON, Map
from pydantic_yaml import to_yaml_str
from pystac_client import Client
from rasterio.features import dataset_features, sieve

from helpers import Params, get_param_model_fields, get_release_assets, stage_in

from shutil import which

nest_asyncio.apply()

ImportError: cannot import name 'ModelField' from 'pydantic.fields' (/Users/simonevaccari/Documents/repos/eoap/open-reproducible-app-package/env_reproducible_app/lib/python3.12/site-packages/pydantic/fields.py)

## Check the container engine

In [2]:
if which("podman"):
    podman = True
elif which("docker"):
    podman = False
else:
    raise ValueError("No container engine")

## Application Package releases

Bob uses Github API to list the artifacts published by Alice in the release

In [3]:
assets = get_release_assets(
    user="eoap",
    repo="quickwin",
    token=os.environ["GH_PAT"],
)

assets

{'1.0.0': [{'url': 'https://github.com/Terradue/app-package-training-bids23/releases/download/1.0.0/app-water-bodies-cloud-native.1.0.0.cwl',
   'cwl': <cwl_utils.parser.cwl_v1_0.Workflow at 0x7f839fd75a90>,
   'label': 'Water bodies detection based on NDWI and otsu threshold',
   'doc': 'Water bodies detection based on NDWI and otsu threshold applied to Sentinel-2 COG STAC items'},
  {'url': 'https://github.com/Terradue/app-package-training-bids23/releases/download/1.0.0/app-water-body-cloud-native.1.0.0.cwl',
   'cwl': <cwl_utils.parser.cwl_v1_0.Workflow at 0x7f839fd023d0>,
   'label': 'Water bodies detection based on NDWI and the otsu threshold',
   'doc': 'Water bodies detection based on NDWI and otsu threshold applied to a single Sentinel-2 COG STAC item'},
  {'url': 'https://github.com/Terradue/app-package-training-bids23/releases/download/1.0.0/app-water-body.1.0.0.cwl',
   'cwl': <cwl_utils.parser.cwl_v1_0.Workflow at 0x7f83a1b501f0>,
   'label': 'Water body detection based on 

## Running the Application Package to detect water bodies on Sentinel-2 data

Alice published three Application Packages.

 Bob selects the one processing several Sentinel-2 acquisitions provided as STAC Items


In [4]:
app_package = assets["1.0.0"][0]

print(app_package["doc"])

print(app_package["url"])

Water bodies detection based on NDWI and otsu threshold applied to Sentinel-2 COG STAC items
https://github.com/Terradue/app-package-training-bids23/releases/download/1.0.0/app-water-bodies-cloud-native.1.0.0.cwl


The Application Package parameters are discovered and a pydantic model is created

In [5]:
Params.set_fields(**get_param_model_fields(cwl_obj=app_package["cwl"]))

Params.get_fields()

{'aoi': ModelField(name='aoi', type=str, required=True),
 'bands': ModelField(name='bands', type=List[str], required=False, default=['green', 'nir']),
 'epsg': ModelField(name='epsg', type=str, required=False, default='EPSG:4326'),
 'stac_items': ModelField(name='stac_items', type=List[str], required=True)}

The Application Package takes as inputs:
- one or more STAC Items
- a list of the bands for the normalized difference
- an area of interest
- the EPSG code used for the area of interest coordinates

 Bob uses a STAC API endpoint to discover Sentinel-2 acquisitions over an area of interest and time of interest 

In [6]:
URL = "https://earth-search.aws.element84.com/v1/"

headers = []

cat = Client.open(URL, headers=headers)
cat

Bod defines the search parameter and get the results:

In [7]:
# Collection
collections = ["sentinel-2-l2a"]

# Start and end dates
start_date = datetime.fromisoformat("2021-07-08T00:00:00")
stop_date = datetime.fromisoformat("2021-07-08T23:59:59")

bbox = [-121.399, 39.834, -120.74, 40.472]

# Other metadata
cloud_cover = 5

# Query by AOI, start and end date and other params
query = cat.search(
    collections=collections,
    datetime=(start_date, stop_date),
    bbox=bbox,
    query={"eo:cloud_cover": {"lt": cloud_cover}},
)

Bob plots the Sentinel-2 discovered STAC Items footprint:

In [8]:
center = ((bbox[1] + bbox[3]) / 2, (bbox[0] + bbox[2]) / 2)

m = Map(center=center, zoom=8)

for item in list(query.item_collection()):
    geo_json = GeoJSON(
        name=item.id,
        data=item.geometry,
        style={
            "opacity": 1,
            "dashArray": "9",
            "fillOpacity": 0.1,
            "weight": 1,
            "color": "blue",
        },
        hover_style={"color": "white", "dashArray": "0", "fillOpacity": 0.5},
    )
    m.add_layer(geo_json)

m

Map(center=[40.153000000000006, -121.0695], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_i…

Bob lists the STAC Items self link, these are the URLs to the Sentinel-2 STAC Items to process:

In [9]:
[item.get_self_href() for item in list(query.item_collection())]

['https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a/items/S2A_10TFK_20210708_0_L2A',
 'https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a/items/S2A_10TFK_20210708_1_L2A']

And creates the parameters for running the Application Package (the epsg and bands input parameters have default values)

In [10]:
params = Params(
    aoi=",".join([str(elem) for elem in bbox]),
    stac_items=[item.self_href for item in query.item_collection()],
    epsg="EPSG:4326",
    bands=["green", "nir"],
)

params.dict()

{'aoi': '-121.399,39.834,-120.74,40.472',
 'bands': ['green', 'nir'],
 'epsg': 'EPSG:4326',
 'stac_items': ['https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a/items/S2A_10TFK_20210708_0_L2A',
  'https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a/items/S2A_10TFK_20210708_1_L2A']}

Bob writes a YAML file with the parameters and their values:

In [11]:
with open("params-s2.yaml", "w") as file:
    print(to_yaml_str(params), file=file)

The file `params.yaml` contains:

```yaml
aoi: -121.399,39.834,-120.74,40.472
bands:
- green
- nir
epsg: EPSG:4326
stac_items:
- https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a/items/S2A_10TFK_20210708_0_L2A
- https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a/items/S2A_10TFK_20210708_1_L2A
```

Bob uses the CWL runner `cwltool` Python API to script the Application Package execution 

In [12]:
parsed_args = argparse.Namespace(
    podman=podman,
    parallel=True,
    debug=False,
    outdir="./runs",
    workflow=app_package["url"],
    job_order=["params-s2.yaml"],
)

stream_out = StringIO()
stream_err = StringIO()

res = main(
    args=parsed_args,
    stdout=stream_out,
)

assert res == 0

[1;30mINFO[0m /data/work/open-reproducible-app-package/env_reproducible_app/lib/python3.9/site-packages/ipykernel_launcher.py 3.1.20231020140205
[1;30mINFO[0m [workflow ] starting step node_water_bodies
[1;30mINFO[0m [workflow ] start
[1;30mINFO[0m [step node_water_bodies] start
[1;30mINFO[0m [workflow node_water_bodies] start
[1;30mINFO[0m [workflow node_water_bodies] starting step node_crop
[1;30mINFO[0m [step node_crop] start
[1;30mINFO[0m [step node_crop] start
[1;30mINFO[0m [step node_water_bodies] start
[1;30mINFO[0m [workflow node_water_bodies_2] starting step node_crop_2
[1;30mINFO[0m [workflow node_water_bodies_2] start
[1;30mINFO[0m [step node_crop_2] start
[1;30mINFO[0m [step node_crop_2] start
[1;30mINFO[0m [job node_crop] /tmp/h0vl2wbo$ docker \
    run \
    -i \
    --mount=type=bind,source=/tmp/h0vl2wbo,target=/EqECVR \
    --mount=type=bind,source=/tmp/6gt147fu,target=/tmp \
    --workdir=/EqECVR \
    --read-only=true \
    --user=1000:1000

This execution generates as output a JSON file listing all files produced.

The JSON contains the output defined in the CWL workflow that can be accessed with: 

```python
os.path.basename(app_package["cwl"].outputs[0].id)
```

In [13]:
results = json.loads(stream_out.getvalue())

results[os.path.basename(app_package["cwl"].outputs[0].id)]

{'location': 'file:///data/work/open-reproducible-app-package/runs/i6jddzuk',
 'basename': 'i6jddzuk',
 'class': 'Directory',
 'listing': [{'class': 'Directory',
   'location': 'file:///data/work/open-reproducible-app-package/runs/i6jddzuk/S2A_10TFK_20210708_1_L2A',
   'basename': 'S2A_10TFK_20210708_1_L2A',
   'listing': [{'class': 'File',
     'location': 'file:///data/work/open-reproducible-app-package/runs/i6jddzuk/S2A_10TFK_20210708_1_L2A/otsu.tif',
     'basename': 'otsu.tif',
     'checksum': 'sha1$eac01b59127e027758b6b4add8c4b7cb6475ba42',
     'size': 286925,
     'path': '/data/work/open-reproducible-app-package/runs/i6jddzuk/S2A_10TFK_20210708_1_L2A/otsu.tif'},
    {'class': 'File',
     'location': 'file:///data/work/open-reproducible-app-package/runs/i6jddzuk/S2A_10TFK_20210708_1_L2A/S2A_10TFK_20210708_1_L2A.json',
     'basename': 'S2A_10TFK_20210708_1_L2A.json',
     'checksum': 'sha1$34642536318a6ca807ab038cc55685082d22825c',
     'size': 4889,
     'path': '/data/work/

Bob writes a simple code to find the STAC Catalog path and then list the contents of that STAC Catalog:

In [14]:
cat = pystac.read_file(
    [
        listing["path"]
        for listing in results[os.path.basename(app_package["cwl"].outputs[0].id)][
            "listing"
        ]
        if "catalog.json" in listing["path"]
    ][0]
)

cat.describe()

* <Catalog id=catalog>
  * <Item id=S2A_10TFK_20210708_0_L2A>
  * <Item id=S2A_10TFK_20210708_1_L2A>


Bob uses the STAC Python library to open the first STAC Item produced:

In [15]:
item = next(cat.get_items())
item

Bob gets the path of the ostu step asset:

In [16]:
asset_href = item.get_assets()["data"].get_absolute_href()

asset_href

'/data/work/open-reproducible-app-package/runs/i6jddzuk/S2A_10TFK_20210708_0_L2A/otsu.tif'

Bob applies the sieve algorithm and then vectorizes the water bodies.

Finally the water bodies are added to a map

In [17]:
# Define the threshold size to remove small features (in pixels)
threshold = 100  # Adjust this threshold as needed
connectivity = 4  # Use 4-connected pixels for the sieve operation

center = ((bbox[1] + bbox[3]) / 2, (bbox[0] + bbox[2]) / 2)

m = Map(center=center, zoom=8)

with rasterio.open(asset_href) as src:
    result = sieve(src, threshold, connectivity=8)
    for geom in dataset_features(src, band=True, as_mask=True):
        geo_json = GeoJSON(
            name="",
            data=geom,
            style={
                "opacity": 1,
                "fillOpacity": 0.1,
                "weight": 1,
                "color": "red",
            },
            hover_style={"color": "red", "dashArray": "0", "fillOpacity": 0.5},
        )
        m.add_layer(geo_json)
m

Map(center=[40.153000000000006, -121.0695], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_i…

Bob sends an email to Alice

<hr>
from: bob@acme.io

to: alice@acme.io

subject: RE:Detecting water bodies with NDWI and the Otsu threshold


Hi Alice!

The results look promising!

Cheers,

Bob
<hr>