# How to prepare data for download

To prepare the geological data, it is required to log the data in the project context

## 1. Initialize the project

Create the working context: data download project if not created already. Project is a placeholder for the code, data, and management of the data operations.

In [None]:
import digitalhub as dh
PROJECT_NAME = "<YOUR_PROJECT_NAME>"
proj = dh.get_or_create_project(PROJECT_NAME)

## 2. Preparation guidelines and examples

This section describes required fields, recommended practices, and example payloads (stringified JSON) to prepare Sentinel data downloads. Use these payloads as the `string_dict_data` argument when calling the download function in this notebook.

As indicated in README.MD file register and store Copernicus CDSETOOL credentials in the project context as shown below

In [None]:
proj.set_secret("CDSETOOL_ESA_USER", "<YOUR_COPERNICUS_USERNAME>")
proj.set_secret("CDSETOOL_ESA_PASSWORD", "<YOUR_COPERNICUS_PASSWORD>")

In the following sections, one can find usage examples for different kinds of geo data.

### 1- Natural disaster events data

This section demonstrates the date fetch process in case of natural disaster tragedy like geological hazards, alluvion, flood etc, the data is fetched for a given event date compute pre/post windows for elaboration and analysis. The data is prepared for those dates (ISO format) in different payloads(one per sensor/window) with distinct 'artifact' names inside to the project context.

In [None]:
# Example payload (stringified JSON)
{
 "satelliteParams":{
    "satelliteType": "Sentinel2",
    "processingLevel": "S2MSI2A",
	"bandmath": ["NDWI"]
 },
 "startDate": "2020-10-02",
 "endDate": "2020-10-22",
 "geometry": "POLYGON ((10.644988646837982 45.85539621678084, 10.644988646837982 46.06780100571985, 10.991744628283294 46.06780100571985, 10.991744628283294 45.85539621678084, 10.644988646837982 45.85539621678084))",
 "cloudCover": "[0,20]",
 "area_sampling": "True",
 "tmp_path_same_folder_dwl": "True",
 "artifact_name": "sentinel2_post_event",
 "preprocess_data_only": "false"
 }

Example workflow: For flood analysis four datasets are prepared (e.g. `sentinel2_pre_flood`, `sentinel2_post_flood`, `sentinel1_GRD_preflood`, `sentinel1_GRD_postflood`). The four datasets: Sentinel‑2 imagery for the 20‑day pre‑ and post‑event windows, and Sentinel‑1 SAR for the 7‑day pre‑ and post‑event windows. Accordingly, the section below demonstrates four separate runs of the "download-sentinel-data" function — one per sensor/time window. Example payload (stringified JSON)

Fetch the "download-sentinel-data" operation in the project. 

In [None]:
function_data = proj.get_function("download-sentinel-data")

#### Post Flood Sentinel2 Data +20days

The parameters passed for sentinel downloads includes the starts and ends dates corresponding flood event. The ouput of this step will be logged inside to the platfrom project context as indicated by parameter "artifact_name" ("sentinel2_post_flood").Several other paramters can be configures as per requirements for e.g. geometry, cloud cover percentage etc

In [None]:
# Example payload (stringified JSON)

string_dict_data = """{
 "satelliteParams":{
    "satelliteType": "Sentinel2",
    "processingLevel": "S2MSI2A",
	"bandmath": ["NDWI"]
 },
 "startDate": "2020-10-02",
 "endDate": "2020-10-22",
 "geometry": "POLYGON ((10.644988646837982 45.85539621678084, 10.644988646837982 46.06780100571985, 10.991744628283294 46.06780100571985, 10.991744628283294 45.85539621678084, 10.644988646837982 45.85539621678084))",
 "cloudCover": "[0,20]",
 "area_sampling": "True",
 "tmp_path_same_folder_dwl": "True",
 "artifact_name": "sentinel2_post_flood",
 "preprocess_data_only": "false"
 }"""
list_args =  ["main.py",string_dict_data]

Run the function. As a result the post flood sentinel-2 data is logged as project artifact("sentinel2_post_flood")

In [None]:
func_run = function_data.run(action="job",
secrets=["CDSETOOL_ESA_USER","CDSETOOL_ESA_PASSWORD"],
fs_group="8877",
args=list_args,
resources={"cpu": "6","mem": "32Gi"},
volumes=[{
    "volume_type": "persistent_volume_claim",
    "name": "volume-flood",
    "mount_path": "/app/files",
    "spec": {
        "size": "100Gi"
        }
    }]
)

#### Pre Flood Sentinel2 Data -20 days

In [None]:
string_dict_data = """{
     "satelliteParams":{
        "satelliteType": "Sentinel2",
        "processingLevel": "S2MSI2A",
    	"bandmath": ["NDWI"]
     },
     "startDate": "2020-09-12",
     "endDate": "2020-10-02",
     "geometry": "POLYGON ((10.644988646837982 45.85539621678084, 10.644988646837982 46.06780100571985, 10.991744628283294 46.06780100571985, 10.991744628283294 45.85539621678084, 10.644988646837982 45.85539621678084))",
     "cloudCover": "[0,20]",
     "area_sampling": "True",
     "tmp_path_same_folder_dwl": "True",
     "artifact_name": "sentinel2_pre_flood",
     "preprocess_data_only": "false"
     }"""
list_args =  ["main.py",string_dict_data]

Run the function again. As a result the pre flood sentinel-2 data is logged as project artifact("sentinel2_post_flood")

In [None]:
run = function_data.run(action="job",
    secrets=["CDSETOOL_ESA_USER","CDSETOOL_ESA_PASSWORD"],
    fs_group="8877",
    args=list_args,
    resources={"cpu": "6","mem": "32Gi"},
    volumes=[{
        "volume_type": "persistent_volume_claim",
        "name": "volume-flood",
        "mount_path": "/app/files",
        "spec": {
            "size": "100Gi"
            }
    }]
    )

#### Post Flood Sentinel1 Data +7days

The parameters passed for sentinel-1 downloads includes the starts and ends dates corresponding to period of 7 days from flood event date. The ouput of this step will be logged inside to the platfrom project context as indicated by parameter "artifact_name" ("sentinel1_GRD_postflood").Several other paramters can be configures as per requirements for e.g. geometry, cloud cover percentage etc.

In [None]:
string_dict_data = """{
"satelliteParams": {
  "satelliteType": "Sentinel1",
  "processingLevel": "LEVEL1",
  "sensorMode": "IW",
  "productType": "GRD"
},
"startDate": "2020-10-02",
"endDate": "2020-10-09",
"geometry": "POLYGON ((10.644988646837982 45.85539621678084, 10.644988646837982 46.06780100571985, 10.991744628283294 46.06780100571985, 10.991744628283294 45.85539621678084, 10.644988646837982 45.85539621678084))",
"area_sampling": "True",
"tmp_path_same_folder_dwl":"True",
"artifact_name": "sentinel1_GRD_postflood"
}"""
list_args =  ["main.py",string_dict_data]

Run the function. As a result the post flood sentinel-2 data is logged as project artifact("sentinel2_post_flood")

In [None]:
run = function_data.run(action="job",
    secrets=["CDSETOOL_ESA_USER","CDSETOOL_ESA_PASSWORD"],
    fs_group="8877",
    args=list_args,
    resources={"cpu": "6","mem": "32Gi"},
    volumes=[{
        "volume_type": "persistent_volume_claim",
        "name": "volume-flood",
        "mount_path": "/app/files",
        "spec": {
            "size": "100Gi"
        }}])

#### Pre Flood Sentinel1 Data -7days
Similary download the sentine-1 data pre flood event.

In [None]:
string_dict_data = """{
  "satelliteParams": {
          "satelliteType": "Sentinel1",
          "processingLevel": "LEVEL1",
          "sensorMode": "IW",
          "productType": "GRD"
      },
      "startDate": "2020-09-25",
      "endDate": "2020-10-02",
      "geometry": "POLYGON ((10.644988646837982 45.85539621678084, 10.644988646837982 46.06780100571985, 10.991744628283294 46.06780100571985, 10.991744628283294 45.85539621678084, 10.644988646837982 45.85539621678084))",
      "area_sampling": "True",
      "tmp_path_same_folder_dwl":"True",
      "artifact_name": "sentinel1_GRD_preflood"
  }"""
list_args =  ["main.py",string_dict_data]

In [None]:
run = function_data.run(action="job",
    secrets=["CDSETOOL_ESA_USER","CDSETOOL_ESA_PASSWORD"],
    fs_group="8877",
    args=list_args,
    resources={"cpu": "6","mem": "32Gi"},
    envs={},
    volumes=[{
        "volume_type": "persistent_volume_claim",
        "name": "volume-flood",
        "mount_path": "/app/files",
        "spec": {
            "size": "100Gi"
        }}])

Another example of natural disaster even is the geological hazard or landslide montioring scenario. For landslide monitoring, the "download-sentinel-data" function is used to fetch Sentinel‑1 SLC scenes that cover the specified AOI geometry and time window. Retrieved images are split by orbit direction — ascending acquisitions are stored in an ascending artifact and descending acquisitions in a descending artifact — allowing independent processing workflows (e.g., SLC stacking or InSAR) for each orbit geometry. The example cells below show submitting one job per orbit direction and saving the results as project artifacts.

#### Landslide Sentinel 1 Data acquistion (Ascending)

In [None]:
s1_ascending = "s1_ascending_landslide_2020-10-01_2020-01-14"
startDate = "2020-10-01"
endDate = "2020-10-14"
geometry = "POLYGON ((10.595369 45.923394, 10.644894 45.923394, 10.644894 45.945838, 10.595369 45.945838, 10.595369 45.923394))"
string_dict_data_asc = """{
    "satelliteParams":{
        "satelliteType": "Sentinel1",
        "processingLevel": "LEVEL1",
        "sensorMode": "IW","productType": "SLC",
        "orbitDirection": "ASCENDING",
        "relativeOrbitNumber": "117"
        },
    "startDate": \"""" + startDate + """\",
    "endDate": \"""" + endDate + """\",
    "geometry": \"""" + geometry  + """\",
    "area_sampling": "True",
    "tmp_path_same_folder_dwl":"True",
    "artifact_name": \"""" + s1_ascending + """\"
    }
"""
list_args =  ["main.py",string_dict_data_asc]

In [None]:
run = function_data.run(action="job",
    secrets=["CDSETOOL_ESA_USER","CDSETOOL_ESA_PASSWORD"],
    fs_group="8877",
    args=list_args,
    resources={"cpu": "6","mem": "32Gi"},
    envs=[{"name": "TMPDIR", "value": "/app/files"}],
    volumes=[{
        "volume_type": "persistent_volume_claim",
        "name": "volume-land",
        "mount_path": "/app/files",
        "spec": {
            "size": "100Gi"
        }}]
    )

#### Landslide Sentinel 1 Data acquistion (Descending)

In [None]:
s1_descending = "s1_descending_landslide_2020-10-01_2010-01-14"
startDate = "2010-01-01"
endDate = "2010-01-14"
geometry = "POLYGON ((10.595369 45.923394, 10.644894 45.923394, 10.644894 45.945838, 10.595369 45.945838, 10.595369 45.923394))"
string_dict_data_des = """{
    "satelliteParams":{
        "satelliteType": "Sentinel1",
        "processingLevel": "LEVEL1",
        "sensorMode": "IW",
        "productType": "SLC",
        "orbitDirection": "DESCENDING",
        "relativeOrbitNumber": "168"
        },
    "startDate": \"""" + startDate + """\",
    "endDate": \"""" + endDate + """\",
    "geometry": \"""" + geometry + """\",
    "tmp_path_same_folder_dwl":"True",
    "area_sampling": "True","artifact_name": \"""" + s1_descending + """\"
    }"""
list_args =  ["main.py",string_dict_data_des]

In [None]:
run = function_data.run(action="job",
    secrets=["CDSETOOL_ESA_USER","CDSETOOL_ESA_PASSWORD"],
    fs_group="8877",
    args=list_args,
    resources={"cpu": "6","mem": "32Gi"},
    envs=[{"name": "TMPDIR", "value": "/app/files"}],
    volumes=[{
        "volume_type": "persistent_volume_claim",
        "name": "volume-land",
        "mount_path": "/app/files",
        "spec": {
            "size": "100Gi"
        }}]
    )

### Environmental Degradation Data.


This section demonstrates the data-fetch process for environmental degradation monitoring (e.g., deforestation, vegetation loss). It explains required inputs, provides a sample payload for Sentinel-2, and lists recommended steps and best practices to obtain consistent baseline and change-detection datasets.

Purpose
- Acquire time-series optical imagery to compute vegetation and burn/change indices (NDVI, NBR, BSI) for baseline and monitoring windows.
- Produce artifacts per time window (e.g., baseline, disturbance, post-disturbance) to support analysis (change detection, classification, trend analysis).

In [None]:
#Example payload (stringified JSON)
{
    "satelliteParams": {
        "satelliteType": "Sentinel2",
        "processingLevel": "S2MSI2A",
        "bandmath": ["NDVI","NBR","BSI"]
    },
    "startDate": "2018-06-01",
    "endDate": "2018-08-31",
    "geometry": "POLYGON ((...))",
    "cloudCover": "[0,20]",
    "area_sampling": "True",
    "tmp_path_same_folder_dwl": "True",
    "artifact_name": "sentinel2_envdeg_baseline",
    "preprocess_data_only": "false"
}


Example workflow: For deforestation analysis, temporal Sentinel‑2 Level‑2A imagery spanning one to two years is required to build monthly time‑series (e.g., NDVI, BSI), enable trend and seasonality modeling, and detect change events using methods like BFAST. The temporal dataset is prepared and logged as artifact (data_s2_deforestation) inside to the project context. Accordinly, the section below demontrates a run of the 'download-sentinel-data' function.

Fetch the "download-sentinel-data" operation in the project. 

In [None]:
function_data = proj.get_function("download-sentinel-data")

In [None]:
string_dict_data = """{
 "satelliteParams":{
     "satelliteType": "Sentinel2"
 },
 "startDate": "2018-01-01",
 "endDate": "2018-12-31",
 "geometry": "POLYGON((10.968432350469937 46.093829019481056,10.968432350469937 46.09650743619973, 10.97504139531014 46.09650743619973,10.97504139531014 46.093829019481056, 10.968432350469937 46.093829019481056))",
 "area_sampling": "true",
 "cloudCover": "[0,5]",
 "artifact_name": "data_s2_deforestation"
 }"""

list_args =  ["main.py",string_dict_data]

In [None]:
run = function_data.run(action="job",
    secrets=["CDSETOOL_ESA_USER","CDSETOOL_ESA_PASSWORD"],
    fs_group="8877",
    args=list_args,
    resources={"cpu": "6","mem": "32Gi"},
    envs=[{"name": "TMPDIR", "value": "/app/files"}],
    volumes=[{
        "volume_type": "persistent_volume_claim",
        "name": "volume-deforestation",
        "mount_path": "/app/files",
        "spec": {
            "size": "50Gi"
        }}]
    )