# How to prepare data for download

To prepare the geological data, it is required to log the data in the project context

## 1. Initialize the project

Create the working context: data download project if not created alreay. Project is a placeholder for the code, data, and management of the data operations.

In [None]:
import digitalhub as dh
PROJECT_NAME = "<YOUR_PROJECT_NAME>"
proj = dh.get_or_create_project(PROJECT_NAME)

## 2. Preparation guidelines and examples

This section describes required fields, recommended practices, and example payloads (stringified JSON) to prepare Sentinel data downloads. Use these payloads as the `string_dict_data` argument when calling the download function in this notebook.

Key rules
- Dates: use ISO format `YYYY-MM-DD`. Ensure `startDate <= endDate`.
- Geometry: provide WKT (e.g. `POLYGON(...)`) or GeoJSON geometry. Keep geometries reasonably simplified for large areas.
- cloudCover: specify an interval (e.g. `[0,5]`) meaning 0–5%. Use numeric values or stringified array depending on caller expectations.
- satelliteParams.satelliteType: common values `Sentinel2`, `Sentinel1`, `Landsat`.
- satelliteParams.bandmath: list of indices or formulas, e.g. `["NDWI"]`, `["NDVI"]`, or custom band expressions.
- area_sampling vs tile-based: `area_sampling: true` for sampling within polygon; use `false` or omit for tile-based downloads.
- artifact_name: unique name for the produced artifact (include sensor and date range).

Operational tips
- For large AOIs split jobs by tiles or temporal ranges to avoid timeouts or huge volumes.
- For time-series jobs use monthly or seasonal batches.
- If you need SAR data (Sentinel-1) include polarization and processing parameters in `satelliteParams`.
- Provide required secrets (e.g. ESA credentials) and sufficient resources (cpu/mem) in the run invocation.

 In the following sections, one can find usage examples for different kind of data.

### 1- Flood data prepartion and download.

Fetch the 'download-sentinel-data' operation in the project. 

In [None]:
function_data = proj.get_function('download-sentinel-data')

For flood analysis we process four datasets: Sentinel‑2 imagery for the 20‑day pre‑ and post‑event windows, and Sentinel‑1 SAR for the 7‑day pre‑ and post‑event windows. Accordingly, the section below demonstrates four separate runs of the 'download-sentinel-data' function — one per sensor/time window.

#### Post flood Sentinel2 data +20days

The parameters passed for sentinel downloads includes the starts and ends dates corresponding flood event. The ouput of this step will be logged inside to the platfrom project context as indicated by parameter 'artifact_name' ('sentinel2_post_flood').Several other paramters can be configures as per requirements for e.g. geometry, cloud cover percentage etc

In [None]:
string_dict_data = """{
 "satelliteParams":{
    "satelliteType": "Sentinel2",
    "processingLevel": "S2MSI2A",
	"bandmath": ["NDWI"]
 },
 "startDate": "2020-10-02",
 "endDate": "2020-10-22",
 "geometry": "POLYGON ((10.644988646837982 45.85539621678084, 10.644988646837982 46.06780100571985, 10.991744628283294 46.06780100571985, 10.991744628283294 45.85539621678084, 10.644988646837982 45.85539621678084))",
 "cloudCover": "[0,20]",
 "area_sampling": "True",
 "artifact_name": "sentinel2_post_flood",
 "preprocess_data_only": "false"
 }"""

list_args =  ["main.py",string_dict_data]

Run the function. As a result the post flood sentinel-2 data is logged as project artifact('sentinel2_post_flood')

In [None]:
run = function_data.run(action="job",
        secrets=["CDSETOOL_ESA_USER","CDSETOOL_ESA_PASSWORD"],
        fs_group='8877',
        args=list_args,
        resources={"mem":{"requests": "32Gi", "limits": "64Gi"}},
        volumes=[{
            "volume_type": "persistent_volume_claim",
            "name": "volume-flood",
            "mount_path": "/app/files",
            "spec": {
                "size": "100Gi"
            }}])

#### Pre flood Sentinel2 data -20 days

In [None]:
string_dict_data = """{
     "satelliteParams":{
        "satelliteType": "Sentinel2",
        "processingLevel": "S2MSI2A",
    	"bandmath": ["NDWI"]
     },
     "startDate": "2020-09-12",
     "endDate": "2020-10-02",
     "geometry": "POLYGON ((10.644988646837982 45.85539621678084, 10.644988646837982 46.06780100571985, 10.991744628283294 46.06780100571985, 10.991744628283294 45.85539621678084, 10.644988646837982 45.85539621678084))",
     "cloudCover": "[0,20]",
     "area_sampling": "True",
     "artifact_name": "sentinel2_pre_flood",
     "preprocess_data_only": "false"
     }"""

list_args =  ["main.py",string_dict_data]

Run the function again. As a result the pre flood sentinel-2 data is logged as project artifact('sentinel2_post_flood')

In [None]:
run = function_data.run(action="job",
        secrets=["CDSETOOL_ESA_USER","CDSETOOL_ESA_PASSWORD"],
        fs_group='8877',
        args=list_args,
        resources={"mem":{"requests": "32Gi", "limits": "64Gi"}},
        volumes=[{
            "volume_type": "persistent_volume_claim",
            "name": "volume-flood",
            "mount_path": "/app/files",
            "spec": {
                "size": "100Gi"
            }}])

### Post flood Sentinel1 data +7days

The parameters passed for sentinel-1 downloads includes the starts and ends dates corresponding to period of 7 days from flood event date. The ouput of this step will be logged inside to the platfrom project context as indicated by parameter 'artifact_name' ('sentinel1_GRD_postflood').Several other paramters can be configures as per requirements for e.g. geometry, cloud cover percentage etc.

In [None]:
string_dict_data = """{
  "satelliteParams": {
          "satelliteType": "Sentinel1",
          "processingLevel": "LEVEL1",
          "sensorMode": "IW",
          "productType": "GRD"
      },
      'startDate': '2020-10-02',
      'endDate': '2020-10-09',
      'geometry': 'POLYGON ((10.644988646837982 45.85539621678084, 10.644988646837982 46.06780100571985, 10.991744628283294 46.06780100571985, 10.991744628283294 45.85539621678084, 10.644988646837982 45.85539621678084))',
      'area_sampling': 'True',
      'tmp_path_same_folder_dwl':'True',
      'artifact_name': 'sentinel1_GRD_postflood'
  }"""
list_args =  ["main.py",string_dict_data]

Run the function. As a result the post flood sentinel-2 data is logged as project artifact('sentinel2_post_flood')

In [None]:
run = function_data.run(action="job",
        secrets=["CDSETOOL_ESA_USER","CDSETOOL_ESA_PASSWORD"],
        fs_group='8877',
        args=list_args,
        volumes=[{
            "volume_type": "persistent_volume_claim",
            "name": "volume-flood",
            "mount_path": "/app/files",
            "spec": {
                "size": "100Gi"
            }}])

#### Pre flood Sentinel1 data -7days
Similary download the sentine-1 data pre flood event.

In [None]:
string_dict_data = """{
  "satelliteParams": {
          "satelliteType": "Sentinel1",
          "processingLevel": "LEVEL1",
          "sensorMode": "IW",
          "productType": "GRD"
      },
      'startDate': '2020-09-25',
      'endDate': '2020-10-02',
      'geometry': 'POLYGON ((10.644988646837982 45.85539621678084, 10.644988646837982 46.06780100571985, 10.991744628283294 46.06780100571985, 10.991744628283294 45.85539621678084, 10.644988646837982 45.85539621678084))',
      'area_sampling': 'True',
      'tmp_path_same_folder_dwl':'True',
      'artifact_name': 'sentinel1_GRD_preflood'
  }"""

# s3 path is not mandatory

list_args =  ["main.py",string_dict_data]

In [None]:
run = function_data.run(action="job",
        secrets=["CDSETOOL_ESA_USER","CDSETOOL_ESA_PASSWORD"],
        fs_group='8877',
        args=list_args,
        volumes=[{
            "volume_type": "persistent_volume_claim",
            "name": "volume-flood",
            "mount_path": "/app/files",
            "spec": {
                "size": "100Gi"
            }}])

### Landslide data preparation example

In [None]:
Todo

### Deforestation data prepartion example

In [None]:
Todo