## EO Data Extraction Workflow
This notebook demonstrates a streamlined workflow for extracting and processing Earth Observation (EO) data 
using the **openEO** Python client. 

### Key Steps:
1. Load and align input data (shapefile).
2. Generate UTM-aligned patches for analysis.
3. Split patches into smaller manageable jobs.
4. Run the extraction process using openEO backends.
5. View and analyze the outputs (e.g., NetCDF files).

### Required Libraries:
- `openeo` for interacting with EO backends.
- `openeo-gfmap` for handling geospatial data.

### Step 1: load in the shapefile

In [None]:
import openeo
import geopandas as gpd
from pathlib import Path

# Import functions from the cleaned-up module
from helper.eo_utils import (
    generate_patches_by_crs,
    process_split_jobs,
    create_job_dataframe
)

# Filepath to the input shapefile
file_path = Path(r"C:\Git_projects\WAC\production\resources\Land_use_Roads_tile.shp")

# Load input shapefile as GeoDataFrame
base_df = gpd.read_file(file_path)
print(base_df.head())  # Preview the input data

### Step 2. Generate Patches

Here we create as many UTM-aligned patches as we can, within the provided polygon, based on the following parameters:
- **Patch size**: Size of each patch in pixels.
- **Resolution**: Alignment resolution in meters.

To complete the dataframe required for the job manager we add in the temporal extent as well:
- **Start date and duration**: Temporal extent for data extraction.



In [None]:
# Parameters for processing
patch_size = 64          # Size of patches in pixels
resolution = 10.0         # Alignment resolution in meters
start_date = "2023-01-01" # Temporal extent start date
nb_months = 3             # Number of months for the temporal extent

# Step 1: Generate aligned patches by UTM CRS
dataframes_by_crs = generate_patches_by_crs(
    base_gdf=base_df,
    start_date=start_date,
    duration_months=nb_months,
    patch_size=patch_size,
    resolution=resolution
)

### Split Patches into Jobs
We combine the patches into jobs using an **S2 tile grid system**. Such combination allows us to extract multiple patches within one openEO batch job, thereby reducing the total cost

Parameters include:
- **Max points per job**: Controls the size of each job.
- **H3 resolution**: Sets the grid resolution for spatial division.

In [None]:
max_points = 10           # Maximum points per job for splitting
grid_resolution = 3       # H3 index resolution

# Step 2: Process the patches into split jobs with H3 indexing
split_jobs = process_split_jobs(
    geodataframes=dataframes_by_crs,
    max_points=max_points,
    grid_resolution=grid_resolution
)

### Create Job DataFrame
From the splitted jobs we create a dataframe which we can use for the MultiBackendJobManager

In [None]:
# Step 3: Create a summary DataFrame for the split jobs
job_dataframe = create_job_dataframe(split_jobs)
job_dataframe

For testing; reduce the dataframe

In [None]:
job_dataframe = job_dataframe[0:2]


### Submit Extraction Jobs

Using the openEO backend, we authenticate and submit the jobs to process the EO data. 
Each job extracts Sentinel and climate data for its assigned spatial and temporal parameters.

In [None]:
# Authenticate and connect to openEO backend
connection = openeo.connect(url="openeo.dataspace.copernicus.eu").authenticate_oidc()

# Initialize MultiBackendJobManager
from openeo.extra.job_management import MultiBackendJobManager, ParquetJobDatabase

manager = MultiBackendJobManager()
manager.add_backend("cdse", connection=connection, parallel_jobs=2)

# Initialize or load job tracker
job_tracker = 'job_tracker.parquet'
job_db = ParquetJobDatabase(path=job_tracker)

if not job_db.exists():
    df = manager._normalize_df(job_dataframe)
    job_db.persist(df)

# Submit jobs
from eo_extractors.extractor import wac_extraction_job
manager.run_jobs(start_job=wac_extraction_job, job_db=job_db)