# EOPF STAC and Xarray Tutorial

This tutorial demonstrates how to work with Earth Observation Processing Framework (EOPF) data using STAC (SpatioTemporal Asset Catalog) and xarray. You'll learn how to:

1. Connect to the EOPF STAC API
2. Search for satellite data collections
3. Access and open Zarr datasets using xarray
4. Explore the data structure using DataTree

## Prerequisites

This tutorial requires the following Python packages:
- `pystac` and `pystac-client` for STAC operations
- `xarray` for data manipulation
- `requests` for HTTP operations

## What is STAC?

STAC (SpatioTemporal Asset Catalog) is a specification that provides a common language to describe geospatial information, making it easier to work with satellite imagery and other geospatial data.

## 1. Import Required Libraries

First, let's import all the necessary libraries for this tutorial.

In [None]:
import requests
from typing import List, Optional, cast
from pystac import Collection, MediaType
from pystac_client import Client, CollectionClient
from datetime import datetime
import xarray as xr

# Optional: Configure xarray display options
# xr.set_options(display_expand_attrs=False)

## 2. Define Helper Functions

Let's create a utility function to extract information from search results.

In [None]:
def list_found_elements(search_result):
    """
    Extract item IDs and collection IDs from STAC search results.
    
    Parameters:
    -----------
    search_result : pystac_client.ItemSearch
        The search result object from STAC API
    
    Returns:
    --------
    tuple: (list of item IDs, list of collection IDs)
    """
    id = []
    coll = []
    for item in search_result.items():  # retrieves the result inside the catalogue
        id.append(item.id)
        coll.append(item.collection_id)
    return id, coll

## 3. Connect to the EOPF STAC API

Now we'll establish a connection to the EOPF STAC API endpoint.

In [None]:
# Configuration
max_description_length = 100
eopf_stac_api_root_endpoint = "https://stac.core.eopf.eodc.eu/"  # root starting point

# Connect to the STAC catalog
client = Client.open(url=eopf_stac_api_root_endpoint)

print(
    "Connected to Catalog {id}: {description}".format(
        id=client.id,
        description=client.description
        if len(client.description) <= max_description_length
        else client.description[: max_description_length - 3] + "...",
    )
)

## 4. Search for Sentinel-2 Data

Let's search for Sentinel-2 Level 2A data over Innsbruck, Austria for a specific time period.

### Search Parameters:
- **Bounding Box**: Innsbruck, Austria area (11.124756, 47.311058, 11.459839, 47.463624)
- **Collection**: Sentinel-2 L2A (surface reflectance data)
- **Time Range**: May 1, 2020 to May 31, 2025

In [None]:
# Define search parameters for Innsbruck area
innsbruck_s2 = client.search(
    bbox=(11.124756, 47.311058,  # AOI extent (Area of Interest)
          11.459839, 47.463624),
    collections=['sentinel-2-l2a'],  # interest Collection
    datetime='2020-05-01T00:00:00Z/2025-05-31T23:59:59.999999Z'  # interest period
)

# Extract search results
new_ins = list_found_elements(innsbruck_s2)

print('Retrieved Sentinel 2 L2A Items between 01-May-2020 and 31-May-2025 close to Innsbruck, Austria: ', len(new_ins[0]))

## 5. Access Collection and Item Details

Now we'll get detailed information about the Sentinel-2 collection and retrieve URLs for the found items.

In [None]:
# Get the Sentinel-2 L2A collection
c_sentinel2 = client.get_collection('sentinel-2-l2a')

# Collect URLs for all found items
c_sentinel2_urls = []
for x in range(len(new_ins[0])):
    c_sentinel2_urls.append(c_sentinel2.get_item(new_ins[0][x]).self_href)

# Store item IDs for easy access
c_sentinel2_ids = new_ins[0]
print(f"Found {len(c_sentinel2_ids)} Sentinel-2 items")
print(f"First few item IDs: {c_sentinel2_ids[:3]}")

## 6. Select and Examine a Specific Item

Let's select the first available item and examine its assets, particularly focusing on Zarr format data.

In [None]:
# Choose the first item available to be opened
item = c_sentinel2.get_item(id=c_sentinel2_ids[0])
print(f"Selected item: {item.id}")
print(f"Item datetime: {item.datetime}")

## 7. Explore Zarr Assets

EOPF data is stored in Zarr format, which is optimized for cloud-native access. Let's examine the available Zarr assets and identify the main data group.

In [None]:
# Initialize variable to store the top-level Zarr group asset
top_level_zarr_group_asset = None

# Iterate through Zarr assets
for asset_name, asset in sorted(
    item.get_assets(media_type=MediaType.ZARR).items(), key=lambda item: item[1].href
):
    roles = asset.roles or []
    print(
        "Zarr asset {group_path} ({title}) has roles {roles}".format(
            group_path="".join(asset.href.split(".zarr")[-1:]) or "/",
            title=asset.title,
            roles=roles,
        )
    )
    
    # Identify the top-level Zarr group asset (contains both data and metadata)
    if "data" in roles and "metadata" in roles:
        top_level_zarr_group_asset = asset

# Verify we found the top-level asset
assert (
    top_level_zarr_group_asset is not None
), "Unable to find top-level Zarr group asset"

print(
    "\nAsset '{name}' is the top-level Zarr group asset".format(
        name=top_level_zarr_group_asset.title
    )
)

## 8. Open Data with Xarray DataTree

Now we'll open the Zarr dataset using xarray's DataTree functionality, which is perfect for hierarchical data structures like EOPF datasets.

### About DataTree
DataTree is an xarray extension that handles hierarchical datasets with multiple groups, which is common in Earth observation data where different measurements and metadata are organized in separate groups.

In [None]:
# Open the dataset using xarray DataTree with EOPF-specific engine
dt = xr.open_datatree(
    top_level_zarr_group_asset.href, 
    engine="eopf-zarr",  # EOPF-specific Zarr engine
    op_mode="native",    # Native operation mode
    chunks={}            # Let xarray handle chunking automatically
)

print("Successfully opened DataTree!")
print(f"DataTree type: {type(dt)}")

## 9. Explore DataTree Structure

Let's examine the hierarchical structure of our dataset by listing all available groups.

In [None]:
# List all groups in the DataTree
print("Available DataTree groups:")
print("=" * 30)
for dt_group in sorted(dt.groups):
    print(f"📁 DataTree group: {dt_group}")

print(f"\nTotal number of groups: {len(dt.groups)}")

## 10. Display Asset URL

For reference, let's display the absolute URL of the Zarr asset we're working with.

In [None]:
# Display the absolute URL of the top-level Zarr group asset
asset_url = top_level_zarr_group_asset.get_absolute_href()
print(f"Asset URL: {asset_url}")

## 11. Further Exploration (Optional)

Now that you have successfully opened the EOPF dataset, you can explore its contents further. Here are some suggestions for next steps:

In [None]:
# Example: Explore the root dataset
print("Root dataset info:")
print(dt.ds)

# Example: Access a specific group (uncomment to try)
# if dt.groups:
#     first_group = list(dt.groups)[0]
#     print(f"\nFirst group '{first_group}' contents:")
#     print(dt[first_group].ds)

## Summary

In this tutorial, you learned how to:

1. ✅ **Connect to EOPF STAC API** - Established connection to the Earth Observation data catalog
2. ✅ **Search for satellite data** - Found Sentinel-2 L2A data over Innsbruck, Austria
3. ✅ **Identify Zarr assets** - Located the main data assets in cloud-optimized Zarr format
4. ✅ **Open with xarray DataTree** - Successfully loaded hierarchical satellite data
5. ✅ **Explore data structure** - Examined the organization of the dataset groups

### Key Concepts Covered:
- **STAC (SpatioTemporal Asset Catalog)**: Standard for describing geospatial data
- **EOPF**: Earth Observation Processing Framework for cloud-native data processing
- **Zarr**: Cloud-optimized data format for large arrays
- **xarray DataTree**: Tool for working with hierarchical datasets

### Next Steps:
- Explore individual data groups and variables
- Perform data analysis and visualization
- Apply geospatial operations and transformations
- Integrate with other Earth observation workflows