# STAC to `zarr`: How to access information

### Introduction

In this tutorial we will demonstrate how to discover, access, and analyse Earth Observation data using the [EOPF Sentinel Zarr Sample Service STAC Catalog](https://stac.browser.user.eopf.eodc.eu/?.language=en) and EOPF `zarr` datasets. 
We will show a step on step guide perfect for beginners in Earth observation data processing.

### What we will learn

- ☁️ How to open cloud-optimised datasets through EOPF STAC Catalog
- 🏗️ Understand EOPF Zarr organisation with visualisations
- 🔎 Common techniques for examining datasets
- 📊 Perform simple data analysis examples

### Prerequisites

This tutorial requires the `xarray-eopf` extension for data manipulation. To find out more about the library, access the [documentation](https://eopf-sample-service.github.io/xarray-eopf/).

The [Searching the EOPF Sentinel Zarr Samples Service STAC API]() tutorial gives an introduction to the workflow for accessing the STAC collection we are interested in.

<hr>

#### Import libraries

In [1]:
import requests
from typing import List, Optional, cast
from pystac import Collection, MediaType
from pystac_client import Client, CollectionClient
from datetime import datetime
import xarray as xr

# xr.set_options(display_expand_attrs=False)

#### Helper functions

##### `list_found_elements`
As we are expecting to visualise several elements that will be stored in lists, we define a function that will allow us retrieve item `id`'s and collections `id`'s for further retrieval.

In [2]:
def list_found_elements(search_result):
    id = []
    coll = []
    for item in search_result.items(): #retrieves the result inside the catalogue.
        id.append(item.id)
        coll.append(item.collection_id)
    return id , coll

## Establish the connection

Our first step is to create our connection to interact with the EOPF STAC Catalog.<br>
This involves defining the starting point for the data we wish to retrieve.<br>

The API's base URL is available through the 🔗**Source** ([click here](https://stac.core.eopf.eodc.eu/)), which can be found in the **API & URL** tab of the [EOPF Sentinel Zarr Sample Service STAC Catalog](https://stac.browser.user.eopf.eodc.eu/?.language=en).

![EOPF API url for connection](img/api_connection.png)

Through `Client.open()` function, we can create the access to the starting point of the Catalogue by providing the specific url.

In [3]:
max_description_length = 100

eopf_stac_api_root_endpoint = "https://stac.core.eopf.eodc.eu/" #root starting point
eopf_catalog = Client.open(url=eopf_stac_api_root_endpoint)

Rectifying the catalog we have just accessed:

In [4]:
print(
    "Connected to Catalog {id}: {description}".format(
        id=eopf_catalog.id,
        description=eopf_catalog.description
        if len(eopf_catalog.description) <= max_description_length
        else eopf_catalog.description[: max_description_length - 3] + "...",
    )
)

For this tutorial, we will focus on the Sentinel-2 L2A Collection. The EOPF STAC Catalog corresponding id is: `sentinel-2-l2a`.

As we are interested in retrieving and exploring an Item from the collection, we will focus again over the Innsbruck area we have defined in the [previous tutorial]().

In [5]:
innsbruck_s2 = eopf_catalog.search( 
    collections= 'sentinel-2-l2a', # interest Collection,
    bbox=(11.124756, 47.311058, # AOI extent
          11.459839,47.463624),
    datetime='2020-05-01T00:00:00Z/2025-05-31T23:59:59.999999Z' # interest period
)

combined_ins =list_found_elements(innsbruck_s2)

print("Search Results:")
print('Total Items Found for Sentinel-2 L-2A over Innsbruck:  ',len(combined_ins[0]))

In [6]:
first_item_id=combined_ins[0][0]

Wit

In [7]:
first_item_id

In [8]:
c_sentinel2 = eopf_catalog.get_collection('sentinel-2-l2a')
# c_sentinel2_urls=[]
# for x in range(len(combined_ins[0])):
#     c_sentinel2_urls.append(c_sentinel2.get_item(combined_ins[0][x]).self_href)

In [9]:
# innsbruck_items=[] # a list to store the assets information
# for x in range(len(combined_ins[0])): # We retrieve the available Items over Innsbruck
#     innsbruck_items.append(c_sentinel2 # we set into the Sentinel-2 L-2A collection
#                       .get_item(combined_ins[0][x])  # We only get the Innsbruck filtered items
#                       .get_assets(media_type=MediaType.ZARR)) # we obtain the .zarr location
    
# first_item = innsbruck_items[0]   # we select the first item from our list

In [10]:
# len(combined_ins[0])

In [11]:
#Choosing the first item available to be opened:
item= c_sentinel2.get_item(id=first_item_id) 

In [12]:
try:
    # Open the dataset using EOPF-specific engine
    datatree = xr.open_datatree(
        item.href,
        engine="eopf-zarr",  # EOPF-specific Zarr engine
        op_mode="native",    # Native operation mode
        chunks={}            # Let xarray handle chunking
    )
    
    print(f"\n✅ Dataset opened successfully!")
    print(f"📊 DataTree type: {type(datatree)}")
    print(f"🌳 Number of groups: {len(datatree.groups)}")
    
except Exception as e:
    print(f"❌ Error opening dataset: {e}")
    print("This might be due to network issues or dataset access permissions.")
    raise

In [13]:
for asset_name, asset in sorted(
    item.get_assets(media_type=MediaType.ZARR).items(), key=lambda item: item[1].href
):
    roles = asset.roles or []
    print(
        "Zarr asset {group_path} ({title}) has roles {roles}".format(
            group_path="".join(asset.href.split(".zarr")[-1:]) or "/",
            title=asset.title,
            roles=roles,
        )
    )
    # Identify the top-level Zarr group asset. This is what we will access with xarray.
    if "data" in roles and "metadata" in roles:
        top_level_zarr_group_asset = asset

In [14]:
print(item)

In [15]:
assert (
    top_level_zarr_group_asset is not None
), "Unable to find top-level Zarr group asset"
print(
    "Asset {name} is the top-level Zarr group asset".format(
        name=top_level_zarr_group_asset.title
    )
)

In [16]:
# dt = xr.open_datatree(top_level_zarr_group_asset.href, **top_level_zarr_group_asset.extra_fields["xarray:open_datatree_kwargs"])
# dt = xr.open_datatree(
#     top_level_zarr_group_asset.href,
#     **top_level_zarr_group_asset.extra_fields["xarray:open_datatree_kwargs"]
# )
# dt = xr.open_datatree(top_level_zarr_group_asset.href, engine="eopf-zarr", op_mode="native", chunks={})
# dt

dt = xr.open_datatree(
    top_level_zarr_group_asset.href, engine="eopf-zarr", op_mode="native", chunks={})
for dt_group in sorted(dt.groups):
    print("DataTree group {group_name}".format(group_name=dt_group))

In [17]:
top_level_zarr_group_asset.get_absolute_href()

In [18]:
xr.open_datatree()

In [19]:
item

## 💪 Now it is your turn

## Conclusion


## What's next?