# Searching the EOPF Sentinel Zarr Samples Service STAC API

### Introduction

In this tutorial, we will dive into the access of Sentinel 1, Sentinel 2 and Sentinel 3 `.zarr` Collections available in the [EOPF Sentinel Zarr Sample Service STAC](https://stac.browser.user.eopf.eodc.eu/?.language=en). <br>
This powerful API provides a structured way to search and access the EOPF Data through Python programming language.

### What we will learn
- 🔍 How to **programmatically browse** available collections inside the EOPF STAC API
- 📊 Understanding **collection metadata** in user-friendly terms
- 🎯 **Searching for specific data** with practical examples

### Prerequisites

The `pystac` and `pystac_client` libraries facilitate the request and deep search inside the STAC environment, enabling an efficient processing.
Check out [pystac documentation](https://pystac.readthedocs.io/en/stable/) and [pystac_client documentation](https://pystac-client.readthedocs.io/en/latest/api.html)  for additional resources.

> **Note:** <br>
> We recommend creating a virtual environment as it helps manage library versions and prevents conflicts with other Python projects on your system. <br>
> Follow [this tutorial](), to create a virtual environment that will allow us to run all the available tutorials in [EOPF-101](https://github.com/eopf-toolkit/eopf-101). 

To ensure a stable and reproducible environment for our project, we start by setting up our dependencies.

<hr>

#### Import libraries

In [None]:
import requests
from typing import List, Optional, cast
from pystac import Collection, MediaType
from pystac_client import Client, CollectionClient
from datetime import datetime

#### Helper functions

##### `list_found_elements`
As we are expecting to visualise several elements that will be stored in lists, we define a function that will allow us retrieve item `id`'s and collections `id`'s for further retrieval.

In [None]:
def list_found_elements(search_result):
    id = []
    coll = []
    for item in search_result.items(): #retrieves the result inside the catalogue.
        id.append(item.id)
        coll.append(item.collection_id)
    return id , coll

<hr>

## API connection

Our first step is to construct our request to interact with the EOPF STAC API.<br>
This involves defining the parameters for the data we wish to retrieve.<br>

The API's base URL is conveniently available through the 🔗**Source** ([click here](https://stac.core.eopf.eodc.eu/)), which can be found in the **API & URL** tab of the [EOPF Sentinel Zarr Sample Service STAC API](https://stac.browser.user.eopf.eodc.eu/?.language=en).

![EOPF API url for connection](img/api_connection.png)

This entry, allows us to access the starting point of the Catalogue.

In [None]:
max_description_length = 100

eopf_stac_api_root_endpoint = "https://stac.core.eopf.eodc.eu/" #root starting point
client = Client.open(url=eopf_stac_api_root_endpoint) # calls the selected url

Rectifying the catalog we have just accessed:

In [None]:
print(
    "Selected Catalog: {id}: {description}".format(
        id=client.id,
        description=client.description
        # if len(client.description) <= max_description_length
        # else client.description[: max_description_length - 3] + "...",
    )
)

It is important to remember that the Sentinel Zarr Sample Service STAC **is still under development** and recieves constant updates and additions to the Collections.<br>
To ensure we have access to available resources, we include some verification of the availability of data inside the catalogue. <br>
This proactive step helps us understand what data is currently accessible.

> **Note:** <br>
> To explore further issues or more considerations check un the [EOPF Sentinel Zarr Samples Service](https://zarr.eopf.copernicus.eu/) updates and their [Github Issues](https://github.com/EOPF-Sample-Service/eopf-stac/issues)

In [None]:
all_collections: Optional[List[Collection]] = None
# The simplest approach to retrieve all collections may fail due to #18 on Github.

try:
    all_collections = [_ for _ in client.get_all_collections()]
    print(
        "Collection :[https://github.com/EOPF-Sample-Service/eopf-stac/issues/18 appears to be resolved]"
    )
except Exception:
    print(
        "* [https://github.com/EOPF-Sample-Service/eopf-stac/issues/18 appears to not be resolved]"
    )


In [None]:
print(all_collections)

We see that one collection is missing to be updated, then we need to clean the search and only work with available resources.

## Available Collections

We can filter the available Collections to have an overview of the instances we can retrieve:

In [None]:
if all_collections is None:
    # If collection retrieval fails due to #18.
    valid_collections: List[Collection] = []
    for collection_href in [link.absolute_href for link in client.get_child_links()]:
        collection_dict = requests.get(url=collection_href).json()
        try:
            # Attempt to retrieve collections individually.
            valid_collections.append(Collection.from_dict(collection_dict))
        except Exception as e:
            if isinstance(e, TypeError) and "not subscriptable" in str(e).lower():
                # This exception is expected for some collections due to #18.
                continue
            else:
                raise e
            
    all_collections = valid_collections


And the available collections that can be explored are:

In [None]:
print(all_collections)

After performing an initial check, we can see the collections, their temporal extent and a description where we can successfully retrieve `zarr` encoded items.

In [None]:
for collection in all_collections: # We will access individually to each collection
    collection_parent = collection.get_parent() # position in one collection
    start_date = collection.extent.temporal.intervals[0][0] # Get the first available date of the items inside the selected collection
    end_date = collection.extent.temporal.intervals[0][1]   # Get the last available date of the items inside the selected collection
    print("Collection {id}".format(id=collection.id))  # Collection id
    print(
        " - Description: {description}".format(        # Summary of the contained information
            description=collection.description
            if len(collection.description) <= max_description_length
            else collection.description[: max_description_length - 3] + "..."
        )
    )
    print(
        " - Temporal Extent: {start_date} to {end_date}".format(
        start_date = start_date.strftime("%Y-%m-%d"),
        end_date = end_date.strftime("%Y-%m-%d")
        )
    )
    

## Searching inside the EOPF STAC API

With the `.search()` parameter inside our `client` (the Catalog) definition, we are able to define a series of parameters that allow us filtering the data that matches the criteria we are interested in.

### Spatial Extent
To narrow down our data search, we can define a specific area of interest.<br>
We are able to do this by providing a bounding box (`bbox`), which is composed by providing the top-left and bottom-right corner coordinates. It is similar to drawing the extent in the interactive map of the EOPF browser interface.<br>

We can focus in a speficic area for our search.<br>
We can define for example, the outskirts of Innsbruck, Austria.

In [None]:
bbox_search = client.search( # searches for the coordinates inside the catalog
    bbox=(
        11.124756, 47.311058, #top left
        11.459839, 47.463624  #bottom-right
        )
)

innsbruck_sets=list_found_elements(bbox_search) #we apply our constructed function that stores internal information

#Results
print("🔍 Search Results:")
print('🛰️ Collections Represented:','(',len(set(innsbruck_sets[1])),')',set(innsbruck_sets[1]))
print('📊 Total Items Found:  ',len(innsbruck_sets[0]))

Based on our search within the defined area of interest (until the most updated version of the tutorial), we can see that out of the initial 11 collections, 9 collections have available data, containing `zarr` encoded items, that intersect the specified coordinates.<br>

This gives us a clear picture of the data density and variety available for our Area of Interest (AOI).

### Temporal Extent

Filtering data by a specific time interval is also incredibly useful. The `datetime` parameter inside the `client` definition (the catalog) allows us to focus on imagery captured within a particular period.<br>
We will define an interval that spans, for example, between May 1, 2020, and May 31, 2023.


In [None]:
time_frame = client.search( # inside the catalog
    datetime="2020-05-01T00:00:00Z/2023-05-31T23:59:59.999999Z")  # the interval we are interested in, separated by '/'

time_items=list_found_elements(time_frame) #we apply our constructed function

#Results
print("🔍 Search Results:")
print('🛰️ Collections Represented','(',len(set(time_items[1])),')',set(time_items[1]))
print('🕐 Total Items Found:  ',len(time_items[0]))


### Combined Search

Now, we can explore how to refine our search even further by combining multiple criteria. <br>
This capability is incredibly powerful for pinpointing precisely the data we need. For instance, we can search for items within a specific time frame and from a particular collection simultaneously.<br>
We will focus our attention on the Sentinel-2 L2A Collection for this tutorial.

#### Colleciton + Temporal extent

We define the `collections` argument by the collection `id` argument contained inside the Catalogue. For multiple paramteres, we define the `datetime` we had previously chose

In [None]:
sentinel2 = client.search(
    collections= ['sentinel-2-l2a'], # the collection we are interesed in
    datetime="2020-05-01T00:00:00Z/2023-05-31T23:59:59.999999Z"
)

multiple_items=list_found_elements(sentinel2) #we apply our constructed function

print("🔍 Search Results:")
print('🕐 Total Items Found for Sentinel-2 L-2A:  ',len(multiple_items[0]))

We need to consider, that this parameters are **exclusive** from the **Sentinel-2 Level-2A** collection, and can be refined in a further step.

#### Colleciton + Temporal + Spatial extents
A usual workflow in EO analysis, considers retrieving datasets within an AOI and a time frame. `pystac` allows us to combine the `collection`, `bbox` and `datetime` arguments for a fine data retrieval.

Defining Innsbruck within the previously deifned timeframe for the **Sentinel-2 Level-2A** collection:

In [None]:
innsbruck_s2 = client.search( 
    collections= ['sentinel-2-l2a'], # interest Collection,
    bbox=(11.124756, 47.311058, # AOI extent
          11.459839,47.463624),
    datetime='2020-05-01T00:00:00Z/2025-05-31T23:59:59.999999Z' # interest period
)

combined_ins =list_found_elements(innsbruck_s2)

print("🔍 Search Results:")
print('🕐 Total Items Found for Sentinel-2 L-2A over Innsbruck:  ',len(combined_ins[0]))

We can try the same workflow for a complete different location. <br>

Lets define a new AOI outside land and the Sentinel-3 SLSTR-L2 collection for the Coast Area of Rostock, Germany:

In [None]:
rostock_s3 = client.search(
    bbox=(11.766357,53.994566, # AOI extent
          12.332153,54.265086),
    collections= ['sentinel-3-slstr-l2-lst'], # interest Collection
    datetime='2020-05-01T00:00:00Z/2025-05-31T23:59:59.999999Z' # interest period
)

combined_ros=list_found_elements(rostock_s3)

print("🔍 Search Results:")
print('🕐 Total Items Found for Sentinel-3 SLSTR-L2 over Rostock Coast:  ',len(combined_ros[0]))

### Location in Catalogue

So far, we have made a search among the STAC catalog and browsed over the general metadata of the collections. To **download** or get the actual `zarr` Items, we need to get their URL storage location in the cloud.<br>
Such element, can be found inside the `.items` object by the `.get_assets` parameter. Inside, it allos to obtain the `.MediaType` element we are interested in. For us `.zarr`.

Defining our search again in Innsbruck:

In [None]:
print("🔍 Search Results:")
print('🕐 Total Items Found for Sentinel-2 L-2A over Innsbruck:  ',len(combined_ins[0]))

In [None]:

# c_sentinel2_urls=[]
# for x in range(len(new_ins[0])):
#     c_sentinel2_urls.append(c_sentinel2.get_item(new_ins[0][x]).self_href) # call out, the defined search at Innsbruck, Austria.

In [None]:
c_sentinel2 = client.get_collection('sentinel-2-l2a')

assets_loc=[]
for x in range(len(combined_ins[0])): # We retrieve only the assets we are interested in a loop
    assets_loc.append(c_sentinel2 # we set into the Sentinel-2 L-2A collection
                      .get_item(combined_ins[0][x])  # We only get the Innsbruck filtered items
                      .get_assets(media_type=MediaType.ZARR)) # we obtain the .zarr location
    
print("🔍 Search Results:")
print('🔗 URLS for the',combined_ins[0][0],'item:  ',assets_loc[0]['product'])

## 💪 Now it is your turn

These exercises will help you master the STAC API and understand how to find the data you need.



#### 1. Explore Your Own Area of Interest

**Your Task**: 
1. Go to [http://bboxfinder.com/](http://bboxfinder.com/) and select an area of interest (your hometown, a research site, etc.)
2. Copy the bounding box coordinates of your interest
3. Change the orvided code to search for data over your interest area

#### 2. Temporal Analysis

**Your Task**: 
1. Compare data availability across different years (2022, 2023, 2024)

##### 3. 