# Discovering Earth Observation Data with the EO-MQS

The EO-MQS service is hosted within the C-SCALE federated cloud infrastructure and provides a unified way of discovering Copernicus data available within the federation by making use of the SpatioTemporal Asset Catalog (STAC) specification. The purpose of this notebook is to prvovide a concise introduction on how to use open-source Python libraries to search for geospatial data exposed by the EO-MQS STAC API.

## Prerequisites

In this example, we are going to make use of a popular STAC client for Python, the `pystac-client`. The library is already installed in this environment, but can be manually installed anywhere else via `pip install pystac-client`. 
Alternatively, common Python libraries like the `requests` library which support working with HTTP APIs are of course also well suited.

To get started, we need to import the `Client` class to connect to the EO-MQS which exposes its STAC API under `https://eo-mqs.c-scale.eu/stac/v1`.

In [69]:
try:
    from pystac_client import Client
except ImportError:
    %pip install pystac-client
    from pystac_client import Client

client = Client.open("https://eo-mqs.c-scale.eu/stac/v1")


In [70]:
client.title

'C-SCALE Earth Observation Metadata Query Service (EO-MQS)'

## CollectionClient

The client can be used to iterate through the Collections available in the EO-MQS Catalog. 

The `get_collections` method fetches the collections from the `/collections` endpoint and returns an iterable. To load a particular collection for further use we call the `get_collection(<collection_id>)` method below.

In [71]:
for collection in client.get_collections():
    print(collection)

<CollectionClient id=NCG-INGRID-PT|SENTINEL2_L1C_INCD>
<CollectionClient id=EODC|SENTINEL2_L2A>
<CollectionClient id=EODC|SENTINEL2_GRI_L1C>
<CollectionClient id=EODC|GFM>
<CollectionClient id=EODC|SENTINEL1_HPAR>
<CollectionClient id=EODC|DOP_AUT_K_KLAGENFURT>
<CollectionClient id=EODC|DOP_AUT_K_OSTTIROL>
<CollectionClient id=EODC|DOP_AUT_K_TAMSWEG>
<CollectionClient id=EODC|DOP_AUT_K_VILLACH>
<CollectionClient id=EODC|DOP_AUT_K_WOLFSBERG>
<CollectionClient id=EODC|DOP_AUT_K_ZELL_AM_SEE>
<CollectionClient id=EODC|DOP_AUT_K_ZELTWEG>
<CollectionClient id=EODC|COP_DEM>
<CollectionClient id=EODC|SENTINEL1_SLC>
<CollectionClient id=EODC|SENTINEL1_MPLIA>
<CollectionClient id=EODC|SENTINEL1_SIG0_20M>
<CollectionClient id=EODC|AI4SAR_SIG0>
<CollectionClient id=EODC|SENTINEL1_GRD>
<CollectionClient id=EODC|SENTINEL2_L1C>
<CollectionClient id=EODC|SENTINEL3_SRAL_L2>
<CollectionClient id=EODC|SENTINEL1_GRD_COVERAGE>
<CollectionClient id=EODC|INTRA_FIELD_CROP_GROWTH_POTENTIAL>
<CollectionClient i

On static as well as dynamic catalogues we cann also make use of the `links` attributes which lets us quickly examinate, for instance, the number of available collections.

In [72]:
child_links = client.get_links('child')
print(f"The EO-MQS currently features {len(child_links)} collections.")

The EO-MQS currently features 235 collections.


In [73]:
collection = client.get_collection("EODC|SENTINEL2_L1C")
collection

There are many ways to access the collection metadata programmatically. 

In [74]:
print(f"This collection contains data in the following temporal inteval: {collection.extent.temporal.to_dict()}")

This collection contains data in the following temporal inteval: {'interval': [['2015-07-04T00:00:00Z', None]]}


In [75]:
# To verify this extent, we can calculate the actual limits like this:
# collection.update_extent_from_items()

In [76]:
# collection.extent.temporal.to_dict()

In [77]:
# Check which STAC Extensions are used by the collection
collection.stac_extensions

['https://stac-extensions.github.io/sat/v1.0.0/schema.json',
 'https://stac-extensions.github.io/eo/v1.0.0/schema.json',
 'https://stac-extensions.github.io/projection/v1.1.0/schema.json',
 'https://stac-extensions.github.io/alternate-assets/v1.1.0/schema.json',
 'https://stac-extensions.github.io/item-assets/v1.0.0/schema.json',
 'https://stac-extensions.github.io/datacube/v2.0.0/schema.json',
 'https://stac-extensions.github.io/timestamps/v1.0.0/schema.json',
 'https://stac-extensions.github.io/processing/v1.1.0/schema.json']

## STAC Items

Simlarly to before, we can use the collection client instance to iterate over the items contained in the collection. The server must provide the `/collections/<collection_id>/items` endpoint to support this feature automatically. This can be useful to manually filter items or extract information programmatically. The `get_all_items()` method again returns an iterator.

In [78]:
items = collection.get_all_items()

In [79]:
# Load 10 items with cloud cover less than 10%
items10 = []
for n, item in enumerate(items):
    if len(items10) == 10:
        break
    cloud_cover = item.properties.get("eo:cloud_cover")
    if cloud_cover < 10:
        print(f"Append item {item.id} with {cloud_cover:.2f}% cloud cover")
        items10.append(item)

Append item S2B_MSIL1C_20240301T124939_R138_T30WWE_20240301T134634 with 0.73% cloud cover
Append item S2B_MSIL1C_20240301T124939_R138_T30WVD_20240301T134634 with 9.63% cloud cover
Append item S2A_MSIL1C_20240301T123851_R066_T23ENN_20240301T140054 with 1.00% cloud cover
Append item S2A_MSIL1C_20240301T123851_R066_T23EMP_20240301T140054 with 5.11% cloud cover
Append item S2A_MSIL1C_20240301T123851_R066_T23EMN_20240301T140054 with 4.77% cloud cover
Append item S2A_MSIL1C_20240301T123851_R066_T23ELP_20240301T140054 with 2.23% cloud cover
Append item S2A_MSIL1C_20240301T123851_R066_T23ELN_20240301T140054 with 0.09% cloud cover
Append item S2A_MSIL1C_20240301T123851_R066_T22EFU_20240301T140054 with 6.46% cloud cover
Append item S2A_MSIL1C_20240301T123851_R066_T22EFT_20240301T140054 with 0.00% cloud cover
Append item S2A_MSIL1C_20240301T123851_R066_T22EET_20240301T140054 with 0.00% cloud cover


In [80]:
items10[-1]

In [81]:
# If the item provides a previeww image we can look at it in here using the following code
from IPython.display import Image

Image(url=items10[7].assets["thumbnail"].href, width=500)

## Item Search

Data providers that have realized their STAC implementation in terms of a dynamic STAC API offer users the opportunity to search their Catalogs using spatial and temporal constraints. The `pystac_client` enables this search via the class method `search`. This function returns an ItemSearch instance that can further be accessed to retrieve matched items.

Note that in its current implementation, the EO-MQS supports the *core* STAC search endpoint paramters as described in the [STAC API - Item Search](https://github.com/radiantearth/stac-api-spec/tree/master/item-search#query-parameter-table) specification. Those are:
- limit
- bbox
- datetime
- intersects
- ids
- collections


### Example 1: Search for Sentinel-1 GRD data over Austria (bbox) in 2022

This first example makes use of the `bbox`, `datetime` and the `collections` parameters. Learn about the correct formatting of these values on the STAC Spec GitHub page or by looking at the [pystac-client docs](https://pystac-client.readthedocs.io/en/latest/api.html#item-search).

In [82]:
# We can iterate and grep the collections automatically or look it up in the browser or API
s1_collections = []
for collection in client.get_collections():
    if "grd" in collection.id.lower() :
        print(f"Append collection {collection.id} to list of Sentinel-1 collections.")
        s1_collections.append(collection.id)

# manually add collections as requried
s1_collections.append("CREODIAS|SENTINEL-1")

Append collection EODC|SENTINEL1_GRD to list of Sentinel-1 collections.
Append collection EODC|SENTINEL1_GRD_COVERAGE to list of Sentinel-1 collections.
Append collection CollGS_CZ|sentinel-1-grd to list of Sentinel-1 collections.
Append collection VITO|urn:eop:VITO:CGS_S1_GRD_L1 to list of Sentinel-1 collections.
Append collection VITO|urn:eop:VITO:CGS_S1_GRD_SIGMA0_L1 to list of Sentinel-1 collections.


If you do not have bbox coordinates at hand, you can quickly create your region of interest at [geojson.io](https://geojson.io).

In [83]:
bbox_aut = [9.25, 46.31, 17.46, 49.18]
time_period = "2022-01-01/2022-10-31"
limit = 20 # limit the number of items to be returned (per data provider)

In [84]:
# put together the search dictionary
search1 = {"collections": s1_collections,
           "bbox": bbox_aut,
           "datetime": time_period,
           "limit": limit}    

In [85]:
results1 = client.search(**search1)

In [86]:
items = results1.item_collection()

print(f"We found {len(items)} matching items.")

We found 20 matching items.


#### Tips on how to increase match rate

**NOTE:** Depending on the backend implementation, the `collections` parameter might have restrictions on allowed values. To make sure all possible items are fetched, consider iterating over the collections in our list and issue separate requests.

In [87]:
items_list = []
for s1_collection in s1_collections:
    results = client.search(collections=[s1_collection], 
                            bbox=bbox_aut, 
                            datetime=time_period, 
                            limit=limit)
    try:
        items_list.extend(results.item_collection())
    except:
        print(f"Search for items with collection id {s1_collection} failed or no items found.")


Search for items with collection id EODC|SENTINEL1_GRD_COVERAGE failed or no items found.
Search for items with collection id CollGS_CZ|sentinel-1-grd failed or no items found.


In [88]:
print(f"Now, we found {len(items_list)} matching items.")

Now, we found 80 matching items.


**NOTE:** By default, `pystac_client.search` will choose the HTTP method *POST* when making requests to the STAC API. Some data providers will only allow *GET* requests! Potentially available datasets might therefore remain undetected. *Hint: specify the method explicitly.*

In [89]:
items_list = []
for s1_collection in s1_collections:
    results = client.search(collections=[s1_collection], 
                            bbox=bbox_aut, 
                            datetime=time_period, 
                            limit=limit,
                            method="GET")
    try:
        items_list.extend(results.item_collection())
    except:
        print(f"Search for items with collection id {s1_collection} failed.")


Search for items with collection id EODC|SENTINEL1_GRD_COVERAGE failed.


In [90]:
print(f"Now, we found {len(items_list)} matching items.")

Now, we found 100 matching items.


In [91]:
print(items_list[0].id)
Image(url=items_list[0].assets["thumbnail"].href, width=500)

S1A_IW_GRDH_1SDV_20221031T170811_20221031T170836_045689_0576D7


### Example 2: Search for Sentinel-2 data intersecting a GeoJSON object

The second example makes use of the `intersects` and the `collections` parameters. Note that you cannot specify both `bbox` and `intersects`, this will result in an error.

We make use of a geojson file from EODC'S OGC Features API to find data over Spain and Portgual.

In [92]:
from owslib.ogcapi.features import Features
try:
    import geopandas as gpd
except ImportError:
    %pip install geopandas
    import geopandas 
from shapely import to_geojson

eodc_ogc = 'https://features.services.eodc.eu/'
eodc_ogc_features = Features(eodc_ogc)

cql_filter = "iso3='ESP' OR iso3='PRT'"

field_items = eodc_ogc_features.collection_items("world_administrative_boundaries", filter=cql_filter, limit=100)

europe = gpd.GeoDataFrame.from_features(field_items["features"], crs="EPSG:4326")

geom = to_geojson(europe.geometry.unary_union)

In [93]:
geom

'{"type":"MultiPolygon","coordinates":[[[[-18.156109999999956,27.705280000000073],[-18.169859999999915,27.735760000000084],[-18.167499999999905,27.753610000000037],[-18.16110999999995,27.76194000000004],[-18.146109999999908,27.76944000000009],[-18.132229999999936,27.77264000000008],[-18.11319999999995,27.76194000000004],[-18.06110999999993,27.755970000000048],[-18.039999999999907,27.76222000000007],[-18.015279999999905,27.790550000000053],[-18.001039999999932,27.816110000000037],[-17.931109999999933,27.848610000000065],[-17.907499999999914,27.848610000000065],[-17.898889999999938,27.842500000000086],[-17.890559999999937,27.82944000000009],[-17.883679999999913,27.81687000000005],[-17.883479999999906,27.79722000000004],[-17.903469999999913,27.780550000000062],[-17.91124999999994,27.773750000000064],[-17.963889999999935,27.682360000000074],[-17.982779999999934,27.637500000000045],[-18.014449999999954,27.649440000000084],[-18.156109999999956,27.705280000000073]]],[[[-17.29860999999994,28.0

In [94]:
s2_collections = []
for collection in client.get_collections():
    if "l1c" in collection.id.lower() :
        print(f"Append collection {collection.id} to list of Sentinel-2 L1C collections.")
        s2_collections.append(collection.id)


Append collection NCG-INGRID-PT|SENTINEL2_L1C_INCD to list of Sentinel-2 L1C collections.
Append collection EODC|SENTINEL2_GRI_L1C to list of Sentinel-2 L1C collections.
Append collection EODC|SENTINEL2_L1C to list of Sentinel-2 L1C collections.
Append collection EODC|SENTINEL2_L1C_COVERAGE to list of Sentinel-2 L1C collections.
Append collection CollGS_CZ|sentinel-2-l1c to list of Sentinel-2 L1C collections.
Append collection CollGS_CZ|sentinel-2-l1c-2023 to list of Sentinel-2 L1C collections.
Append collection VITO|urn:eop:VITO:CGS_S2_L1C to list of Sentinel-2 L1C collections.
Append collection VITO|urn:eop:VITO:PROBAV_L1C_HDF_V2 to list of Sentinel-2 L1C collections.


We pick one that supports POST requests, for instance *EODC|SENTINEL2_L1C*.

In [95]:
search2 = {"collections": ["EODC|SENTINEL2_L1C"],
           "intersects": geom,
           "limit": limit,
           "method": "POST"}    

In [96]:
results2 = client.search(**search2)

In [97]:
items = results2.item_collection()

print(f"We found {len(items)} matching items.")

We found 20 matching items.


In [98]:
items[0]

NOTE: You can always visualize STAC data (collections, items, etc.) in external tools like the STAC Browser, for instance do the following:

In [99]:
print(f"Look at this item in the STAC Browser: https://radiantearth.github.io/stac-browser/#/external/{items[0].get_self_href()}")

Look at this item in the STAC Browser: https://radiantearth.github.io/stac-browser/#/external/https://eo-mqs.c-scale.eu/stac/v1/collections/EODC|SENTINEL2_L1C/items/S2B_MSIL1C_20240301T110859_R137_T30TWP_20240301T131141


## Example 3: Use non-default search parameters (experimental!)

As mentioned, the EO-MQS officially does not support parameters that are not part of the STAC API core specifications. However, when realizing a STAC implementation at a data provider's site, we usually make use of open-source libraries that are constantly being developed and improved. An example for such an improvement is the addition of search paramters like `filter`, `sortby` or `fields`.

This section hints at what can be done using these additional parameters.

In [100]:
# let's re-use a sentinel-2 collection from before
search3 = {"collections": s2_collections[2], #'EODC|SENTINEL2_L1C'
           "limit": limit,
           "method": "GET"}    

In [101]:
# sort the results in a descending manner based on the datetime property
sortby = "-properties.datetime"
search_sort = search3
search_sort["sortby"] = sortby

In [102]:
results3 = client.search(**search_sort)
items = results3.item_collection()

In [103]:
print(f"First item {items[0]} has datetime {items[0].get_datetime()}")
print(f"Lastt item {items[-1]} has datetime {items[-1].get_datetime()}")

First item <Item id=S2A_MSIL1C_20240303T055741_R091_T48XWH_20240303T061256> has datetime 2024-03-03 05:57:41.024000+00:00
Lastt item <Item id=S2A_MSIL1C_20240303T055741_R091_T47WMT_20240303T061256> has datetime 2024-03-03 05:57:41.024000+00:00


In [104]:
# filter based on specific property, e.g. cloud cover
filterby = {
  "filter": {
    "op": "and",
    "args": [
      {
        "op": "<",
        "args": [
          {
            "property": "eo:cloud_cover"
          },
          10
        ]
      }
    ]
  }
}
search_filter = search3
search_filter["sortby"] = "-properties.eo:cloud_cover"
search_filter["filter"] = filterby
search_filter["method"] = "POST"
search_filter["limit"] = 100

In [105]:
results3 = client.search(**search_filter)
items = results3.item_collection()

In [106]:
print(f"First item {items[0]} has cloud cover {items[0].properties.get('eo:cloud_cover')}%")
print(f"Last item {items[-1]} has cloud cover {items[-1].properties.get('eo:cloud_cover')}%")


First item <Item id=S2B_MSIL1C_20240301T000749_R130_T49CDQ_20240301T022654> has cloud cover 9.9916444890407%
Last item <Item id=S2B_MSIL1C_20240229T154309_R125_T08CMU_20240229T213657> has cloud cover 9.01783044065629%
