# STAC


When you ingest a dataset to the EOTDL, a `catalog.parquet` file is created with the metadata of the dataset. This metadata is STAC-compliant, so it can be used to query the dataset using the STAC API and generate STAC catalogs.


# STAC Catalogs


As seen before, the following code will ingest a dataset to the EOTDL and create a `catalog.parquet` file with the metadata of the dataset. As this have been seen in the previous notebook, we won't run it again, but feel free to do so!


In [None]:
## Uncomment to run

# from eotdl.datasets import ingest_dataset

# path = "example_data/EuroSAT-small"
# ingest_dataset(path)

During the ingestion process, a `catalog.parquet` file is created with STAC metadata. If your dataset already has STAC metadata (a `catalog.json` file exists at the root of the dataset), the metadata will be parsed and added to the `catalog.parquet` file. Otherwise, the `CLI` will create a STAC-compatible metadata from the directory structure.


In [2]:
import geopandas as gpd

path = "workshop_data/EuroSAT-small"
catalog = f"{path}/catalog.parquet"

gdf = gpd.read_parquet(catalog)
gdf.head()

Unnamed: 0,type,stac_version,stac_extensions,datetime,id,bbox,geometry,assets,links,repository
0,Feature,1.0.0,[],2025-09-28 18:01:32.817591,README.md,"{'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin'...",POLYGON EMPTY,{'asset': {'checksum': 'b320743f60bdc9c45b67e4...,[],eotdl
1,Feature,1.0.0,[],2025-09-28 18:01:32.818327,hello.txt,"{'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin'...",POLYGON EMPTY,{'asset': {'checksum': 'f572d396fae9206628714f...,[],eotdl
2,Feature,1.0.0,[],2025-09-28 18:01:32.818447,Forest/Forest_1.tif,"{'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin'...",POLYGON EMPTY,{'asset': {'checksum': 'f3b8b9fef6b2df6f24792e...,[],eotdl
3,Feature,1.0.0,[],2025-09-28 18:01:32.818601,Forest/Forest_2.tif,"{'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin'...",POLYGON EMPTY,{'asset': {'checksum': '2e38dab64435bfbab25bab...,[],eotdl
4,Feature,1.0.0,[],2025-09-28 18:01:32.818748,Forest/Forest_3.tif,"{'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin'...",POLYGON EMPTY,{'asset': {'checksum': '3e7bb982f9db5f7dabc556...,[],eotdl


Since the metadata generated by the EOTDL is STAC-compliant, it can be used to automatically generate STAC catalogs.


In [3]:
from eotdl.curation.stac import create_stac_catalog

items = create_stac_catalog(catalog)

items

100%|██████████| 8/8 [00:00<00:00, 413.54it/s]


[<Item id=README.md>,
 <Item id=hello.txt>,
 <Item id=Forest/Forest_1.tif>,
 <Item id=Forest/Forest_2.tif>,
 <Item id=Forest/Forest_3.tif>,
 <Item id=AnnualCrop/AnnualCrop_2.tif>,
 <Item id=AnnualCrop/AnnualCrop_3.tif>,
 <Item id=AnnualCrop/AnnualCrop_1.tif>]

Optionally, you can create a STAC catalog / collection and link the items to it.


In [4]:
from eotdl.curation.stac import create_stac_catalog
import pystac

stac_catalog = pystac.Catalog(
    id="bids25-catalog",
    description="BiDS 2025 STAC Catalog created in the EOTDL tutorial",
    title="BiDS 2025 Catalog",
    stac_extensions=[],
    extra_fields={},
)

stac_catalog = create_stac_catalog(catalog, stac_catalog)

stac_catalog

100%|██████████| 8/8 [00:00<00:00, 2027.95it/s]


Either way, once the STAC metadata is generated, can be saved to disk.


In [5]:
stac_catalog.normalize_and_save(
    root_href="data/stac", catalog_type=pystac.CatalogType.SELF_CONTAINED
)

Keep in mind that if the original dataset already has STAC metadata, it will be overwritten.


# STAC API


You can interact with EOTDL via its STAC API, both with the `eotdl` CLI and the Python API.


In [9]:
!uv run eotdl stac --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1meotdl stac [OPTIONS] COMMAND [ARGS]...[0m[1m                                 [0m[1m [0m
[1m                                                                                [0m
 EOTDL STAC API                                                                 
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m          Show this message and exit.                                  [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Commands [0m[2m──────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36mstatus      [0m[1;36m [0m                                                         

We can explore collections from the CLI


In [13]:
!uv run eotdl stac collections

[{'name': 'EuroSAT-small-bids25', 'id': '68d94da70839c97bf32294ec'}, {'name': 'EuroSAT-RGB-bids25', 'id': '68d94d6b0839c97bf32294eb'}, {'name': 'boadella-jsl2025', 'id': '68c7e5f8ca20b5063257cf81'}, {'name': 'EuroSAT-small-jsl2025', 'id': '68c3fbdb0839c97bf32294e6'}, {'name': 'RFInject', 'id': '68b4ab34ca20b5063257cf76'}, {'name': 'Boadella-LPS25', 'id': '6855292a28fd5bfec013f993'}, {'name': 'EuroSAT-small-lps25', 'id': '685523fc28fd5bfec013f992'}, {'name': 'HyperspectralSimForS2-waters', 'id': '684ad47b28fd5bfec013f986'}, {'name': 'SatellogicDataset', 'id': '6841551a25cf895e2fa41a48'}, {'name': 'MassachusettsRoadsS2', 'id': '683ed91a00cf9ffafe807822'}, {'name': 'EuroCropsCloudNative', 'id': '682f2d186a29eac175867330'}, {'name': 'MSC-France', 'id': '682731d2180d79b848ab04f2'}, {'name': 'ESAWAAI', 'id': '6826ee856a29eac175867327'}, {'name': 'JPL-CH4-detection', 'id': '680760267b05622170bef9ff'}, {'name': 'HYPERVIEW2', 'id': '68074b43c8575682bb134c3e'}, {'name': 'PASTIS-HD', 'id': '6800b

And from the Python Library as well


In [16]:
from eotdl.curation.stac.api import retrieve_stac_collections

retrieve_stac_collections()

[{'name': 'EuroSAT-small-bids25', 'id': '68d94da70839c97bf32294ec'},
 {'name': 'EuroSAT-RGB-bids25', 'id': '68d94d6b0839c97bf32294eb'},
 {'name': 'boadella-jsl2025', 'id': '68c7e5f8ca20b5063257cf81'},
 {'name': 'EuroSAT-small-jsl2025', 'id': '68c3fbdb0839c97bf32294e6'},
 {'name': 'RFInject', 'id': '68b4ab34ca20b5063257cf76'},
 {'name': 'Boadella-LPS25', 'id': '6855292a28fd5bfec013f993'},
 {'name': 'EuroSAT-small-lps25', 'id': '685523fc28fd5bfec013f992'},
 {'name': 'HyperspectralSimForS2-waters', 'id': '684ad47b28fd5bfec013f986'},
 {'name': 'SatellogicDataset', 'id': '6841551a25cf895e2fa41a48'},
 {'name': 'MassachusettsRoadsS2', 'id': '683ed91a00cf9ffafe807822'},
 {'name': 'EuroCropsCloudNative', 'id': '682f2d186a29eac175867330'},
 {'name': 'MSC-France', 'id': '682731d2180d79b848ab04f2'},
 {'name': 'ESAWAAI', 'id': '6826ee856a29eac175867327'},
 {'name': 'JPL-CH4-detection', 'id': '680760267b05622170bef9ff'},
 {'name': 'HYPERVIEW2', 'id': '68074b43c8575682bb134c3e'},
 {'name': 'PASTIS-HD

Obviously, we can filter and search per collection!


In [21]:
!uv run eotdl stac collection EuroSAT-small-bids25

{'uid': 'auth0|642adbfdb3da3ab51492d60a', 'id': '68d94da70839c97bf32294ec', 'name': 'EuroSAT-small-bids25', 'metadata': {'authors': ['Fran Martín'], 'license': 'open', 'source': 'https://github.com/earthpulse/eotdl/blob/develop/tutorials/workshops/lps25/02_training.ipynb', 'description': '# EuroSAT-small-bids25\n\nThis is a toy model trained with the EuroSAT dataset for the LPS25 workshop.', 'thumbnail': 'https://images.unsplash.com/photo-1446776811953-b23d57bd21aa?q=80&w=2072&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D'}, 'versions': [{'version_id': 1, 'createdAt': '2025-07-16T15:43:28.724000', 'size': 643749}], 'tags': [], 'createdAt': '2025-09-28T17:00:55.105000', 'updatedAt': '2025-09-28T17:00:55.105000', 'likes': 0, 'downloads': 0, 'quality': 0, 'active': True, 'allowed_users': [], 'benchmark': None, 'visibility': 'public'}


In [20]:
from eotdl.curation.stac.api import retrieve_stac_collection

retrieve_stac_collection("EuroSAT-small-bids25")

{'uid': 'auth0|642adbfdb3da3ab51492d60a',
 'id': '68d94da70839c97bf32294ec',
 'name': 'EuroSAT-small-bids25',
 'metadata': {'authors': ['Fran Martín'],
  'license': 'open',
  'source': 'https://github.com/earthpulse/eotdl/blob/develop/tutorials/workshops/lps25/02_training.ipynb',
  'description': '# EuroSAT-small-bids25\n\nThis is a toy model trained with the EuroSAT dataset for the LPS25 workshop.',
  'thumbnail': 'https://images.unsplash.com/photo-1446776811953-b23d57bd21aa?q=80&w=2072&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D'},
 'versions': [{'version_id': 1,
   'createdAt': '2025-07-16T15:43:28.724000',
   'size': 643749}],
 'tags': [],
 'createdAt': '2025-09-28T17:00:55.105000',
 'updatedAt': '2025-09-28T17:00:55.105000',
 'likes': 0,
 'downloads': 0,
 'quality': 0,
 'active': True,
 'allowed_users': [],
 'benchmark': None,
 'visibility': 'public'}

As EOTDL works with the datasets as STAC items, we can filter and search by item.


In [23]:
!uv run eotdl stac items EuroSAT-small-bids25

[{'id': 'README.md', 'assets': {'asset': {'checksum': 'b320743f60bdc9c45b67e40377eeecca7e14890d', 'href': 'https://api.eotdl.com/datasets/68d94da70839c97bf32294ec/stage/README.md', 'size': 279, 'timestamp': '2025-09-28T18:00:54.693326'}}}, {'id': 'hello.txt', 'assets': {'asset': {'checksum': 'f572d396fae9206628714fb2ce00f72e94f2258f', 'href': 'https://api.eotdl.com/datasets/68d94da70839c97bf32294ec/stage/hello.txt', 'size': 6, 'timestamp': '2025-09-28T18:00:54.693712'}}}, {'id': 'Forest/Forest_1.tif', 'assets': {'asset': {'checksum': 'f3b8b9fef6b2df6f24792ead860616186fe5efe0', 'href': 'https://api.eotdl.com/datasets/68d94da70839c97bf32294ec/stage/Forest/Forest_1.tif', 'size': 107244, 'timestamp': '2025-09-28T18:00:54.694847'}}}, {'id': 'Forest/Forest_2.tif', 'assets': {'asset': {'checksum': '2e38dab64435bfbab25bab8c779ecad6c0764677', 'href': 'https://api.eotdl.com/datasets/68d94da70839c97bf32294ec/stage/Forest/Forest_2.tif', 'size': 107244, 'timestamp': '2025-09-28T18:00:54.695736'}}},

In [24]:
from eotdl.curation.stac.api import retrieve_stac_items

retrieve_stac_items("EuroSAT-small-bids25")

[{'id': 'README.md',
  'assets': {'asset': {'checksum': 'b320743f60bdc9c45b67e40377eeecca7e14890d',
    'href': 'https://api.eotdl.com/datasets/68d94da70839c97bf32294ec/stage/README.md',
    'size': 279,
    'timestamp': '2025-09-28T18:00:54.693326'}}},
 {'id': 'hello.txt',
  'assets': {'asset': {'checksum': 'f572d396fae9206628714fb2ce00f72e94f2258f',
    'href': 'https://api.eotdl.com/datasets/68d94da70839c97bf32294ec/stage/hello.txt',
    'size': 6,
    'timestamp': '2025-09-28T18:00:54.693712'}}},
 {'id': 'Forest/Forest_1.tif',
  'assets': {'asset': {'checksum': 'f3b8b9fef6b2df6f24792ead860616186fe5efe0',
    'href': 'https://api.eotdl.com/datasets/68d94da70839c97bf32294ec/stage/Forest/Forest_1.tif',
    'size': 107244,
    'timestamp': '2025-09-28T18:00:54.694847'}}},
 {'id': 'Forest/Forest_2.tif',
  'assets': {'asset': {'checksum': '2e38dab64435bfbab25bab8c779ecad6c0764677',
    'href': 'https://api.eotdl.com/datasets/68d94da70839c97bf32294ec/stage/Forest/Forest_2.tif',
    'size'

And retrive per item.


In [25]:
!uv run eotdl stac item EuroSAT-small-bids25 README.md

{'type': 'Feature', 'stac_version': '1.0.0', 'stac_extensions': {}, 'datetime': '2025-09-28T18:00:54.692721', 'id': 'README.md', 'bbox': {'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin': 0.0}, 'geometry': '\x01\x03\x00\x00\x00\x00\x00\x00\x00', 'assets': {'asset': {'checksum': 'b320743f60bdc9c45b67e40377eeecca7e14890d', 'href': 'https://api.eotdl.com/datasets/68d94da70839c97bf32294ec/stage/README.md', 'size': 279, 'timestamp': '2025-09-28T18:00:54.693326'}}, 'links': {}, 'repository': 'eotdl'}


In [26]:
from eotdl.curation.stac.api import retrieve_stac_item

retrieve_stac_item("EuroSAT-small-bids25", "README.md")

{'type': 'Feature',
 'stac_version': '1.0.0',
 'stac_extensions': {},
 'datetime': '2025-09-28T18:00:54.692721',
 'id': 'README.md',
 'bbox': {'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin': 0.0},
 'geometry': '\x01\x03\x00\x00\x00\x00\x00\x00\x00',
 'assets': {'asset': {'checksum': 'b320743f60bdc9c45b67e40377eeecca7e14890d',
   'href': 'https://api.eotdl.com/datasets/68d94da70839c97bf32294ec/stage/README.md',
   'size': 279,
   'timestamp': '2025-09-28T18:00:54.693326'}},
 'links': {},
 'repository': 'eotdl'}

We can also search items using SQL queries with [DuckDB](https://duckdb.org/).

DuckDB is a lightweight, in-process database that lets you run fast SQL queries directly on local data (like CSV or Parquet files) without a separate server — perfect for quickly filtering and exploring datasets.

As always, we can do it using the CLI, as follows:


In [28]:
!uv run eotdl stac search EuroSAT-small-bids25 --query "id IN ('README.md', 'Forest/Forest_3.tif')"

[{'type': 'Feature', 'stac_version': '1.0.0', 'stac_extensions': [], 'datetime': 1759082454692, 'id': 'README.md', 'bbox': {'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin': 0.0}, 'geometry': {}, 'assets': {'asset': {'checksum': 'b320743f60bdc9c45b67e40377eeecca7e14890d', 'href': 'https://api.eotdl.com/datasets/68d94da70839c97bf32294ec/stage/README.md', 'size': 279, 'timestamp': 1759082454693}}, 'links': [], 'repository': 'eotdl'}, {'type': 'Feature', 'stac_version': '1.0.0', 'stac_extensions': [], 'datetime': 1759082454695, 'id': 'Forest/Forest_3.tif', 'bbox': {'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin': 0.0}, 'geometry': {}, 'assets': {'asset': {'checksum': '3e7bb982f9db5f7dabc556016c3d081dfb1fb73d', 'href': 'https://api.eotdl.com/datasets/68d94da70839c97bf32294ec/stage/Forest/Forest_3.tif', 'size': 107244, 'timestamp': 1759082454696}}, 'links': [], 'repository': 'eotdl'}]


Or simply using the Python Library.


In [29]:
from eotdl.curation.stac.api import search_stac_items

query = "id IN ('README.md', 'Forest/Forest_3.tif')"

search_stac_items("EuroSAT-small-bids25", query)

[{'type': 'Feature',
  'stac_version': '1.0.0',
  'stac_extensions': [],
  'datetime': 1759082454692,
  'id': 'README.md',
  'bbox': {'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin': 0.0},
  'geometry': {},
  'assets': {'asset': {'checksum': 'b320743f60bdc9c45b67e40377eeecca7e14890d',
    'href': 'https://api.eotdl.com/datasets/68d94da70839c97bf32294ec/stage/README.md',
    'size': 279,
    'timestamp': 1759082454693}},
  'links': [],
  'repository': 'eotdl'},
 {'type': 'Feature',
  'stac_version': '1.0.0',
  'stac_extensions': [],
  'datetime': 1759082454695,
  'id': 'Forest/Forest_3.tif',
  'bbox': {'xmax': 0.0, 'xmin': 0.0, 'ymax': 0.0, 'ymin': 0.0},
  'geometry': {},
  'assets': {'asset': {'checksum': '3e7bb982f9db5f7dabc556016c3d081dfb1fb73d',
    'href': 'https://api.eotdl.com/datasets/68d94da70839c97bf32294ec/stage/Forest/Forest_3.tif',
    'size': 107244,
    'timestamp': 1759082454696}},
  'links': [],
  'repository': 'eotdl'}]

Sometimes you may not know which filters to apply. Luckily, you can retrieve them!

Using the CLI:


In [31]:
!uv run eotdl stac search EuroSAT-small-bids25

{'schema': None, 'type': 'BYTE_ARRAY', 'stac_version': 'BYTE_ARRAY', 'stac_extensions': None, 'list': None, 'element': 'INT32', 'datetime': 'INT64', 'id': 'BYTE_ARRAY', 'bbox': None, 'xmax': 'DOUBLE', 'xmin': 'DOUBLE', 'ymax': 'DOUBLE', 'ymin': 'DOUBLE', 'geometry': 'BYTE_ARRAY', 'assets': None, 'asset': None, 'checksum': 'BYTE_ARRAY', 'href': 'BYTE_ARRAY', 'size': 'INT64', 'timestamp': 'INT64', 'links': None, 'repository': 'BYTE_ARRAY'}


Or using the Python Library


In [33]:
search_stac_items("EuroSAT-small-bids25")

{'schema': None,
 'type': 'BYTE_ARRAY',
 'stac_version': 'BYTE_ARRAY',
 'stac_extensions': None,
 'list': None,
 'element': 'INT32',
 'datetime': 'INT64',
 'id': 'BYTE_ARRAY',
 'bbox': None,
 'xmax': 'DOUBLE',
 'xmin': 'DOUBLE',
 'ymax': 'DOUBLE',
 'ymin': 'DOUBLE',
 'geometry': 'BYTE_ARRAY',
 'assets': None,
 'asset': None,
 'checksum': 'BYTE_ARRAY',
 'href': 'BYTE_ARRAY',
 'size': 'INT64',
 'timestamp': 'INT64',
 'links': None,
 'repository': 'BYTE_ARRAY'}

## Discussion and Contribution opportunities

Feel free to ask questions now (live or through Discord) and make suggestions for future improvements.

- Have you any experience working with the STAC speficication?
- If not or so, why?
- Which features do you miss when working with STAC? And which features do you enjoy the most?
