Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adopting stac-table and xstac #218

Closed
TomAugspurger opened this issue Jan 14, 2022 · 6 comments
Closed

adopting stac-table and xstac #218

TomAugspurger opened this issue Jan 14, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@TomAugspurger
Copy link
Collaborator

Hi all,

I have a couple of small-ish packages to assist in creating STAC metadata for

  1. Tabular data, like might be stored in parquet and loaded with pandas stac-table
  2. n-dimensional labeled data, like might be stored in NetCDF or Zarr and loaded with xarray xstac

I was wondering if stactools would be a good home for them.

Specifically, I'm proposing the addition of two new sub-modules / sub-packages to stactools:
Each of these subpackages would be either namespace packages or regular packages that require optional dependencies.
In other words, no new required dependecies for stactools itself, and no breaking API changes.

I think the new sub-packages should reflect the type of data they're tailored for. So I would propose something like

  1. stactools.table
  2. stactools.datacube (dunno about this name. But it matches the most prominent STAC extension used)

Logistics

Commit History

The easiest path forward is to just copy-paste the code from those libraries into the relevant submodules. Preserving the commit history is probably the responsible thing to do, but might be complicated. I'd recommend we give it one shot to make a pull request that preserves the commit history, and if that fails we can just copy the source itself.

CLI

xstac includes a CLI, based on argparse. stactools uses click. The xstac CLI should be rewritten to use click and updated to integrate into stactools

Packaging

xstac and stac-table use a mix of setupools's declarative setup.cfg and pyproject.toml. I don't know how compatible those are with whatever we'll need, but they might need to be rewritten too.

Tests

It seems like stactools is using unittest as a test runner. xstac and stac-table are using pytest.
pytest is really nice, and I would recommend it for any new project. Its test runner can handle modules that
use unittest's basic features, so we might be able to just switch out the runner. If that causes issues,
we'd need to either update stactools to use pytest, or xstac and stac-table to use unittest.

Deprecating xstac and stac-table

We don't want people hanging around on old versions of xstac and stac-table. So once we have a release of stactools with these modules integrated, I'll make a release of xstac and stac-table that issues a warning at import time instructing users to upgrade.

Alternatives

Another option, still building on stactools namespace thing, is to put these repos in the stactools-packages org. But that doesn't feel quite right, since that org is for datasets, and these proposed modules are lower-level utilities that might be used in the generation of STAC metadata for a specific dataset.

@TomAugspurger TomAugspurger added the enhancement New feature or request label Jan 14, 2022
@gadomski gadomski mentioned this issue Jan 14, 2022
@gadomski
Copy link
Member

👍🏽 in general, with a couple of comments:

If that causes issues, we'd need to either update stactools to use pytest, or xstac and stac-table to use unittest.

I think stactools should migrate to pytest regardless of the resolution here. Our default policy seems to be "do what the other STAC libraries are doing" and since PySTAC is on pytest, we probably should be too. #219.

But that doesn't feel quite right, since that org is for datasets, and these proposed modules are lower-level utilities that might be used in the generation of STAC metadata for a specific dataset.

So this isn't strictly true -- when we created stactools-packages we chose that name instead of "stactools-datasets" because we wanted the org to support dependency-heavy non-dataset packages, e.g. browse. You're right that in practice it's been a storage shelf for dataset packages. Since your proposed changes are dependency-light, I see no reason they couldn't/shouldn't live in the main stactools repo as namespace packages (similar to stactools.cli and stactools.testing).

stactools.datacube (dunno about this name. But it matches the most prominent STAC extension used)

I'm a little wary here just because of the growing exposure of the odc-stac project and the related increase in frequency of the "OpenDataCube" project/name appearing in the STAC ecosystem. Since we're using it to make xarrays, is stactools.xarray in play?

@TomAugspurger
Copy link
Collaborator Author

Good call about avoiding a naming clash with datacube.

stactools.xarray is probably best, though we'll need to be a bit careful with imports and namespaces if we shadow the name "xarray".

@duckontheweb
Copy link

since PySTAC is on pytest

Actually, PySTAC is still running tests using python -m unittest, but I think we should move to pytest as well.

@TomAugspurger
Copy link
Collaborator Author

TomAugspurger commented Feb 7, 2023

My proposal focused on moving xstac & stac-table into this repository. Alternatively, they could continue to live in their own repositories, but under the stac-utils / stactools-package GitHub organization. That would one of my larger concerns: that they live in a personal GitHub repository.

@gadomski
Copy link
Member

gadomski commented Feb 9, 2023

We discussed this offline as well, but (in my opinion) it makes the most sense to move xstac and stac-table into https://github.com/stac-utils as their own standalone repositories. We're starting to rethink the "stactools is the swiss-army knife / dumping ground" model, so there really isn't too much benefit to integrating those packages into this one ... the only use-case could be to hook into the stac CLI endpoint, but (a) you can still do that using the plugin mechanism, and (b) it's my hope that stac-cli will take over that endpoint in the medium future.

Tl;dr: I think stac-utils/xtac and stac-utils/stac-table makes the most sense, LMK if I can help w/ the move or any reworks required.

@TomAugspurger
Copy link
Collaborator Author

Transfered to https://github.com/stac-utils/stac-table and https://github.com/stac-utils/xstac. This should suffice for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants