Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentinel2 Dataset behavior #1758

Open
calebrob6 opened this issue Dec 5, 2023 · 3 comments
Open

Sentinel2 Dataset behavior #1758

calebrob6 opened this issue Dec 5, 2023 · 3 comments
Labels
datasets Geospatial or benchmark datasets

Comments

@calebrob6
Copy link
Member

Description

I have a Sentinel 2 scene with the following files (e.g. in ./test_scene/):

T36KVU_20210513T073609_B01_60m.tif
T36KVU_20210513T073609_B02_10m.tif
T36KVU_20210513T073609_B03_10m.tif
T36KVU_20210513T073609_B04_10m.tif
T36KVU_20210513T073609_B05_20m.tif
T36KVU_20210513T073609_B06_20m.tif
T36KVU_20210513T073609_B07_20m.tif
T36KVU_20210513T073609_B08_10m.tif
T36KVU_20210513T073609_B09_60m.tif
T36KVU_20210513T073609_B11_20m.tif
T36KVU_20210513T073609_B12_20m.tif
T36KVU_20210513T073609_B8A_20m.tif
T36KVU_20210513T073609_TCI_10m.tif

I would expect any of the following to work:

ds = Sentinel2(
    "test_scene/",
    bands=["B01"],
)
ds = Sentinel2(
    "test_scene/",
    bands=["B01", "B02"],
)
ds = Sentinel2(
    "test_scene/",
    bands=["B01", "B02"],
    res=37
)

However the filename_glob and filename_regex are setup in such a way that none of the above are recognized as valid Sentinel 2 scenes.

Steps to reproduce

see above

Version

0.6.0.dev0

@calebrob6
Copy link
Member Author

Further:

ds = Sentinel2(
    "test_scene/",
    bands=["B01", "B02"],
    res=60,
)

will not throw an error, but ds[ds.bounds] will throw an error.

@estherrolf for visibility

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Dec 5, 2023
@adamjstewart
Copy link
Collaborator

This was specifically broken by https://github.com/microsoft/torchgeo/pull/754/files#diff-79277b084e67f13f6469cba19e6eadb93ce6c6479cef26161a0c847b75705a81

Basically, depending on where you download your data from, you either get:

  1. All bands in 10m resolution (resampled, no suffix)
  2. All bands in native resolution (10m, 20m, or 60m)
  3. All bands in all resolutions (10m, 20m, and 60m)

1, 2, and 3 are all somewhat contradictory. We could easily support each of these on their own, but supporting all 3 in combination is hard:

A. Remove resolution from the regex (only supports 1)
B. Replace resolution with a wildcard (only supports 1 and 2)
C. Include 10m in the regex (only supports 3)

In order to prioritize the highest resolution, maybe we could sort the glob results lexicographically and choose the first one only? But that feels really sloppy and could probably break for more complicated hypothetical datasets.

@calebrob6
Copy link
Member Author

I think this is strange behavior as one of the points of RasterDataset is that it can resample/align different layers to the same resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

No branches or pull requests

2 participants