Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentinel2 data set not able to locate data file downloaded from https://scihub.copernicus.eu #505

Closed
tinkueg opened this issue Apr 13, 2022 · 7 comments · Fixed by #754
Closed
Assignees
Labels
datasets Geospatial or benchmark datasets
Milestone

Comments

@tinkueg
Copy link

tinkueg commented Apr 13, 2022

I see an error when I try to access the data downloaded from https://scihub.copernicus.eu/dhus/#/home for sentinel-2 using torchgeo Sentinel2 dataset. After investigating I found that the function does not uses the standard naming convention for sentinel-2 data products.

files downloaded from USGS Earth Explorer seem to have a different filename format.

Naming Conventions
https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/naming-convention
https://www.usgs.gov/centers/eros/science/sentinel-2-data-dictionary#tile_number

If there is a way to change file name please suggest.

----ERROR LOG----

$ python train.py
Traceback (most recent call last):
  File "train.py", line 12, in <module>
    dataset = Sentinel2(root='./../data/sentinel', crs=None, transforms=None, bands=[], cache=True)
  File "/home/manish/projects/pytorch/torchgeo_lcc/torchgeo_env/lib/python3.8/site-packages/torchgeo/datasets/sentinel.py", line 103, in __init__
    super().__init__(root, crs, res, transforms, cache)
  File "/home/manish/projects/pytorch/torchgeo_lcc/torchgeo_env/lib/python3.8/site-packages/torchgeo/datasets/geo.py", line 374, in __init__
    raise FileNotFoundError(
FileNotFoundError: No Sentinel2 data was found in './../data/sentinel'
@calebrob6
Copy link
Member

@tinkueg, sorry you've run into problems and are frustrated, and thanks for reporting the issue. That said, please refrain from opening future issues with such a hostile tone. The people that work on this library do so voluntarily in their spare time and do not need to read or respond to hostile issues. Also, feel free to contribute pull requests that fix or better document the issues you run into. We're all trying to make torchgeo into a better tool!

@tinkueg
Copy link
Author

tinkueg commented Apr 13, 2022

calebrob6 I am new to open source projects and not a developer. I was unaware that it is supported by volunteers and I was thinking that it's Microsoft project!
I have deleted my comments. Thanks for efforts by you all!

@tinkueg tinkueg changed the title Sentinel2 data set not able to locate data file downloaded from https://scihub.copernicus.eu/dhus/#/home Sentinel2 data set not able to locate data file downloaded from https://scihub.copernicus.eu Apr 13, 2022
@calebrob6
Copy link
Member

No problem! Again, thanks for reporting the issue (without issues we wouldn't be aware of problems like this) -- this is definitely something we want to fix before an eventual 1.0 release.

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Apr 13, 2022
@adamjstewart adamjstewart added this to the 0.2.2 milestone Apr 13, 2022
@tinkueg
Copy link
Author

tinkueg commented Apr 13, 2022

calebrob6 Is there a guideline that tells how I can update the documentation so that other people know about the current issues.. my be contribute by fixing things.

@adamjstewart
Copy link
Collaborator

I tend not to bother documenting bugs and instead try to fix them. In this case, this bug is relatively easy to fix. All we need to do is update the filename_regex to support both USGS Earth Explorer and Copernicus filename formats. If no one gets to it I can do this within a week.

@adamjstewart adamjstewart self-assigned this Apr 14, 2022
@adamjstewart
Copy link
Collaborator

adamjstewart commented Apr 14, 2022

Okay, I took a look at this. For others following this issue, this is what the file structure looks like:

USGS EarthExplorer

├── L1C_T16TFM_A035544_20220412T163959.zip
└── S2A_MSIL1C_20220412T162841_N0400_R083_T16TFM_20220412T202300.SAFE
    ├── GRANULE
    │   └── L1C_T16TFM_A035544_20220412T163959
    │       ├── IMG_DATA
    │       │   ├── T16TFM_20220412T162841_B01.jp2
    │       │   ├── T16TFM_20220412T162841_B02.jp2
    │       │   ├── T16TFM_20220412T162841_B03.jp2
    │       │   ├── T16TFM_20220412T162841_B04.jp2
    │       │   ├── T16TFM_20220412T162841_B05.jp2
    │       │   ├── T16TFM_20220412T162841_B06.jp2
    │       │   ├── T16TFM_20220412T162841_B07.jp2
    │       │   ├── T16TFM_20220412T162841_B08.jp2
    │       │   ├── T16TFM_20220412T162841_B09.jp2
    │       │   ├── T16TFM_20220412T162841_B10.jp2
    │       │   ├── T16TFM_20220412T162841_B11.jp2
    │       │   ├── T16TFM_20220412T162841_B12.jp2
    │       │   ├── T16TFM_20220412T162841_B8A.jp2
    │       │   └── T16TFM_20220412T162841_TCI.jp2

Copernicus Open Access Hub

├── S2A_MSIL2A_20220414T110751_N0400_R108_T26EMU_20220414T165533.zip
├── S2A_MSIL2A_20220414T110751_N0400_R108_T26EMU_20220414T165533.SAFE
│   ├── GRANULE
│   │   └── L2A_T26EMU_A035569_20220414T110747
│   │       ├── IMG_DATA
│   │       │   ├── R10m
│   │       │   │   ├── T26EMU_20220414T110751_AOT_10m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_B02_10m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_B03_10m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_B04_10m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_B08_10m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_TCI_10m.jp2
│   │       │   │   └── T26EMU_20220414T110751_WVP_10m.jp2
│   │       │   ├── R20m
│   │       │   │   ├── T26EMU_20220414T110751_AOT_20m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_B01_20m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_B02_20m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_B03_20m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_B04_20m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_B05_20m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_B06_20m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_B07_20m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_B11_20m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_B12_20m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_B8A_20m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_SCL_20m.jp2
│   │       │   │   ├── T26EMU_20220414T110751_TCI_20m.jp2
│   │       │   │   └── T26EMU_20220414T110751_WVP_20m.jp2
│   │       │   └── R60m
│   │       │       ├── T26EMU_20220414T110751_AOT_60m.jp2
│   │       │       ├── T26EMU_20220414T110751_B01_60m.jp2
│   │       │       ├── T26EMU_20220414T110751_B02_60m.jp2
│   │       │       ├── T26EMU_20220414T110751_B03_60m.jp2
│   │       │       ├── T26EMU_20220414T110751_B04_60m.jp2
│   │       │       ├── T26EMU_20220414T110751_B05_60m.jp2
│   │       │       ├── T26EMU_20220414T110751_B06_60m.jp2
│   │       │       ├── T26EMU_20220414T110751_B07_60m.jp2
│   │       │       ├── T26EMU_20220414T110751_B09_60m.jp2
│   │       │       ├── T26EMU_20220414T110751_B11_60m.jp2
│   │       │       ├── T26EMU_20220414T110751_B12_60m.jp2
│   │       │       ├── T26EMU_20220414T110751_B8A_60m.jp2
│   │       │       ├── T26EMU_20220414T110751_SCL_60m.jp2
│   │       │       ├── T26EMU_20220414T110751_TCI_60m.jp2
│   │       │       └── T26EMU_20220414T110751_WVP_60m.jp2

Although the zipfile name and directory structure is quite different, the filenames are almost identical except for the resolution extension at the end. So changing things to support USGS is easy (just make the resolution optional). But this raises the question of whether or not we're actually properly supporting Copernicus. Right now the resolution is just a wildcard, so we could pick up the same tile at multiple different resolutions. Also, when we search for matching tiles, I think we're replacing the resolution with a wildcard, so that could cause issues too. I'll have to think about this one. I think the correct solution is:

  1. Change the filename_glob and filename_regex in sentinel.py to make the resolution optional
  2. Remove replacement of <resolution> with wildcard in geo.py
  3. Set the resolution used in filename_regex inside of __init__ in sentinel.py based on user-specified res

Hopefully that should allow us to support both? I wish things were a bit more uniform. Also, we don't currently look at anything other than the filename, but other datasets will require use of the full filepath, so that may come into play at some point. Anyway, this will need extensive testing to make sure it works as expected. I'll try to get started on this soon.

@adamjstewart
Copy link
Collaborator

@tinkueg in your issue, you mention that the dataset is unable to detect files downloaded from Copernicus. But our regex is actually designed for Copernicus, it should only be incompatible with EarthExplorer data. Can you share an example filename so I know what your data looks like?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants