Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding seasonet dataset + tests + doc #1466

Merged
merged 17 commits into from
Sep 24, 2023

Conversation

dkosm
Copy link
Contributor

@dkosm dkosm commented Jul 6, 2023

This PR adds the SeasoNet Seasonal Land Cover Segmentation Dataset.

Huge thanks to @briktor for his work on the dataset and the torchgeo version !!

The SeasoNet dataset consists of 1,759,830 multi-spectral Sentinel-2 image patches, taken from 519,547 unique locations, covering the whole surface area of Germany. Annotations are provided in the form of pixel-level land cover and land usage segmentation masks from the German land cover model LBM-DE2018 with land cover classes based on the CORINE Land Cover database (CLC) 2018. The set is split into two overlapping grids, consisting of roughly 880,000 samples each, which are shifted by half the patch size in both dimensions. The images in each of the both grids themselves do not overlap.

Example Plot:
example_plot

Dataset format:

  • images are 16-bit GeoTiffs, split into seperate files based on resolution
  • images include 12 spectral bands with 10, 20 and 60 m per pixel resolutions
  • masks are single-channel 8-bit GeoTiffs

Paper: here

@github-actions github-actions bot added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets testing Continuous integration testing labels Jul 6, 2023
@adamjstewart adamjstewart added this to the 0.5.0 milestone Jul 6, 2023
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, excited to add your dataset to TorchGeo!

docs/api/datasets.rst Outdated Show resolved Hide resolved
tests/datasets/test_seasonet.py Outdated Show resolved Hide resolved
tests/datasets/test_seasonet.py Outdated Show resolved Hide resolved
torchgeo/datasets/seasonet.py Outdated Show resolved Hide resolved
torchgeo/datasets/seasonet.py Outdated Show resolved Hide resolved
torchgeo/datasets/seasonet.py Show resolved Hide resolved
torchgeo/datasets/seasonet.py Show resolved Hide resolved
torchgeo/datasets/seasonet.py Outdated Show resolved Hide resolved
torchgeo/datasets/seasonet.py Show resolved Hide resolved
torchgeo/datasets/seasonet.py Outdated Show resolved Hide resolved
@dkosm
Copy link
Contributor Author

dkosm commented Jul 10, 2023

Thanks for the fast reply! We will look into the necessary changes asap!

@calebrob6
Copy link
Member

Thanks @dkosm! Please let us know if you questions about the changes :)

@dkosm
Copy link
Contributor Author

dkosm commented Aug 31, 2023

@microsoft-github-policy-service agree company="TU Dortmund"

@dkosm
Copy link
Contributor Author

dkosm commented Aug 31, 2023

So after some delay, we were able to submit the CLA. We tried to adress all remarks that you had, in the meantime. Anything left ?

adamjstewart
adamjstewart previously approved these changes Sep 2, 2023
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic!

Left a couple of optional style suggestions.

SeasoNet.metadata[6],
"url",
os.path.join("tests", "data", "seasonet", "meta.csv"),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a list and a for-loop so there isn't so much code duplication.

{
"name": "spring",
"ext": ".zip",
"url": "https://zenodo.org/api/files/e2288446-9ee8-4b2e-ae76-cd80366a40e1/spring.zip", # noqa: E501
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would save a base url like:

url = "https://zenodo.org/api/files/e2288446-9ee8-4b2e-ae76-cd80366a40e1/"

and then you can append only when you need the full URL. Then you don't have so much code duplication. Not sure if you really need ext either, could just infer that from the filename.

self,
root: str = "data",
split: str = "train",
seasons: Collection[str] = all_seasons,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is Collection different from Iterable? How did you decide which to use?

"""
path = self.files.iloc[index][0]
with rasterio.open(f"{path}_labels.tif") as f:
array = f.read() - 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment explaining why - 1?

torchgeo/datasets/seasonet.py Outdated Show resolved Hide resolved
@adamjstewart adamjstewart enabled auto-merge (squash) September 24, 2023 14:13
@adamjstewart
Copy link
Collaborator

Thanks for the contribution, hopefully the first of many!

Apologies it took so long to get the PR merged. In the meantime I moved to Germany! Starting a postdoc at TUM, maybe we'll see each other around someday.

@adamjstewart adamjstewart merged commit b796bbd into microsoft:main Sep 24, 2023
21 checks passed
"nodata": None,
"crs": CRS.from_epsg(32632),
"transform": Affine(10.0, 0.0, 664800.0, 0.0, -10.0, 5342400.0),
"compress": "zstd",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to confirm that the original SeasoNet files are also compressed with ZSTD

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants