Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add year filter functionality CDL #1337

Merged
merged 5 commits into from
May 25, 2023
Merged

Conversation

nilsleh
Copy link
Collaborator

@nilsleh nilsleh commented May 15, 2023

This PR adds years argument to select a number of years for which to download CDL or use CDL so the user doesn't have to download all 15 years. This is how we decided to implement it for NLCD as well.

@github-actions github-actions bot added datasets Geospatial or benchmark datasets testing Continuous integration testing labels May 15, 2023
@adamjstewart adamjstewart added this to In progress in SSL4EO-L via automation May 15, 2023
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs more coverage

torchgeo/datasets/cdl.py Outdated Show resolved Hide resolved
torchgeo/datasets/cdl.py Outdated Show resolved Hide resolved
torchgeo/datasets/cdl.py Outdated Show resolved Hide resolved
@adamjstewart adamjstewart added this to the 0.5.0 milestone May 15, 2023
torchgeo/datasets/cdl.py Outdated Show resolved Hide resolved
torchgeo/datasets/cdl.py Outdated Show resolved Hide resolved
torchgeo/datasets/cdl.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining issues are very trivial. Feel free to merge this if you want to focus on different things.

@@ -363,15 +373,30 @@ def _verify(self) -> None:
RuntimeError: if ``download=False`` but dataset is missing or checksum fails
"""
# Check if the extracted files already exist
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very unimportant but I wonder if we should instead have a single for-loop over all versions and only download/extract the exact versions we need to. Imagine a situation where someone downloaded and extracted 2021, but then 2022 came out and they wanted to use that too. At the moment, Spack would try to re-download (torchvision will skip this for us) and re-extract the zip file for 2021 even though it's already extracted.

@@ -43,23 +43,23 @@ class CDL(RasterDataset):
is_image = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the glob/regex will still pick up versions other than the ones requested if they are found on the system. You could fix this by editing the filename_regex during __init__. We do something similar to this in our Landsat datasets where we patch filename_glob based on the bands requested.

@nilsleh nilsleh merged commit 2d576d9 into microsoft:main May 25, 2023
18 checks passed
SSL4EO-L automation moved this from In progress to Done May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets testing Continuous integration testing
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants