Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CMS Global Mangrove Canopy dataset #391

Merged
merged 10 commits into from
Feb 20, 2022
Merged

Conversation

nilsleh
Copy link
Collaborator

@nilsleh nilsleh commented Feb 7, 2022

This PR adds the CMS Global Mangrove Canopy dataset. It's a dataset of mangrove areas around the world at 30m resolution, where the raster displays pixel-wise measurements of either aboveground biomass, basal area weighted height, or maximum canopy height. An example for agb can be found below.

Since there are 116 different countries, I think it is desirable to let the user choose for which countries data should be retrieved. Likewise this should be possible for choosing one of the three measurements. However, I am not sure how to best do this with inheriting the Raster dataset class, where I need to only select possible filenames matching country and measurement to populate the index.

Example AGB:
cms

@nilsleh nilsleh marked this pull request as draft February 7, 2022 18:13
@github-actions github-actions bot added datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing labels Feb 7, 2022
@adamjstewart
Copy link
Collaborator

Since there are 116 different countries, I think it is desirable to let the user choose for which countries data should be retrieved. Likewise this should be possible for choosing one of the three measurements. However, I am not sure how to best do this with inheriting the Raster dataset class, where I need to only select possible filenames matching country and measurement to populate the index.

The easiest way to do this would be to always load all countries and allow the user to select a region by passing an ROI to the GeoSampler. It'll be a bit harder to let them select the countries.

@adamjstewart adamjstewart added this to the 0.3.0 milestone Feb 7, 2022
@nilsleh
Copy link
Collaborator Author

nilsleh commented Feb 7, 2022

Okay, I just thought that having a list of country names as an argument would be the most user friendly thing to do. With respect to selecting one of three measurements, should I then overwrite the getitem method that filters for the selected measurement?

@adamjstewart
Copy link
Collaborator

If you do want to support a country name argument, you should filter by filename during __init__ (after super) so that the sampler doesn't even know about regions that it can't sample from. Alternatively, if we only supported a single country at a time, you could have a filename_glob property set dynamically based on the value of self.country but it sounds like you'd prefer to be able to choose multiple countries.

@nilsleh
Copy link
Collaborator Author

nilsleh commented Feb 8, 2022

After second thought, supporting only a single country at a time might actually be very reasonable, since individual country tiles can be very large and a user could still use the Union to combine several countries.

However, I have come across an inconsistency for three country names in the dataset file naming:

  • there is NewZealand and Newzealand
  • FrenchGuiana and FrenchGuyana
  • CoteDivoire and CotedIvoire

Do you have a suggestion on how to deal with this? Write a warning in the documentation?

@adamjstewart
Copy link
Collaborator

What do you mean by inconsistency? Inconsistency between what and what?

@ashnair1
Copy link
Collaborator

ashnair1 commented Feb 8, 2022

Inconsistency in naming. For case 1 and 3, you could lowercase them via .lower() before doing comparisons right?

@nilsleh
Copy link
Collaborator Author

nilsleh commented Feb 8, 2022

Sorry, inconsistency of naming a country for one of the three measurements (agb, hba95 or hmax95):

  • agb_FrenchGuyana.tif, hmax95_FrenchGuyana.tif, but hba95_FrenchGuiana.tif
  • agb_Newzealand.tif, hmax95_Newzealand.tif, but hba_95_NewZealand.tif
  • agb_CoteDivoire.tif, hmax95_CoteDivoire.tif, but hba_95_CotedIvoire.tif

@adamjstewart
Copy link
Collaborator

Yeah, that's a bit harder. It's easy to do if you override all of __init__/__getitem__ but we would like to avoid that if we can because it's a lot of duplicated code and it's something we continue to optimize.

@nilsleh
Copy link
Collaborator Author

nilsleh commented Feb 8, 2022

Yes that's what I thought as well. Maybe then just add a warning for those three cases and wait for them to correct the dataset? I have emailed the data center there.

@nilsleh nilsleh marked this pull request as ready for review February 9, 2022 13:41
@adamjstewart
Copy link
Collaborator

It looks like you have filename_glob set to select only a single country/measurement combo so there's no need to warn now.

tests/datasets/test_cms_mangrove_canopy.py Show resolved Hide resolved
torchgeo/datasets/cms_mangrove_canopy.py Outdated Show resolved Hide resolved
torchgeo/datasets/cms_mangrove_canopy.py Outdated Show resolved Hide resolved
torchgeo/datasets/cms_mangrove_canopy.py Outdated Show resolved Hide resolved
torchgeo/datasets/cms_mangrove_canopy.py Outdated Show resolved Hide resolved
torchgeo/datasets/cms_mangrove_canopy.py Outdated Show resolved Hide resolved
torchgeo/datasets/cms_mangrove_canopy.py Outdated Show resolved Hide resolved
@calebrob6
Copy link
Member

(just cleaned up a little bit)

@calebrob6
Copy link
Member

@adamjstewart, are your comments resolved here?

@calebrob6 calebrob6 merged commit 9cf36fa into microsoft:main Feb 20, 2022
@adamjstewart adamjstewart mentioned this pull request Jul 11, 2022
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
* CMS dataset

* dynamically set filename

* add warning in documentation

* requested changes and data.py

* single zip file and camel case

* md5 check added

* correct error messages

* compression smaller test file

Co-authored-by: Caleb Robinson <calebrob6@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants