Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add L8 Biome Dataset #1200

Merged
merged 47 commits into from
Apr 5, 2023
Merged

Conversation

shradhasehgal
Copy link
Contributor

@shradhasehgal shradhasehgal commented Mar 28, 2023

Overview

Adds L8 Biome dataset as a RasterDataset to TorchGeo.

Main changes:

  • Added functions to initialize the L8 Biome dataset, sample from it, and plot images.
  • Added data generation script and artificial data corresponding to the same.
  • Wrote tests to check the functionality of the L8 Biome class.
  • Integrated L8Biome Data Module and included different splits.
  • Included dataset details into the docs.

Here is a sample plot of an image and its mask from the dataset:

L8 Biome

@github-actions github-actions bot added datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation labels Mar 28, 2023
@adamjstewart adamjstewart added this to In progress in SSL4EO-L via automation Mar 28, 2023
@adamjstewart
Copy link
Collaborator

Used getitem and plotting functions from CDL dataset. How should these be changed for L8 Biome?

CDL is useless for __getitem__ and plot since this dataset is drastically different. CDL is only useful for copying the downloading code. For __getitem__, LandCoverAIGeo is a better reference. For plotting, every dataset is different, there's isn't a great example, but any land cover mapping dataset should have a similar image + mask plotting setup.

What will the cmap be for this dataset?

First of all, cmap isn't required for any dataset. But you do need a way to plot the image, and imshow + colormap is the best way to do this. Oftentimes, like in the case of CDL, this cmap is hardcoded into the GeoTIFF file, and TorchGeo will automatically read and use it without listing it in the dataset, but I don't think that's the case with your masks. In the case of this dataset, the authors purposefully chose 0, 64, 128, 192, 255 for the class values instead of 0, 1, 2, 3, 4. I'm guessing that's because these are the cmap values they recommend. Basically, we plot the image in grayscale with each class having these color values. Note that in the __getitem__ you'll need to undo this and ensure that these classes are 0, 1, 2, 3, 4.

torchgeo/datasets/l8biome.py Outdated Show resolved Hide resolved
torchgeo/datasets/l8biome.py Outdated Show resolved Hide resolved
torchgeo/datasets/l8biome.py Outdated Show resolved Hide resolved
torchgeo/datasets/l8biome.py Show resolved Hide resolved
torchgeo/datasets/l8biome.py Show resolved Hide resolved
torchgeo/datasets/l8biome.py Show resolved Hide resolved
torchgeo/datasets/l8biome.py Outdated Show resolved Hide resolved
torchgeo/datasets/l8biome.py Outdated Show resolved Hide resolved
@adamjstewart
Copy link
Collaborator

@adamjstewart adamjstewart marked this pull request as draft March 28, 2023 02:33
@adamjstewart adamjstewart changed the title Adding L8Biome Dataset - WIP Add L8 Biome Dataset Mar 28, 2023
@adamjstewart adamjstewart added this to the 0.5.0 milestone Mar 28, 2023
@github-actions github-actions bot added the datamodules PyTorch Lightning datamodules label Mar 30, 2023
@shradhasehgal
Copy link
Contributor Author

Latest modifications:

  • Fixed L8Biome dataset verification and download
  • Resolved comments from previous review
  • Changed auxiliary files for dataset addition
  • Created L8Biome datamodule

Doubts:

  • Used getitem function from LandCoverAI as per this comment. Wondering if anything needs to change for L8Biome in particular?
  • Altered cmap values to 0-4 as per this discussion. What do the values in this dictionary represent and how to obtain them for L8 Biome?
  • Some tests are still failing after running linters. How to resolve them?

TODOs:

  • Write tests
  • Plotting function

Tagging @adamjstewart for review.

Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very close to being done (aside from plotting and testing)!

docs/api/datasets.rst Outdated Show resolved Hide resolved
docs/api/geo_datasets.csv Outdated Show resolved Hide resolved
torchgeo/datamodules/__init__.py Outdated Show resolved Hide resolved
torchgeo/datamodules/l8biome.py Show resolved Hide resolved
torchgeo/datasets/l8biome.py Show resolved Hide resolved
torchgeo/datasets/l8biome.py Outdated Show resolved Hide resolved
torchgeo/datasets/l8biome.py Outdated Show resolved Hide resolved
torchgeo/datasets/l8biome.py Outdated Show resolved Hide resolved
torchgeo/datasets/l8biome.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added the testing Continuous integration testing label Apr 1, 2023
tests/data/l8biome/data.py Outdated Show resolved Hide resolved
tests/datasets/test_l8biome.py Outdated Show resolved Hide resolved
tests/datasets/test_l8biome.py Outdated Show resolved Hide resolved
tests/datasets/test_l8biome.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully my last review!

Docs look good: https://torchgeo--1200.org.readthedocs.build/en/1200/api/datasets.html#l8-biome

Can you plot an image and add the plot to the PR description so we can make sure everything looks visually correct? Colors might be better now that it isn't only B2 data.

torchgeo/datasets/l8biome.py Outdated Show resolved Hide resolved
torchgeo/datasets/l8biome.py Show resolved Hide resolved
torchgeo/datasets/l8biome.py Outdated Show resolved Hide resolved
torchgeo/datasets/l8biome.py Outdated Show resolved Hide resolved
torchgeo/datasets/l8biome.py Outdated Show resolved Hide resolved
Downloading raw data directly

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
shradhasehgal and others added 2 commits April 5, 2023 13:02
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, congrats on your first PR! Hopefully the first of many if the extensive review process didn't scare you away 😅

@adamjstewart adamjstewart merged commit 24afe5a into microsoft:main Apr 5, 2023
20 checks passed
SSL4EO-L automation moved this from In progress to Done Apr 5, 2023
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
* Added varification of data

* Added getitem and plotting functions - partial

* Update torchgeo/datasets/l8biome.py

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

* Changes corresponding to draft PR

* Changed auxilliary files for dataset addition; added L8Biome datamodule

* Fixed dataset verification and download; minor changes in variable declaration

* Running linters to properly format code

* Fixing failed linter tests from previous commit

* Fixing comments from review microsoft#2

* Added script to generate artificial data in similar format as original files

* Removed dependency on existing dataset

* Fixes to the getitem function; tested with samples

* Making suggested changes; fixing mask; writing tests

* Changes to plot RGB image

* Changes to plot RGB image - running into errors related to band variables declaration

* Fixed plotting of dataset

* Added masks to artificial data generation; tested with pytest - all tests pass

* Resolving file and code formatting with linters

* Added artificial data

* Fixing lists in docstrings

* Fixing comments from review

* Added new tests

* Regenerated artificial data

* Added test for l8datamodule; linters formatting

* Fixed image generation with more rigorous check on filepath

* docstring formatting change

* Resolving CI test errors

* Added conf/yaml for datamodules test

* Fixing CI test failure bug; Changes from review; Samplers in datamodule WIP

* Added more artificial data

* Fixed yaml and datamodule changes

* Running linters to properly format code

* Reducing batch_size and length for datamodules tests

* Reduced patch size in l8biome yaml to fit within artificial data

* Declaring all samplers each time

* Reverting to stages checks for samplers

* Changed data generation file and generated data again

* Fix getitem

* Included more encompassing tests

* Adding test for RGB bands being absent; running linters

* Added check for mask in getitem test

* Changes corresponding to review

* Update torchgeo/datasets/l8biome.py

Downloading raw data directly

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

* Update l8biome.py

* Update torchgeo/datasets/l8biome.py: Changing RGB bands order

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

* Change B2 band to B1

---------

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datamodules PyTorch Lightning datamodules datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants