Add MapInWild dataset #1131

burakekim · 2023-02-21T16:01:45Z

Here is the MapInWild dataset instance. There are things I solved with ugly fixes and quick workarounds but I hope this pull request still is a good-enough starting point.

TODO and issues

Checksum: So far no checksum. I can not seem to find the MD5s from Hugging Face. There are only SH256s.
Now it is not really straightforward to distinguish the modalities in the __getitem__ making it harder to apply specific normalization steps (i.e., s2/10000).
Documentation and tests

Nice to do

Picking modalities band-wise to enable the use of various combinations of bands. This could even supersede the modality-wise picking.
Improving the plot function. A normalization function (i.e., percentile_normalization) is needed for at least the Sentinel-2 data. For that, an argument and modality-aware logic inside this function could be helpful.
Remove the pandas and move to a lower-level library.

Any help is appreciated. Thanks in advance for your time in reviewing this pull request.

P.S. I am by no means a GitHub pro, so any heads-up is appreciated.

The signature:
mapinwild = MapInWild(root="data", download=True, modality=["mask","esa_wc","s2_temporal_subset","viirs","s1"], split="train", transforms=None, checksum=False)

Some sample images:

VIIRS Nighttime light band

Sentinel-2 single temporal subset (the Sentinel-2 season with the highest score)

ESA WorldCover map

Sentinel-1

adamjstewart

This looks like a great start! Let me know if you have any specific questions.

Checksum: So far no checksum. I can not seem to find the MD5s from Hugging Face. There are only SH256s.

You can calculate them yourself using md5 <filename> on the command line. We'll likely move from MD5 to SHA256 someday.

Remove the pandas and move to a lower-level library.

You should probably just keep pandas. I've been thinking about making it a required dep of TorchGeo because so many datasets use it.

torchgeo/datasets/mapinwild.py

burakekim · 2023-02-22T09:41:34Z

@microsoft-github-policy-service agree company="unibwmunich"

accept suggestions Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

burakekim · 2023-02-22T10:33:43Z

Thanks a bunch for your revisions!

I applied some quick changes and tests. I will make sure there is a full test coverage after applying the necessary additions and changes, which are for now:

Applying MD5 checks.
Finding a generic way to handle modalities one by one in the __getitem__ so that we can apply modality-specific normalizations. As a last resort, I will create some mess by simply using self._load_raster(id, modals) to load the modalities separately.
Making plot function modality aware to apply proper visualization scalings.

Issues

Do you happen to know why the line pyupgrade --py37-plus $(find . -name "*.py") outputs the error below? It works without the query argument $(find . -name "*.py"). Is this argument needed?

usage: pyupgrade [-h] [--exit-zero-even-if-changed] [--keep-percent-format] [--keep-mock] [--keep-runtime-typing]
                 [--py3-plus] [--py36-plus] [--py37-plus] [--py38-plus] [--py39-plus] [--py310-plus] [--py311-plus]
                 [filenames ...]
pyupgrade: error: unrecognized arguments: -name *.py)

I shortened the plain texts that are going over 89 characters and applied # noqa: E501 for the links, is this OK?

adamjstewart · 2023-02-22T19:23:28Z

Do you happen to know why the line pyupgrade --py37-plus $(find . -name "*.py") outputs the error below?

Nope, that syntax should be correct. Note that our tests use the more complicated:

$ pyupgrade --py38-plus $(find . -path ./docs/src -prune -o -name "*.py" -print)

I shortened the plain texts that are going over 89 characters and applied # noqa: E501 for the links, is this OK?

Yes this is perfect.

adamjstewart · 2023-04-17T16:05:57Z

Note that we just dropped Python 3.8 support, so --py38-plus is now --py39-plus

Improves the dataset instance and test data script.

burakekim · 2023-04-24T19:46:19Z

@adamjstewart I have improved the dataset class (mapinwild.py) and added the code to create test data (mapinwild\data.py) with the last commit.

Now, I assume the next step is to populate the test_mapinwild.py.

The progress is quite slow as I have very little bandwidth for this side hustle -sorry for that! In fact, feel free to contribute to the test_mapinwild.py to wrap up things faster, otherwise, it could take up quite some time for me.

adamjstewart · 2023-04-24T21:43:43Z

No rush. Most of us are currently working towards a paper deadline, so we prob won't be able to help much either. But might be able to help after June.

torchgeo/datasets/mapinwild.py

burakekim · 2023-09-25T20:29:43Z

Could you please help me understand the check "license/cla Expected — Waiting for status to be reported"? I can not seem to complete that.

Also, please let me know if there is anything else to be done from my side before the release. Thank you for your time in helping me out here, it was a great learning curve for me. I could have never done it without your great reviews :) I plan to contribute more to the library as much as I can!

adamjstewart · 2023-09-25T21:01:14Z

@calebrob6 any idea what's wrong with the CLA bot? It was accepted here: #1131 (comment)

Glad my harsh reviews didn't scare you away 😆. I also see that you're in Munich? I just started a postdoc at TUM, maybe I'll see you around!

I'll review everything in detail again tomorrow, but I think we're probably pretty close now.

calebrob6 · 2023-09-25T21:08:33Z

No idea, let's see if the ol' close and reopen trick works :)

adamjstewart · 2023-09-25T21:11:29Z

I hate how reliable ol' close and reopen is...

calebrob6 · 2023-09-25T21:17:08Z

Yep, I think that did it! Thanks for all the hard work on this @burakekim!

burakekim · 2023-09-26T08:27:36Z

I also see that you're in Munich? I just started a postdoc at TUM, maybe I'll see you around!

@adamjstewart Yes, I am doing my PhD here in Munich. Definitely, it would be great to meet up! I will shoot you an e-mail :)

burakekim · 2023-09-26T08:44:43Z

Thanks for all the hard work on this @burakekim!

Thanks a bunch, @calebrob6! Contributing to the library has been an enjoyable experience :)

torchgeo/datasets/mapinwild.py

tests/datasets/test_mapinwild.py

torchgeo/datasets/mapinwild.py

calebrob6 · 2023-09-27T17:07:52Z

What are the requested changes here?

adamjstewart · 2023-09-27T18:50:11Z

Need to re-review tonight

adamjstewart

Two remaining easy changes and I think this will be ready to merge!

tests/datasets/test_mapinwild.py

torchgeo/datasets/mapinwild.py

tests/datasets/test_mapinwild.py

adamjstewart

Great work on this! Glad to have your dataset in TorchGeo!

adamjstewart · 2023-09-29T10:38:27Z

Bumping tests...

add mapinwild dataset

f8b0a5d

github-actions bot added the datasets Geospatial or benchmark datasets label Feb 21, 2023

adamjstewart added this to the 0.5.0 milestone Feb 21, 2023

adamjstewart requested changes Feb 21, 2023

View reviewed changes

burakekim and others added 7 commits February 22, 2023 10:53

add copyright and move the header

7a1b539

Apply suggestions from code review

7dcba84

accept suggestions Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

add spaces between sections

eef425e

Merge branch 'main' of https://github.com/burakekim/torchgeo

6e0f20a

test_black

9b9c1f8

test_isort

18141f8

test_flake8

b962cfd

burakekim added 2 commits March 31, 2023 12:14

Merge branch 'microsoft:main' into main

f496123

Merge branch 'microsoft:main' into main

517e2cd

burakekim and others added 2 commits April 18, 2023 17:28

Merge branch 'main' into main

067b902

dataset instance and test data

fd19fbf

Improves the dataset instance and test data script.

github-actions bot added the testing Continuous integration testing label Apr 24, 2023

Merge branch 'main' into main

9adadee

improvements in test script and dataset class

87c1813

github-actions bot added the documentation Improvements or additions to documentation label Aug 4, 2023

burakekim changed the title ~~Add mapinwild dataset~~ Add MapInWild dataset Aug 4, 2023

burakekim and others added 4 commits August 4, 2023 17:11

update test data

08c01a3

Merge branch 'main' into main

128ffe1

Merge branch 'main' into main

b21fce0

Merge branch 'main' of https://github.com/burakekim/torchgeo

89f091a

burakekim and others added 3 commits September 25, 2023 20:24

addressing the comments

2cd5b0b

fix mypy plt.Figure not defined

816fcea

Merge branch 'main' into main

4d764ed

burakekim commented Sep 25, 2023

View reviewed changes

torchgeo/datasets/mapinwild.py Show resolved Hide resolved

torchgeo/datasets/mapinwild.py Show resolved Hide resolved

calebrob6 closed this Sep 25, 2023

calebrob6 reopened this Sep 25, 2023

make the _merge_parts slimmer

f2ff730

adamjstewart reviewed Sep 26, 2023

View reviewed changes

pandas and reviews

30b1499

adamjstewart requested changes Sep 27, 2023

View reviewed changes

tests/datasets/test_mapinwild.py Show resolved Hide resolved

torchgeo/datasets/mapinwild.py Show resolved Hide resolved

monkeypatch tvt sets

181527b

adamjstewart reviewed Sep 29, 2023

View reviewed changes

tests/datasets/test_mapinwild.py Outdated Show resolved Hide resolved

Simplify MonkeyPatch import

6e3e1b3

adamjstewart approved these changes Sep 29, 2023

View reviewed changes

adamjstewart enabled auto-merge (squash) September 29, 2023 09:56

adamjstewart closed this Sep 29, 2023

auto-merge was automatically disabled September 29, 2023 10:38
Pull request was closed

adamjstewart reopened this Sep 29, 2023

adamjstewart enabled auto-merge (squash) September 29, 2023 10:38

adamjstewart merged commit c51014c into microsoft:main Sep 29, 2023
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MapInWild dataset #1131

Add MapInWild dataset #1131

burakekim commented Feb 21, 2023 •

edited

Loading

adamjstewart left a comment

burakekim commented Feb 22, 2023

burakekim commented Feb 22, 2023 •

edited

Loading

adamjstewart commented Feb 22, 2023

adamjstewart commented Apr 17, 2023

burakekim commented Apr 24, 2023

adamjstewart commented Apr 24, 2023

burakekim commented Sep 25, 2023 •

edited

Loading

adamjstewart commented Sep 25, 2023

calebrob6 commented Sep 25, 2023

adamjstewart commented Sep 25, 2023

calebrob6 commented Sep 25, 2023

burakekim commented Sep 26, 2023

burakekim commented Sep 26, 2023

calebrob6 commented Sep 27, 2023

adamjstewart commented Sep 27, 2023

adamjstewart left a comment

adamjstewart left a comment

adamjstewart commented Sep 29, 2023

Add MapInWild dataset #1131

Add MapInWild dataset #1131

Conversation

burakekim commented Feb 21, 2023 • edited Loading

adamjstewart left a comment

Choose a reason for hiding this comment

burakekim commented Feb 22, 2023

burakekim commented Feb 22, 2023 • edited Loading

adamjstewart commented Feb 22, 2023

adamjstewart commented Apr 17, 2023

burakekim commented Apr 24, 2023

adamjstewart commented Apr 24, 2023

burakekim commented Sep 25, 2023 • edited Loading

adamjstewart commented Sep 25, 2023

calebrob6 commented Sep 25, 2023

adamjstewart commented Sep 25, 2023

calebrob6 commented Sep 25, 2023

burakekim commented Sep 26, 2023

burakekim commented Sep 26, 2023

calebrob6 commented Sep 27, 2023

adamjstewart commented Sep 27, 2023

adamjstewart left a comment

Choose a reason for hiding this comment

adamjstewart left a comment

Choose a reason for hiding this comment

adamjstewart commented Sep 29, 2023

burakekim commented Feb 21, 2023 •

edited

Loading

burakekim commented Feb 22, 2023 •

edited

Loading

burakekim commented Sep 25, 2023 •

edited

Loading