-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MapInWild dataset #1131
Add MapInWild dataset #1131
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great start! Let me know if you have any specific questions.
Checksum: So far no checksum. I can not seem to find the MD5s from Hugging Face. There are only SH256s.
You can calculate them yourself using md5 <filename>
on the command line. We'll likely move from MD5 to SHA256 someday.
Remove the pandas and move to a lower-level library.
You should probably just keep pandas. I've been thinking about making it a required dep of TorchGeo because so many datasets use it.
@microsoft-github-policy-service agree company="unibwmunich" |
accept suggestions Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Thanks a bunch for your revisions! I applied some quick changes and tests. I will make sure there is a full test coverage after applying the necessary additions and changes, which are for now:
Issues
|
Nope, that syntax should be correct. Note that our tests use the more complicated: $ pyupgrade --py38-plus $(find . -path ./docs/src -prune -o -name "*.py" -print)
Yes this is perfect. |
Note that we just dropped Python 3.8 support, so |
Improves the dataset instance and test data script.
@adamjstewart I have improved the dataset class (mapinwild.py) and added the code to create test data (mapinwild\data.py) with the last commit. Now, I assume the next step is to populate the test_mapinwild.py. The progress is quite slow as I have very little bandwidth for this side hustle -sorry for that! In fact, feel free to contribute to the test_mapinwild.py to wrap up things faster, otherwise, it could take up quite some time for me. |
No rush. Most of us are currently working towards a paper deadline, so we prob won't be able to help much either. But might be able to help after June. |
Could you please help me understand the check "license/cla Expected — Waiting for status to be reported"? I can not seem to complete that. Also, please let me know if there is anything else to be done from my side before the release. Thank you for your time in helping me out here, it was a great learning curve for me. I could have never done it without your great reviews :) I plan to contribute more to the library as much as I can! |
@calebrob6 any idea what's wrong with the CLA bot? It was accepted here: #1131 (comment) Glad my harsh reviews didn't scare you away 😆. I also see that you're in Munich? I just started a postdoc at TUM, maybe I'll see you around! I'll review everything in detail again tomorrow, but I think we're probably pretty close now. |
No idea, let's see if the ol' close and reopen trick works :) |
I hate how reliable ol' close and reopen is... |
Yep, I think that did it! Thanks for all the hard work on this @burakekim! |
@adamjstewart Yes, I am doing my PhD here in Munich. Definitely, it would be great to meet up! I will shoot you an e-mail :) |
Thanks a bunch, @calebrob6! Contributing to the library has been an enjoyable experience :) |
What are the requested changes here? |
Need to re-review tonight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two remaining easy changes and I think this will be ready to merge!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this! Glad to have your dataset in TorchGeo!
Bumping tests... |
Closes #1096
Here is the MapInWild dataset instance. There are things I solved with ugly fixes and quick workarounds but I hope this pull request still is a good-enough starting point.
TODO and issues
__getitem__
making it harder to apply specific normalization steps (i.e., s2/10000).Nice to do
plot
function. A normalization function (i.e., percentile_normalization) is needed for at least the Sentinel-2 data. For that, an argument and modality-aware logic inside this function could be helpful.pandas
and move to a lower-level library.Any help is appreciated. Thanks in advance for your time in reviewing this pull request.
P.S. I am by no means a GitHub pro, so any heads-up is appreciated.
The signature:
mapinwild = MapInWild(root="data", download=True, modality=["mask","esa_wc","s2_temporal_subset","viirs","s1"], split="train", transforms=None, checksum=False)
Some sample images:
VIIRS Nighttime light band
Sentinel-2 single temporal subset (the Sentinel-2 season with the highest score)
ESA WorldCover map
Sentinel-1