-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for radiant-mlhub 0.5+ #1102
Support for radiant-mlhub 0.5+ #1102
Conversation
We'll need to update all datasets, not just NASA Marine Debris.
Can we just ask to resume whenever |
Got it, working on all the datatsets now! |
2286ec1
to
e9a2366
Compare
Ok! So what I ended up doing is falling back to download the old file structure by grabbing the |
Does that approach work for older radiant-mlhub or is 0.5.0 the minimum version where that technique works? |
cef2957
to
d369a34
Compare
Good idea! Checked with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also update requirements/datasets.txt
to radiant-mlhub==0.5.5
and remove radiant-mlhub from .github/dependabot.yml
?
@@ -16,15 +16,15 @@ | |||
from torchgeo.datasets import BeninSmallHolderCashews | |||
|
|||
|
|||
class Dataset: | |||
class Collection: | |||
def download(self, output_dir: str, **kwargs: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When radiant-mlhub 0.5 was first released, we actually didn't notice when things broke because we replace their download method with our own. Do you know if it's possible to monkeypatch something else farther internal so that we can actually test their download method but still use a local file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually have never used monkeypatch! So I'm not sure how that would work. It looks like the download
call just creates a requests. Session
and download chunks for the dataset in a threadpool. We would probably have to over-ride something in requests? But then our test data would need to be chunked I think.
Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Can we loosen the |
Bleh, that's annoying. Let me see if I can convince them to support newer shapely. I would rather focus on testing newer shapely than on newer radiant-mlhub. So we can test the latest version of radiant-mlhub 0.4 and prevent dependabot from updating it until radiant-mlhub supports shapely 2. |
Done: https://github.com/radiantearth/radiant-mlhub/pull/195 Hopefully the next release will support newer shapely. Until then, let's keep testing old radiant-mlhub. |
So that means changing |
You can actually leave your changes to
|
8f24cb0
to
76268b3
Compare
Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Everything looks good at first glance, testing this locally now... |
This is working for most datasets but doesn't seem to work for SpaceNet 6. Maybe @ashnair1 can comment on why only SpaceNet 6 is different.
(scroll right to see error messages) |
So |
So it looks like SpaceNet provides a direct download link shown here: https://spacenet.ai/sn6-challenge/. Since it is 39GB, it looks like ml-hub does not support downloading that dataset directly anymore and only allow the new method of downloading each image individually using the |
I'll let @ashnair1 decide, he wrote all of our SpaceNet data loaders. I think the current approach is probably fine. |
So the difference in SpaceNet6 is primarily caused by the way it was archived. Refer to radiantearth/radiant-mlhub#93 While writing this dataset, Long term, I think it might be best to download all SpaceNet datasets from the official site - https://spacenet.ai/ and not use radiant-mlhub. |
With this PR both SpaceNet 6 and SpaceNet != 6 seem to download properly for all versions of radiant-mlhub (just ignore the warning messages) so I think this is safe to download. We can consider moving the SpaceNet downloads (or rehosting) later. Would be great if we could download all of our datasets without an API key. |
* update datasets and tests to support radiant-mlhub>0.5 * add test coverage for nasa_marine_debris corrupted cases * style fixes * Correct return type in test_nasa_marine_debris.py * Update setup.cfg to limit radiant-mlhub version Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com> * radiant-mlhub version updates to <0.6 * Update environment.yml to not upper bound radiant-mlhub Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com> --------- Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
* update datasets and tests to support radiant-mlhub>0.5 * add test coverage for nasa_marine_debris corrupted cases * style fixes * Correct return type in test_nasa_marine_debris.py * Update setup.cfg to limit radiant-mlhub version Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com> * radiant-mlhub version updates to <0.6 * Update environment.yml to not upper bound radiant-mlhub Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com> --------- Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Quick changes to several datasets and the tests to support the new download file structures in
radiant-mlhub>=0.5
.Quick changes for now. Currently no checksum checking for the dataset since
radiant-mlhub
just downloads the individual files directly. Any ideas how I can check for partial dataset downloads?radiant-mlhub
says it will auto-resume dataset downloads if requested but this code does not currently check or ask it to resume.Closes #711
Closes #1101