Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip downloading when not in pre-sampled coords #1821

Merged
merged 4 commits into from
Mar 2, 2024

Conversation

tatsubori
Copy link
Contributor

@tatsubori tatsubori commented Jan 23, 2024

SSL4EO's download_ssl4eo.py can fail with NoKey error if given pre-sampled coords like sampled_locations.csv do not contain any index specified by the indices range.

@github-actions github-actions bot added the scripts Training and evaluation scripts label Jan 23, 2024
adamjstewart
adamjstewart previously approved these changes Jan 23, 2024
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue occurs when the range specified for sample_ssl4eo.py is different from the range specified for download_ssl4eo.py and some indices have no lat/lon specified in sampled_locations.csv? Makes sense, it's probably good to skip those. I wonder if we should add a warning that we are skipping those. For example, if someone requests to try downloading 1M images but only gets 1K, they might be surprised.

@adamjstewart
Copy link
Collaborator

@wangyi111 may be interested in reviewing this

@wangyi111
Copy link
Contributor

yes I think if we skip then a warning is good. Normally, the indices range for download_ssl4eo.py is supposed to be covered by sampled_locations.csv. So the warning can help the user check again if this is really needed.

One rare use case could be you want to filter out some coords in sampled_locations.csv (e.g. you want to download more images within an area), move those entries to another sampled_locations_special.csv file for download. In this case, you still specify the full indices range in sampled_locations.csv to download_ssl4eo.py, and the script can skip those ids that are not in the new csv.

@calebrob6
Copy link
Member

I'd recommend returning False if skipped and returning True if completed, then at the end of the script printing how many were skipped.

@adamjstewart adamjstewart added this to the 0.5.2 milestone Feb 29, 2024
@adamjstewart
Copy link
Collaborator

I added a warning message. Adding return values and summing them is more work, but @calebrob6 can add another commit/PR if he wants.

@isaaccorley isaaccorley merged commit f8faf9f into microsoft:main Mar 2, 2024
24 checks passed
isaaccorley pushed a commit that referenced this pull request Mar 2, 2024
* Skip downloading an entry if it is not in the given pre-sampled list.

* Add warning message when skipping

---------

Co-authored-by: Michiaki Tatsubori <mich@mich-mbp5.local>
Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scripts Training and evaluation scripts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants