Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GridGeoSampler: Sample beyond roi limit if needed #448

Closed
wants to merge 3 commits into from

Conversation

remtav
Copy link
Contributor

@remtav remtav commented Mar 2, 2022

fix for #431 as per suggested by @adamjstewart

@ghost
Copy link

ghost commented Mar 2, 2022

CLA assistant check
All CLA requirements met.

@github-actions github-actions bot added the samplers Samplers for indexing datasets label Mar 2, 2022
@adamjstewart adamjstewart added this to the 0.2.1 milestone Mar 8, 2022
@adamjstewart
Copy link
Collaborator

Sorry it took me so long to review this. For the sake of completeness, here are all possible solutions to #431 that I can think of:

  1. Skip the last partial patch (current behavior)
  2. Sample outside the bounds of the image with padding
  3. Adjust the stride of the last patch only
  4. Adjust the stride of all patches
  5. Adjust the size of the last patch only
  6. Adjust the size of all patches

The more I think about it, I agree that 1 is not the best solution, especially if you are trying to make predictions for an entire region. You raised some concerns about how to pad things properly with 2 but I think it's actually easier than you think, rasterio will automatically pad with nodata values if you do an out-of-bounds windowed sample (if I remember correctly).

I think 5 and 6 are both bad solutions because they make it difficult to construct a mini-batch without padding. I personally like 4 the best (calculate the total number of patches we need then adjust stride to match) but it is also very unintuitive. I can imagine someone adjusting stride only to find that the sampler doesn't obey their choice and being very frustrated.

For the sake of completeness (sampling all parts of the image) and intuitiveness (not changing the meaning of size/stride) I think I'm actually leaning towards 2. Before we decide to go with this solution, can you see how other libraries deal with this? I can imagine that satellite imagery isn't the only image source large enough that you have to chop it up into patches. @isaaccorely what does Kornia do?

P.S. I noticed that you added warnings to let the user know that the stride is being changed. I don't think we'll want to do this. I usually only add warnings to let users know that what they're trying to do isn't the correct way to do it (e.g. this function is deprecated, this input value is not ideal). In this case, because all satellite images vary in width/height by a few pixels, it is actually impossible to choose a value for size/stride that doesn't result in this warning.

@remtav
Copy link
Contributor Author

remtav commented Mar 16, 2022

Thanks for the complete evaluation and feedback @adamjstewart !
Knowing that rasterio already pads with nodata outside the bounds of an image makes solution 2 more interesting.
I can certaintly investigate how other librairies (maybe with medical imagery?) go about with this issue.

@adamjstewart
Copy link
Collaborator

We could of course make this a parameter that the user decides like we did with size (CRS or pixel units) but I would like to avoid this unless we have a good use case where one of these other solutions is needed. Trying to keep things simple when possible.

@adamjstewart
Copy link
Collaborator

FWIW, Pytorch's torch.nn.functional.conv2d seems to use behavior 1 if no padding is given. Still looking to see what other libraries do.

@adamjstewart adamjstewart modified the milestones: 0.2.1, 0.2.2 Mar 19, 2022
@remtav
Copy link
Contributor Author

remtav commented Mar 31, 2022

@adamjstewart

The solaris library seems to favor option 2.
Here's what I tested this morning:

import rasterio

from solaris.tile.raster_tile import RasterTiler

raster = "https://github.com/remtav/geo-deep-learning/blob/222-stac-item-input/tests/data/spacenet/SpaceNet_AOI_2_Las_Vegas-056155973080_01_P001-WV03-B.tif?raw=true"

with rasterio.open(raster) as src:
    rtiler = RasterTiler(dest_dir='/tmp/tiles', src_tile_size=(250, 250), verbose=True)
    rtiler.tile(src)

You'll see that tile() calls a tile_generator(), which then uses get_tile_bounds() to split geometry into "equal sized tiles" with split_geom()

This process ends up with a rasterio.window.from_bounds() on the generated bounds (which extend passed the original bounds). Rasterio is therefore responsible for the padding.

@adamjstewart
Copy link
Collaborator

Based on https://docs.rastervision.io/en/0.13/api.html#geodatawindowconfig it seems like Raster Vision adds a padding parameter that lets you decide whether or not to include windows/patches/chips outside of the image bounds?

@adamjstewart
Copy link
Collaborator

Let's go with option 2, where size and stride are held constant but the last patch samples outside the bounds of the image with padding. I think this will be the least surprising behavior. No need for a warning since the image size will almost never be an exact multiple of size.

I would also like to undo the change in #376 so that areas smaller than the patch size will still be sampled from. We need to be careful to skip patches with area=0 though. This occurs when two tiles are side-by-side with an overlapping edge face.

We'll want tests for at least 4 cases:

  • Area of overlap = 0 (sample no patches)
  • 0 < area of overlap < patch size (sample 1 patch)
  • area of overlap = some multiple of patch size w/ stride factored in (sample the # of patches we currently do)
  • area of overlap != some multiple (sample 1 more patch than we currently do)

@adamjstewart
Copy link
Collaborator

@remtav have you gotten a chance to work on this? We're planning a 0.3.0 release in about a week and I would love to see this change in that release.

@adamjstewart adamjstewart marked this pull request as draft June 27, 2022 01:02
@remtav
Copy link
Contributor Author

remtav commented Jun 27, 2022

@remtav have you gotten a chance to work on this? We're planning a 0.3.0 release in about a week and I would love to see this change in that release.

Thanks for the reminder @adamjstewart . I can move this on top of my TODO and tackle this during this week.

@remtav remtav changed the base branch from main to datasets/nongeo June 28, 2022 18:03
@remtav remtav changed the base branch from datasets/nongeo to main June 28, 2022 18:03
remtav added a commit to remtav/torchgeo that referenced this pull request Jun 28, 2022
…artial patch (to be padded)

test_single.py: add tests for multiple limit cases (see issue microsoft#448)
@remtav remtav changed the title GridGeoSampler: Adjust minx/miny with a smaller stride for the last sample per row/col GridGeoSampler: Sample beyond roi limit if needed Jun 28, 2022
@remtav
Copy link
Contributor Author

remtav commented Jun 28, 2022

I can't easily rebase my github's branch on top of main to integrate latest developments (I'd new github desktop and don't have admin rights on my laptop...). I'll close this PR and open a new one with an up-to-date branch.

@remtav remtav closed this Jun 28, 2022
@remtav remtav deleted the samplers/gridgeosampler_bounds branch June 28, 2022 18:28
remtav added a commit to remtav/torchgeo that referenced this pull request Jun 28, 2022
…artial patch (to be padded)

test_single.py: add tests for multiple limit cases (see issue microsoft#448)
@adamjstewart adamjstewart removed this from the 0.2.2 milestone Jul 2, 2022
@adamjstewart adamjstewart added this to the 0.3.1 milestone Sep 3, 2022
adamjstewart added a commit that referenced this pull request Sep 3, 2022
* Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning

* style and mypy fixes

* black test fix

* Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning

* style and mypy fixes

* black test fix

* single.py: adapt gridgeosampler to sample beyond limit of ROI for a partial patch (to be padded)
test_single.py: add tests for multiple limit cases (see issue #448)

* format for black and flake8

* format for black and flake8

* once again, format for black and flake8

* Revert "Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning"

This reverts commit cb554c6

* adapt unit tests, remove warnings

* flake8: remove warnings import

* Address some comments

* Simplify computation of # rows/cols

* Document this new feature

* Fix size of ceiling symbol

* Simplify tests

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Modexus pushed a commit to Modexus/torchgeo that referenced this pull request Sep 3, 2022
…rosoft#630)

* Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning

* style and mypy fixes

* black test fix

* Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning

* style and mypy fixes

* black test fix

* single.py: adapt gridgeosampler to sample beyond limit of ROI for a partial patch (to be padded)
test_single.py: add tests for multiple limit cases (see issue microsoft#448)

* format for black and flake8

* format for black and flake8

* once again, format for black and flake8

* Revert "Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning"

This reverts commit cb554c6

* adapt unit tests, remove warnings

* flake8: remove warnings import

* Address some comments

* Simplify computation of # rows/cols

* Document this new feature

* Fix size of ceiling symbol

* Simplify tests

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
adamjstewart added a commit that referenced this pull request Sep 3, 2022
* Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning

* style and mypy fixes

* black test fix

* Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning

* style and mypy fixes

* black test fix

* single.py: adapt gridgeosampler to sample beyond limit of ROI for a partial patch (to be padded)
test_single.py: add tests for multiple limit cases (see issue #448)

* format for black and flake8

* format for black and flake8

* once again, format for black and flake8

* Revert "Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning"

This reverts commit cb554c6

* adapt unit tests, remove warnings

* flake8: remove warnings import

* Address some comments

* Simplify computation of # rows/cols

* Document this new feature

* Fix size of ceiling symbol

* Simplify tests

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
…rosoft#630)

* Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning

* style and mypy fixes

* black test fix

* Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning

* style and mypy fixes

* black test fix

* single.py: adapt gridgeosampler to sample beyond limit of ROI for a partial patch (to be padded)
test_single.py: add tests for multiple limit cases (see issue microsoft#448)

* format for black and flake8

* format for black and flake8

* once again, format for black and flake8

* Revert "Adjust minx/miny with a smaller stride for the last sample per row/col and issue warning"

This reverts commit cb554c6

* adapt unit tests, remove warnings

* flake8: remove warnings import

* Address some comments

* Simplify computation of # rows/cols

* Document this new feature

* Fix size of ceiling symbol

* Simplify tests

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
samplers Samplers for indexing datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants