Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samplers not completely respecting RoI #260

Closed
robertomest opened this issue Nov 23, 2021 · 3 comments · Fixed by #144
Closed

Samplers not completely respecting RoI #260

robertomest opened this issue Nov 23, 2021 · 3 comments · Fixed by #144

Comments

@robertomest
Copy link

Hi Folks,
I've been using torchgeo for loading data and it's working really well. There seems to be a bug in the samplers (verified it in both RandomGeoSampler and GridGeoSampler). When creating a sampler with an RoI, the sampler currently only uses the RoI to select among the available files in the rtree index. If one of the files goes beyond the RoI, that region will be sampled as well. This is specially problematic in datasets composed of large tiffs (like CDL).

Example:

from torchgeo.samplers import RandomGeoSampler
from torchgeo.datasets import CDL
from torchgeo.datasets import BoundingBox
from shapely import geometry as shpg
import geopandas as gpd

dataset = CDL("/tmp/cdl2")
minx, maxx, miny, maxy, mint, maxt = dataset.bounds
roi = BoundingBox(minx, (minx + maxx) / 2, miny, (miny + maxy) / 2, mint, maxt)
sampler = RandomGeoSampler(dataset, size=1e5, length=200, roi=roi)

dataset_bounds = gpd.GeoSeries(shpg.box(minx, miny, maxx, maxy))
roi_bounds = gpd.GeoSeries(shpg.box(minx, miny, (minx + maxx) / 2, (miny + maxy) / 2))
# Sample some bounding boxes
samples = gpd.GeoSeries([shpg.box(b[0], b[2], b[1], b[3]) for b in sampler])

ax = dataset_bounds.boundary.plot(color="black")
roi_bounds.boundary.plot(ax=ax, color="green")
samples.boundary.plot(ax=ax, color="red")

bug

I think the problem would be fixed by computing the sampling bounds as the intersection of the hit bounds and the roi

bounds = intersection(BoundingBox(*hit.bounds), self.roi)

Let me know if you would like me to open a PR and help out on this.

@adamjstewart
Copy link
Collaborator

Duplicate of #149, will be fixed by #144

@calebrob6
Copy link
Member

@robertomest, thanks for opening an issue! Please let us know if you run into any others or want to contribute in another way. I'll ping you here when #144 is merged.

@adamjstewart
Copy link
Collaborator

@robertomest #144 is now ready if you want to test it out. You can clone the repo, check out the feature/zipdatasets branch, and run your code from the same directory or add the directory to your PYTHONPATH. Let me know if you notice any bugs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants