Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reprojection issues #1344

Merged
merged 16 commits into from
May 19, 2023
Merged

Fix reprojection issues #1344

merged 16 commits into from
May 19, 2023

Conversation

adamjstewart
Copy link
Collaborator

This PR fixes 2 issues related to reprojection:

  • RasterDataset: fix loading of files requiring reprojection

I broke this in #1329. When loading any file that requires changing the CRS (and therefore WarpedVRT), we see the following error:

Traceback (most recent call last):
  File "/Users/Adam/torchgeo/test.py", line 11, in <module>
    ds[ds.bounds]
  File "/Users/Adam/torchgeo/torchgeo/datasets/geo.py", line 899, in __getitem__
    samples = [ds[query] for ds in self.datasets]
  File "/Users/Adam/torchgeo/torchgeo/datasets/geo.py", line 899, in <listcomp>
    samples = [ds[query] for ds in self.datasets]
  File "/Users/Adam/torchgeo/torchgeo/datasets/geo.py", line 443, in __getitem__
    data = self._merge_files(filepaths, query, self.band_indexes)
  File "/Users/Adam/torchgeo/torchgeo/datasets/geo.py", line 486, in _merge_files
    dest = src.read(
  File "rasterio/_warp.pyx", line 1233, in rasterio._warp.WarpedVRTReaderBase.read
ValueError: WarpedVRT does not permit boundless reads

The fix is to always use rasterio.merge.merge instead of src.read, even if there is only a single file. This bug wasn't detected because we don't actually load real files and change the CRS during the RasterDataset tests. I added real files and ensured that we test loading those files with a different CRS.

  • Intersection/UnionDataset: fix crs/res propagation

This bug was reported by @jatentaki in #1341. Basically, the following works fine:

ds = (ds1 & ds2) & ds3
ds = (ds1 | ds2) | ds3

However, the following is completely broken:

ds = ds1 & (ds2 & ds3)
ds = ds1 | (ds2 | ds3)

The fix for this was much more involved. We need to add a res getter/setter to GeoDataset, similar to our crs getter/setter. Then we need to override this in Intersection/UnionDataset to update both datasets. This is updated recursively to ensure that all datasets have matching crs/res. This is a pretty uncommon use case (ds1 & ds2 & ds3 maps to the former), so that's why no one noticed until now. I added tests for both of the above situations so this doesn't break again.

Thanks @jatentaki for reporting this!

Fixes #1341

@adamjstewart adamjstewart added this to the 0.4.2 milestone May 17, 2023
@github-actions github-actions bot added datasets Geospatial or benchmark datasets testing Continuous integration testing labels May 17, 2023

Returns:
the :term:`coordinate reference system (CRS)`

.. versionadded:: 0.2
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the getter/setter were added in 0.2, the attribute itself has always been present, so technically there was no API change as far as the user is concerned.

@adamjstewart adamjstewart added this to In progress in SSL4EO-L via automation May 17, 2023
calebrob6
calebrob6 previously approved these changes May 18, 2023
tests/data/raster/uint16/corrupted.tif Outdated Show resolved Hide resolved
dest, _ = rasterio.merge.merge(
vrt_fhs, bounds, self.res, indexes=band_indexes
)
dest, _ = rasterio.merge.merge(vrt_fhs, bounds, self.res, indexes=band_indexes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern here is on performance but let's sort that out later. I'm imagining that we can have something like benchmark.py run during integration tests to test for dataloader performance regressions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall it being slightly slower (that's why we added the if-statement in the first place), but I'm unaware of any other way of doing this with src.read, so for now if we want boundless reading (needed for GridGeoSampler) then we have to use rasterio.merge.merge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but it still stands that this is a place that needs revisiting

SSL4EO-L automation moved this from In progress to Done May 18, 2023
@adamjstewart adamjstewart reopened this May 18, 2023
SSL4EO-L automation moved this from Done to In progress May 18, 2023
dest, _ = rasterio.merge.merge(
vrt_fhs, bounds, self.res, indexes=band_indexes
)
dest, _ = rasterio.merge.merge(vrt_fhs, bounds, self.res, indexes=band_indexes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but it still stands that this is a place that needs revisiting

@calebrob6 calebrob6 merged commit bbf6108 into main May 19, 2023
32 checks passed
SSL4EO-L automation moved this from In progress to Done May 19, 2023
@calebrob6 calebrob6 deleted the fixes/reprojection branch May 19, 2023 04:33
@adamjstewart adamjstewart modified the milestones: 0.4.2, 0.5.0 Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets testing Continuous integration testing
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

UnionDataset does not propagate .res
2 participants