Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add src_geoloc_array kwarg to reproject and _reproject #2884

Merged
merged 7 commits into from
Dec 24, 2023
Merged

Conversation

sgillies
Copy link
Member

@sgillies sgillies commented Jul 25, 2023

Rasterio 1.3 can warp (reproject) rasters using crs and transform, ground control point, or rational polynomial coefficient inputs that are not stored on the source raster. Like this:

    reproject(                                                                                                                                                                                             
        source,                                                                                                                                                                                            
        destination,                                                                                                                                                                                            
        src_crs=SRC_CRS,                                                                                                                                                                                   
        src_transform=SRC_TRANSFORM,
        ...                                                                                                                                                                                                                                                                                                                            
    )

    reproject(                                                                                                                                                                                             
        source,                                                                                                                                                                                            
        destination,                                                                                                                                                                                            
        src_crs=SRC_CRS,                                                                                                                                                                                   
        gcps=SRC_GCPS,
        ...                                                                                                                                                                                                                                                                                                                            
    )

    reproject(                                                                                                                                                                                             
        source,                                                                                                                                                                                            
        destination,                                                                                                                                                                                             
        src_crs=SRC_CRS,                                                                                                                                                                                   
        rpcs=SRC_RPCS,
        ...                                                                                                                                                                                                                                                                                                                            
    )

For 1.4, let's add external geolocation arrays (https://gdal.org/development/rfc/rfc4_geolocate.html) to the mix.

Proposed usage is this:

    reproject(                                                                                                                                                                                             
        source,                                                                                                                                                                                            
        destination,                                                                                                                                                                                             
        src_crs=SRC_CRS,                                                                                                                                                                                   
        geoloc_array=(xs, ys),
        ...                                                                                                                                                                                                                                                                                                                            
    )

where xs and ys are 2-D arrays of x and y coordinates respectively. Appropriate arrays can be computed using rasterio and numpy objects. For example, full resolution arrays can be generated for a rasterio dataset like this:

xs, ys = dataset.transform * numpy.meshgrid(numpy.arange(dataset.width), numpy.arange(dataset.height))

Lower resolution arrays are acceptable, too. They will give slightly different results.

xs, ys = dataset.transform * numpy.meshgrid(numpy.arange(0, dataset.width, 10), numpy.arange(0, dataset.height, 10))

Internally, these arrays are copied so that they can be accessed via a single pointer using a GDAL MEM dataset. A copy could be avoided if a 3-D array is passed in (that's a TODO).

The GDAL API for configuration geolocation arrays is detailed at https://gdal.org/api/gdal_alg.html#_CPPv432GDALCreateGenImgProjTransformer212GDALDatasetH12GDALDatasetHPPc.

On the command line, rio-warp could be extended with a --geoloc-dataset option specifying a 2-band dataset file containing the equivalent data. There's a chance that rio warp --to SRC_GEOLOC_ARRAY=FILENAME (where FILENAME is a 2-band raster dataset containing x and y coordinates) works right now (with GDAL >= 3.5.2).

Resolves #2883

Plus an experimental _xygrid method for datasets.

Resolves #2883
@sgillies sgillies added this to the 1.4.0 milestone Jul 25, 2023
@sgillies sgillies self-assigned this Jul 25, 2023
@sgillies
Copy link
Member Author

So far, so good. We have a limited capability to warp datasets and arrays using a pair of numpy arrays that hold coordinates. For example, here is the output raster from test_geoloc_warp_array().

foo-rgb

rasterio/_base.pyx Outdated Show resolved Hide resolved
rasterio/_base.pyx Outdated Show resolved Hide resolved
rasterio/_base.pyx Outdated Show resolved Hide resolved
rasterio/_base.pyx Outdated Show resolved Hide resolved
@sgillies
Copy link
Member Author

sgillies commented Jul 26, 2023

I've added a test for lower resolution array input. It works!

test_geoloc_warp_array_subsampled

@groutr @snowman2 I added more details to the top comment. What do you think?

@snowman2
Copy link
Member

Two thoughts:

  • I think it would be helpful to add src_geoloc_array to the docstrings
  • Do you think it would be a good idea to guard against providing src_geoloc_array if used with an older GDAL version?

@sgillies
Copy link
Member Author

sgillies commented Jul 31, 2023

@snowman2 absolutely yes to both!

Helping users avoid conflicting input (gcps and geolocation arrays, for example) is another TODO. One approach to that would be to change from 4 different geolocation keyword arguments to a single one, but I'm not sure about doing that now.

@snowman2
Copy link
Member

One approach to that would be to change from 4 different geolocation keyword arguments to a single one, but I'm not sure about doing that now.

Datacube has the concept of a GeoBox class that contains the geolocation information. rasterio could potentially follow a similar pattern where the user passes this a geolocation object they constructed previously.

@groutr
Copy link
Contributor

groutr commented Jul 31, 2023

We have nice classes in transform.py for GCPSTransformer, RPCTransformer, and AffineTransformer. A GridTransformer can be added. Then in reproject, we can accept an instance of the transformer class.

    reproject(                                                                                                                                                                                             
        source,                                                                                                                                                                                            
        destination,                                                                                                                                                                                             
        src_crs=SRC_CRS,
        transformer = <instance of transformer class>
        ...                                                                                                                                                                                                                                                                                                                            
    )

@sgillies
Copy link
Member Author

@snowman2 @groutr yeah, I was thinking along those lines. Maybe geolocation array could be made available only via transformer 🤔

@sgillies
Copy link
Member Author

sgillies commented Aug 7, 2023

Also to consider: opendatacube/datacube-core#1457.

@@ -223,6 +223,7 @@ def _reproject(
num_threads=1,
warp_mem_limit=0,
working_data_type=0,
src_geoloc_array=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few usability notes (context in GDAL RFC #4):

  • kwargs should disallow the use of METHOD=GEOLOC_ARRAY and SRC_GEOLOC_ARRAY.
  • Is the intention to support PIXEL_STEP and LINE_STEP via kwargs? These seem like key parameters for some users. PIXEL_OFFSET and LINE_OFFSET seem less useful, but worth supporting for completeness.
  • It is possible to assign a geolocation array to the target image using the DST_GEOLOC_ARRAY transformer option. If kwargs is passed along correctly it seems like this should work if the caller configures things correctly, but it may be worth thinking about whether or not this should be more officially supported, or explicitly disallowed.
  • It is possible for the input image to have GEOLOCATION metadata set. This is a special metadata domain like RPC, and normally points to an external dataset. Another useful pattern is to construct a geolocation array, attach GEOLOCATION metadata, and omit the *_DATASET metadata items. This dataset can then be applied to any dataset of the right shape without having to manage geolocation metadata on the primary image dataset. Is Rasterio's stance that this metadata will be ignored, and users must manually load it and craft an appropriate call to reproject()?
  • Rasterio must be able to read and write GEOLOCATION metadata, otherwise the library cannot be used to craft an image whose geolocation is provided by a geolocation array. I think I confirmed this works, but IMO should be intentionally tested.
  • A geolocation can include an optional Z_BAND, although I suspect this is ignored by the geolocation array transformer, and exists to provide a mechanism to include elevation for use elsewhere.

rasterio/_warp.pyx Show resolved Hide resolved
tests/test_warp.py Show resolved Hide resolved
@sgillies
Copy link
Member Author

sgillies commented Sep 1, 2023

Scope for 1.4.0: A geoloc array transformer as discussed above, but none of the extra features from your usability notes @pl-kevinwurster. We can add those later if needed.

@sgillies sgillies modified the milestone: 1.4.0 Sep 1, 2023
@pl-kevinwurster
Copy link
Contributor

@sgillies sounds good – thanks!

@sgillies sgillies marked this pull request as ready for review December 22, 2023 02:02
Copy link
Member

@snowman2 snowman2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@sgillies
Copy link
Member Author

Thanks @snowman2. Though I liked the idea of grouping source geolocation parameters into one, it's not a good fit. GDAL's warper can't use our Python transformers. It needs GCPs and RPCs attached to a dataset. I'm against using the classes only as containers.

Last thing I want to do here is add support to calculate_default_transform().

@djhoese
Copy link
Contributor

djhoese commented Dec 22, 2023

Sorry to jump in here, but I found this PR from @snowman2's linking to it in rioxarray issues/PRs. Do I understand this PR and the GDAL RFC 4 correctly that this would allow non-uniformly spaced coordinate arrays in warping/resampling operations? For example, satellite-based instrument data where the spacing of the pixels depends on the scanning pattern of the instrument and geometry of the Earth.

@sgillies sgillies merged commit 163fa83 into main Dec 24, 2023
15 of 16 checks passed
@sgillies sgillies deleted the issue2883 branch December 24, 2023 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for warping with external geolocation arrays
5 participants