Pyresample has had some real growing pains. These have become more and more apparent as Pytroll developers have worked on Satpy where we had more freedom to start fresh and create interfaces that worked for us. That development along with the Pyresample User Survey conducted in 2021 have guided us to a new design for Pyresample that we hope to release as version 2.0.
Below are the various categories of components of Pyresample and how we see them existing. In most of the cases for existing interfaces in Pyresample, we expect things to be backwards compatible or in the extreme cases we want to add new interfaces alongside the existing ones for an easier transition to the new functionality. This will mean some deprecated interfaces so that we can make the user experience more consistent regardless of resampling algorithm or other use case.
You can track the progress of Pyresample 2.0 by following the issues and pull requests on the v2.0 milestone on GitHub.
- Vertical or other higher dimension resampling: This is not a simple problem and with the other changes planned for 2.0 we won't be able to look into this.
- New resampling algorithms: The 2.0 version bump is not meant for new algorithms. We'll keep these more for minor version releases. Version 2.0 is about larger breaking changes.
There are currently X types of user-facing geometry objects in Pyresample:
- ~GridDefinition~
- SwathDefinition
- DynamicAreaDefinition
- AreaDefinition
- StackedAreaDefinition
We've realized that the GridDefinition
is actually a special case of an
AreaDefinition
and therefore it will be deprecated. The other classes,
as far as purpose, are still needed in some sense and will most likely not
go anywhere. Most changes to these classes will be internal for code
cleanliness.
However, we'd like these classes to be easier to create and use. We'd like to focus on what is actually required to work with these objects with the rest of Pyresample. This means separating the "numbers" from the metadata. AreaDefinitions will no longer require a name or other descriptive information. You can provide them, but they won't be required.
Going forward we'd like users to focus on using classmethod's to create these objects. We hope this will provide an easier connection from what information you have to a useable object. For example, instead of:
area = AreaDefinition(name, description, proj_id, projection, width, height, area_extent)
You would do this in pyresample 2.0:
metadata = {"name": name, "description": description} # optional
area = AreaDefinition.from_extent_shape(projection, area_extent, (height, width), metadata)
You'll also be allowed to provide arbitrary metadata to swath definitions and the other geometry types.
Currently there are different interfaces for calling the different resampling options in Pyresample. Sometimes you call a function, sometimes you create a class and call a "resample" method on it, and sometimes if you want finer control you call multiple functions and have to pass things between them. In Pyresample 2.0 we want to get things down to a few consistent interfaces all wrapped up into a series of Resampler classes.
You'll create resampler classes by doing:
from pyresample import create_resampler
resampler = create_resampler(src_geom, dst_geom, resampler='some-resampler-name')
Pyresample 2.0 will maintain a "registry" of available resampler classes that you can refer to by name or get one by default based on the passed geometries. This registry of resamplers will also make it easier for users or third-party libraries to add their own resamplers.
We hope with this basic creation process that we can have more control over what algorithms support what features and let the user know when something isn't allowed early on with clear error messages. For example, what combinations of geometry types are supported by the resampler or what types of arrays (xarray.DataArray, dask, or numpy) can be provided.
Once you have your resampler instance you can resample your data by doing:
new_data = resampler.resample(data, **kwargs)
That's it. There are of course a lot of options hidden in the **kwargs
,
but those will be specific to each algorithm. Our hope is that any
optimizations or conversions that need to happen to get your data resampled
can all be contained in these resampler objects and hopefully require less
from the user.
Alternatively to the .resample
call, users can first call two methods:
resampler.precompute()
new_data = resampler.compute(data, **kwargs)
This precompute
method will perform any computations that can be done
without needing the actual "image" data. You can then call .compute
to do the actual resampling. This separation is important when we start talking
about caching (see below).
One major simplification we're hoping to achieve with Pyresample 2.0 is a
defined set of caching functionality all encapsulated in "Cache" objects.
These objects can be passed to create_resampler
to enable the resampler
to store intermediate computation results for reuse. How and where that storing
is done is up to the specific cache object. It could be in-memory only, or
to zarr datasets on local disk, or to some remote storage.
By calling the .precompute
method, the user will be able to pre-fill this
cache without needing any image data. This will be useful for users using
pyresample in operations where they may want to manually fill the cache before
spawning realtime (time sensitive) processing.
From our survey we learned that a lot of users use the indexes returned by
get_neighbour_info
for their own custom analyses. We recognize this need
and while Cache objects could be written to get the same result, we think
there is a better way. We plan to implement this functionality through a
separate "Index" interface. Like Resamplers, these would provide you a way
of relating a source geometry to a destination geometry. However, these
objects would only be responsible for returning the index information.
We haven't defined the interface for these yet, but hope that having something separate from resamplers will serve more people.
We would like to support pyresample users who use the xarray and dask libraries more. Behind the scenes over the last couple years we've added a lot of dask-based support to pyresample through the Satpy library. We've slowly moved that functionality over to Pyresample and the Resampler objects mentioned above are the first defined interface for that. However, there is still a lot of work to be done to completely take advantage of the parallel nature that dask arrays provide us.
It should also be easier for users with data in xarray DataArray or Dataset objects to access pyresample functionality; even without knowing the metadata that pyresample will need to do some resampling (ex. CRS, extents, etc). Usually that type of information is held in the metadata of the xarray object already. New tools are in development to make this information easier to access; mainly the Geoxarray project. We will be working on Geoxarray and Pyresample to simplify common resampling tasks for xarray users.
The documentation for Pyresample is in need of a lot of love. As Pyresample has grown the documentation hasn't really been restructured to best present the new information it has taken on. We hope that as part of Pyresample 2.0 we can clean out the cobwebs and make it easier to find the information you are looking for.