regrid operation on grid more accurate and faster without datashader #5387

ludwigVonKoopa · 2022-08-12T13:21:30Z

Hi,

For my use case, i often need to plot high-res geographical gridded data, and zooming in and out to see the global data but some precise parts too.

Thanks to datashader and holoviews' regrid function, i am able to do this without too much work from myself.

But as i'm using it, i see some details which unsatisfy me. For example, we can't see real data, only aggregation or interpolation.
When looking at global data it doesn't matter, but when you zoom, you cannot understand well the data :

code to create figure

example of file free to use
!wget https://www.unidata.ucar.edu/software/netcdf/examples/smith_sandwell_topo_v8_2.nc

import xarray as xr
import hvplot.xarray
import holoviews as hv
import numpy as np

ds = xr.open_dataset("smith_sandwell_topo_v8_2.nc")
ds["longitude"] = ((ds.longitude + 180 ) % 360 ) -180
ds = ds.sortby("longitude").load()

# interp latitudes from mercator grid to plateCarree grid
lat, lon = ds.latitude, ds.longitude
ds = ds.interp(
    latitude= np.linspace(lat.min(), lat.max(), 180*60), 
    longitude=np.linspace(lon.min(), lon.max(), 360*60), 
    method="cubic"
)

from holoviews.operation.datashader import regrid

plot = ds.ROSE[::60, ::60].T.hvplot(x="longitude", y="latitude")
regrid(plot, interpolation="linear").options(width=600, title="regrid") + plot.options(width=600, title="raw")

I understand the aim of datashader, to create an image from any source of data for visualisation optimisation.
But for this specific type of data, a regular grid, we should not need datashader to build a lighweight image send to the plot.

For example, i was thinking of using indices and isel to select only the wanted data, thus not using datashader.
Something like that :

import param
from holoviews.core import Operation
from holoviews.streams import RangeXY
from holoviews.core.data import XArrayInterface

class regrid_resample(Operation):
    """regrid the data with slices.

    no computation is done on the data, except for indices searching.
    """

    width  = param.Integer(default=800, doc="Maximum number of data on x dimension. If there is more, it slice")
    height = param.Integer(default=400, doc="Maximum number of data on y dimension. If there is more, it slice")

    streams = param.ClassSelector(default=[RangeXY], class_=(dict, list))
    x_range  = param.NumericTuple(default=None, length=2)
    y_range  = param.NumericTuple(default=None, length=2)


    def _process(self, element, key=None):
        xmax = self.p.width
        ymax = self.p.height

        # create coords for image
        x, y = element.kdims
        dims = [y.name, x.name]

        coords = tuple(element.dimension_values(d, expanded=False) for d in [x, y])
        coord_dict = {x.name: coords[0], y.name: coords[1]}

        # shape of the image
        imax, jmax = coords[0].size, coords[1].size

        # coords of the image.
        #   needed for the first initialisation, where RangeXY return None
        x0 = float(coords[0][0])
        x1 = float(coords[0][-1])
        y0 = float(coords[1][0])
        y1 = float(coords[1][-1])

        # step of image coordinates
        x_step = float(coords[0][1] - x0)
        y_step = float(coords[1][1] - y0)

        # display limits
        xstart, xend = self.p.x_range if self.p.x_range else (x0, x1)
        ystart, yend = self.p.y_range if self.p.y_range else (y0, y1)

        # computing slice indices
        #   we should not have a too big image. If it is, we crop data
        #   take +-2 more indices to always have a hole image display, and avoid no data at the edge
        i0 = int(max(0, (xstart - x0)/x_step -2))
        i1 = int(min(imax, (xend - x0)/x_step))+2
        i_step = int(max(1, (i1 - i0) / xmax))


        j0 = int(max(0, (ystart - y0)/y_step -2))
        j1 = int(min(jmax, (yend - y0)/y_step))+2
        j_step = int(max(1, (j1 - j0) / ymax))


        # constructing xarray Dataset
        arrays = {}
        for i, vd in enumerate(element.vdims):
            if element.interface is XArrayInterface:
                xarr = element.data[vd.name]
            else:
                arr = element.dimension_values(vd, flat=False)
                xarr = xr.DataArray(arr, coords=coord_dict, dims=dims)
            arrays[vd.name] = xarr

        # applying slice indices
        regridded = xr.Dataset(arrays).isel(**{dims[0]: slice(j0, j1, j_step), dims[1]: slice(i0, i1, i_step)})

        cloned = element.clone(regridded, datatype=['xarray'] + element.datatype, shared_data=False, bounds=None)
        return cloned

code to create figure

plot = ds.ROSE.T.hvplot(x="longitude", y="latitude", data_aspect=2)

opts_kwargs  = dict(width=300, clim=(-7000, 7000), cmap="RdBu_r")
param_kwargs = dict(width=400, height=400)

plot1 = regrid_resample(plot, **param_kwargs).options(title="resample class", **opts_kwargs)
plot2 = regrid(plot, interpolation="linear", **param_kwargs).options(title="original regrid class", **opts_kwargs)

full_plot = plot1 + plot2
full_plot

Test on the 21600x10800 grid :

Benchmarks

If we look at benchmark, i have found that :

when no zooming, resampling method is far faster than regrid (~130ms / 10ms)
the more you zoom, the less difference there is. regrid take ~ the same time to compute

code used to benchmark

import time

class regrid_with_print(regrid):
    def _process(self, element, key=None):
        _timer = time.time()
        res = super()._process(element, key=key)
        print(f"regrid   : {(time.time()-_timer)*1e3:7.2f}ms")
        return res

class resample_with_print(regrid_resample):
    def _process(self, element, key=None):
        _timer = time.time()
        res = super()._process(element, key=key)
        print(f"resample : {(time.time()-_timer)*1e3:7.2f}ms")
        return res

I don't know if holoviews would benefit from this kind of class / function.
I can put this in a merge request if needed and wanted.

Thanks

The text was updated successfully, but these errors were encountered:

philippjfr · 2022-08-12T13:32:06Z

Thanks for the issue, I think we actually used to have a regrid implementation similar to yours and I certainly wouldn't object to shipping yours again. I do think there's good reason to provide both versions however. The datashader regrid is statistically accurate, i.e. it uses actual aggregation, this means that if you had outliers those could easily be discovered while a subsampling implementation like yours can easily miss outlier samples. I'd be in favor of making regrid a composite operation that can use either subsampling or statistical aggregation (via datashader).

ludwigVonKoopa · 2022-08-12T13:51:29Z

Thanks for the reply.

I did not know there was a resampling method before. And i aggree that datashader should not be dropped, and would be better on most situation.

I'd be in favor of making regrid a composite operation that can use either subsampling or statistical aggregation (via datashader).

Do you have in mind something like :

class regrid(AggregationOperation):
    ...
    
    def _process(self, element, key=None):
        if self.p.interpolation is None:
            # use classic datashader regrid method
            return self._datashader_process(element, key=key)
        else:
            # use resampling method
            return self._resample_process(element, key=key)

?

hoxbro · 2022-08-22T16:03:06Z

@ianthomas23 Do you think that there could be some easy datashader performance improvements here? Do you think this condition is worth having a fast code path for?

philippjfr · 2022-08-22T16:13:58Z

Do you have in mind something like :

Yes, precisely.

philippjfr · 2022-08-22T16:14:52Z

Do you think that there could be some easy datashader performance improvements here? Do you think this condition is worth having a fast code path for?

Doesn't seem worth it, resampling like suggested is a simple reindexing operation in numpy and should be very cheap.

droumis · 2022-09-19T16:26:17Z

@jbednar, do you have an opinion on this? In particular, how do you feel about the tradeoff of performance vs API complexity and code maintenance?

Maybe we should have a separate operation that strides the array rather than aggregation. It seems likely that many users care a lot about performance and the problem becomes educating them about when to use this approach.

jbednar · 2022-09-20T02:55:35Z

My vote would be for HoloViews to implement this quick-and-dirty subsampling directly, not via Datashader, and to make it available as a separate operation in HoloViews, and controlled with an argument in hvPlot. As Philipp points out, there's no need for bringing in Datashader's complexity to implement indexing of this type for a regular grid.

hoxbro assigned ianthomas23 Aug 22, 2022

droumis added the type: enhancement Minor feature or improvement to an existing feature label Sep 19, 2022

droumis added this to the Wishlist milestone Sep 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regrid operation on grid more accurate and faster without datashader #5387

regrid operation on grid more accurate and faster without datashader #5387

ludwigVonKoopa commented Aug 12, 2022

philippjfr commented Aug 12, 2022

ludwigVonKoopa commented Aug 12, 2022 •

edited

Loading

hoxbro commented Aug 22, 2022

philippjfr commented Aug 22, 2022

philippjfr commented Aug 22, 2022

droumis commented Sep 19, 2022

jbednar commented Sep 20, 2022

regrid operation on grid more accurate and faster without datashader #5387

regrid operation on grid more accurate and faster without datashader #5387

Comments

ludwigVonKoopa commented Aug 12, 2022

Benchmarks

philippjfr commented Aug 12, 2022

ludwigVonKoopa commented Aug 12, 2022 • edited Loading

hoxbro commented Aug 22, 2022

philippjfr commented Aug 22, 2022

philippjfr commented Aug 22, 2022

droumis commented Sep 19, 2022

jbednar commented Sep 20, 2022

ludwigVonKoopa commented Aug 12, 2022 •

edited

Loading