Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regrid operation on grid more accurate and faster without datashader #5387

Open
ludwigVonKoopa opened this issue Aug 12, 2022 · 7 comments
Open
Assignees
Labels
type: enhancement Minor feature or improvement to an existing feature
Milestone

Comments

@ludwigVonKoopa
Copy link

Hi,

For my use case, i often need to plot high-res geographical gridded data, and zooming in and out to see the global data but some precise parts too.

Thanks to datashader and holoviews' regrid function, i am able to do this without too much work from myself.

But as i'm using it, i see some details which unsatisfy me. For example, we can't see real data, only aggregation or interpolation.
When looking at global data it doesn't matter, but when you zoom, you cannot understand well the data :

code to create figure
example of file free to use
!wget https://www.unidata.ucar.edu/software/netcdf/examples/smith_sandwell_topo_v8_2.nc

import xarray as xr
import hvplot.xarray
import holoviews as hv
import numpy as np

ds = xr.open_dataset("smith_sandwell_topo_v8_2.nc")
ds["longitude"] = ((ds.longitude + 180 ) % 360 ) -180
ds = ds.sortby("longitude").load()

# interp latitudes from mercator grid to plateCarree grid
lat, lon = ds.latitude, ds.longitude
ds = ds.interp(
    latitude= np.linspace(lat.min(), lat.max(), 180*60), 
    longitude=np.linspace(lon.min(), lon.max(), 360*60), 
    method="cubic"
)

image

from holoviews.operation.datashader import regrid

plot = ds.ROSE[::60, ::60].T.hvplot(x="longitude", y="latitude")
regrid(plot, interpolation="linear").options(width=600, title="regrid") + plot.options(width=600, title="raw")

holoviews2_resize

I understand the aim of datashader, to create an image from any source of data for visualisation optimisation.
But for this specific type of data, a regular grid, we should not need datashader to build a lighweight image send to the plot.

For example, i was thinking of using indices and isel to select only the wanted data, thus not using datashader.
Something like that :

import param
from holoviews.core import Operation
from holoviews.streams import RangeXY
from holoviews.core.data import XArrayInterface

class regrid_resample(Operation):
    """regrid the data with slices.

    no computation is done on the data, except for indices searching.
    """

    width  = param.Integer(default=800, doc="Maximum number of data on x dimension. If there is more, it slice")
    height = param.Integer(default=400, doc="Maximum number of data on y dimension. If there is more, it slice")

    streams = param.ClassSelector(default=[RangeXY], class_=(dict, list))
    x_range  = param.NumericTuple(default=None, length=2)
    y_range  = param.NumericTuple(default=None, length=2)


    def _process(self, element, key=None):
        xmax = self.p.width
        ymax = self.p.height

        # create coords for image
        x, y = element.kdims
        dims = [y.name, x.name]

        coords = tuple(element.dimension_values(d, expanded=False) for d in [x, y])
        coord_dict = {x.name: coords[0], y.name: coords[1]}

        # shape of the image
        imax, jmax = coords[0].size, coords[1].size

        # coords of the image.
        #   needed for the first initialisation, where RangeXY return None
        x0 = float(coords[0][0])
        x1 = float(coords[0][-1])
        y0 = float(coords[1][0])
        y1 = float(coords[1][-1])

        # step of image coordinates
        x_step = float(coords[0][1] - x0)
        y_step = float(coords[1][1] - y0)

        # display limits
        xstart, xend = self.p.x_range if self.p.x_range else (x0, x1)
        ystart, yend = self.p.y_range if self.p.y_range else (y0, y1)

        # computing slice indices
        #   we should not have a too big image. If it is, we crop data
        #   take +-2 more indices to always have a hole image display, and avoid no data at the edge
        i0 = int(max(0, (xstart - x0)/x_step -2))
        i1 = int(min(imax, (xend - x0)/x_step))+2
        i_step = int(max(1, (i1 - i0) / xmax))


        j0 = int(max(0, (ystart - y0)/y_step -2))
        j1 = int(min(jmax, (yend - y0)/y_step))+2
        j_step = int(max(1, (j1 - j0) / ymax))


        # constructing xarray Dataset
        arrays = {}
        for i, vd in enumerate(element.vdims):
            if element.interface is XArrayInterface:
                xarr = element.data[vd.name]
            else:
                arr = element.dimension_values(vd, flat=False)
                xarr = xr.DataArray(arr, coords=coord_dict, dims=dims)
            arrays[vd.name] = xarr

        # applying slice indices
        regridded = xr.Dataset(arrays).isel(**{dims[0]: slice(j0, j1, j_step), dims[1]: slice(i0, i1, i_step)})

        cloned = element.clone(regridded, datatype=['xarray'] + element.datatype, shared_data=False, bounds=None)
        return cloned
code to create figure
plot = ds.ROSE.T.hvplot(x="longitude", y="latitude", data_aspect=2)

opts_kwargs  = dict(width=300, clim=(-7000, 7000), cmap="RdBu_r")
param_kwargs = dict(width=400, height=400)

plot1 = regrid_resample(plot, **param_kwargs).options(title="resample class", **opts_kwargs)
plot2 = regrid(plot, interpolation="linear", **param_kwargs).options(title="original regrid class", **opts_kwargs)

full_plot = plot1 + plot2
full_plot

Test on the 21600x10800 grid :

holoviews3_resize

Benchmarks

If we look at benchmark, i have found that :

  • when no zooming, resampling method is far faster than regrid (~130ms / 10ms)
  • the more you zoom, the less difference there is. regrid take ~ the same time to compute
code used to benchmark
import time

class regrid_with_print(regrid):
    def _process(self, element, key=None):
        _timer = time.time()
        res = super()._process(element, key=key)
        print(f"regrid   : {(time.time()-_timer)*1e3:7.2f}ms")
        return res

class resample_with_print(regrid_resample):
    def _process(self, element, key=None):
        _timer = time.time()
        res = super()._process(element, key=key)
        print(f"resample : {(time.time()-_timer)*1e3:7.2f}ms")
        return res

holoviews4_resize

I don't know if holoviews would benefit from this kind of class / function.
I can put this in a merge request if needed and wanted.

Thanks

@philippjfr
Copy link
Member

Thanks for the issue, I think we actually used to have a regrid implementation similar to yours and I certainly wouldn't object to shipping yours again. I do think there's good reason to provide both versions however. The datashader regrid is statistically accurate, i.e. it uses actual aggregation, this means that if you had outliers those could easily be discovered while a subsampling implementation like yours can easily miss outlier samples. I'd be in favor of making regrid a composite operation that can use either subsampling or statistical aggregation (via datashader).

@ludwigVonKoopa
Copy link
Author

ludwigVonKoopa commented Aug 12, 2022

Thanks for the reply.

I did not know there was a resampling method before. And i aggree that datashader should not be dropped, and would be better on most situation.

I'd be in favor of making regrid a composite operation that can use either subsampling or statistical aggregation (via datashader).

Do you have in mind something like :

class regrid(AggregationOperation):
    ...
    
    def _process(self, element, key=None):
        if self.p.interpolation is None:
            # use classic datashader regrid method
            return self._datashader_process(element, key=key)
        else:
            # use resampling method
            return self._resample_process(element, key=key)

?

@hoxbro
Copy link
Member

hoxbro commented Aug 22, 2022

@ianthomas23 Do you think that there could be some easy datashader performance improvements here? Do you think this condition is worth having a fast code path for?

@philippjfr
Copy link
Member

Do you have in mind something like :

Yes, precisely.

@philippjfr
Copy link
Member

Do you think that there could be some easy datashader performance improvements here? Do you think this condition is worth having a fast code path for?

Doesn't seem worth it, resampling like suggested is a simple reindexing operation in numpy and should be very cheap.

@droumis
Copy link
Member

droumis commented Sep 19, 2022

@jbednar, do you have an opinion on this? In particular, how do you feel about the tradeoff of performance vs API complexity and code maintenance?

Maybe we should have a separate operation that strides the array rather than aggregation. It seems likely that many users care a lot about performance and the problem becomes educating them about when to use this approach.

@droumis droumis added the type: enhancement Minor feature or improvement to an existing feature label Sep 19, 2022
@droumis droumis added this to the Wishlist milestone Sep 19, 2022
@jbednar
Copy link
Member

jbednar commented Sep 20, 2022

My vote would be for HoloViews to implement this quick-and-dirty subsampling directly, not via Datashader, and to make it available as a separate operation in HoloViews, and controlled with an argument in hvPlot. As Philipp points out, there's no need for bringing in Datashader's complexity to implement indexing of this type for a regular grid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Minor feature or improvement to an existing feature
Projects
None yet
Development

No branches or pull requests

6 participants