-
-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regrid operation on grid more accurate and faster without datashader #5387
Comments
Thanks for the issue, I think we actually used to have a regrid implementation similar to yours and I certainly wouldn't object to shipping yours again. I do think there's good reason to provide both versions however. The datashader regrid is statistically accurate, i.e. it uses actual aggregation, this means that if you had outliers those could easily be discovered while a subsampling implementation like yours can easily miss outlier samples. I'd be in favor of making |
Thanks for the reply. I did not know there was a resampling method before. And i aggree that datashader should not be dropped, and would be better on most situation.
Do you have in mind something like : class regrid(AggregationOperation):
...
def _process(self, element, key=None):
if self.p.interpolation is None:
# use classic datashader regrid method
return self._datashader_process(element, key=key)
else:
# use resampling method
return self._resample_process(element, key=key) ? |
@ianthomas23 Do you think that there could be some easy datashader performance improvements here? Do you think this condition is worth having a fast code path for? |
Yes, precisely. |
Doesn't seem worth it, resampling like suggested is a simple reindexing operation in numpy and should be very cheap. |
@jbednar, do you have an opinion on this? In particular, how do you feel about the tradeoff of performance vs API complexity and code maintenance? Maybe we should have a separate operation that strides the array rather than aggregation. It seems likely that many users care a lot about performance and the problem becomes educating them about when to use this approach. |
My vote would be for HoloViews to implement this quick-and-dirty subsampling directly, not via Datashader, and to make it available as a separate operation in HoloViews, and controlled with an argument in hvPlot. As Philipp points out, there's no need for bringing in Datashader's complexity to implement indexing of this type for a regular grid. |
Hi,
For my use case, i often need to plot high-res geographical gridded data, and zooming in and out to see the global data but some precise parts too.
Thanks to datashader and holoviews'
regrid
function, i am able to do this without too much work from myself.But as i'm using it, i see some details which unsatisfy me. For example, we can't see real data, only aggregation or interpolation.
When looking at global data it doesn't matter, but when you zoom, you cannot understand well the data :
code to create figure
I understand the aim of datashader, to create an image from any source of data for visualisation optimisation.
But for this specific type of data, a regular grid, we should not need datashader to build a lighweight image send to the plot.
For example, i was thinking of using indices and isel to select only the wanted data, thus not using datashader.
Something like that :
code to create figure
Test on the 21600x10800 grid :
Benchmarks
If we look at benchmark, i have found that :
regrid
(~130ms / 10ms)regrid
take ~ the same time to computecode used to benchmark
I don't know if holoviews would benefit from this kind of class / function.
I can put this in a merge request if needed and wanted.
Thanks
The text was updated successfully, but these errors were encountered: