New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added datashader regridding operation #1773

Merged
merged 14 commits into from Aug 4, 2017

Conversation

Projects
None yet
4 participants
@philippjfr
Member

philippjfr commented Aug 1, 2017

As suggested and outlined in #1552 this PR adds a regrid operation that allows dynamically downsampling and upsampling HoloViews Image, RGB and HSV types to a specified x_range, y_range, width and height. It closely mirrors the aggregate operation but operates on Images instead. I introduced a common baseclass for the two operations and have started using it in various projects. It will require some further cleanup, decisions on naming and documentation but is now fully functional.

@jlstevens jlstevens added the WIP label Aug 1, 2017

@philippjfr philippjfr added the feature label Aug 4, 2017

@philippjfr philippjfr removed the WIP label Aug 4, 2017

@philippjfr

This comment has been minimized.

Member

philippjfr commented Aug 4, 2017

@jlstevens @jbednar I've now finished the code changes to this PR leaving decisions on naming and docstrings. I've included various changes to the aggregate implementation for compatibility with the most recent changes in datashader. All the changes should be backward compatible. I've also added unit tests for regridding.

The main naming decisions to make are about the aggregator and interpolation parameters which correspond to the downsample and upsample methods in gridtools respectively.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Aug 4, 2017

@jlstevens Any idea how I can make the raster regridding unit tests dependent on a particular version of datashader? They're failing now because it's still pulling datashader 0.5.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Aug 4, 2017

You could try doing a version check in the test class setUp method and raising SkipTest if it isn't a version it should be testing against. Not entirely sure that would work but seems worth trying...

@jbednar

Looks good; mainly commenting on docstrings.

dynamic = param.Boolean(default=True, doc="""
Enables dynamic processing by default.""")
expand = param.Boolean(default=True, doc="""
Whether the x_range and y_range should be allowed to expand
beyond the extent of the data.""")

This comment has been minimized.

@jbednar

jbednar Aug 4, 2017

Member

Maybe this docstring could be, ahem, expanded, with some of the pros and cons. My guess is:

Whether the x_range and y_range should be allowed to expand
beyond the extent of the data.  Setting this value to True is useful
for the case where you want to ensure a certain size of output 
grid, e.g. if you are doing masking or other arithmetic on the
grids.  A value of False ensures that the grid is only just as 
large as it needs to be to contain the data, which will be faster
and use less memory if the resulting aggregate is being overlaid 
on a much larger background.""")

If that description is accurate, maybe it should be False by default?

This comment has been minimized.

@philippjfr

philippjfr Aug 4, 2017

Member

Yes, that is accurate. I agree that making it False might be useful as that's usually what you want for visualization purposes, True is mostly needed when you want to match grid sizes for computation.

This comment has been minimized.

@jbednar

jbednar Aug 4, 2017

Member

Ok, False then; HV should be optimized for viz, I think.

This comment has been minimized.

@philippjfr

philippjfr Aug 4, 2017

Member

Actually there's a good reason not to enable it for aggregate. Computing the actual range is expensive when you have to iterate over all the datapoints and it doesn't actually gain you anything except making sure that pixels aren't wasted on regions where there is no data. I could set it to True for regrid only, but it might be better to be consistent.

This comment has been minimized.

@jbednar

jbednar Aug 4, 2017

Member

I think it should be True for regrid only; gridded data has very clear bounds and it's probably more confusing if we don't respect those, and definitely slower.

This comment has been minimized.

@philippjfr

philippjfr Aug 4, 2017

Member

and definitely slower.

Not necessarily true, it just changes whether the pixels are crammed into the bounds of the image or whether they're wasted on the area beyond the original image bounds. If you're using first/last aggregators that's true though.

By default, the link_inputs parameter is set to True so that
when applying shade, backends that support linked streams
update RangeXY streams on the inputs of the shade operation.""")

This comment has been minimized.

@jbednar

jbednar Aug 4, 2017

Member

Maybe expand to say why one might want to set it to False? E.g. if you reuse objects in different cells of a notebook and find them inappropriately linked?

This comment has been minimized.

@philippjfr

philippjfr Aug 4, 2017

Member

Will do.

This comment has been minimized.

@jlstevens

jlstevens Aug 4, 2017

Member

Agree with Jim's suggestion here. Doesn't look like it has been updated yet...

This comment has been minimized.

@jlstevens

jlstevens Aug 4, 2017

Member

I think we settled for ResamplingOperation.

This comment has been minimized.

@jbednar

jbednar Aug 4, 2017

Member

Looks like it's been updated below; maybe it was supposed to be done here?

This comment has been minimized.

@philippjfr

philippjfr Aug 4, 2017

Member

Yes, meant to add it everywhere.

the x_range and y_range. If x_sampling or y_sampling are supplied
the operation will ensure that a bin is no smaller than the minimum
sampling distance by reducing the width and height when the zoomed
in beyond the minimum sampling distance.

This comment has been minimized.

@jbednar

jbednar Aug 4, 2017

Member

when zoomed

else:
layers = {}
for c in agg.coords[column].data:
cagg = agg.sel(**{column: c})
layers[c] = self.p.element_type((xs, ys, cagg.data), **params)
eldata = cagg if ds_version > '0.5.0' else (xs, ys, cagg.data)
layers[c] = self.p.element_type(eldata, **params)
return NdOverlay(layers, kdims=[data.get_dimension(column)])

This comment has been minimized.

@jbednar

jbednar Aug 4, 2017

Member

Hopefully once there is a new ds release we can remove this bit and simply require ds>=0.6.0.

upsample = param.Boolean(default=True, doc="""
Whether to allow upsampling if the source array is smaller than
the requested array.""")

This comment has been minimized.

@jbednar

jbednar Aug 4, 2017

Member

Should this be False by default since interpolation is nearest by default, and thus the result will be approximately the same whether or not upsampling is done? With linear interpolation the results will be different, but one could say in the docstring for the interpolation parameter that one might want upsample=False for interpolation methods that smooth the data, unlike nearest.

This comment has been minimized.

@philippjfr

philippjfr Aug 4, 2017

Member

Yes, I'd agree with that, upsampling is rarely needed I think, again only when you're trying to match a higher resolution target gridding (in which case you should probably downsample the higher resolution grid rather than upsampling the lower resolution one).

@jbednar

This comment has been minimized.

Member

jbednar commented Aug 4, 2017

The main naming decisions to make are about the aggregator and interpolation parameters which correspond to the downsample and upsample methods in gridtools respectively.

I'm fine with those names.

@@ -133,7 +139,7 @@ class aggregate(resample_operation):
the x_range and y_range. If x_sampling or y_sampling are supplied
the operation will ensure that a bin is no smaller than the minimum
sampling distance by reducing the width and height when the zoomed
in beyond the minimum sampling distance.
beyond the minimum sampling distance.

This comment has been minimized.

@jbednar

jbednar Aug 4, 2017

Member

when zoomed in beyond

@jbednar

This comment has been minimized.

Member

jbednar commented Aug 4, 2017

Ready to merge? Looks good to me.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Aug 4, 2017

I'll be able to review this PR shortly. Don't merge till then!

aggregator = param.ClassSelector(class_=ds.reductions.Reduction,
default=ds.count())
class resample_operation(Operation):

This comment has been minimized.

@jlstevens

jlstevens Aug 4, 2017

Member

Why is it called resample_operation and not just resample?

This comment has been minimized.

@jlstevens

jlstevens Aug 4, 2017

Member

Is this an abstract base class or a usable one?

This comment has been minimized.

@philippjfr

philippjfr Aug 4, 2017

Member

Abstract, doesn't have a _process

This comment has been minimized.

@jlstevens

jlstevens Aug 4, 2017

Member

My issue here is that the name does not sound like an abstract class. I think it can have a simple name and should simply be hidden from users when importing.

This comment has been minimized.

@philippjfr

philippjfr Aug 4, 2017

Member

What's your suggestion?

This comment has been minimized.

@jbednar

jbednar Aug 4, 2017

Member

ResampleOperation?

if self.p.x_sampling:
width = int(min([(xspan/self.p.x_sampling), width]))
if self.p.y_sampling:
height = int(min([(yspan/self.p.y_sampling), height]))

This comment has been minimized.

@jlstevens

jlstevens Aug 4, 2017

Member

I think you need to be careful here - I don't see why xspan and self.p.x_sampling can't both be integers on Python 2 causing potential integer division issues. Same when computing height.

This comment has been minimized.

@philippjfr

philippjfr Aug 4, 2017

Member

Sure will add the __future__ import.

# Disable upsampling if requested
(xstart, xend), (ystart, yend) = (x_range, y_range)
xspan, yspan = (xend-xstart), (yend-ystart)
if not self.p.upsample and self.p.target is None:

This comment has been minimized.

@jlstevens

jlstevens Aug 4, 2017

Member

Shouldn't there at least be a warning to say upsampling has been disabled?

This comment has been minimized.

@philippjfr

philippjfr Aug 4, 2017

Member

Huh why? It's a parameter.

This comment has been minimized.

@jbednar

jbednar Aug 4, 2017

Member

Seems like the parameter should be respected as set, with no warning.

This comment has been minimized.

@jlstevens

jlstevens Aug 4, 2017

Member

In that case the docstring needs improvement:

"Whether to allow upsampling if the source array is smaller than the requested array."

What is the behavior if upsample=False? Padding?

This comment has been minimized.

@jlstevens

jlstevens Aug 4, 2017

Member

Ok, Philipp's now explained it to me. The behavior makes sense though I think there could be a bit more clarification in the docstrings.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Aug 4, 2017

Looks good other than a few minor comments I made. Happy to see this PR merged once those things are addressed.

just as large as it needs to be to contain the data, which will
be faster and use less memory if the resulting aggregate is
being overlaid on a much larger background.""")
height = param.Integer(default=400, doc="""
The height of the aggregated image in pixels.""")

This comment has been minimized.

@jlstevens

jlstevens Aug 4, 2017

Member

Isn't this height parameter used by regrid? I'm not sure it is clear to talk about an 'aggregated' image in that case. Maybe just talk about the 'output' image?

This comment has been minimized.

@philippjfr

philippjfr Aug 4, 2017

Member

True, although in aggregation is still the correct term at least when downsampling. Will fix it anyway.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Aug 4, 2017

Looks good and the unit tests have passed. Merging.

@jlstevens jlstevens merged commit 95691a4 into master Aug 4, 2017

1 of 3 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details
s3-reference-data-cache Test data is cached.
Details

ea42gh added a commit to ea42gh/holoviews that referenced this pull request Aug 12, 2017

@jlstevens jlstevens added this to the 1.8.3 milestone Aug 21, 2017

@jlstevens jlstevens deleted the datashader_regridding branch Aug 21, 2017

@jakirkham

This comment has been minimized.

Contributor

jakirkham commented Nov 2, 2017

So GitHub makes it look like this is in v1.8.3 and v1.8.4, but went to import it and it wasn't. Then I realized it had been removed from those releases ( #1884 ). Maybe GitHub is confused. Anyways when is this planned to land in a release?

@philippjfr

This comment has been minimized.

Member

philippjfr commented Nov 2, 2017

Anyways when is this planned to land in a release?

Today :-)

@jakirkham

This comment has been minimized.

Contributor

jakirkham commented Nov 2, 2017

Well isn't it my lucky day. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment