Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added datashader regridding operation #1773

Merged
merged 14 commits into from Aug 4, 2017
Merged

Conversation

philippjfr
Copy link
Member

@philippjfr philippjfr commented Aug 1, 2017

As suggested and outlined in #1552 this PR adds a regrid operation that allows dynamically downsampling and upsampling HoloViews Image, RGB and HSV types to a specified x_range, y_range, width and height. It closely mirrors the aggregate operation but operates on Images instead. I introduced a common baseclass for the two operations and have started using it in various projects. It will require some further cleanup, decisions on naming and documentation but is now fully functional.

@philippjfr philippjfr added the type: feature A major new feature label Aug 4, 2017
@philippjfr
Copy link
Member Author

@jlstevens @jbednar I've now finished the code changes to this PR leaving decisions on naming and docstrings. I've included various changes to the aggregate implementation for compatibility with the most recent changes in datashader. All the changes should be backward compatible. I've also added unit tests for regridding.

The main naming decisions to make are about the aggregator and interpolation parameters which correspond to the downsample and upsample methods in gridtools respectively.

@philippjfr
Copy link
Member Author

@jlstevens Any idea how I can make the raster regridding unit tests dependent on a particular version of datashader? They're failing now because it's still pulling datashader 0.5.

@jlstevens
Copy link
Contributor

You could try doing a version check in the test class setUp method and raising SkipTest if it isn't a version it should be testing against. Not entirely sure that would work but seems worth trying...

Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; mainly commenting on docstrings.


dynamic = param.Boolean(default=True, doc="""
Enables dynamic processing by default.""")

expand = param.Boolean(default=True, doc="""
Whether the x_range and y_range should be allowed to expand
beyond the extent of the data.""")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this docstring could be, ahem, expanded, with some of the pros and cons. My guess is:

Whether the x_range and y_range should be allowed to expand
beyond the extent of the data.  Setting this value to True is useful
for the case where you want to ensure a certain size of output 
grid, e.g. if you are doing masking or other arithmetic on the
grids.  A value of False ensures that the grid is only just as 
large as it needs to be to contain the data, which will be faster
and use less memory if the resulting aggregate is being overlaid 
on a much larger background.""")

If that description is accurate, maybe it should be False by default?

Copy link
Member Author

@philippjfr philippjfr Aug 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is accurate. I agree that making it False might be useful as that's usually what you want for visualization purposes, True is mostly needed when you want to match grid sizes for computation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, False then; HV should be optimized for viz, I think.

Copy link
Member Author

@philippjfr philippjfr Aug 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there's a good reason not to enable it for aggregate. Computing the actual range is expensive when you have to iterate over all the datapoints and it doesn't actually gain you anything except making sure that pixels aren't wasted on regions where there is no data. I could set it to True for regrid only, but it might be better to be consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be True for regrid only; gridded data has very clear bounds and it's probably more confusing if we don't respect those, and definitely slower.

Copy link
Member Author

@philippjfr philippjfr Aug 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and definitely slower.

Not necessarily true, it just changes whether the pixels are crammed into the bounds of the image or whether they're wasted on the area beyond the original image bounds. If you're using first/last aggregators that's true though.

By default, the link_inputs parameter is set to True so that
when applying shade, backends that support linked streams
update RangeXY streams on the inputs of the shade operation.""")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe expand to say why one might want to set it to False? E.g. if you reuse objects in different cells of a notebook and find them inappropriately linked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Jim's suggestion here. Doesn't look like it has been updated yet...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we settled for ResamplingOperation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's been updated below; maybe it was supposed to be done here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, meant to add it everywhere.

the x_range and y_range. If x_sampling or y_sampling are supplied
the operation will ensure that a bin is no smaller than the minimum
sampling distance by reducing the width and height when the zoomed
in beyond the minimum sampling distance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when zoomed

else:
layers = {}
for c in agg.coords[column].data:
cagg = agg.sel(**{column: c})
layers[c] = self.p.element_type((xs, ys, cagg.data), **params)
eldata = cagg if ds_version > '0.5.0' else (xs, ys, cagg.data)
layers[c] = self.p.element_type(eldata, **params)
return NdOverlay(layers, kdims=[data.get_dimension(column)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully once there is a new ds release we can remove this bit and simply require ds>=0.6.0.


upsample = param.Boolean(default=True, doc="""
Whether to allow upsampling if the source array is smaller than
the requested array.""")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be False by default since interpolation is nearest by default, and thus the result will be approximately the same whether or not upsampling is done? With linear interpolation the results will be different, but one could say in the docstring for the interpolation parameter that one might want upsample=False for interpolation methods that smooth the data, unlike nearest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd agree with that, upsampling is rarely needed I think, again only when you're trying to match a higher resolution target gridding (in which case you should probably downsample the higher resolution grid rather than upsampling the lower resolution one).

@jbednar
Copy link
Member

jbednar commented Aug 4, 2017

The main naming decisions to make are about the aggregator and interpolation parameters which correspond to the downsample and upsample methods in gridtools respectively.

I'm fine with those names.

@@ -133,7 +139,7 @@ class aggregate(resample_operation):
the x_range and y_range. If x_sampling or y_sampling are supplied
the operation will ensure that a bin is no smaller than the minimum
sampling distance by reducing the width and height when the zoomed
in beyond the minimum sampling distance.
beyond the minimum sampling distance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when zoomed in beyond

@jbednar
Copy link
Member

jbednar commented Aug 4, 2017

Ready to merge? Looks good to me.

@jlstevens
Copy link
Contributor

I'll be able to review this PR shortly. Don't merge till then!


aggregator = param.ClassSelector(class_=ds.reductions.Reduction,
default=ds.count())
class resample_operation(Operation):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it called resample_operation and not just resample?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an abstract base class or a usable one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstract, doesn't have a _process

Copy link
Contributor

@jlstevens jlstevens Aug 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My issue here is that the name does not sound like an abstract class. I think it can have a simple name and should simply be hidden from users when importing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResampleOperation?

if self.p.x_sampling:
width = int(min([(xspan/self.p.x_sampling), width]))
if self.p.y_sampling:
height = int(min([(yspan/self.p.y_sampling), height]))
Copy link
Contributor

@jlstevens jlstevens Aug 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to be careful here - I don't see why xspan and self.p.x_sampling can't both be integers on Python 2 causing potential integer division issues. Same when computing height.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will add the __future__ import.

# Disable upsampling if requested
(xstart, xend), (ystart, yend) = (x_range, y_range)
xspan, yspan = (xend-xstart), (yend-ystart)
if not self.p.upsample and self.p.target is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there at least be a warning to say upsampling has been disabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh why? It's a parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the parameter should be respected as set, with no warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case the docstring needs improvement:

"Whether to allow upsampling if the source array is smaller than the requested array."

What is the behavior if upsample=False? Padding?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, Philipp's now explained it to me. The behavior makes sense though I think there could be a bit more clarification in the docstrings.

@jlstevens
Copy link
Contributor

Looks good other than a few minor comments I made. Happy to see this PR merged once those things are addressed.

just as large as it needs to be to contain the data, which will
be faster and use less memory if the resulting aggregate is
being overlaid on a much larger background.""")

height = param.Integer(default=400, doc="""
The height of the aggregated image in pixels.""")
Copy link
Contributor

@jlstevens jlstevens Aug 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this height parameter used by regrid? I'm not sure it is clear to talk about an 'aggregated' image in that case. Maybe just talk about the 'output' image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, although in aggregation is still the correct term at least when downsampling. Will fix it anyway.

@jlstevens
Copy link
Contributor

Looks good and the unit tests have passed. Merging.

@jlstevens jlstevens merged commit 95691a4 into master Aug 4, 2017
ea42gh pushed a commit to ea42gh/holoviews that referenced this pull request Aug 12, 2017
@jlstevens jlstevens added this to the 1.8.3 milestone Aug 21, 2017
@jlstevens jlstevens deleted the datashader_regridding branch August 21, 2017 20:58
@jakirkham
Copy link
Contributor

So GitHub makes it look like this is in v1.8.3 and v1.8.4, but went to import it and it wasn't. Then I realized it had been removed from those releases ( #1884 ). Maybe GitHub is confused. Anyways when is this planned to land in a release?

@philippjfr
Copy link
Member Author

Anyways when is this planned to land in a release?

Today :-)

@jakirkham
Copy link
Contributor

Well isn't it my lucky day. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature A major new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants