New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Datashader operations #894

Merged
merged 23 commits into from Oct 5, 2016

Conversation

Projects
None yet
3 participants
@philippjfr
Member

philippjfr commented Oct 4, 2016

As suggested in #812 this adds datashader operations to HoloViews, making it trivial to aggregate any Element type into an Image or stack of images and then shade it. The PR currently adds two main operations:

  • Aggregate: Takes any 2D dataset with or without values and generates a GridImage aggregate wrapping an xarray DataArray.
  • Shade: Takes the output of an aggregate and applies color interpolation and a transfer function ('linear', 'log', 'eq_hist'), returning an RGB element.

I've not yet written docstrings or any tests yet, but I do have the initial skeleton of a tutorial: https://anaconda.org/philippjfr/holoviews_datashader/notebook

@philippjfr philippjfr added the feature label Oct 4, 2016

cmap = param.ClassSelector(class_=(Iterable, Callable))
how = param.ObjectSelector(default='eq_hist', objects=['linear', 'log', 'eq_hist'])

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Member

Not sure how is a great parameter name. Maybe something like color_mapping or shading_mode?

This comment has been minimized.

@philippjfr

philippjfr Oct 4, 2016

Member

It's consistent with datashader at least, I'd prefer it over either of those.

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

It's actually a normalization function, so normalizer? normalization?

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Member

I made a comment about this below (my fault for reviewing PRs bottom up!). I see the argument regarding consistency but I still don't like a parameter called 'how'. Personally, I find 'normalization' to be a much nicer name...

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

I don't much like it either. I might change it in datashader as well, so I'd vote for using normalization here.

return cvs.points(data, x, y, self.p.aggregator)
def uint32_to_uint8(img):

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Member

Given this is used in a single place, I would consider inlining it or at least making it a classmethod of Shade...

This comment has been minimized.

@philippjfr
@jbednar

jbednar approved these changes Oct 4, 2016

Looks great!

y_range = param.NumericTuple(default=None, length=2,
allow_None=True)

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

You shouldn't need allow_None if the default is None.

y_sampling = param.Number(default=None, doc="""
Specifies the smallest allowed sampling interval along the y-axis.""")

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

I can't quite see what these are used for.

This comment has been minimized.

@philippjfr

philippjfr Oct 4, 2016

Member

They let you define a minimum sampling interval. It's one of my major annoyances with datashader you can zoom in beyond the sampling resolution, giving you useless speckly aggregates.

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

Should that be added to datashader instead?

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Member

@jbednar I would say so - I agree fixing this in datashader would be a better solution. If it gets parameterized there, the corresponding parameter could be reflected here.

cmap = param.ClassSelector(class_=(Iterable, Callable))
how = param.ObjectSelector(default='eq_hist', objects=['linear', 'log', 'eq_hist'])

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

The actual how attribute also supports 'cbrt', so that should probably be listed here. But it also supports any callable; is it possible to have that allowed here as well without preventing tab completion?

This comment has been minimized.

@philippjfr

philippjfr Oct 4, 2016

Member

Hmm, does it even support tab-completion? Either way cbrt and any other callable should be allowed.

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Member

So how is an existing parameter name in datashader then? I suspected as much...

Edit: More discussion about naming above.

@@ -37,7 +37,17 @@ def init(cls, eltype, data, kdims, vdims):
kdim_param = element_params['kdims']
vdim_param = element_params['vdims']
if not isinstance(data, xr.Dataset):
if isinstance (data, xr.DataArray):

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

A comment would be helpful above this block explaining what the purpose is. Maybe "Extract dimensions from xarray data structure"?

vdims = [dataset.get_dimension(column)(name) if column
else Dimension('Count')]
aggregate = pandas_pipeline(dataset.dframe(), schema, canvas,

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

I can't see why you needed to use pandas_pipeline explicitly here, but that's because I haven't studied that part of the datashader code.

elif isinstance(obj, Element):
line = isinstance(obj, Curve)
paths.append(obj.dframe())
if line:

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

It's never nice to see a bunch of if-else blocks to support different types, but I don't see a cleaner way to propose.

return cvs.line(data, x, y, self.p.aggregator)
else:
return cvs.points(data, x, y, self.p.aggregator)

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

Maybe instead of a boolean line, this could be handled more generally by setting glyph=Canvas.points or glyph=Canvas.line above, then doing return getattr(cvs,glyph)(data, x, y, self.p.aggregator) here?

This comment has been minimized.

@philippjfr

philippjfr Oct 4, 2016

Member

Sounds good.

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

Calling it "glyph" is a bit confusing, as it's actually a method name rather than an actual glyph object, but "method" seems meaningless...

array = element.data[element.vdims[0].name]
kdims = element.kdims
shade_opts = dict(how=self.p.how)
if element.ndims > 2:

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

A comment would be helpful here to summarize what this fairly convoluted processing of colormaps is for.

This comment has been minimized.

@philippjfr

philippjfr Oct 4, 2016

Member

Will add docstrings and comments next.

"""
callable = param.Callable(default=lambda x: x, doc="""

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Member

In discussion, we decided operation would be an even better name...

@@ -70,28 +70,45 @@ def get_overlay_bounds(cls, overlay):
raise ValueError("Extents across the overlay are inconsistent")
class DynamicOperation(Operation):
class Dynamic(Operation):

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Member

Should probably be a ParameterizedFunction and moved to some other file as a utility..

@@ -182,7 +182,7 @@ def _process(self, view, key=None):
raise NotImplementedError
def process_element(self, element, key, **params):
def process_element(self, element, key=None, **params):

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Member

Why is key now optional? From memory, the purpose of this method was to always supply a key and get the corresponding result...

# Initialize an empty axis layout
grid_data = ((pos, self(cell, **params))
for pos, cell in element.items())
processed = GridSpace(grid_data, label=element.label,
kdims=element.kdims)
elif dynamic:
processed = DynamicFunction(element, function=self, kwargs=params)
streams = getattr(self, 'streams', [])

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Member

This feature probably needs mentioning in the docstring wherever dynamic is mentioned.

Aggregate implements 2D binning for any valid HoloViews Element
type using datashader. By default it will simply count the number
of values in each bin but custom aggregators can be supplied
implementing mean, max, min and other reduction operations.

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

I wouldn't call them "custom"; they are just other aggregators. So maybe "but other aggregators can be supplied explicitly".

@@ -68,74 +69,91 @@ def dataset_pipeline(dataset, schema, canvas, glyph, summary):
class Aggregate(ElementOperation):
"""
Aggregate implements 2D binning for any valid HoloViews Element
type using datashader. By default it will simply count the number

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

Maybe add a sentence saying "I.e., this operation turns a HoloViews Element or overlay of Elements into an hv.Image or an overlay of hv.Images by rasterizing it, which provides a fixed-sized representation independent of the original dataset size.

the x_range and y_range. If x_sampling or y_sampling are supplied
the operation will ensure that a bin is no smaller than the
minimum sampling distance.
"""

This comment has been minimized.

@jbednar
width = param.Integer(default=600)
width = param.Integer(default=600, doc="""
The width of the aggregated image in pixels.""")

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

Should it be square by default, since the default hv image size is square in aspect?

This comment has been minimized.

@philippjfr

philippjfr Oct 4, 2016

Member

True, I'll also make it a bit smaller since our plots aren't nearly this big by default.

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

Can you just make it respect the normal holoviews dpi, etc., so that there are no explicit height and width parameters here at all?

This comment has been minimized.

@philippjfr

philippjfr Oct 5, 2016

Member

Hmm, the user still might want a specific height/width for export. Adding a stream for the bokeh plot width and height would be possible though, then it's even responsive to resizing.

This comment has been minimized.

@jbednar

jbednar Oct 5, 2016

Member

But isn't that true for hv in general, not just datashader? In any case, respecting the bokeh plot height and width however that could be done would be fabulous, because that will ensure pixel-accurate rendering with no wasted pixels. Not sure how you can get that information out of bokeh, though.

This comment has been minimized.

@philippjfr

philippjfr Oct 5, 2016

Member

Easily enough, I should be able to send it along with the ranges.

This comment has been minimized.

@jbednar

jbednar Oct 5, 2016

Member

I thought one could only get the overall size of the plot, not the size of the data area? Anyway, either is better than nothing!

This comment has been minimized.

@philippjfr

philippjfr Oct 5, 2016

Member

Ah that's true. If you disable axes it'll be fairly accurate I guess.

class Shade(ElementOperation):
"""
Shade applies a normalization function to the data and then
applies colormapping to an Image or NdOverlay of Images, returning

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Member

Maybe explain what "the data" is? I.e. is it an xarray, either 2D or if 3D then a stack of categories?

The normalization operation applied before colormapping.
Valid options include 'linear', 'log', 'eq_hist', 'cbrt',
and any valid transfer function that acces data, mask, nbins
arguments.""")

This comment has been minimized.

@jbednar

jbednar Oct 5, 2016

Member

acces -> has?

DynamicMap with a callback that will apply the operation
dynamically. An ElementOperation may also supply a list of Stream
classes on the streams attribute, which can allow dynamic control
over the parameters on the operation.

This comment has been minimized.

@jlstevens

jlstevens Oct 5, 2016

Member

I would have said 'on a streams parameter' instead of 'on the streams attribute'...

"""
operation = param.Callable(default=lambda x: x, doc="""
Callable to apply to DynamicMap items dynamically.""")

This comment has been minimized.

@jlstevens

jlstevens Oct 5, 2016

Member

How about 'Operation or user-defined callable to apply dynamically.'?

@jlstevens

This comment has been minimized.

Member

jlstevens commented Oct 5, 2016

Looks good to me!

There are a few other things we have discussed:

  • Making Shade into a compositor.
  • Updating the compositor system to become a plot option
  • Generalizing (compositors are really operations+predicates) and documenting them.

I also expect there will be some additional PRs for small fixes and general clean-up but this one is now ready to merge.

@jlstevens jlstevens merged commit 7c6a567 into master Oct 5, 2016

0 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details

@philippjfr philippjfr deleted the datashader branch Oct 14, 2016

@philippjfr philippjfr referenced this pull request Oct 21, 2016

Closed

Accept xarray DataArray #869

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment