Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Datashader operations #894

Merged
merged 23 commits into from Oct 5, 2016

Conversation

Projects
None yet
3 participants
@philippjfr
Copy link
Contributor

philippjfr commented Oct 4, 2016

As suggested in #812 this adds datashader operations to HoloViews, making it trivial to aggregate any Element type into an Image or stack of images and then shade it. The PR currently adds two main operations:

  • Aggregate: Takes any 2D dataset with or without values and generates a GridImage aggregate wrapping an xarray DataArray.
  • Shade: Takes the output of an aggregate and applies color interpolation and a transfer function ('linear', 'log', 'eq_hist'), returning an RGB element.

I've not yet written docstrings or any tests yet, but I do have the initial skeleton of a tutorial: https://anaconda.org/philippjfr/holoviews_datashader/notebook

@philippjfr philippjfr added the feature label Oct 4, 2016


cmap = param.ClassSelector(class_=(Iterable, Callable))

how = param.ObjectSelector(default='eq_hist', objects=['linear', 'log', 'eq_hist'])

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Contributor

Not sure how is a great parameter name. Maybe something like color_mapping or shading_mode?

This comment has been minimized.

@philippjfr

philippjfr Oct 4, 2016

Author Contributor

It's consistent with datashader at least, I'd prefer it over either of those.

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

It's actually a normalization function, so normalizer? normalization?

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Contributor

I made a comment about this below (my fault for reviewing PRs bottom up!). I see the argument regarding consistency but I still don't like a parameter called 'how'. Personally, I find 'normalization' to be a much nicer name...

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

I don't much like it either. I might change it in datashader as well, so I'd vote for using normalization here.

return cvs.points(data, x, y, self.p.aggregator)


def uint32_to_uint8(img):

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Contributor

Given this is used in a single place, I would consider inlining it or at least making it a classmethod of Shade...

This comment has been minimized.

@philippjfr

philippjfr Oct 4, 2016

Author Contributor

Sure.

@jbednar

jbednar approved these changes Oct 4, 2016

Copy link
Contributor

jbednar left a comment

Looks great!


y_range = param.NumericTuple(default=None, length=2,
allow_None=True)

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

You shouldn't need allow_None if the default is None.


y_sampling = param.Number(default=None, doc="""
Specifies the smallest allowed sampling interval along the y-axis.""")

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

I can't quite see what these are used for.

This comment has been minimized.

@philippjfr

philippjfr Oct 4, 2016

Author Contributor

They let you define a minimum sampling interval. It's one of my major annoyances with datashader you can zoom in beyond the sampling resolution, giving you useless speckly aggregates.

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

Should that be added to datashader instead?

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Contributor

@jbednar I would say so - I agree fixing this in datashader would be a better solution. If it gets parameterized there, the corresponding parameter could be reflected here.

cmap = param.ClassSelector(class_=(Iterable, Callable))

how = param.ObjectSelector(default='eq_hist', objects=['linear', 'log', 'eq_hist'])

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

The actual how attribute also supports 'cbrt', so that should probably be listed here. But it also supports any callable; is it possible to have that allowed here as well without preventing tab completion?

This comment has been minimized.

@philippjfr

philippjfr Oct 4, 2016

Author Contributor

Hmm, does it even support tab-completion? Either way cbrt and any other callable should be allowed.

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Contributor

So how is an existing parameter name in datashader then? I suspected as much...

Edit: More discussion about naming above.

@@ -37,7 +37,17 @@ def init(cls, eltype, data, kdims, vdims):
kdim_param = element_params['kdims']
vdim_param = element_params['vdims']

if not isinstance(data, xr.Dataset):
if isinstance (data, xr.DataArray):

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

A comment would be helpful above this block explaining what the purpose is. Maybe "Extract dimensions from xarray data structure"?

vdims = [dataset.get_dimension(column)(name) if column
else Dimension('Count')]

aggregate = pandas_pipeline(dataset.dframe(), schema, canvas,

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

I can't see why you needed to use pandas_pipeline explicitly here, but that's because I haven't studied that part of the datashader code.

elif isinstance(obj, Element):
line = isinstance(obj, Curve)
paths.append(obj.dframe())
if line:

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

It's never nice to see a bunch of if-else blocks to support different types, but I don't see a cleaner way to propose.

return cvs.line(data, x, y, self.p.aggregator)
else:
return cvs.points(data, x, y, self.p.aggregator)

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

Maybe instead of a boolean line, this could be handled more generally by setting glyph=Canvas.points or glyph=Canvas.line above, then doing return getattr(cvs,glyph)(data, x, y, self.p.aggregator) here?

This comment has been minimized.

@philippjfr

philippjfr Oct 4, 2016

Author Contributor

Sounds good.

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

Calling it "glyph" is a bit confusing, as it's actually a method name rather than an actual glyph object, but "method" seems meaningless...

array = element.data[element.vdims[0].name]
kdims = element.kdims
shade_opts = dict(how=self.p.how)
if element.ndims > 2:

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

A comment would be helpful here to summarize what this fairly convoluted processing of colormaps is for.

This comment has been minimized.

@philippjfr

philippjfr Oct 4, 2016

Author Contributor

Will add docstrings and comments next.

"""

callable = param.Callable(default=lambda x: x, doc="""

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Contributor

In discussion, we decided operation would be an even better name...

@@ -70,28 +70,45 @@ def get_overlay_bounds(cls, overlay):
raise ValueError("Extents across the overlay are inconsistent")


class DynamicOperation(Operation):
class Dynamic(Operation):

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Contributor

Should probably be a ParameterizedFunction and moved to some other file as a utility..

@@ -182,7 +182,7 @@ def _process(self, view, key=None):
raise NotImplementedError


def process_element(self, element, key, **params):
def process_element(self, element, key=None, **params):

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Contributor

Why is key now optional? From memory, the purpose of this method was to always supply a key and get the corresponding result...

# Initialize an empty axis layout
grid_data = ((pos, self(cell, **params))
for pos, cell in element.items())
processed = GridSpace(grid_data, label=element.label,
kdims=element.kdims)
elif dynamic:
processed = DynamicFunction(element, function=self, kwargs=params)
streams = getattr(self, 'streams', [])

This comment has been minimized.

@jlstevens

jlstevens Oct 4, 2016

Contributor

This feature probably needs mentioning in the docstring wherever dynamic is mentioned.

Aggregate implements 2D binning for any valid HoloViews Element
type using datashader. By default it will simply count the number
of values in each bin but custom aggregators can be supplied
implementing mean, max, min and other reduction operations.

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

I wouldn't call them "custom"; they are just other aggregators. So maybe "but other aggregators can be supplied explicitly".

@@ -68,74 +69,91 @@ def dataset_pipeline(dataset, schema, canvas, glyph, summary):


class Aggregate(ElementOperation):
"""
Aggregate implements 2D binning for any valid HoloViews Element
type using datashader. By default it will simply count the number

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

Maybe add a sentence saying "I.e., this operation turns a HoloViews Element or overlay of Elements into an hv.Image or an overlay of hv.Images by rasterizing it, which provides a fixed-sized representation independent of the original dataset size.

the x_range and y_range. If x_sampling or y_sampling are supplied
the operation will ensure that a bin is no smaller than the
minimum sampling distance.
"""

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

How?


width = param.Integer(default=600)
width = param.Integer(default=600, doc="""
The width of the aggregated image in pixels.""")

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

Should it be square by default, since the default hv image size is square in aspect?

This comment has been minimized.

@philippjfr

philippjfr Oct 4, 2016

Author Contributor

True, I'll also make it a bit smaller since our plots aren't nearly this big by default.

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

Can you just make it respect the normal holoviews dpi, etc., so that there are no explicit height and width parameters here at all?

This comment has been minimized.

@philippjfr

philippjfr Oct 5, 2016

Author Contributor

Hmm, the user still might want a specific height/width for export. Adding a stream for the bokeh plot width and height would be possible though, then it's even responsive to resizing.

This comment has been minimized.

@jbednar

jbednar Oct 5, 2016

Contributor

But isn't that true for hv in general, not just datashader? In any case, respecting the bokeh plot height and width however that could be done would be fabulous, because that will ensure pixel-accurate rendering with no wasted pixels. Not sure how you can get that information out of bokeh, though.

This comment has been minimized.

@philippjfr

philippjfr Oct 5, 2016

Author Contributor

Easily enough, I should be able to send it along with the ranges.

This comment has been minimized.

@jbednar

jbednar Oct 5, 2016

Contributor

I thought one could only get the overall size of the plot, not the size of the data area? Anyway, either is better than nothing!

This comment has been minimized.

@philippjfr

philippjfr Oct 5, 2016

Author Contributor

Ah that's true. If you disable axes it'll be fairly accurate I guess.



class Shade(ElementOperation):
"""
Shade applies a normalization function to the data and then
applies colormapping to an Image or NdOverlay of Images, returning

This comment has been minimized.

@jbednar

jbednar Oct 4, 2016

Contributor

Maybe explain what "the data" is? I.e. is it an xarray, either 2D or if 3D then a stack of categories?

Philipp Rudiger Philipp Rudiger
The normalization operation applied before colormapping.
Valid options include 'linear', 'log', 'eq_hist', 'cbrt',
and any valid transfer function that acces data, mask, nbins
arguments.""")

This comment has been minimized.

@jbednar

jbednar Oct 5, 2016

Contributor

acces -> has?

Philipp Rudiger added some commits Oct 5, 2016

DynamicMap with a callback that will apply the operation
dynamically. An ElementOperation may also supply a list of Stream
classes on the streams attribute, which can allow dynamic control
over the parameters on the operation.

This comment has been minimized.

@jlstevens

jlstevens Oct 5, 2016

Contributor

I would have said 'on a streams parameter' instead of 'on the streams attribute'...

Philipp Rudiger Philipp Rudiger
"""

operation = param.Callable(default=lambda x: x, doc="""
Callable to apply to DynamicMap items dynamically.""")

This comment has been minimized.

@jlstevens

jlstevens Oct 5, 2016

Contributor

How about 'Operation or user-defined callable to apply dynamically.'?

Philipp Rudiger Philipp Rudiger
@jlstevens

This comment has been minimized.

Copy link
Contributor

jlstevens commented Oct 5, 2016

Looks good to me!

There are a few other things we have discussed:

  • Making Shade into a compositor.
  • Updating the compositor system to become a plot option
  • Generalizing (compositors are really operations+predicates) and documenting them.

I also expect there will be some additional PRs for small fixes and general clean-up but this one is now ready to merge.

@jlstevens jlstevens merged commit 7c6a567 into master Oct 5, 2016

0 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details

@philippjfr philippjfr deleted the datashader branch Oct 14, 2016

@philippjfr philippjfr referenced this pull request Oct 21, 2016

Closed

Accept xarray DataArray #869

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.