Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make HeatMap more general #849

Merged
merged 22 commits into from Jan 9, 2017

Conversation

Projects
None yet
2 participants
@philippjfr
Copy link
Contributor

philippjfr commented Sep 5, 2016

The HeatMap Element has gone through a lot of redesign, initially being defined as a Raster type and then allowing columnar data. This PR refactors HeatMap to improve the simplify aggregation step and allow any number of value dimensions which can appear as hover information in bokeh. Still a WIP and I need to ensure the code is backward compatible (it should be).

@philippjfr philippjfr referenced this pull request Sep 13, 2016

Merged

Bokeh colorbars #861

4 of 4 tasks complete

@philippjfr philippjfr force-pushed the nd_heatmap branch from 9ee8034 to 6d2eca3 Sep 19, 2016

@philippjfr philippjfr added this to the v1.7.0 milestone Nov 16, 2016

@philippjfr philippjfr force-pushed the nd_heatmap branch from 6d2eca3 to 097680b Dec 10, 2016

@philippjfr philippjfr force-pushed the nd_heatmap branch from 097680b to 69a9793 Jan 8, 2017

Philipp Rudiger Philipp Rudiger

@philippjfr philippjfr force-pushed the nd_heatmap branch from 125c2a4 to efd4bd9 Jan 8, 2017

@philippjfr philippjfr added API data and removed in progress WIP labels Jan 8, 2017

@philippjfr

This comment has been minimized.

Copy link
Contributor Author

philippjfr commented Jan 8, 2017

@jlstevens The PR is now ready for review, I'll rebuild the test data shortly. The test data will have to be updated because HeatMap no longer pads with NaN values in the constructor, instead it computes a 2D gridded aggregate, which is used for display. I've also readded the previously supported raster as a property so it can go through a deprecation cycle. Representing the HeatMap as a gridded aggregate is much more flexible since it allows multiple value dimensions.

@philippjfr

This comment has been minimized.

Copy link
Contributor Author

philippjfr commented Jan 8, 2017

Still need to add thorough unit tests for the aggregation. I'm also now wondering whether I should add a warning when you pass non-aggregated data to a HeatMap, e.g. there are two different values for index ('A', 'a') in the heatmap, it would just silently ignore the second value. Really it should warn you that you should aggregate your data using some function (mean, max, min) before passing it to the HeatMap.

@philippjfr philippjfr requested a review from jlstevens Jan 8, 2017

@philippjfr

This comment has been minimized.

Copy link
Contributor Author

philippjfr commented Jan 8, 2017

Also deprecates support for unpickling the original HeatMap format (which was replaced in 1.4).

Philipp Rudiger Philipp Rudiger

@philippjfr philippjfr force-pushed the nd_heatmap branch from b8289fe to 3f4b073 Jan 8, 2017



def get_2d_aggregate(obj):
"""

This comment has been minimized.

@jlstevens

jlstevens Jan 8, 2017

Contributor

Perhaps this would be better expressed as an operation? Then maybe it could have a minimal docstring example in the class docstring?

for k1, k2 in product(keys1, keys2)})
return dense_map.dframe()
return super(HeatMap, self).dframe()
self.gridded = get_2d_aggregate(self)

This comment has been minimized.

@jlstevens

jlstevens Jan 8, 2017

Contributor

Nice to see how much HeatMap has been simplified!

This comment has been minimized.

@jlstevens

jlstevens Jan 8, 2017

Contributor

That said, it isn't immediately obvious that gridded is now a Dataset. Not sure I am necessarily recommending changing the name as gridded_dataset is awkward...

@@ -383,85 +381,18 @@ class HeatMap(Dataset, Element2D):

vdims = param.List(default=[Dimension('z')])

def __init__(self, data, extents=None, **params):
depth = 1

This comment has been minimized.

@jlstevens

jlstevens Jan 8, 2017

Contributor

I might have forgotten...what is this depth class attribute?

This comment has been minimized.

@philippjfr

philippjfr Jan 8, 2017

Author Contributor

I think this may be wrong now, will have to look into it.

This comment has been minimized.

@philippjfr

philippjfr Jan 8, 2017

Author Contributor

Wasn't needed at all in the end, removed it.

@@ -130,26 +136,31 @@ class HeatmapPlot(ColorbarPlot):
def _axes_props(self, plots, subplots, element, ranges):
dims = element.dimensions()
labels = self._get_axis_labels(dims)
xvals, yvals = [element.dimension_values(i, False)
agg = element.gridded
xvals, yvals = [unique_array(agg.dimension_values(i, False))

This comment has been minimized.

@jlstevens

jlstevens Jan 8, 2017

Contributor

I thought gridded Datasets have the 1D coordinate arrays available. Is the uniqueness being applied over the 2D set of samples or the 1D sequence?

This comment has been minimized.

@philippjfr

philippjfr Jan 8, 2017

Author Contributor

Yes, good point, no longer any need for the unique_array here.

for i in range(2)]
data = {x: xvals, y: yvals, z: zvals}

if 'hover' in self.tools+self.default_tools:
for vdim in element.vdims[1:]:
data[vdim.name] = ['' if is_nan(v) else v

This comment has been minimized.

@jlstevens

jlstevens Jan 8, 2017

Contributor

Wondering if an empty string really suggests NaN. 'NaN' would be explicit but might look noisy.

This comment has been minimized.

@philippjfr

philippjfr Jan 8, 2017

Author Contributor

Good point, I'm now using masked arrays to represent the data, in matplotlib the NaNs are therefore represented by -, which might be better.

This comment has been minimized.

@jlstevens

jlstevens Jan 8, 2017

Contributor

Yes, I think - might be a good compromise.

shape = data.shape
cmap_name = style.pop('cmap', None)
cmap = copy.copy(plt.cm.get_cmap('gray' if cmap_name is None else cmap_name))
cmap.set_bad('w', 1.)

This comment has been minimized.

@jlstevens

jlstevens Jan 8, 2017

Contributor

Might want to make this a plot option at some point instead of hard coding 'w'.

This comment has been minimized.

@philippjfr

philippjfr Jan 8, 2017

Author Contributor

Again good point, indeed we already expose this via clipping_colors, should hook that in here.

This comment has been minimized.

@jlstevens

jlstevens Jan 8, 2017

Contributor

I also find it curious that you are using copy.copy on a colormap - which suggests you are mutating it. I guess set_bad must have side-effects which explains the copying...

@jlstevens

This comment has been minimized.

Copy link
Contributor

jlstevens commented Jan 8, 2017

Ok, I've made my comments for now (and you have already replied to most of them). The biggest suggestion is that get_2d_aggregate might be better expressed as an operation (if that makes sense).

@philippjfr

This comment has been minimized.

Copy link
Contributor Author

philippjfr commented Jan 8, 2017

The biggest suggestion is that get_2d_aggregate might be better expressed as an operation (if that makes sense).

Yes, get_2d_aggregate is basically a fairly crude approximation of datashader aggregation for categorical key dimensions (i.e. 2D aggregation without the binning). Unfortunately a fair bit of complexity is required to allow aggregating without sorting the key dimensions (just added topological sorting to make that work properly).

@philippjfr

This comment has been minimized.

Copy link
Contributor Author

philippjfr commented Jan 8, 2017

@jlstevens Once tests are passing this is ready for a second review and then merge.

Philipp Rudiger added some commits Jan 8, 2017

@philippjfr

This comment has been minimized.

Copy link
Contributor Author

philippjfr commented Jan 8, 2017

@jlstevens Tests now passing on the PR build.

@@ -647,6 +647,30 @@ def walk_depth_first(name):
(names_by_level.get(i, None)
for i in itertools.count())))


def is_cyclic(graph):
"""Return True if the directed graph g has a cycle."""

This comment has been minimized.

@jlstevens

jlstevens Jan 9, 2017

Contributor

What is the representation of the graph? A list of edges as tuples? Would be good to mention in the docstring.

This comment has been minimized.

@jlstevens

jlstevens Jan 9, 2017

Contributor

I'm guessing the representation is similar as in one_to_one...even so, probably worth mentioning..

This comment has been minimized.

@philippjfr

philippjfr Jan 9, 2017

Author Contributor

Right, all three methods here (sort_topologically, cyclical and one_to_one) use the same representation, which is mapping between nodes and edges, will add the docstring.

return np.NaN


class categorical_aggregate2d(ElementOperation):

This comment has been minimized.

@jlstevens

jlstevens Jan 9, 2017

Contributor

Looks great! I was just wondering if you want to keep this class in util or move it to operation.element?

This comment has been minimized.

@philippjfr

philippjfr Jan 9, 2017

Author Contributor

It's imported there but can't be moved, cyclical imports again.

This comment has been minimized.

@jlstevens

jlstevens Jan 9, 2017

Contributor

Ok, having it available for operation.element is fine.

Generates a categorical 2D aggregate by inserting NaNs at all
cross-product locations that do not already have a value assigned.
Returns a 2D gridded Dataset object.
"""

This comment has been minimized.

@jlstevens

jlstevens Jan 9, 2017

Contributor

Quite a long method...if you see chunks that could be split up into helper methods, that might be sensible. Up to you though!

This comment has been minimized.

@philippjfr

philippjfr Jan 9, 2017

Author Contributor

Happy to split it up.

@jlstevens

This comment has been minimized.

Copy link
Contributor

jlstevens commented Jan 9, 2017

I made three more comments for you to reply to. Otherwise looks good and I expect this will be merged very soon!

@philippjfr

This comment has been minimized.

Copy link
Contributor Author

philippjfr commented Jan 9, 2017

Latest comments addressed, ready to merge when tests pass.

@jlstevens

This comment has been minimized.

Copy link
Contributor

jlstevens commented Jan 9, 2017

Tests passed. Merging!

@jlstevens jlstevens merged commit 66901bc into master Jan 9, 2017

4 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.08%) to 76.896%
Details
s3-reference-data-cache Test data is cached.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.