New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bokeh statistics operations, elements and plots #1985

Merged
merged 40 commits into from Oct 31, 2017

Conversation

Projects
None yet
2 participants
@philippjfr
Member

philippjfr commented Oct 9, 2017

This PR will replace the seaborn interface with a set of statistical elements, operations and plots which will all work together. For now it's very much a WIP since there's still some groundwork to be laid, but it's a good start with the basic functionality in place for Distribution and Bivariate elements and corresponding operations.

Demo notebook can be seen here

@@ -164,6 +164,10 @@ class Dataset(Element):
# to supplied data
_auto_indexable_1d = True
# Determines whether value dimensions are in data or should be emulated
# Useful for elements which compute statistics from the data
_virtual_vdims = False

This comment has been minimized.

@jlstevens

jlstevens Oct 30, 2017

Member

Minor comment _pseudo_vdims sounds a little less wacky than _virtual_vdims :-)

This comment has been minimized.

@jlstevens

jlstevens Oct 30, 2017

Member

Or maybe these are 'derived' vdims?

This comment has been minimized.

@philippjfr

philippjfr Oct 30, 2017

Member

Now removed.

@@ -72,7 +72,7 @@ def init(cls, eltype, data, kdims, vdims):
@classmethod
def validate(cls, dataset):
ndims = len(dataset.dimensions())
ndims = dataset.ndims if dataset._virtual_vdims else len(dataset.dimensions())

This comment has been minimized.

@jlstevens

jlstevens Oct 30, 2017

Member

I assume this is because the 'virtual' vdims are supposed to be hidden?

result = applicable_op.apply(sliced, ranges, key=key)
result = result.relabel(group=applicable_op.group)
result = list(zip(sliced.keys(), [result]))
overlay = overlay.clone(values[:start]+result+values[stop:])

This comment has been minimized.

@jlstevens

jlstevens Oct 30, 2017

Member

Not looked ahead yet - I'm hoping to see some new tests showing this new compositor functionality.

"""
Transfers options for all backends from one object to another.
Drops any options defined in the supplied drop list.
"""

This comment has been minimized.

@jlstevens

jlstevens Oct 30, 2017

Member

Seems useful though it definitely needs some examples (even if only in unit tests).

from .chart import Chart, Scatter
class _StatisticsElement(Chart):

This comment has been minimized.

@jlstevens

jlstevens Oct 30, 2017

Member

Not sure about the need for an underscore, I don't see why this shouldn't be called StatisticalElement even if it is a base class. I can imagine isinstance(foo, StatisticalElement) being useful.

from .element import contours
class univariate_kde(Operation):

This comment has been minimized.

@jlstevens

jlstevens Oct 30, 2017

Member

Will need a docstring. I know you were probably already planning to add one - just making sure!

This comment has been minimized.

@philippjfr
@@ -451,6 +453,31 @@ def get_min_distance(element):
return 0
class univariate_composite(Operation):

This comment has been minimized.

@jlstevens

jlstevens Oct 30, 2017

Member

Minor point, I think you could call this univariate_compositor.

This comment has been minimized.

@philippjfr

philippjfr Oct 30, 2017

Member

Agreed.

@@ -101,7 +101,10 @@ def init(cls, eltype, data, kdims, vdims):
@classmethod
def validate(cls, dataset):

This comment has been minimized.

@jlstevens

jlstevens Oct 30, 2017

Member

Why not make the derived_vdims flag (or similar, validate_vdims maybe?) an explicit argument to validate?

This comment has been minimized.

@philippjfr

philippjfr Oct 30, 2017

Member

Now removed.

def __init__(self, data, kdims=None, vdims=None, **params):
super(_StatisticsElement, self).__init__(data, kdims, vdims, **params)
if not self.vdims:
self.vdims = [Dimension('Density')]

This comment has been minimized.

@jlstevens

jlstevens Oct 30, 2017

Member

Is 'Density' going to be right for all statistical elements?

This comment has been minimized.

@philippjfr

philippjfr Oct 30, 2017

Member

Yeah, this logic isn't right at all, will get to it.

bw = plot_opts.pop('bw', univariate_kde.bandwidth)
transformed = univariate_kde(element, bandwidth=bw)
Store.transfer_options(element, transformed, ['bw'])
return transformed

This comment has been minimized.

@jlstevens

jlstevens Oct 30, 2017

Member

Transferring styles from input to output seems like it should be an option for the Compositor when you register the operation.

This comment has been minimized.

@philippjfr

philippjfr Oct 30, 2017

Member

Do you want that to hold up merging this PR?

@jlstevens

This comment has been minimized.

Member

jlstevens commented Oct 30, 2017

Just to say that with a few tweaks discussed with @philippjfr (namely using the Compositor itself to transfer styles, passing the validate_vdims flag as an explicit boolean parameter), I am very happy with the approach taken here.

Using the compositor in the background and writing things in terms of operations is very clean and I find it to be a very powerful approach. It is along the lines of the 'plotting macro' idea we had before and this is an initial step in that direction (using the compositor to do that job for now).

Other than seeing those tweaks and the other minor comments above addressed, I would be happy to merge once some unit tests are added (doesn't have to be extensive!) and the corresponding element notebooks are added. I expect coverage would go up without new unit tests just because this would delete that old seaborn/pandas interface code!

@philippjfr

This comment has been minimized.

Member

philippjfr commented Oct 31, 2017

Docstrings, element references and unit tests left, should be able to finish that up in the morning. Compositor now handles everything, and I'm very happy with it overall. It did touch an awful lot of files though.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Oct 31, 2017

@jlstevens This is now ready for review. I've added element notebooks for both new Elements along with a new demo, added docstrings to the operations and added unit tests for transferring options, for the statistics elements themselves and for the compositing of the statistics elements.

Should get a very healthy boost in coverage overall.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Oct 31, 2017

Almost a 1% coverage increase! Ready for final review and merge whenever.

from .chart import Chart, Scatter
class _StatisticsElement(Chart):

This comment has been minimized.

@jlstevens

jlstevens Oct 31, 2017

Member

I would still prefer to have this called StatisticalElement which matches the name used in the unit testing class.

This comment has been minimized.

@philippjfr

philippjfr Oct 31, 2017

Member

Sure, don't want that appearing in the top-level namespace though.

This comment has been minimized.

@jlstevens

jlstevens Oct 31, 2017

Member

Can't we just set __all__ so if it is used it has to be explicitly imported?

This comment has been minimized.

@philippjfr

philippjfr Oct 31, 2017

Member

I'm now filtering abstract classes from hv.element. Works fine.

if type(element) not in Store.registry[backend]:
eltype = type(element)
if (eltype not in Store.registry[backend] and
all(eltype.__name__ != d.pattern for d in Compositor.definitions)):

This comment has been minimized.

@jlstevens

jlstevens Oct 31, 2017

Member

How come we didn't need this before when the compositor was used?

This comment has been minimized.

@philippjfr

philippjfr Oct 31, 2017

Member

I can actually remove this again, it was needed when I didn't define plotting class stubs for these elements, which I then restored because I realized options wouldn't work correctly without them.

This comment has been minimized.

@jlstevens

jlstevens Oct 31, 2017

Member

Ok, removing this would be good in that case.

if any(len(c._pattern_spec) == 1 for c in Compositor.definitions):
obj = obj.map(lambda obj: Compositor.collapse_element(obj, mode='data',
backend=backend),
[Element])

This comment has been minimized.

@jlstevens

jlstevens Oct 31, 2017

Member

Maybe this stuff should be provided by Compositor itself? (i.e CompositeOverlay vs Element)

This comment has been minimized.

@jlstevens

jlstevens Oct 31, 2017

Member

I also think this should be done by compositor as you are accessing an underscore attribute, namely _pattern_spec.

This comment has been minimized.

@jlstevens

jlstevens Oct 31, 2017

Member

Maybe something like Compositor.map(obj, backend)?

new_ids = tuple(overlay.traverse(lambda x: id(x), [spec_fn]))
if new_ids == prev_ids:
return overlay
prev_ids = new_ids

This comment has been minimized.

@jlstevens

jlstevens Oct 31, 2017

Member

I am happy that we have StatisticalCompositorTest but I still don't yet have a mental model of how this code extends how Compositor works.

This comment has been minimized.

@philippjfr

philippjfr Oct 31, 2017

Member

I'll quickly describe it:

a) Compositors can now be applied to individual elements not just overlays.
b) Compositors are applied iteratively until there are no more matches in the compositor definitions (needed to reduce Overlays with multiple elements that should be transformed)
c) If transfer_options is true, the options are transferred from the input element to the output element. Additionally plot options that also apply to the operation are transferred there.

This comment has been minimized.

@jlstevens

jlstevens Oct 31, 2017

Member

Ok. I think this is all stuff to demonstrate with examples when we finally expose Compositor as a useful thing to users.

"not %s." % (group, dims))
dimensions[group] = [d if isinstance(d, Dimension) else Dimension(d) for d in dims]
return dimensions

This comment has been minimized.

@jlstevens

jlstevens Oct 31, 2017

Member

Just checking that this function is confined to core....

This comment has been minimized.

@philippjfr

philippjfr Oct 31, 2017

Member

Good thing to check but I'm fairly certain.

This comment has been minimized.

@philippjfr

philippjfr Oct 31, 2017

Member

Yes it is.

The group identifier for the output of this particular compositor""")
kwargs = param.Dict(doc="""
Optional set of parameters to pass to the operation.""")
transfer_options = param.Boolean(default=False, doc="""
Whether to transfer the options from the input to the output.""")

This comment has been minimized.

@jlstevens

jlstevens Oct 31, 2017

Member

I think this is fine but maybe the part that transfers parameters (i.e plot options) to the operation should be a separate flag. Maybe transfer_parameters? or propagate_params?

@jlstevens

This comment has been minimized.

Member

jlstevens commented Oct 31, 2017

After a few more comments above are addressed, I'm happy to merge.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Oct 31, 2017

Okay all comments addressed, just waiting on tests now.

@philippjfr philippjfr removed the WIP label Oct 31, 2017

@jlstevens

This comment has been minimized.

Member

jlstevens commented Oct 31, 2017

Tests have passed. Merging!

@jlstevens jlstevens merged commit 77b92f0 into master Oct 31, 2017

4 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.9%) to 80.739%
Details
s3-reference-data-cache Test data is cached.
Details

@philippjfr philippjfr added this to the v1.9 milestone Nov 2, 2017

@philippjfr philippjfr deleted the bokeh_stats_elements branch Nov 2, 2017

@pyup-bot pyup-bot referenced this pull request Nov 3, 2017

Closed

Update holoviews to 1.9.0 #104

@pyup-bot pyup-bot referenced this pull request Nov 13, 2017

Closed

Update holoviews to 1.9.1 #120

@pyup-bot pyup-bot referenced this pull request Dec 12, 2017

Merged

Update holoviews to 1.9.2 #139

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment