New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeseries operations #1172

Merged
merged 10 commits into from Mar 6, 2017

Conversation

Projects
None yet
3 participants
@philippjfr
Member

philippjfr commented Mar 3, 2017

Some basic operations to work with timeseries, the outlier detection is very useful in combination with datashader because it lets you get hover information about the outliers. Needs tests and support for datetime types.

Until we sort out input/output specifications on operations, I'm handling unpacking of (Nd)Overlays with a .map call in the _process method.

I've also added a streams parameter to ElementOperation by default. It was already supported API and I've found myself wanting to attach custom streams (based on paramnb) to existing ElementOperations and not being able to. Here's the pattern I'm talking about, in the following code there are two streams one for the ranges shared amongst the data loading, datashading and decimate operations, and self which is a class defined for paramnb which controls the rolling window and sigma parameters of the smoothing and outlier detection operations:

stream = RangeXY()

# Define DynamicMap to load dataset based on stream parameters
dmap = hv.DynamicMap(self.load_data, kdims=[], streams=[self])

# Apply rolling mean with the window controlled by stream 
smoothed = rolling(dmap, streams=[self])

# Apply steps interpolation
smoothed = interpolate_curve(dmap)

# Datashade based on ranges
smoothed = datashade(smoothed, cmap=get_cmap('Set1'),
                     aggregator=ds.count_cat('Conf'),
                     width=675, height=300, streams=[stream])

# Compute outliers with window and sigma controlled by stream
outliers = rolling_outlier_std(dmap, streams=[self])

# Decimate outliers only based on current range
outliers = decimate(outliers, streams=[stream])

# Overlay dynamically smoothed, interpolated, datashaded timeseries
# with dynamically decimated outliers
smoothed * outliers

@philippjfr philippjfr added the WIP label Mar 3, 2017

@jlstevens

This comment has been minimized.

Member

jlstevens commented Mar 3, 2017

The functionality here is great!

@philippjfr @jbednar That said, this PR illustrates one worry I have with operations - the number of possible operations you can imagine implementing is unbounded. Once we split out the plotting backends, are operations really part of the core HoloViews?

I worry that even a set of general scientific set of operations could keep growing forever, nevermind any domain specific stuff. I am tempted to say that the core operations included with HoloViews should only be those used by the core codebase.

We've already started adding operations that depend on third party libraries (e.g datashader) and once we split out plotting backends, you'll need to install things like holoviews-bokeh and holoviews-mpl. Why not extend this concept to operations and have things like holoviews-datashader, holoviews-timeseries and holoviews-image-tools? We can always have a metapackage that includes everything e.g holoviews-all.

In short, I strongly feel we need a plan to keep what is in core HoloViews well bounded allowing us to split off more open ended things like plotting backends and operations elsewhere where they can grow more freely. This would make sense to consider for HoloViews 2.0.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Mar 3, 2017

That said, this PR illustrates one worry I have with operations - the number of possible operations you can imagine implementing is unbounded. Once we split out the plotting backends, are operations really part of the core HoloViews?

I'd argue they are unless they are clearly domain specific, which I don't think is the case here. I actually think there are other operations that are far more domain specific than these. If holoviews-core is about working with data then I don't mind shipping other operations with it. The main issue I have with splitting out all operations is that they are actually user level API, which means users will have to import them from various random places. Plotting backends don't suffer from the same issue because they can be loaded invisibly in the background without having the user execute awkward imports.

That said I'd be happy to discuss splitting them out if we have a good plan for them, and there is more than two or three operations per extension (like there would currently be if we split out holoviews-datashader or holoviews-timeseries).

@philippjfr

This comment has been minimized.

Member

philippjfr commented Mar 3, 2017

Now working with the dask interface without having to cast the data so we should be able to apply these operations nicely to huge timeseries.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Mar 3, 2017

The main issue I have with splitting out all operations is that they are actually user level API, which means users will have to import them from various random places.

That's not necessarily true as you can imagine trying to import the various operation packages into the normal place (if available). Not saying that is a brilliant idea though!

@jbednar

This comment has been minimized.

Member

jbednar commented Mar 3, 2017

@philippjfr, this looks great; let's merge it!

@jlstevens , I don't think the operations we have now or can reasonably anticipate would add up to be worthy of the overhead of maintaining and installing a separate package, per operation or group of operations. Given the amount of code involved, the extra boilerplate and tracking required seems like it would make life worse, not better.

That said, it certainly is important to be careful about scope, and it is also important to think about future maintainability. In this context, is it possible for us to define a rigorous interface that operations use and staunchly defend that, rather than trying to handle it at the packaging level? The important thing is to avoid having a combinatorial spaghetti mess, where we when we change one thing in the core we have to update lots of operations that do things that only one of us knows anything about. Seems like this could be avoided by focusing on having a clear interface for operations to use, so that as long as that doesn't change, other changes to the core don't make us have to revisit each of the disparate operations supported.

Overall, I think operations really exploit the power of HoloViews to provide high-level analysis easily and safely, so I do think we should make them maintainable and feasible and something we can encourage rather than something to fear.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Mar 3, 2017

I'm still not sure how we will define the scope of what operations do or don't belong into HoloViews.

What we can do is establish a high bar for the things that do belong. In particular, I would like to see some decent documentation/notebooks describing all the operations we have, justifying their cross-domain generality and giving examples.

@jbednar

This comment has been minimized.

Member

jbednar commented Mar 3, 2017

Ok, I think that should be simple enough. I don't think there is any issue with current operations not being suitable across domains; these are all easily applicable across domains as far as I can remember.

class rolling_outlier_std(ElementOperation):
"""
Detect outliers for using the standard devitation within a rolling window.

This comment has been minimized.

@jbednar

jbednar Mar 3, 2017

Member

Detect outliers using the standard deviation

@philippjfr philippjfr added feature and removed WIP labels Mar 4, 2017

@philippjfr

This comment has been minimized.

Member

philippjfr commented Mar 4, 2017

Added some tests now, so this should be ready to merge. I'll also put together a notebook covering these operations, which I'll add to contrib.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Mar 4, 2017

That said, it certainly is important to be careful about scope, and it is also important to think about future maintainability. In this context, is it possible for us to define a rigorous interface that operations use and staunchly defend that, rather than trying to handle it at the packaging level? The important thing is to avoid having a combinatorial spaghetti mess, where we when we change one thing in the core we have to update lots of operations that do things that only one of us knows anything about. Seems like this could be avoided by focusing on having a clear interface for operations to use, so that as long as that doesn't change, other changes to the core don't make us have to revisit each of the disparate operations supported.

Overall operations have changed very little since their first inception, particularly ElementOperations. One thing that's bugging me and others (see #1107) is that by default they do not unpack NdOverlays so if you want that to be supported you need to add some special handling in the _process method like I have done here. Having discussed this with Jean-Luc, we both agree this will have to be handled by allowing operations to declare both their input and output types. It's still unclear to both of us how best to specify nested type declarations if at all, so that's an issue that still needs discussing. In the meantime I think it's fine to resort to the _process/_apply pattern used here as that's always going to be compatible. So really what I'm saying is, even if retrofit type declarations on operations in the future, old operations without them should continue working.

@jbednar

This comment has been minimized.

Member

jbednar commented Mar 4, 2017

Well, we wanted to add output types to param for other reasons anyway...

import param
from ..core import ElementOperation, Element
from ..core.util import pd

This comment has been minimized.

@jlstevens

jlstevens Mar 5, 2017

Member

I think it might be better to make this explicit i.e show that pd is None if pandas isn't available.

Minimum sigma before a value is considered an outlier.""")
def _apply(self, element, key=None):
sigma, window = self.p.sigma, self.p.rolling_window

This comment has been minimized.

@jlstevens

jlstevens Mar 5, 2017

Member

I think it might be worth checking if pd is None here and raising a suitable error (cannot proceed without pandas).

@@ -575,7 +575,7 @@ class decimate(ElementOperation):
The x_range as a tuple of min and max y-value. Auto-ranges
if set to None.""")
def _process(self, element, key=None):
def _apply(self, element, key=None):

This comment has been minimized.

@jlstevens

jlstevens Mar 5, 2017

Member

Not sure I like the name _apply. Even though _process is supposed to process elements already, how about _process_element which processes elements, excluding Overlays/NdOverlays.

This comment has been minimized.

@philippjfr

philippjfr Mar 5, 2017

Member

Unfortunately we already have a process_element method, which just does this:

    def process_element(self, element, key, **params):
        """
        The process_element method allows a single element to be
        operated on given an externally supplied key.
        """
        self.p = param.ParamOverrides(self, params)
        return self._process(element, key)

This comment has been minimized.

@jlstevens

jlstevens Mar 5, 2017

Member

Yeah, I just noticed it here. Why aren't we using this method then?

This comment has been minimized.

@philippjfr

philippjfr Mar 5, 2017

Member

Because it still doesn't handle this case, and changing it would not be backward compatible. It's badly named is all.

This comment has been minimized.

@jlstevens

jlstevens Mar 5, 2017

Member

How about _process_layer then? At least this is supposed to be temporary, but best pick a meaningful name in case it isn't as temporary as we would like...

This comment has been minimized.

@philippjfr

philippjfr Mar 5, 2017

Member

Sure, I don't mind.

streams = param.List(default=[], doc="""
List of streams that are applied if dynamic=True, allowing
for dynamic interaction with the plot.""")

This comment has been minimized.

@jlstevens

jlstevens Mar 5, 2017

Member

Seems fine but we do need to document this before before 1.7.

This comment has been minimized.

@philippjfr

philippjfr Mar 5, 2017

Member

Yes, we'll have to add documentation about dynamic operations to our DynamicMap and streams sections.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Mar 6, 2017

Looks good and the tests are passing. Merging.

@jlstevens jlstevens merged commit 2047dd3 into master Mar 6, 2017

4 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage remained the same at 78.393%
Details
s3-reference-data-cache Test data is cached.
Details

@jbednar jbednar deleted the timeseries_operations branch Mar 22, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment