New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lazy evaluation for DynamicMap methods #422

Closed
philippjfr opened this Issue Jan 27, 2016 · 31 comments

Comments

Projects
None yet
3 participants
@philippjfr
Member

philippjfr commented Jan 27, 2016

After a lengthy discussion on how to work with very large datasets with @jlstevens, we realized that DynamicMap callbacks and the lazy evaluation they allow gets us most of the way there but there's some crucial things missing for usability. In particular the methods on DynamicMap are all inherited from HoloMap, which means they work on the assumption that all the data is already loaded. This means that currently when using a method on a DynamicMap it will apply the operation only to the elements that's already loaded, which will in the best case do nothing but will usually result in exceptions when the next Element is loaded.

What I propose is to subclass the methods on DynamicMap to generate closures, which will apply the requested operation when a new key is requested. This will allow slicing, sampling, reducing, aggregating and overlaying operations to be applied at runtime. Even operations could trivially be modified to simply define a closure when passed a DynamicMap.

This will be a huge improvement for usability and make exploration and analysis of large datasets incredibly easy. The user could specify the data transformations to be applied directly on the object rather than having to write complex callbacks.

Here's a small toy example:

Say we have an array of temperatures on disk which is indexed by Date x Atmospheric Height x Latitude x Longitude.

Currently required implementation:

def get_contours(key):
    array, metadata = load_data_from disk(key)
    img = hv.Image(array, **metadata).select(Latitude=(0, 30), Longitude=(50, 80))
    return hv.contours(img)

def get_curve(key):
    array, metadata = load_data_from disk(key)
    return hv.Image(array, **metadata).reduce(Latitude=np.mean)

kdims = [hv.Dimension('Date', ...), hv.Dimension('Atmospheric Height', ...)]
contours = hv.DynamicMap(get_contours, kdims=kdims)
curve = hv.DynamicMap(get_curve, kdims=kdims)
contours + curve

With lazy evaluation:

def get_data(key):
    array, metadata = load_data_from disk(key)
    return hv.Image(array, **metadata)

dmap = hv.DynamicMap(get_data, kdims=[hv.Dimension('Date', ...),
                     hv.Dimension('Atmospheric Height', ...)])

hv.contours(dmap.select(Latitude=(0, 30), Longitude=(50, 80))) +\
dmap.reduce(Latitude=np.mean)

The lazy evaluation example really demonstrates the strengths of this approach, you can define the data loading operation once and then interactively explore the whole dataset without having to define more and more functions. I hope this gets the idea across, I think this is relatively little work for a pretty huge payoff, this would open up a lot of the functionality in HoloViews up to much, much larger datasets.

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 27, 2016

Sounds like an excellent idea to me. How will DynamicMap know that get_data returns an Image? Or does it not need to know that, as it will simply apply whatever operations are requested onto whatever it returns? And if so, does that mean that get_data can dynamically return different Element types?

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 27, 2016

How will DynamicMap know that get_data returns an Image? Or does it not need to know that, as it will simply apply whatever operations are requested onto whatever it returns?

Correct, it simply wouldn't care, if the operation you applied doesn't work for the return type it will complain at evaluation.

And if so, does that mean that get_data can dynamically return different Element types?

No, get_data has to consistently return the same Element type because the plotting code needs to initialize the appropriate artist/glyph. But you wouldn't want to have it return different types because then you can't apply any operations to it. The proposal does however allow for lazy overlaying so you can load the data into two different dynamicmaps and overlay them.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 27, 2016

Having outlined the idea there are various methods that are problematic because they can't easily be lazily evaluated:

  • range could be made to work by working in chunks and with some caching, which would allow for our regular normalization options to be applied.
  • dimension_values for the value dimension would mean returning a 1D array of literally all the datapoints, which isn't really desirable.
  • groupby is a tough one but could be made to work if we had proper support for constant dimensions, which would also allow you to explode a DynamicMap into a GridSpace or NdLayout of DynamicMaps using .layout and .grid.
  • collapse is also tough because it would require more chunking so that only a certain amount of the data is in memory at the same time.

Everything else operates on individual elements and should thus be relatively easy to support.

@philippjfr philippjfr added the feature label Feb 4, 2016

@philippjfr philippjfr added this to the v1.5.0 milestone Feb 4, 2016

@philippjfr

This comment has been minimized.

Member

philippjfr commented Feb 10, 2016

Just a quick note that .relabel with depth > 1 should also lazily evaluate on generated Elements.

@philippjfr philippjfr modified the milestones: v1.6.0, v1.5.0 Apr 20, 2016

@philippjfr

This comment has been minimized.

Member

philippjfr commented May 9, 2016

As a summary, these are the methods that now support lazy evaluation:

  • __call__
  • __mul__
  • groupby
  • overlay, grid, layout
  • relabel with depth >= 1
  • redim
  • __getitem__/select (deep slicing and deep select)
  • hist

Methods that aren't yet implemented:

  • map
  • collate

Methods that will never be implemented:

  • dframe
  • table
  • dimension_values for non-key dimensions
  • split_overlay
  • collapse
  • sample
  • reduce/aggregate
@jlstevens

This comment has been minimized.

Member

jlstevens commented Jan 30, 2017

I would prefer it if these methods that cannot be made lazy simply didn't exist on DynamicMap. Even if they are made to always raise NotImplementedError, the fact they are defined and tab-complete but don't actually work is rather annoying.

Sadly, I don't know a good way of 'undefining' inherited methods (see this stackoverflow post) so the only suggestion is that these methods are introduced as a mixin class for HoloMap only, with a new intermediate class shared by HoloMap and DynamicMap.

The prospect of more classes isn't one I am particularly pleased with but I do think we should try to solve the issue of these dud methods somehow.

Edit: I know that IPython annoyingly ignores the order of __dir__ but maybe it doesn't ignore the set of items returned by it. In that case we could use this along with the trick proposed in the stackoverflow post of declaring these dud methods as properties that raise AttributeError...

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 31, 2017

A shared base class seems like the clean way to do this. Undefining those methods will completely violate the meaning of class inheritance, which is really not a good idea---the subclass then no longer satisfies "is a", making it very hard to reason about.

Coming up with a name for the shared base class seems like the trickiest part, as I don't know what defines the set of things that make sense for both HoloMaps and DynamicMaps. But having such a name will be useful, because there are already a variety of cases where we need to refer to both of them without making a distinction. "Map" is the clearly shared bit, but doesn't mean much (which makes both of those names slightly dubious already).

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 31, 2017

A shared base class seems like the clean way to do this.

Strongly agree, we decided on ViewableMap as the baseclass.

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 31, 2017

That name sounds reasonable, though of course a Curve is also a viewable map (from x to y), as is every other HoloViews Element with key and value dimensions. Can't think of anything better, though.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Jan 31, 2017

We already have a ViewableElement base class of Element which is why I suggested ViewableMap: this scheme seems fairly consistent to me.

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 31, 2017

Sure.

Fundamentally, what all three of these classes share is that they (a) contain at least one additional value dimension beyond those mapped on to the screen at any one instant, and (b) specifically allow those other dimension(s) to be selected or iterated over or animated to choose the one specific value to be displayed at that instant. I.e., for dimensions n>s, dimensions 1 to s are mapped onto the screen in some way, and dimensions s+1 to n are selected in some way, whether by widgets or iteration.

So ViewableMap is a bit backwards, since compared to a regular Element, what's different about these classes is that they have mappings that are not viewable at any one time, only if iterated over. So IterableMap or SelectableMap or PlayableMap would also be true. Anyway, I don't much like any of these names; nothing is really all that accurate or compelling, mainly because the Map bit is not actually the thing that should be shared between all three, as that's not the distinguishing bit compared to Elements or other Containers.

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 31, 2017

But of course, it's too late to change HoloMap and DynamicMap...

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 31, 2017

ViewableMap should be read as "Map of Viewables" not as "Map that is Viewable".

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 31, 2017

That would work if Elements were called Viewables, but they aren't.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 31, 2017

They are Viewable types along with Overlays.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Jan 31, 2017

Twice I was about to echo Philipp's reply but he beat me to it ;-p

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 31, 2017

I don't know what you mean; there doesn't seem to be anything called Viewable in the hv source code, apart from one comment:

~/holoviews/holoviews> \grep Viewable *.py */*.py | \grep -v ViewableElement
plotting/renderer.py:        Given a HoloViews Viewable return a corresponding plot instance.
@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 31, 2017

Right, the actual name is ViewableElement.

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 31, 2017

So we are meant to read ViewableMap as "a map of ViewableElements"? Then it all makes sense, but I have to say that it's quite a stretch! You didn't like ElementMap, in that case? "Element" is normally a noun, while "Viewable" is normally an adjective, so ElementMap naturally leads to the parsing you had in mind.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 31, 2017

So we are meant to read ViewableMap as "a map of ViewableElements"? Then it all makes sense, but I have to say that it's quite a stretch! You didn't like ElementMap, in that case?

Right, ViewableElement could really be renamed to Viewable. Basically Overlay types are also ViewableElement types, when I'd argue they should just be Viewable types since they really are not Elements. That's also why ElementMap isn't right.

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 31, 2017

Well, you don't often use ViewableElement anywhere in the documentation, so I would think you could change it to Viewable now. And then one could replace many of the occurrences of Element in the tutorials to say the base class Viewable instead, i.e. anything that could either be an Element or an Overlay. At that point, ViewableMap would make sense (though still awkward due to the grammatically surprising use of Viewable as a noun (a la Smalltalk)).

That said, seems to me that a Layout is also viewable, but is not a ViewableElement, which argues for keeping the current name ViewableElement. The parent class of HoloMap and DynamicMap could then even be ViewableElementMap; it's long and ugly, but is not a name we need all that often, and it's more obvious how to parse that.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 31, 2017

Sure, for me the more descriptive the better, this class should definitely not make it beyond the namespace of the core.spaces module.

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 31, 2017

For code purposes, that's true, but for the documentation, we do need a term that unifies both HoloMap and DynamicMap, and ViewableElementMap would then be that term. Still, even though it's clunky, it probably helps explanations be clearer, because when we first introduce it we can explain why it's called that, and that there are two subtypes, which should actually help people understand HoloMap and DynamicMap better.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Jan 31, 2017

I agree ViewableElementMap is ugly and descriptive and fine for coding purposes (and I agree it should not escape core.spaces) but I'm not sure I would be happy seeing that name leak into our user docs.

The term I would be happy to use to describe both HoloMap and DynamicMap is NdMapping without mentioning ViewableElementMap. We can still use the term 'viewable' if we wish without mentioning the class names of things that users really don't need to know about...

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 31, 2017

But NdMapping doesn't convey anything about what is mapped. It's fine to mention that these are both NdMappings, but it doesn't help explain how they relate to Elements or to other containers, it just explains how they do mapping.

ViewableElementMap does explain what is mapped. There are ViewableElements, which are things that can be viewed in a single screen-based coordinate system at a single time, of which there are many types (Curve, Image, etc., and various kinds of Overlays thereof). Then there are ViewableElementMaps, which are collections of ViewableElements that select one coordinate from these mapped extra dimensions to display at any one time. ViewableElements and ViewableElementMaps share much in common -- both have one screen-based coordinate system, both take up the same space on the screen, and both can be used interchangeably in a Layout, but one is fully rendered on screen, and what's rendered onscreen for the other is just one particular coordinate's data at any one time, from some larger mapping.

To me this makes much more sense than our current documentation, where it's nearly impossible to see or remember how HoloMaps relate to Elements. Not being as deeply involved in the code and principles as you two, I think I'm actually more trustworthy in this particular case than you two -- I know very clearly what's confusing, whereas you got over your confusion years ago. :-)

@philippjfr

This comment has been minimized.

Member

philippjfr commented Mar 4, 2017

So I had a go implementing the final outstanding dynamic method, map and ran into trouble, if I allow a map that applies both to the DynamicMap and its items I ended up with infinitely nested DynamicMaps (well up to the max_recursion limit), here's my current (slightly incorrect) implementation to avoid this:

    def map(self, map_fn, specs=None, clone=True):
        """
        Recursively replaces elements using a map function when the
        specification applies.
        """
        if specs and not isinstance(specs, list): specs = [specs]
        applies = specs is None or any(self.matches(spec) for spec in specs)

        deep_mapped = self.clone() if clone else self
        if applies:
            return map_fn(deep_mapped)
        for k, v in self.items():
            deep_mapped[k] = v.map(map_fn, specs, clone)

        from ..util import Dynamic
        def dynamic_map(obj):
            return obj.map(map_fn, specs, clone)
        return Dynamic(deep_mapped, shared_data=True, operation=dynamic_map)

Will need to think about it more.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Apr 10, 2017

I've just done a final review of all the DynamicMap methods, and I think we're very close. Once #1240 is merged I believe all methods that can be made dynamic have been implemented that way. I also don't mind retaining the list of non-dynamic methods that apply to the cache. However there are three methods which absolutely cannot be supported, which are the top three in the list below. My suggestion would simply raise NotImplementedErrors for those cases.

# Should be disabled
add_dimension
drop_dimension
reindex

# General methods
dimensions
get
get_dimension
get_dimension_index
get_dimension_type
get_param_values
matches
pprint
params
warning

# Applies to cache (i.e. not dynamic)
collapse
dframe
dimension_values
items
info
keys
reduce
sample
split_overlays
table
traverse
update
values

# Dynamically implemented
__call__
__mul__
__getitem__
collate
grid
groupby
hist
layout
map
overlay
redim
relabel
select

# DynamicMap method
event
reset

One final thing I need to do is ensure that the __call__ is working correctly as I've encountered some weird interactions with it.

@jbednar

This comment has been minimized.

Member

jbednar commented Apr 10, 2017

Did there not end up being a shared base class so that these methods could truly just not be defined?

@philippjfr

This comment has been minimized.

Member

philippjfr commented Apr 10, 2017

Did there not end up being a shared base class so that these methods could truly just not be defined?

These methods are defined all the way down at the MultiDimensionalMapping level, so I don't think there's a clean way not to define them.

@jbednar

This comment has been minimized.

Member

jbednar commented Apr 10, 2017

Ok, then...

@philippjfr

This comment has been minimized.

Member

philippjfr commented Apr 10, 2017

Since merging #1262 this is now finally complete.

@philippjfr philippjfr closed this Apr 10, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment