Added support for dynamic groupby on all data interfaces #711

philippjfr · 2016-06-04T20:48:33Z

A dynamic version of groupby proved exceptionally useful for large datasets we can now handle via the iris interface. However it can trivially be implemented in a general way using select, which is what I've done here.

However there's also some cases where the behavior is not well defined. When you apply a dynamic groupby to columnar dataset, it can be sparse, which means that some portions of the cartesian grid the DynamicMap defines can be empty. A simple example would be something like this:

import holoviews as hv
dataset = hv.Dataset((['UK', 'UK', 'USA'], [1995, 1996, 1995],  [0.1, 0.2, 0.3]),
                     kdims=['Country', 'Year', 'Index'], vdims=['Value'])
dmap = dataset.groupby(['Country', 'Year'], dynamic=True)

:DynamicMap   [Country,Year]

assert len(dmap['USA', 1996]) == 0

Here the value entry for USA and 1996 did not exist, so it returned an empty Element. Alternatively it could raise a KeyError. However the semantics of a DynamicMap mean that anything inside the space defined by theDimensions should be addressable, I think returning an empty Element might be more appropriate. However when you access a value that was not defined in the original Dataset it should definitely raise a KeyError:

dmap['Canada', 1955]

So I'll have to make sure that DynamicMap.__getitem__ (and select) ensure that when in bounded mode it checks the requested key is in the defined values, not just the bounds.

philippjfr · 2016-06-08T16:15:48Z

Requires review and discussion about the behavior described above.

jlstevens · 2016-06-08T23:12:14Z

By an empty element, do you mean an element without any data?

I remember discussing empty elements with you ages ago. If that is indeed what you mean, then that is the right behaviour. As long as all the visualization code is happy to process elements without data in them.

philippjfr · 2016-06-08T23:15:11Z

By an empty element, do you mean an element without any data?

Yes, basically Elements containing a length zero array or equivalent. We'd likely have to double check that all plots will handle them correctly though.

jlstevens · 2016-06-08T23:35:22Z

Maybe for a separate issue, but I would like to say we always support empty elements. To do this, it would be good to automatically test that empty elements always work. That said, I'm not sure that what an 'empty element' is, is always defined. I suppose it is any valid datastructure (shape, type etc) with no data in it? Though, how could you have an empty MxN numpy array for instance?

For instance, we could have arranged it so data=None could be supported everywhere. Which could be useful but would be an orthogonal feature to empty data.

I like the idea of empty elements and have wanted to support them for ages. I'm just not entirely sure that their semantics (i.e how they should be declared) is entirely defined and unambiguous.

philippjfr · 2016-06-08T23:38:26Z

Maybe for a separate issue, but I would like to say we always support empty elements. To do this, it would be good to automatically test that empty elements always work. That said, I'm not sure that what an 'empty element' is, is always defined. I suppose it is any valid datastructure (shape, type etc) with no data in it? Though, how could you have an empty MxN numpy array for instance?

You can define an array of shape (0, 0), so I don't think it's an issue. In the sparse data formats the shape is (0, D), which also works fine. Just need to make sure the plots don't choke on it.

jlstevens · 2016-06-08T23:40:15Z

I suppose the simplest solution might be to define the semantics as 'an empty element is any element with zero length data'. Very clear, even if you can always make data of the right shape etc that is empty. The assumption though is that len makes sense for all the data structures we support.

Otherwise, you can just declare it appropriately as you suggest.

philippjfr · 2016-06-08T23:43:11Z

I suppose the simplest solution might be to define the semantics as 'an empty element is any element with zero length data'. Very clear, even if you can always make data of the right shape etc that is empty. The assumption though is that len makes sense for all the data structures we support.

Length on Elements using the data interfaces is always defined as the total number of samples so for a grid based interface that's the product of the shape and in the column based format that's the number of rows. Checking for an empty Element should therefore be easy, the only problem is that some artists in matplotlib and bokeh might not be allow being initialized with an empty array.

jlstevens · 2016-07-14T05:28:15Z

Looks good and dynamic is False by default so nothing should break unless the new feature is used. Merging.

github-actions · 2024-10-26T04:28:58Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

philippjfr added the in progress label Jun 4, 2016

philippjfr force-pushed the dataset_dynamic_groupby branch from 266480d to 11bdd6e Compare June 6, 2016 15:02

philippjfr added type: feature A major new feature tag: component: data labels Jun 6, 2016

philippjfr added 4 commits July 13, 2016 22:45

Added support for dynamic groupby on all data interfaces

56cff41

Implemented dropping static dimensions using iris reindex

ae47ae9

Handled scalar values in dynamic Dataset.groupby

0478e02

Fix for dynamic Dataset.groupby

733d329

philippjfr force-pushed the dataset_dynamic_groupby branch from 744f187 to 733d329 Compare July 14, 2016 03:45

jlstevens merged commit 124019e into master Jul 14, 2016

philippjfr removed the in progress label Jul 14, 2016

philippjfr deleted the dataset_dynamic_groupby branch September 2, 2016 00:57

github-actions bot locked as resolved and limited conversation to collaborators Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for dynamic groupby on all data interfaces #711

Added support for dynamic groupby on all data interfaces #711

philippjfr commented Jun 4, 2016 •

edited

Loading

philippjfr commented Jun 8, 2016

jlstevens commented Jun 8, 2016

philippjfr commented Jun 8, 2016

jlstevens commented Jun 8, 2016 •

edited

Loading

philippjfr commented Jun 8, 2016 •

edited

Loading

jlstevens commented Jun 8, 2016

philippjfr commented Jun 8, 2016 •

edited

Loading

jlstevens commented Jul 14, 2016

github-actions bot commented Oct 26, 2024

Added support for dynamic groupby on all data interfaces #711

Added support for dynamic groupby on all data interfaces #711

Conversation

philippjfr commented Jun 4, 2016 • edited Loading

philippjfr commented Jun 8, 2016

jlstevens commented Jun 8, 2016

philippjfr commented Jun 8, 2016

jlstevens commented Jun 8, 2016 • edited Loading

philippjfr commented Jun 8, 2016 • edited Loading

jlstevens commented Jun 8, 2016

philippjfr commented Jun 8, 2016 • edited Loading

jlstevens commented Jul 14, 2016

github-actions bot commented Oct 26, 2024

philippjfr commented Jun 4, 2016 •

edited

Loading

jlstevens commented Jun 8, 2016 •

edited

Loading

philippjfr commented Jun 8, 2016 •

edited

Loading

philippjfr commented Jun 8, 2016 •

edited

Loading