-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for dynamic groupby on all data interfaces #711
Conversation
266480d
to
11bdd6e
Compare
Requires review and discussion about the behavior described above. |
By an empty element, do you mean an element without any data? I remember discussing empty elements with you ages ago. If that is indeed what you mean, then that is the right behaviour. As long as all the visualization code is happy to process elements without data in them. |
Yes, basically Elements containing a length zero array or equivalent. We'd likely have to double check that all plots will handle them correctly though. |
Maybe for a separate issue, but I would like to say we always support empty elements. To do this, it would be good to automatically test that empty elements always work. That said, I'm not sure that what an 'empty element' is, is always defined. I suppose it is any valid datastructure (shape, type etc) with no data in it? Though, how could you have an empty For instance, we could have arranged it so I like the idea of empty elements and have wanted to support them for ages. I'm just not entirely sure that their semantics (i.e how they should be declared) is entirely defined and unambiguous. |
You can define an array of shape (0, 0), so I don't think it's an issue. In the sparse data formats the shape is (0, D), which also works fine. Just need to make sure the plots don't choke on it. |
I suppose the simplest solution might be to define the semantics as 'an empty element is any element with zero length data'. Very clear, even if you can always make data of the right shape etc that is empty. The assumption though is that Otherwise, you can just declare it appropriately as you suggest. |
Length on Elements using the data interfaces is always defined as the total number of samples so for a grid based interface that's the product of the shape and in the column based format that's the number of rows. Checking for an empty Element should therefore be easy, the only problem is that some artists in matplotlib and bokeh might not be allow being initialized with an empty array. |
744f187
to
733d329
Compare
Looks good and dynamic is False by default so nothing should break unless the new feature is used. Merging. |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
A dynamic version of groupby proved exceptionally useful for large datasets we can now handle via the iris interface. However it can trivially be implemented in a general way using select, which is what I've done here.
However there's also some cases where the behavior is not well defined. When you apply a dynamic groupby to columnar dataset, it can be sparse, which means that some portions of the cartesian grid the DynamicMap defines can be empty. A simple example would be something like this:
Here the value entry for USA and 1996 did not exist, so it returned an empty Element. Alternatively it could raise a KeyError. However the semantics of a DynamicMap mean that anything inside the space defined by theDimensions should be addressable, I think returning an empty Element might be more appropriate. However when you access a value that was not defined in the original Dataset it should definitely raise a KeyError:
So I'll have to make sure that
DynamicMap.__getitem__
(and select) ensure that when in bounded mode it checks the requested key is in the defined values, not just the bounds.