New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dropping dimensions in gridded groupby #1154

Merged
merged 6 commits into from Feb 27, 2017

Conversation

Projects
None yet
3 participants
@philippjfr
Member

philippjfr commented Feb 26, 2017

When using the .to method to convert a gridded dataset into lower dimensional views it is possible to drop dimensions, e.g. to visualize the distribution of values in some space. Think for example of a 3D cube of temperatures (lat * lon * altitude) . I want to view the distribution of values for each altitude, dropping the lat and lon dimensions. Previously we supported this in some limited cases but there's a simple solution to the problem, when applying a groupby on a gridded dataset which drops certain dimensions we simply convert to a columnar format. This will only affect the .to method which is meant specifically to convert high-dimensional data into a lower dimensional view so I think it's consistent with the overall semantics.

@philippjfr philippjfr added the feature label Feb 26, 2017

@jbednar

This comment has been minimized.

Member

jbednar commented Feb 26, 2017

Sounds like the right thing to do, to me.

@philippjfr philippjfr requested a review from jlstevens Feb 26, 2017

@jlstevens

This comment has been minimized.

Member

jlstevens commented Feb 27, 2017

This makes sense for conversion to elements that work best with columnar data anyway - no need to throw away data unless really necessary (and in most cases, it shouldn't be necessary).

My main question is about the potential performance implications of this approach?

@jlstevens

This comment has been minimized.

Member

jlstevens commented Feb 27, 2017

The title of this PR should be updated as I think it is misleading - if I understand correctly, this PR is preserving data instead of dropping it!

@philippjfr

This comment has been minimized.

Member

philippjfr commented Feb 27, 2017

My main question is about the potential performance implications of this approach?

Unless your data is huge it's pretty fast, it only expands the dimensions that are left so there's no huge overhead there and even then you can use the dynamic groupby. In the end it's definitely better than applying the groupby and then throwing an exception because the Elements can't interpret the data.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Feb 27, 2017

Ok, I now see that these operations weren't working at all before (raising exceptions).

Note that it is good to preserve columns, even if they aren't used in the rendering/display. We might want to consider a utility that traverses a nested structure to strip out unused columns (e.g if the user decides they are happy to throw away data to save space). Might be worth filing as a feature request...

@philippjfr

This comment has been minimized.

Member

philippjfr commented Feb 27, 2017

Note that it is good to preserve columns, even if they aren't used in the rendering/display. We might want to consider a utility that traverses a nested structure to strip out unused columns (e.g if the user decides they are happy to throw away data to save space).

I think something like that might be a good idea, we should probably also look into how reindex behaves on the different interfaces. In some cases it explicitly drops columns (e.g. array) when in other cases it simply ignores the dimension, being more consistent about this behavior or providing better control over it might be enough to address this problem.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Feb 27, 2017

Ok, looks good! Merging.

@jlstevens jlstevens merged commit 336ccf6 into master Feb 27, 2017

4 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.006%) to 78.208%
Details
s3-reference-data-cache Test data is cached.
Details

@philippjfr philippjfr deleted the groupby_fixes branch Feb 27, 2017

@philippjfr philippjfr added this to the v1.7.0 milestone Feb 28, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment