Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add `sum_cat` aggregation and support for `span` in `_colorize` #875

Open
wants to merge 4 commits into
base: master
from

Conversation

@maihde
Copy link

maihde commented Feb 14, 2020

I have categorical data where I want to sum a column, but grouped by category. The count_cat functionality simply counts the number of rows of each category. This PR creates a new aggregation called sum_cat.

In addition, I often need to produce multiple datashader visualizations that are colorized where the alpha range is distributed across a fixed span. This PR enables support for using span on categorical data.

Please provide comments or suggestions on how to improve the PR if necessary.

Practical examples can be found here:
https://github.com/spectriclabs/jupyter-notebooks/blob/master/Datashader-Improvements.ipynb

@jbednar

This comment has been minimized.

Copy link
Member

jbednar commented Feb 14, 2020

This looks great! Thanks for working on this. We've had an issue open for years now to generalize count_cat (#140), but the limitation has never affected our own work, so it's never gotten addressed. And I don't think we'd ever noticed that span wasn't implemented for categoricals! Some notes:

  • There are various tests failing right now that would need to be addressed
  • Originally, the idea was to create a single categorical "meta-aggregator" that would accept any reduction function and compute it categorically. I.e., instead of count_cat(col) and sum_cat(col), it would be categorical(count,col) and categorical(sum,col). We could then offer count_cat and sum_cat as simple macros that would be useful when using a string to specify the aggregator, but wouldn't need to try to implement each of the various reduction operators categorically. Having implemented a second categorical operator, how difficult do you think it would be to make it be a single general operation with a reduction-operator argument?
  • Once it's part of the codebase, it would need to be documented in the user guide, but it would be ok to leave that job for me since you've already made such nice examples.
@maihde

This comment has been minimized.

Copy link
Author

maihde commented Feb 14, 2020

@jbednar

This comment has been minimized.

Copy link
Member

jbednar commented Feb 14, 2020

It looks like the failing tests on master are due to API changes for Pandas 1.0, which we'll address separately. For now we've pinned Pandas <1.0, so can you please rebase this PR (or patch in the changes from #876) so that the tests can run properly?

@maihde maihde force-pushed the spectriclabs:develop-spectric branch from 5c6535a to 2d64820 Feb 15, 2020
@maihde maihde force-pushed the spectriclabs:develop-spectric branch from 2d64820 to 20f00f3 Feb 16, 2020
@maihde

This comment has been minimized.

Copy link
Author

maihde commented Feb 17, 2020

@jbednar

This comment has been minimized.

Copy link
Member

jbednar commented Feb 17, 2020

Sounds good! I'll leave this open until then, as it's already useful as-is, but can close it as soon as the more general one is available. Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.