Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize categorical aggregation and support span in _colorize #875

Merged
merged 5 commits into from Mar 20, 2020

Conversation

maihde
Copy link
Contributor

@maihde maihde commented Feb 14, 2020

I have categorical data where I want to sum a column, but grouped by category. The count_cat functionality simply counts the number of rows of each category. This PR creates a new aggregation called sum_cat.

In addition, I often need to produce multiple datashader visualizations that are colorized where the alpha range is distributed across a fixed span. This PR enables support for using span on categorical data.

Please provide comments or suggestions on how to improve the PR if necessary.

Practical examples can be found here:
https://github.com/spectriclabs/jupyter-notebooks/blob/master/Datashader-Improvements.ipynb

@jbednar
Copy link
Member

jbednar commented Feb 14, 2020

This looks great! Thanks for working on this. We've had an issue open for years now to generalize count_cat (#140), but the limitation has never affected our own work, so it's never gotten addressed. And I don't think we'd ever noticed that span wasn't implemented for categoricals! Some notes:

  • There are various tests failing right now that would need to be addressed
  • Originally, the idea was to create a single categorical "meta-aggregator" that would accept any reduction function and compute it categorically. I.e., instead of count_cat(col) and sum_cat(col), it would be categorical(count,col) and categorical(sum,col). We could then offer count_cat and sum_cat as simple macros that would be useful when using a string to specify the aggregator, but wouldn't need to try to implement each of the various reduction operators categorically. Having implemented a second categorical operator, how difficult do you think it would be to make it be a single general operation with a reduction-operator argument?
  • Once it's part of the codebase, it would need to be documented in the user guide, but it would be ok to leave that job for me since you've already made such nice examples.

@maihde
Copy link
Contributor Author

maihde commented Feb 14, 2020 via email

@jbednar
Copy link
Member

jbednar commented Feb 14, 2020

It looks like the failing tests on master are due to API changes for Pandas 1.0, which we'll address separately. For now we've pinned Pandas <1.0, so can you please rebase this PR (or patch in the changes from #876) so that the tests can run properly?

@maihde
Copy link
Contributor Author

maihde commented Feb 17, 2020 via email

@jbednar
Copy link
Member

jbednar commented Feb 17, 2020

Sounds good! I'll leave this open until then, as it's already useful as-is, but can close it as soon as the more general one is available. Thanks so much!

@maihde
Copy link
Contributor Author

maihde commented Feb 21, 2020 via email

@jbednar
Copy link
Member

jbednar commented Feb 21, 2020

Cool, thanks! Presumably this implementation would now replace count_cat and sum_cat? Presumably we'd leave a wrapper to support count_cat for old code, but now simply calling the new code, and we'd delete sum_cat entirely.

For the API for the new functionality, I think we should probably call it "by" as suggested in #140, because even though it is currently limited to a categorical axis, in the future it doesn't seem terribly difficult to generalize this approach to perform 3D aggregation by handling numerical axes as well (by first binning the value then treating it like a category). So for now it would be:

class by(Reduction):
    """Apply the provided reduction separately per categorical ``column`` value.
    Parameters
    ----------
    column : str
        Name of the column to aggregate over. Column data type must be
        categorical. Resulting aggregate has a outer dimension axis along the
        categories present.
    reduction : Reduction
        Per-category reduction function.
    """

@jbednar jbednar changed the title Add sum_cat aggregation and support for span in _colorize Generalize categorical aggregation and support span in _colorize Feb 21, 2020
@maihde
Copy link
Contributor Author

maihde commented Feb 21, 2020 via email

@maihde
Copy link
Contributor Author

maihde commented Feb 22, 2020 via email

Fixes holoviz#140 by providing the `by()` reduction which allows generic
categorical reductions.
@maihde
Copy link
Contributor Author

maihde commented Feb 27, 2020

@jbednar any thoughts on this PR? I think I have finished everything and this is ready to merge into master.

Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! It's so great to have this functionality at last; good job! I made a few tiny suggestions that you should be able to just merge in place, and then I can merge this PR. I've played with it a little, but once this PR is merged I'll try it out some more when I update the docs. Welcome to the Datashader developers club! :-)

datashader/reductions.py Outdated Show resolved Hide resolved
datashader/reductions.py Show resolved Hide resolved
examples/environment.yml Outdated Show resolved Hide resolved
doc/api.rst Outdated Show resolved Hide resolved
fix: address minor changes

Co-Authored-By: James A. Bednar <jbednar@users.noreply.github.com>
@mattastley
Copy link

@jbednar I merged in the minor changes and it should be ready to merge into master.

@jbednar jbednar merged commit 958e662 into holoviz:master Mar 20, 2020
@jbednar
Copy link
Member

jbednar commented Mar 20, 2020

Thanks!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants