Add new where reduction #1155

ianthomas23 · 2022-12-16T18:13:53Z

This partially implements issue #1126, adding a new where reduction that accepts either a max or min reduction. Best illustrated via an example:

import datashader as ds
import numpy as np
import pandas as pd

x = np.arange(2)
df = pd.DataFrame(dict(
    y_from = [0.0, 1.0, 0.0, 1.0, 0.0],
    y_to   = [0.0, 1.0, 1.0, 0.0, 0.5],
    value  = [1.1, 3.3, 5.5, 2.2, 4.4],
    other  = [-55, -77, -99, -66, -88],
))

canvas = ds.Canvas(plot_height=3, plot_width=5)
agg = canvas.line(
    source=df, x=x, y=["y_from", "y_to"], axis=1,
    agg=ds.where(ds.max("value"), "other"),
)

print(agg)

which outputs

<xarray.DataArray (y: 3, x: 5)>
array([[-99., -88., -55., -66., -66.],
       [ nan, -99., -99., -88., -88.],
       [-77., -77., -77., -99., -99.]])
Coordinates:
  * x        (x) float64 0.1 0.3 0.5 0.7 0.9
  * y        (y) float64 0.1667 0.5 0.8333

You can think of this using the max('value') reduction as normal, but then returning the corresponding values from the 'other' column rather that the value column.

What it currently supports:

where takes either a min or max selector reduction.
Works on CPU (not GPU), with or without dask.
Works with antialiased lines.
Cannot be used within a summary or categorical by reduction.

Note that there is no support for use of first and last within a where because there is no advantage in doing this, you can just use the first or last directly on their own.

Future improvements:

Support within categorical reductions.
Support for GPU.
If lookup_column is not specified, use the index of the row in the supplied DataFrame.
New max_n, min_n, first_n, last_n reductions.

All of these are possible but fiddly to implement, so I would rather have partial functionality available for users to experiment with and I can add these improvements over time.

Currently some combinations of lines and dask give different results depending on the number of dask partitions, but this has always been the situation and is no worse here.

ianthomas23 · 2022-12-16T18:14:16Z

datashader/reductions.py

@@ -1159,6 +1223,75 @@ def _finalize(bases, **kwargs):
        raise NotImplementedError("mode is currently implemented only for rasters")


+class where(FloatingReduction):
+    def __init__(self, selector: Reduction, lookup_column: str):


Needs a docstring with a good example.

I have added a docstring. It is somewhat tortuous English when written in a generic form so I have added a concrete example that should help.

I am using the argument name selector because long-term I would like to divide the Reduction class hierarchy into two:

Selection reductions that use values from a column without modifying them, e.g. first and max.

Combination reductions that do some form of mathematical combination of the values from a column, e.g. mean, std, count.

Given that division, a where reduction will accept a Selection but not a Combination to select values from the lookup_column.

Ok, sounds good. Agreed that it's tortuous, but the example does help a lot.

jbednar · 2022-12-16T19:36:09Z

Thanks! Can you clarify the current status of types? I.e. can you return an integer aggregate when testing on a float condition?

ianthomas23 · 2022-12-18T13:53:08Z

where always returns a float64 with nans to represent no data, just as min, max, first, last etc reductions.

jbednar · 2022-12-20T21:42:57Z

Ok, I guess we'll need to deal with datatype issues when we support using the Pandas index as the "column" (actually just imputed values that act like a column, hence needing special support).

ianthomas23 · 2023-01-09T09:28:49Z

Rebased on top of main to pick up the CI fixes.

ianthomas23 · 2023-01-09T11:21:58Z

The reduction in coverage is mostly due to changes to the CUDA append functions and such CUDA code is not run in github actions.

codecov · 2023-01-09T12:33:51Z

Codecov Report

Merging #1155 (b34ffd6) into main (645ae07) will increase coverage by 0.03%.
The diff coverage is 83.68%.

@@            Coverage Diff             @@
##             main    #1155      +/-   ##
==========================================
+ Coverage   85.39%   85.43%   +0.03%     
==========================================
  Files          35       35              
  Lines        7819     7941     +122     
==========================================
+ Hits         6677     6784     +107     
- Misses       1142     1157      +15

Impacted Files	Coverage Δ
datashader/core.py	`88.05% <ø> (ø)`
datashader/reductions.py	`86.94% <80.83%> (-0.29%)`	⬇️
datashader/compiler.py	`95.62% <100.00%> (+0.53%)`	⬆️
datashader/glyphs/line.py	`92.95% <0.00%> (+0.09%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

ianthomas23 · 2023-01-11T12:02:44Z

Pinging @jbednar. I'd like to merge this and add the extra functionality (such as use of a virtual integer row index) as separate PRs.

jbednar

Those look like some painful changes, and it's great to have them behind us! Thanks, and it looks good to me!

jbednar · 2023-01-13T00:05:38Z

datashader/reductions.py

@@ -507,6 +531,10 @@ def __init__(self, cat_column, reduction=count()):
            self.categorizer = category_codes(cat_column)
        else:
            raise TypeError("first argument must be a column name or a CategoryPreprocess instance")
+
+        if isinstance(reduction, where):
+            raise TypeError("'by' reduction cannot use a 'where' reduction")


Can you make "use" more specific? Cannot accept, does not support, etc.? It's confusing to read about reductions using reductions.

jbednar · 2023-01-13T00:07:12Z

datashader/reductions.py

@@ -1159,6 +1223,75 @@ def _finalize(bases, **kwargs):
        raise NotImplementedError("mode is currently implemented only for rasters")


+class where(FloatingReduction):
+    def __init__(self, selector: Reduction, lookup_column: str):


Ok, sounds good. Agreed that it's tortuous, but the example does help a lot.

ianthomas23 commented Dec 16, 2022

View reviewed changes

ianthomas23 added 14 commits January 9, 2023 09:26

Basics of where reduction

9ec80a1

Where with min or max

8f4aa30

Add bool returns to all append functions

e15a68d

Tidy up

d71b897

where cannot use same column as its contained reduction

8e489ba

Dask combine for cpu where reduction

a1cae7f

More sensible arg names

7a63723

Add new tests

6a47af3

Dask where.combine for multiple-stage combines

d0c030c

Support antialiasing, including dask cpu

02b30b1

cuda tests

aae19f6

Temporarily disable test_dask.py::test_line_antialias_where

26e0e1f

Fix antialiased where reduction tests

9ab6d6d

Exclude elements that may vary from dask line antialias where tests

140b631

where docstring

4b2172d

Exclude docstring example from doctest

efd78ec

ianthomas23 requested a review from jbednar January 12, 2023 09:41

ianthomas23 mentioned this pull request Jan 12, 2023

Where reduction using dataframe row index #1164

Merged

jbednar approved these changes Jan 13, 2023

View reviewed changes

Improved exception message

b34ffd6

ianthomas23 merged commit 2e0f8e0 into holoviz:main Jan 16, 2023

ianthomas23 deleted the where_reduction branch January 16, 2023 10:11

ianthomas23 added this to the v0.14.4 milestone Jan 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new where reduction #1155

Add new where reduction #1155

ianthomas23 commented Dec 16, 2022

ianthomas23 Dec 16, 2022

ianthomas23 Jan 9, 2023

jbednar Jan 13, 2023

jbednar commented Dec 16, 2022

ianthomas23 commented Dec 18, 2022

jbednar commented Dec 20, 2022

ianthomas23 commented Jan 9, 2023

ianthomas23 commented Jan 9, 2023

codecov bot commented Jan 9, 2023 •

edited

Loading

ianthomas23 commented Jan 11, 2023

jbednar left a comment

jbednar Jan 13, 2023

ianthomas23 Jan 16, 2023

jbednar Jan 13, 2023

Add new where reduction #1155

Add new where reduction #1155

Conversation

ianthomas23 commented Dec 16, 2022

ianthomas23 Dec 16, 2022

Choose a reason for hiding this comment

ianthomas23 Jan 9, 2023

Choose a reason for hiding this comment

jbednar Jan 13, 2023

Choose a reason for hiding this comment

jbednar commented Dec 16, 2022

ianthomas23 commented Dec 18, 2022

jbednar commented Dec 20, 2022

ianthomas23 commented Jan 9, 2023

ianthomas23 commented Jan 9, 2023

codecov bot commented Jan 9, 2023 • edited Loading

Codecov Report

ianthomas23 commented Jan 11, 2023

jbednar left a comment

Choose a reason for hiding this comment

jbednar Jan 13, 2023

Choose a reason for hiding this comment

ianthomas23 Jan 16, 2023

Choose a reason for hiding this comment

jbednar Jan 13, 2023

Choose a reason for hiding this comment

codecov bot commented Jan 9, 2023 •

edited

Loading