Support `ds.where` and `ds.summary` and add selector #5805

Hoxbro · 2023-07-13T13:54:37Z

Example code:

import datashader as ds
import holoviews as hv
import numpy as np
import pandas as pd
from holoviews.operation.datashader import rasterize

hv.extension("bokeh")

num = 10000
np.random.seed(1)

dists = {
    cat: pd.DataFrame(
        {
            "x": np.random.normal(x, s, num),
            "y": np.random.normal(y, s, num),
            "s": s,
            "val": val,
            "cat": cat,
        }
    )
    for x, y, s, val, cat in [
        (2, 2, 0.03, 0, "d1"),
        (2, -2, 0.10, 1, "d2"),
        (-2, -2, 0.50, 2, "d3"),
        (-2, 2, 1.00, 3, "d4"),
        (0, 0, 3.00, 4, "d5"),
    ]
}

df = pd.concat(dists, ignore_index=True)
agg = ds.where(ds.min("s"))

plot = rasterize(hv.Points(df), aggregator=agg).opts(
    tools=["hover"], colorbar=True, width=500
)

With agg = ds.where(ds.min("val")):

val2.mp4

With agg = ds.where(ds.min("s")):

s.mp4

codecov-commenter · 2023-07-13T13:59:00Z

Codecov Report

Merging #5805 (cab0cc9) into main (d48950a) will increase coverage by 0.01%.
The diff coverage is 97.22%.

@@            Coverage Diff             @@
##             main    #5805      +/-   ##
==========================================
+ Coverage   88.19%   88.20%   +0.01%     
==========================================
  Files         307      307              
  Lines       63305    63399      +94     
==========================================
+ Hits        55829    55920      +91     
- Misses       7476     7479       +3

Flag	Coverage Δ
ui-tests	`22.33% <18.51%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
holoviews/operation/datashader.py	`83.80% <95.00%> (+0.42%)`	⬆️
holoviews/tests/operation/test_datashader.py	`97.59% <100.00%> (+0.13%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

holoviews/operation/datashader.py

Hoxbro · 2023-07-17T13:49:39Z

With selector:

screenrecord-2023-07-17_15.47.31.mp4

jlstevens · 2023-07-17T13:58:45Z

Looking good!

holoviews/operation/datashader.py

philippjfr

This looks really good! It will need a whole new documentation section though (which we'll have @jbednar write) and really the datashader user guide will need to be split up. I did ask one clarification question about the ways in which aggregator=where(...) and selector=... interact because I can't quite wrap my head around it.

jlstevens · 2023-07-20T15:19:43Z

Very happy with this PR already, just used it successfully in a talk at EuroPython! :-)

I'll be taking a closer look soon but my initial question is whether you have any idea how first_n and last_n might work as selectors? I suppose the DataSet could have m columns x n selector layers? While this would quickly blow up for wide data with lots of columns (or large n) I assume this is the natural extension?

Alternatively, first and last should probably behave like first_n and last_n where n=1...

jlstevens · 2023-07-20T15:23:06Z

holoviews/operation/datashader.py

+                params["vdims"] = [params["vdims"]]
+            sum_agg = ds.summary(**{str(params["vdims"][0]): agg_fn, "index": ds.where(sel_fn)})
+            agg = self._apply_datashader(dfdata, cvs_fn, sum_agg, agg_kwargs, x, y)
+            _ignore = [*params["vdims"], "index"]


While it is a good default, the name "index" is a magic value. I could be wrong but couldn't this clash with a column name?

holoviews/operation/datashader.py

jbednar · 2023-07-20T19:47:25Z

While this would quickly blow up for wide data with lots of columns (or large n) I assume this is the natural extension?

With the index array approach, first_3 should always only be 3x larger than the aggregate array, while with the approach of returning all columns the size and time taken would scale with the number and size of columns. Seems unsafe as a default!

An intermediate approach could be to return a fixed number of scalar columns only (up to e.g. 3) by default, but that seems quite arbitrary.

philippjfr · 2023-07-24T09:41:09Z

I'd say we handle the first_n and last_n cases later.

philippjfr · 2023-07-24T10:21:24Z

I'm going to merge. I'll look into the first_n and last_n thing separately.

Hoxbro added 3 commits June 22, 2023 19:23

Adding basic where support in datashader

e3a05aa

Merge branch 'main' into inspect_where

f943208

Support rd.SpecialColumn

c2845be

Hoxbro marked this pull request as draft July 13, 2023 13:54

Hoxbro added 2 commits July 13, 2023 16:44

Add agg_types

4791b4f

Use xr.Dataset to get extra columns

1ca5fa3

Hoxbro force-pushed the inspect_where branch from 2bb7c6e to 1daac82 Compare July 14, 2023 11:29

Remove hack and better naming

8a40312

Hoxbro force-pushed the inspect_where branch from 1daac82 to 8a40312 Compare July 14, 2023 11:30

Hoxbro added 4 commits July 14, 2023 14:57

Add support for where column

91df239

Merge branch 'main' into inspect_where

bfb4fc2

Don't set nodata=0

a62f828

Use sorted to reorder first item

f3347ce

Hoxbro force-pushed the inspect_where branch from 6abdf83 to f3347ce Compare July 15, 2023 14:10

Hoxbro commented Jul 15, 2023

View reviewed changes

holoviews/operation/datashader.py Outdated Show resolved Hide resolved

Hoxbro added 3 commits July 17, 2023 14:34

Simplify sorted

b8d9ab6

Refactor out _apply_datashader

ef19e71

Add selector

ff1ff5a

Hoxbro added 9 commits July 18, 2023 14:40

Add basic summary support

1918ebc

Use summary for selector

e546332

Small cleanup

9690288

Remove prefix

7d2ffc9

Fix wrong variable used

703f3a2

Fix failing test

d0d3731

Add guards for ds.where

58dd018

Revert removing column for ds.where as an aggregator

aee0dcc

Add unittests

4c4498f

Hoxbro changed the title ~~Support where inspection~~ Support ds.where and ds.summary and add selector Jul 19, 2023

Hoxbro force-pushed the inspect_where branch 2 times, most recently from 15faf72 to 16d72a1 Compare July 19, 2023 09:52

Add docstring to selector

6fb7721

Hoxbro force-pushed the inspect_where branch from 16d72a1 to 6fb7721 Compare July 19, 2023 09:53

Hoxbro marked this pull request as ready for review July 19, 2023 09:57

Hoxbro added this to the 1.17.0 milestone Jul 19, 2023

philippjfr reviewed Jul 19, 2023

View reviewed changes