Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support by(max_n) and by(min_n) #1229

Merged
merged 1 commit into from Jun 7, 2023
Merged

Support by(max_n) and by(min_n) #1229

merged 1 commit into from Jun 7, 2023

Conversation

ianthomas23
Copy link
Member

Support for categorical max_n and min_n reductions such as ds.by("cat", ds.max_n("value", n=3)) on CPU and GPU both with and without dask. This is the first part of issue #1210, support for categorical first_n, last_n and where to follow.

Example:

import datashader as ds
import numpy as np
from numpy import nan
import pandas as pd

x = np.arange(2)
df = pd.DataFrame(dict(
    y_from = [0.0, 1.0, 0.0, 1.0, 0.0],
    y_to   = [0.0, 1.0, 1.0, 0.0, 0.5],
    value  = [1.1, 3.3, 5.5, 2.2, 4.4],
    cat    = ['a', 'b', 'a', 'b', 'a'],
))
df["cat"] = df["cat"].astype("category")

canvas = ds.Canvas(plot_height=2, plot_width=3)
agg = canvas.line(source=df, x=x, y=["y_from", "y_to"], axis=1,
                  agg=ds.by("cat", ds.max_n("value", n=3)))
print(agg)

which prints

xarray.DataArray (y: 2, x: 3, cat: 2, n: 3)>
array([[[[5.5, 4.4, 1.1],
         [nan, nan, nan]],

        [[1.1, nan, nan],
         [2.2, nan, nan]],

        [[1.1, nan, nan],
         [2.2, nan, nan]]],


       [[[nan, nan, nan],
         [3.3, 2.2, nan]],

        [[5.5, 4.4, nan],
         [3.3, nan, nan]],

        [[5.5, 4.4, nan],
         [3.3, nan, nan]]]])
Coordinates:
  * x        (x) float64 0.1667 0.5 0.8333
  * y        (y) float64 0.25 0.75
  * cat      (cat) <U1 'a' 'b'
  * n        (n) int64 0 1 2
Attributes:
    x_range:  (0, 1)
    y_range:  (0.0, 1.0)

Note that the returned DataArray has shape (ny, nx, ncat, n) which I think is more logical than the alternative possibility of (ny, nx, n, ncat).

In terms of implementation, functions like nanmax_n_in_place now always accept a 4D array so that there is a single implementation for 3D (max) and 4D (max_n) arrays for each of CPU and GPU. Use of the combine function in max inserts the extra dimension of size 1 to change the shape without copying any data.

df_pd.at[2,'f32'] = nan
df_pd.at[2,'f64'] = nan
df_pd.at[2,'plusminus'] = nan
# x 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find myself creating this manual table whenever I need to check new tests, so here including it to make it easier to check new tests in future.

@@ -652,7 +708,7 @@ def test_categorical_sum_binning(df):


@pytest.mark.parametrize('df', dfs)
def test_categorical_max(df):
def test_categorical_max2(df):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the existing categorical max test but renaming it so it doesn't overwrite the new one which is above it in this file.

@ianthomas23 ianthomas23 added this to the v0.15.1 milestone Jun 6, 2023
@codecov
Copy link

codecov bot commented Jun 6, 2023

Codecov Report

Merging #1229 (bfdcc66) into main (28c8581) will decrease coverage by 0.04%.
The diff coverage is 67.79%.

@@            Coverage Diff             @@
##             main    #1229      +/-   ##
==========================================
- Coverage   83.62%   83.59%   -0.04%     
==========================================
  Files          35       35              
  Lines        8738     8751      +13     
==========================================
+ Hits         7307     7315       +8     
- Misses       1431     1436       +5     
Impacted Files Coverage Δ
datashader/transfer_functions/_cuda_utils.py 20.63% <0.00%> (ø)
datashader/reductions.py 79.02% <47.05%> (-0.22%) ⬇️
datashader/compiler.py 88.42% <100.00%> (+0.06%) ⬆️
datashader/utils.py 81.63% <100.00%> (+0.09%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jbednar
Copy link
Member

jbednar commented Jun 6, 2023

We'll need some docs at the Datashader level when you're done with all this, of course.

@ianthomas23 ianthomas23 merged commit f917cd9 into holoviz:main Jun 7, 2023
14 of 16 checks passed
@ianthomas23 ianthomas23 deleted the cat_max_n branch June 7, 2023 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants