Multi-dimensional dim transforms on data sets #4080

poplarShift · 2019-10-31T13:11:24Z

Addresses #3932 and #237

Supersedes #3636.

Related to #3790.

In this PR, we have the possibility to apply arbitrary dim transforms with multiple output values to Datasets, taking into account correct insertion of dimensions. The name of the new method used here is .transform, which provides two ways of specifying transforms. You either supply tuples of dimensions and dim_transforms as args or as kwargs and the method will apply these transforms and either replace the existing dimensions or add the newly added dimensions as new value dimensions.

The upside of all of this is that we get complex statistical aggregation for free - see below for an example with hex bins that compute trends within each bin.

List of changes

Add Dataset.transform
Implement Interface.assign
Dataset.aggregate now handles dim transforms
tests

Setup

import xarray as xr
from holoviews import Dataset
import numpy as np
import pandas as pd
import holoviews as hv
from holoviews import dim, opts

Multi-dimensional dim transforms and aggregations

df = pd.DataFrame(dict(
    x=np.array(range(7))%2,
    y=np.array(range(7))%3,
    u=np.array(range(7))%4,
    v=np.array(range(7))%5
))
ds = Dataset(df, ['x', 'y'], ['u', 'v'])
nds = ds.groupby(['x'])

# scalar output
tf1 = dim('u', lambda u, v: np.sum(u) + np.sum(v), dim('v'))
# tuple of arrays
tf2 = dim('u', lambda u, v: (u, np.mean(v)), dim('v'))

print(ds.data.head())

x	y	u	v
0	0	0	0
1	1	1	1
0	2	2	2
1	0	3	3
0	1	0	4

print(ds.transform(w=tf1).data)

# same as:
# ds.transform(('w', tf1)).data

x	y	u	v	w
0	0	0	0	20
1	1	1	1	20
0	2	2	2	20
1	0	3	3	20
0	1	0	4	20
1	2	1	0	20
0	0	2	1	20

print(ds.transform((('a', 'b'), tf2), drop=True).data)

a	b
0	1.5714285714285714
1	1.5714285714285714
2	1.5714285714285714
3	1.5714285714285714

print(ds.aggregate('y', w=tf1).data)

y	w
0	9
1	6
2	5

Example: Complex hex binning operations

hv.extension('bokeh')
xds = xr.tutorial.open_dataset('air_temperature').sel(time=slice(None, '2013-1-5'))
df = xds.to_dataframe().reset_index()

def regression(x1, x2):
    x1 = pd.to_numeric(x1)/1e9/86400.
    p = np.polyfit(x1, x2, 1)
    return p[0], p[1]

tf = dim('time', regression, dim('air'))
ds = hv.Dataset(df, ['lon', 'lat'], ['time', 'air'])

e = hv.HexTiles(ds)
e.opts(gridsize=10, aggregator=(('trend', 'offset'), tf),
       color=dim('trend'),
       scale=dim('offset').norm(),
       colorbar=True, width=600)

philippjfr · 2020-01-15T21:38:05Z

Sorry I never reviewed this. I'd very much like to get this into the release, so I'm going to take it over.

holoviews/core/dimension.py

holoviews/core/data/__init__.py

poplarShift · 2020-02-29T20:53:27Z

Sorry for not replying earlier. I currently don't have a lot of spare time but I'm still interested in getting this in.

philippjfr · 2020-03-06T11:51:26Z

@poplarShift I have a potentially much simpler implementation here, but one thing I haven't figured out is the drop_duplicate_data keyword argument? Why is that needed here?

poplarShift · 2020-03-06T15:19:04Z

I see you already went ahead and got rid of it. I like the solution with .assign!

philippjfr · 2020-03-06T18:13:16Z

I see you already went ahead and got rid of it. I like the solution with .assign!

I did, but is there a good reason why I shouldn't have?

jbednar

This is some really amazing and useful functionality. Among many other things, it makes the new link_selections even more powerful, by making it simple to express arbitrarily complex data transformation pipelines that can then be linked by dimension automatically. This is a major step up in power for HoloViews!

holoviews/core/accessors.py

holoviews/core/data/__init__.py

holoviews/core/data/dictionary.py

holoviews/core/util.py

holoviews/util/transform.py

Co-Authored-By: James A. Bednar <jbednar@users.noreply.github.com>

poplarShift · 2020-03-09T20:30:32Z

@philippjfr Awesome!

Sorry for being a bit slow with replying these days. I don't remember the exact reasons for the drop_duplicate_data kwarg, but I think they were specific to my implementation.

Also thanks for seeing this through, I'm super excited about using this straight from the source instead of my monkey patched code snippets! As @jbednar said this is indeed a major step up for workflow design.

poplarShift mentioned this pull request Oct 31, 2019

HexTiles aggregator with multiple input vdims #3636

Closed

philippjfr reviewed Jan 15, 2020

View reviewed changes

holoviews/core/dimension.py Outdated Show resolved Hide resolved

philippjfr reviewed Jan 15, 2020

View reviewed changes

holoviews/core/data/__init__.py Outdated Show resolved Hide resolved

poplarShift and others added 20 commits March 6, 2020 11:09

enable vdim insertion in datasets

5b9718f

add transform to add new dimensions

ec4f186

add option to traverse only first level of children

ad264aa

add dim transform to dimensioned containers

555ef28

drop_dimensions method on pandas and xr interfaces

5cb209c

drop remaining dimensions after transform

c076a1b

drop duplicate values after dropping dimensions

bf0f451

move computation of multi-output transform out of loop

cc23478

annotation

db7e8be

fix dimension insertion index

c9314f2

enable dropping of dims and duplicates in ndmapping transforms

89f26eb

Enable Dataset aggregation using dim transforms

31cca92

aggregation defaults to all kdims

71eac0d

handle dataset without dim restrictions during aggregation

eb57903

allow single string for transform output signature

eb78673

enable bokeh hextiles aggregation with dim transforms

029fb3f

enable overwriting of dataset dimensions

347f592

enable passing dim transforms to hex tiles

805907b

fix dim insertion position

ea00cb5

Transforms improvements

3a8d4d2

Implement assign based transform method

0e191a6

philippjfr force-pushed the transforms branch from 2d36b9b to 0e191a6 Compare March 6, 2020 12:32

philippjfr added 2 commits March 6, 2020 13:56

Further cleanup

7bf8fcd

Fixed flake

0109fd2

Add more tests

e35af03

philippjfr added 6 commits March 6, 2020 17:41

Allow arbitrary dim expressions

d867e64

Allow applying transform method to indexed datastructure

b2f6051

Fixed flakes

99ce8bb

Add support for dropping coords

6e83b97

Defer NumPy function calls to method on data

cb634cb

Fixed flakes

6221ae6

philippjfr force-pushed the transforms branch from f94ff9a to 6221ae6 Compare March 6, 2020 17:39

philippjfr added 4 commits March 6, 2020 18:50

Fixed py2 issue

680b5f2

Minor fixes for numpy transforms

315025c

Better error handling on transforms

995d0d4

Fixed flakes

39d1980

philippjfr added 3 commits March 6, 2020 20:40

Resolve parameters in dim expressions

a79cc5c

Implement Dataset.__new__ to allow casting DynamicMaps

c10f014

Generalized transform watching to apply.opts

584d46a

jbednar approved these changes Mar 7, 2020

View reviewed changes

philippjfr and others added 5 commits March 8, 2020 11:09

Apply suggestions from code review

3497917

Co-Authored-By: James A. Bednar <jbednar@users.noreply.github.com>

Made __new__ backward compatible

0e9e28d

Fix issue with BaseShape.__new__

f0fad57

Pass assign_coords as kwargs

c292267

Pass assign_coords as kwargs

b0b252c

philippjfr force-pushed the transforms branch from c292267 to b0b252c Compare March 9, 2020 16:37

philippjfr added 2 commits March 9, 2020 17:44

Enable keep_index by default

a1c1969

Fixes for xarray assign

a3376e5

philippjfr merged commit eaa8e4c into master Mar 9, 2020

philippjfr mentioned this pull request Mar 9, 2020

Transforming data and broadcasting #237

Closed

philippjfr deleted the transforms branch April 25, 2022 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-dimensional dim transforms on data sets #4080

Multi-dimensional dim transforms on data sets #4080

poplarShift commented Oct 31, 2019 •

edited by philippjfr

Loading

philippjfr commented Jan 15, 2020

poplarShift commented Feb 29, 2020

philippjfr commented Mar 6, 2020

poplarShift commented Mar 6, 2020

philippjfr commented Mar 6, 2020

jbednar left a comment •

edited

Loading

poplarShift commented Mar 9, 2020

Multi-dimensional dim transforms on data sets #4080

Multi-dimensional dim transforms on data sets #4080

Conversation

poplarShift commented Oct 31, 2019 • edited by philippjfr Loading

Setup

Multi-dimensional dim transforms and aggregations

Example: Complex hex binning operations

philippjfr commented Jan 15, 2020

poplarShift commented Feb 29, 2020

philippjfr commented Mar 6, 2020

poplarShift commented Mar 6, 2020

philippjfr commented Mar 6, 2020

jbednar left a comment • edited Loading

Choose a reason for hiding this comment

poplarShift commented Mar 9, 2020

poplarShift commented Oct 31, 2019 •

edited by philippjfr

Loading

jbednar left a comment •

edited

Loading