compose weighted with groupby, coarsen, resample, rolling etc. #3937

mathause · 2020-04-05T22:00:40Z

It would be nice to make weighted work with groupby - e.g. #3935 (comment)

However, it is not entirely clear to me how that should be done. One way would be to do:

da.groupby(...).weighted(weights).mean()

this would require that the groupby operation is applied over the weights (how would this be done?)
Or should it be

da.weighted(weights).groupby(...).mean()

but this seems less intuitive to me.

Or

da.groupby(..., weights=weights).mean()

The text was updated successfully, but these errors were encountered:

shoyer · 2020-04-09T04:36:21Z

I think da.groupby(...).weighted(weights).mean() would be pretty sensible.

We could probably make both da.groupby(...).weighted(weights) and da.weighted(weights).groupby(...) return exactly equivalent objects, if desired.

dcherian · 2020-07-08T20:01:59Z

@ahuang11 in #4210:

I want to do something similar as xesmf's weighted regridding, but without the need to install esmpy which has a lot of dependencies.

Are variations of the following possible?

ds.weighted(coslat_weights).coarsen(lat=2, lon=2).mean()

ds.coarsen(lat=2, lon=2).weighted(coslat_weights).mean()

mathause · 2020-07-09T08:21:39Z

That's currently not possible. What you can try is the following:

(ds * coslat_weights).coarsen(lat=2, lon=2).sum() / coslat_weights.coarsen(lat=2, lon=2).sum()

but this only works if you don't have any NaNs.

roxyboy · 2020-07-09T10:02:46Z

@mathause Does the skipna=True flag not work in your last example?

mathause · 2020-07-09T11:00:46Z

No that won't work. You need to mask the weights where the data is NaN. An untested and not very efficient way may be:

def coarsen_weighted_mean(da, weights, dims, skipna=None, boundary="exact"):

    weighted_sum = (da * weights).coarsen(dims, boundary=boundary).sum(skipna=skipna)

    masked_weights = weights.where(da.notnull())
    sum_of_weights = masked_weights.coarsen(dims, boundary=boundary).sum()
    valid_weights = sum_of_weights != 0
    sum_of_weights = sum_of_weights.where(valid_weights)

    return weighted_sum / sum_of_weights

An example (without NaNs though):

import xarray as xr
import numpy as np

air = xr.tutorial.open_dataset("air_temperature").air
weights = np.cos(np.deg2rad(air.lat))

# we need to rename them from "lat"
weights.name = "weights"

c_w_m = coarsen_weighted_mean(air, weights, dict(lat=2), boundary="trim")

# to compare do it for one slice
alt = air.isel(lat=slice(0, 2)).weighted(weights).mean("lat") 

# compare if it is the same
xr.testing.assert_allclose(c_w_m.isel(lat=0, drop=True), alt)

kuchaale · 2021-07-06T14:11:52Z

Would

da.weighted(weights).integrate(...)

be in scope as well?

jbusecke · 2022-08-31T21:36:51Z

I am interested in the coarsen with weights scenario that @dcherian and @mathause described here for a current project of ours.

I solved the issue manually and its not that hard

import xarray as xr
import numpy as np

# example data with weights
data = np.arange(16).reshape(4,4).astype(float)
# add some nans
data[2,2] = np.nan
data[1,1] = np.nan

# create some simple weights
weights = np.repeat(np.array([[1,2,1,3]]).T, 4, axis=1)
weights

da = xr.DataArray(data, dims=['x', 'y'], coords={'w':(['x','y'], weights)})
da

masked_weights = da.w.where(~np.isnan(da)) # .weighted() already knows how to do this
da_weighted = da * masked_weights
da_coarse = da_weighted.coarsen(x=2, y=2).sum() / masked_weights.coarsen(x=2, y=2).sum()
da_coarse

but I feel all of this is duplicating existing functionality (e.g. the masking of weights based on nans in the data) and might be sensibly streamlined into something like:

da.weighted(da.w).coarsen(...).mean()

at least from a user perspective (there might be unique challenges with the implementation that I am overlooking here).

Happy to help but would definitely need some guidance on this one.

I do believe that this would provide a very useful functionality for many folks who work with curvilinear grids and want to prototype things that depend on some sort of scale reduction (coarsening).

Also cc'ing @TomNicholas who is involved in the same project 🤗

mathause mentioned this issue Apr 5, 2020

Add a days_in_month accessor to CFTimeIndex #3935

Merged

3 tasks

TomNicholas added the API design label Apr 5, 2020

dcherian mentioned this issue Jul 8, 2020

Use weighted with coarsen? #4210

Closed

dcherian changed the title ~~weighted and groupby~~ compose weighted with groupby, coarsen, rolling etc. Jul 8, 2020

dcherian changed the title ~~compose weighted with groupby, coarsen, rolling etc.~~ compose weighted with groupby, coarsen, resample, rolling etc. Feb 24, 2021

dcherian mentioned this issue Feb 24, 2021

Cell Boundary aware operations xarray-contrib/cf-xarray#10

Open

dcherian mentioned this issue Jun 17, 2021

Implement weighted groupby #5480

Draft

4 tasks

mathause mentioned this issue Aug 25, 2022

overlapping regions regionmask/regionmask#228

Closed

TomNicholas added the topic-groupby label Jul 27, 2023

benbovy mentioned this issue Nov 10, 2023

Change grid resolution xarray-contrib/xdggs#18

Open

dcherian mentioned this issue Sep 27, 2024

rolling(...).construct(...) blows up chunk size #9550

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compose weighted with groupby, coarsen, resample, rolling etc. #3937

compose weighted with groupby, coarsen, resample, rolling etc. #3937

mathause commented Apr 5, 2020 •

edited

Loading

shoyer commented Apr 9, 2020

dcherian commented Jul 8, 2020

mathause commented Jul 9, 2020

roxyboy commented Jul 9, 2020

mathause commented Jul 9, 2020

kuchaale commented Jul 6, 2021

jbusecke commented Aug 31, 2022

compose weighted with groupby, coarsen, resample, rolling etc. #3937

compose weighted with groupby, coarsen, resample, rolling etc. #3937

Comments

mathause commented Apr 5, 2020 • edited Loading

shoyer commented Apr 9, 2020

dcherian commented Jul 8, 2020

mathause commented Jul 9, 2020

roxyboy commented Jul 9, 2020

mathause commented Jul 9, 2020

kuchaale commented Jul 6, 2021

jbusecke commented Aug 31, 2022

mathause commented Apr 5, 2020 •

edited

Loading