Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compose weighted with groupby, coarsen, resample, rolling etc. #3937

Open
mathause opened this issue Apr 5, 2020 · 7 comments
Open

compose weighted with groupby, coarsen, resample, rolling etc. #3937

mathause opened this issue Apr 5, 2020 · 7 comments

Comments

@mathause
Copy link
Collaborator

mathause commented Apr 5, 2020

It would be nice to make weighted work with groupby - e.g. #3935 (comment)

However, it is not entirely clear to me how that should be done. One way would be to do:

da.groupby(...).weighted(weights).mean()

this would require that the groupby operation is applied over the weights (how would this be done?)
Or should it be

da.weighted(weights).groupby(...).mean()

but this seems less intuitive to me.

Or

da.groupby(..., weights=weights).mean()
@shoyer
Copy link
Member

shoyer commented Apr 9, 2020

I think da.groupby(...).weighted(weights).mean() would be pretty sensible.

We could probably make both da.groupby(...).weighted(weights) and da.weighted(weights).groupby(...) return exactly equivalent objects, if desired.

@dcherian
Copy link
Contributor

dcherian commented Jul 8, 2020

@ahuang11 in #4210:

I want to do something similar as xesmf's weighted regridding, but without the need to install esmpy which has a lot of dependencies.

Are variations of the following possible?

ds.weighted(coslat_weights).coarsen(lat=2, lon=2).mean()
ds.coarsen(lat=2, lon=2).weighted(coslat_weights).mean()

@dcherian dcherian changed the title weighted and groupby compose weighted with groupby, coarsen, rolling etc. Jul 8, 2020
@mathause
Copy link
Collaborator Author

mathause commented Jul 9, 2020

That's currently not possible. What you can try is the following:

(ds * coslat_weights).coarsen(lat=2, lon=2).sum() / coslat_weights.coarsen(lat=2, lon=2).sum()

but this only works if you don't have any NaNs.

@roxyboy
Copy link

roxyboy commented Jul 9, 2020

@mathause Does the skipna=True flag not work in your last example?

@mathause
Copy link
Collaborator Author

mathause commented Jul 9, 2020

No that won't work. You need to mask the weights where the data is NaN. An untested and not very efficient way may be:

def coarsen_weighted_mean(da, weights, dims, skipna=None, boundary="exact"):

    weighted_sum = (da * weights).coarsen(dims, boundary=boundary).sum(skipna=skipna)

    masked_weights = weights.where(da.notnull())
    sum_of_weights = masked_weights.coarsen(dims, boundary=boundary).sum()
    valid_weights = sum_of_weights != 0
    sum_of_weights = sum_of_weights.where(valid_weights)

    return weighted_sum / sum_of_weights

An example (without NaNs though):

import xarray as xr
import numpy as np

air = xr.tutorial.open_dataset("air_temperature").air
weights = np.cos(np.deg2rad(air.lat))

# we need to rename them from "lat"
weights.name = "weights"

c_w_m = coarsen_weighted_mean(air, weights, dict(lat=2), boundary="trim")

# to compare do it for one slice
alt = air.isel(lat=slice(0, 2)).weighted(weights).mean("lat") 

# compare if it is the same
xr.testing.assert_allclose(c_w_m.isel(lat=0, drop=True), alt)

@dcherian dcherian changed the title compose weighted with groupby, coarsen, rolling etc. compose weighted with groupby, coarsen, resample, rolling etc. Feb 24, 2021
@kuchaale
Copy link

kuchaale commented Jul 6, 2021

Would

da.weighted(weights).integrate(...)

be in scope as well?

@jbusecke
Copy link
Contributor

I am interested in the coarsen with weights scenario that @dcherian and @mathause described here for a current project of ours.

I solved the issue manually and its not that hard

import xarray as xr
import numpy as np

# example data with weights
data = np.arange(16).reshape(4,4).astype(float)
# add some nans
data[2,2] = np.nan
data[1,1] = np.nan

# create some simple weights
weights = np.repeat(np.array([[1,2,1,3]]).T, 4, axis=1)
weights

da = xr.DataArray(data, dims=['x', 'y'], coords={'w':(['x','y'], weights)})
da

image

masked_weights = da.w.where(~np.isnan(da)) # .weighted() already knows how to do this
da_weighted = da * masked_weights
da_coarse = da_weighted.coarsen(x=2, y=2).sum() / masked_weights.coarsen(x=2, y=2).sum()
da_coarse

image

but I feel all of this is duplicating existing functionality (e.g. the masking of weights based on nans in the data) and might be sensibly streamlined into something like:

da.weighted(da.w).coarsen(...).mean()

at least from a user perspective (there might be unique challenges with the implementation that I am overlooking here).

Happy to help but would definitely need some guidance on this one.

I do believe that this would provide a very useful functionality for many folks who work with curvilinear grids and want to prototype things that depend on some sort of scale reduction (coarsening).

Also cc'ing @TomNicholas who is involved in the same project 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants