Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gridding data with groupby_bins in 2 dim #2488

Closed
HandmannP opened this issue Oct 15, 2018 · 5 comments
Closed

gridding data with groupby_bins in 2 dim #2488

HandmannP opened this issue Oct 15, 2018 · 5 comments

Comments

@HandmannP
Copy link

HandmannP commented Oct 15, 2018

Dear everybody,

I am just starting to get to know the xarray datastructures and python so I am really still a beginner.
I am working with scattered data wich need to be brought to a regular grid.
Now i found your function groupby_bins which only works in one dimension - on github
I couldn't find anything to wether grouping in 2d is now possible or not.
It would be very helpful to get some more info about that.

Here is just a code example with a small data set:

geop1 =
<xarray.Dataset>
Dimensions: (pos_compl: 44229)
Coordinates:
lon (pos_compl) float64 -29.8 -31.14 -32.65 ... -25.26 -16.4 -43.75
lat (pos_compl) float64 46.48 46.07 45.66 46.18 ... 45.34 61.06 53.19
z (pos_compl) float64 -3.205e+03 -3.197e+03 ... -3.758e+03
time (pos_compl) float64 7.299e+05 7.299e+05 ... 7.367e+05 7.367e+05

  • pos_compl (pos_compl) complex128 (-29.805+46.485j) ... (-43.75400000000002+53.188j)
    Data variables:
    geopot (pos_compl) float64 9.363 7.93 8.218 8.621 ... 10.44 4.293 0.4243

---- groupby bins
---- 0.25
lat_bin = np.arange(mat4['lat_range'][0,0]-0.25/2,mat4['lat_range'][0,1]+0.25,0.25)
----0.5
lon_bin = np.arange(mat4['lon_range'][0,0]-0.5/2,mat4['lon_range'][0,1]+0.5,0.5)

----define bin center
---- 0.25
lat_cent = np.arange(mat4['lat_range'][0,0],mat4['lat_range'][0,1]+0.25,0.25)
-0.5
lon_cent = np.arange(mat4['lon_range'][0,0],mat4['lon_range'][0,1]+0.5,0.5)

---- Now only these two options are possible
geop_mean_lon = geop1.geopot.groupby_bins('lon', lon_bin, labels=lon_cent)
geop_mean_lat = geop1.geopot.groupby_bins('lat', lat_bin, labels=lat_cent)

It would be really nice to have all the information in each grid box - Or is there some other way gridding like this on big datasets is recommended?

Thank you for your help!

@HandmannP
Copy link
Author

HandmannP commented Oct 16, 2018

I wrote a work around for my purpose but I guess I could still be faster ...

@HandmannP
Copy link
Author

%%time
def group_lat(x):
# x is a DataFrame of group values
# now find the value of the longitude box to append to the dictionary key
value = np.ones(1)
value[0] = x.lon.mean()
idx = (np.abs(lon_cent - value)).argmin()
lokey = lon_cent[idx] # longitude value of the box

# compute groups for the latitude
y = x.groupby_bins('lat', lat_bin, labels=lat_cent)
y = dict(y)
# replace the old key with the new key: lon,lat
key = np.asarray((list(y.keys()))) # get dict keys as array
newkey = np.stack((np.ones(len(key))*lokey,key),axis=1)
newkey = tuple(newkey.tolist())
key = tuple(y.keys()) # get dict keys as list

for i in range(len(key)):
    y[tuple(newkey[i])] = y[key[i]]
    del y[key[i]]  
return y

#geop_mean = geop1.groupby_bins('lon', lon_bin, labels=lon_cent).apply(group_lat)
geop_mean = geop1.groupby_bins('lon', lon_bin, labels=lon_cent)
geop_mean = dict(geop_mean)

group into lat boxes

l = 0
geo_grid = dict()

for x in list(geop_mean.keys()):
y = group_lat(geop_mean[x])
if l == 0:
geo_grid = y
else:
geo_grid.update(y)
l += 1

Now the data is sorted into boxes and still contains all metadata

Now get the mean values for each box

l = 0
m = np.zeros((len(tuple(geo_grid.keys())),4))
d = np.asarray(list(geo_grid.keys()))

gp = xr.Dataset({'geopot': (['lat','lon'],
np.ones((lat_cent.shape[0],
lon_cent.shape[0]))),
'z': (['lat','lon'],
np.ones((lat_cent.shape[0],
lon_cent.shape[0])))},
coords={'lon': (['lon'],lon_cent),
'lat': (['lat'],lat_cent)})

for k in range(d.shape[0]):
e = tuple(d[k])
#m[l,2] = geo_grid[e].z.mean()
gp['geopot'].loc[dict(lat=d[k][1], lon=d[k][0])] = geo_grid[e].geopot.mean()
gp['z'].loc[dict(lat=d[k][1], lon=d[k][0])] = geo_grid[e].z.mean()
#gp.loc[dict(lat=m[0,1], lon=m[0,0])]
l +=1

@HandmannP
Copy link
Author

I am open for suggestions to get the code running faster :D

@stale
Copy link

stale bot commented Oct 4, 2020

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Oct 4, 2020
@dcherian
Copy link
Contributor

dcherian commented Oct 4, 2020

This will be addressed as part of multi-variable groupby.

@dcherian dcherian closed this as completed Oct 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants