-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cartesian product of coordinates and using it to index / fill empty dataset #1914
Comments
Let me give a bit of a background what I would like to do:
In principle |
I think that this shouldn't be too hard to 'get done' but also that xarray may not give you much help natively. (I'm not sure though, so take this as hopefully helpful contribution rather than a definitive answer) Specifically, can you do (2) by generating a product of the coords? Either using numpy, stacking, or some simple python: In [3]: list(product(*((data[x].values) for x in data.dims)))
Out[3]:
[(0.287706062977495, 0.065327131503921),
(0.287706062977495, 0.17398282388217068),
(0.287706062977495, 0.1455022501442349),
(0.42398126102299216, 0.065327131503921),
(0.42398126102299216, 0.17398282388217068),
(0.42398126102299216, 0.1455022501442349),
(0.13357153947234057, 0.065327131503921),
(0.13357153947234057, 0.17398282388217068),
(0.13357153947234057, 0.1455022501442349),
(0.42347765161572537, 0.065327131503921),
(0.42347765161572537, 0.17398282388217068),
(0.42347765161572537, 0.1455022501442349)] then distribute those out to a cluster if you need, and then unstack them back into a dataset? |
For "get done" I had for example the following (similar to what I linked as my initial attempt) coordinates = {
'x': np.linspace(-1, 1),
'y': np.linspace(0, 10),
}
constants = {
'a': 1,
'b': 5
}
inps = [{**constants, **{k: v for k, v in zip(coordinates.keys(), x)}}
for x in list(it.product(*coordinates.values()))]
def f(x, y, a, b):
"""Some dummy function."""
v = a * x**2 + b * y**2
return xr.DataArray(v, {'x': x, 'y': y, 'a': a, 'b': b})
# simulate computation on cluster
values = list(map(lambda s: f(**s), inps))
# gather and unstack the inputs
ds = xr.concat(values, dim='new', coords='all')
ds = ds.set_index(new=list(set(ds.coords) - set(ds.dims)))
ds = ds.unstack('new') It is very close to what you suggest. My main question is if this can be done better. Mainly I am wondering if
inputs = cartesian_product(coordinates) # list similar to ``inps`` above
values = [function(inp) for inp in inputs] # or using ipypparallel map
xarray_data = ... # some empty xarray object
for inp, val in zip(inputs, values):
xarray_data[inp] = val I asked how to generate product of coordinates from xarray object because I was expecting that I can create Added commentHaving an empty, as filled with |
I am not sure if it is efficient to interact with a cluster, but I often use In [1]: import xarray as xr
...: import numpy as np
...: data = xr.DataArray(np.full((3, 4), np.nan), dims=('x', 'y'),
...: coords={'x': [0, 1, 2], 'y': ['a', 'b', 'c', 'd']})
...:
...: data
...:
Out[1]:
<xarray.DataArray (x: 3, y: 4)>
array([[ nan, nan, nan, nan],
[ nan, nan, nan, nan],
[ nan, nan, nan, nan]])
Coordinates:
* x (x) int64 0 1 2
* y (y) <U1 'a' 'b' 'c' 'd'
In [2]: data1 = data.stack(xy=['x', 'y'])
...: data1
...:
Out[2]:
<xarray.DataArray (xy: 12)>
array([ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])
Coordinates:
* xy (xy) MultiIndex
- x (xy) int64 0 0 0 0 1 1 1 1 2 2 2 2
- y (xy) object 'a' 'b' 'c' 'd' 'a' 'b' 'c' 'd' 'a' 'b' 'c' 'd' For the above example, In [3]: data1[0]
Out[3]:
<xarray.DataArray ()>
array(np.nan)
Coordinates:
xy object (0, 'a') and we can assign a value for given coordinate values by In [5]: # Assuming we found the result with (1, 'a') is 2.0
...: data1.loc[(1, 'a'), ] = 2.0
In [6]: data1
Out[6]:
<xarray.DataArray (xy: 12)>
array([ nan, nan, nan, nan, 2., nan, nan, nan, nan, nan, nan, nan])
Coordinates:
* xy (xy) MultiIndex
- x (xy) int64 0 0 0 0 1 1 1 1 2 2 2 2
- y (xy) object 'a' 'b' 'c' 'd' 'a' 'b' 'c' 'd' 'a' 'b' 'c' 'd' Note that we need to access via EDIT: I modified my previous comment to take the partial assignment into accout. |
After preparing list similar to Thanks for your suggestion, I need to try few things. I also want to try to extend it to function that computes few different things that could be multi-valued, e.g. def dummy(x, y):
ds = xr.Dataset(
{'out1': ('n', [1*x, 2*x, 3*x]), 'out2': ('m', [x, y])},
coords = {'x': x, 'y': y, 'n': range(3), 'm': range(2)}
)
return ds and then group together such outputs... Ok, I know. I go from simple problem to much more complicated one, but isn't it the case usually? |
import xarray as xr
import numpy as np
data = xr.Dataset(coords={'x': np.linspace(-1, 1), 'y': np.linspace(0, 10), 'a': 1, 'b': 5})
def some_function(x, y):
return float(x) * float(y)
xr.apply_ufunc(some_function, data['x'], data['y'], vectorize=True) Results in:
You can even do this with dask arrays if you set That said, it does feel like there's some missing functionality here for the xarray equivalent of |
This issue has brought up a lot of the same issues: #1773 Clearly, we need better documentation here at the very least. |
@shoyer Thanks for your suggestions and linking the other issue. I think this one can also be labelled as the "usage question". |
This StackOverflow question is related to this "issue". |
xyzpy (by @jcmgray) looks like it might be a nice way to solve this problem, e.g., see http://xyzpy.readthedocs.io/en/latest/examples/complex%20output%20example.html |
Indeed, this is exactly the kind of situation I wrote import numpy as np
import xyzpy as xyz
def some_function(x, y, z):
return x * np.random.randn(3, 4) + y / z
# Define how to label the function's output
runner_opts = {
'fn': some_function,
'var_names': ['output'],
'var_dims': {'output': ['a', 'b']},
'var_coords': {'a': [10, 20, 30]},
}
runner = xyz.Runner(**runner_opts)
# set the parameters we want to explore (combos <-> cartesian product)
combos = {
'x': np.linspace(1, 2, 11),
'y': np.linspace(2, 3, 21),
'z': np.linspace(4, 5, 31),
}
# run them
runner.run_combos(combos) Should produce:
And there are options for merging successive, disjoint sets of data ( There are also multiple ways to define functions inputs/outputs (the easiest of which is just to actually return a |
@jcmgray I had to miss your reply to this issue, I saw it just now. I love your code! I will definitely include xyzpy in my tools from now on ;-). |
Suggest we close given xyzpy seems great at thsi |
xyzpy seems to be unmaintained at present - the last release was 3 years ago. Given how useful/common this functionality is, I would support pulling it into Xarray proper. I still find myself writing adhoc versions of this pretty regularly (usually involving expand_dims and concat). |
Yeah, it was late 2021. https://pypi.org/project/xyzpy/#history. I had looked at the commit history, which seems reasonably active this year, maybe a release is forthcoming... That said, reopening! |
It does actually have commits pretty recently: https://github.com/jcmgray/xyzpy/commits/main/ |
|
For a given empty dataset with only coordinates
I'd like to iterate over the product of coordinates, in a similar way as it can be done for
numpy.array
sto fill the
data
with values of some function.Also I'd like to extend this to the cases of functions that are multi-valued, i.e. they return a
numpy.array
.Is there an easy way to do so? I was unable to find anything similar in the docs.
The text was updated successfully, but these errors were encountered: