Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs to include how to Join using a non-index coord #2460

Open
max-sixty opened this issue Oct 3, 2018 · 3 comments
Open

Update docs to include how to Join using a non-index coord #2460

max-sixty opened this issue Oct 3, 2018 · 3 comments
Assignees

Comments

@max-sixty
Copy link
Collaborator

I originally posted this on SO, as I thought it was a user question rather than a library issue. But after working on it more today, I'm not so sure.

I'm trying to do a 'join' in xarray, but using a non-index coordinate rather than a shared dim.

I have a Dataset indexed on 'a' with a coord on 'b', and a DataArray indexed on 'b':

In [17]: ds=xr.Dataset(dict(a=(('x'),np.random.rand(10))), coords=dict(b=(('x'),list(range(10)))))

In [18]: ds
Out[18]:
<xarray.Dataset>
Dimensions:  (x: 10)
Coordinates:
    b        (x) int64 0 1 2 3 4 5 6 7 8 9
Dimensions without coordinates: x
Data variables:
    a        (x) float64 0.3634 0.2132 0.6945 0.5359 0.1053 0.07045 0.5945 ...

In [19]: da=xr.DataArray(np.random.rand(10), dims=('b',), coords=dict(b=(('b'),list(range(10)))))

In [20]: da
Out[20]:
<xarray.DataArray (b: 10)>
array([0.796987, 0.275992, 0.747882, 0.240374, 0.435143, 0.285271, 0.753582,
       0.556038, 0.365889, 0.434844])
Coordinates:
  * b        (b) int64 0 1 2 3 4 5 6 7 8 9

Can I add da onto my dataset, by joining on ds.b equalling da.b? The result would be:

<xarray.Dataset>
Dimensions:  (x: 10)
Coordinates:
    b        (x) int64 0 1 2 3 4 5 6 7 8 9
Dimensions without coordinates: x
Data variables:
    a        (x) float64 0.3634 0.2132 0.6945 0.5359 0.1053 0.07045 0.5945 ...
    da       (x) float64 0.796987, 0.275992, 0.747882, 0.240374, 0.435143 ...

(for completeness - the data isn't current in the correct position)

@max-sixty
Copy link
Collaborator Author

There's some advanced indexing that I didn't know about, as highlighted in the SO answer.

If anyone else has been confused on this, lmk and I'll add a section to the docs on how .sel can be specifically used for joins.

@dcherian
Copy link
Contributor

dcherian commented Oct 6, 2018

"add a section to the docs on how .sel can be specifically used for joins."

@max-sixty I think this would be great

@max-sixty
Copy link
Collaborator Author

max-sixty commented Oct 7, 2018

I'm writing something up for the docs. But I'm spinning once again.
Did that example only work because the coords were ints that are being used as index positions?
Edit: this approach works with a superset of labels, but not a subset

For example:

In [48]:     da = xr.DataArray(np.arange(4), dims=['y'],
    ...:                       coords={'y': list('abcd')})

In [49]: da
Out[49]:
<xarray.DataArray (y: 4)>
array([0, 1, 2, 3])
Coordinates:
  * y        (y) <U1 'a' 'b' 'c' 'd'


In [45]: da2 = xr.DataArray(np.arange(6), dims=['a'], coords=dict(y=(('a'),list('fedcba'))))

In [46]: da2
Out[46]:
<xarray.DataArray (a: 6)>
array([0, 1, 2, 3, 4, 5])
Coordinates:
    y        (a) <U1 'f' 'e' 'd' 'c' 'b' 'a'
Dimensions without coordinates: a


In [50]: da2.sel(a=da.y)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-50-9514c3c27169> in <module>
----> 1 da2.sel(a=da.y)

/usr/local/lib/python3.7/site-packages/xarray/core/dataarray.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
    845         ds = self._to_temp_dataset().sel(
    846             indexers=indexers, drop=drop, method=method, tolerance=tolerance,
--> 847             **indexers_kwargs)
    848         return self._from_temp_dataset(ds)
    849

/usr/local/lib/python3.7/site-packages/xarray/core/dataset.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   1620         pos_indexers, new_indexes = remap_label_indexers(
   1621             self, indexers=indexers, method=method, tolerance=tolerance)
-> 1622         result = self.isel(indexers=pos_indexers, drop=drop)
   1623         return result._replace_indexes(new_indexes)
   1624

/usr/local/lib/python3.7/site-packages/xarray/core/dataset.py in isel(self, indexers, drop, **indexers_kwargs)
   1537         for name, var in iteritems(self._variables):
   1538             var_indexers = {k: v for k, v in indexers_list if k in var.dims}
-> 1539             new_var = var.isel(indexers=var_indexers)
   1540             if not (drop and name in var_indexers):
   1541                 variables[name] = new_var

/usr/local/lib/python3.7/site-packages/xarray/core/variable.py in isel(self, indexers, drop, **indexers_kwargs)
    905             if dim in indexers:
    906                 key[i] = indexers[dim]
--> 907         return self[tuple(key)]
    908
    909     def squeeze(self, dim=None):

/usr/local/lib/python3.7/site-packages/xarray/core/variable.py in __getitem__(self, key)
    614         array `x.values` directly.
    615         """
--> 616         dims, indexer, new_order = self._broadcast_indexes(key)
    617         data = as_indexable(self._data)[indexer]
    618         if new_order:

/usr/local/lib/python3.7/site-packages/xarray/core/variable.py in _broadcast_indexes(self, key)
    485                 dims.append(d)
    486         if len(set(dims)) == len(dims):
--> 487             return self._broadcast_indexes_outer(key)
    488
    489         return self._broadcast_indexes_vectorized(key)

/usr/local/lib/python3.7/site-packages/xarray/core/variable.py in _broadcast_indexes_outer(self, key)
    537             new_key.append(k)
    538
--> 539         return dims, OuterIndexer(tuple(new_key)), None
    540
    541     def _nonzero(self):

/usr/local/lib/python3.7/site-packages/xarray/core/indexing.py in __init__(self, key)
    376                 if not np.issubdtype(k.dtype, np.integer):
    377                     raise TypeError('invalid indexer array, does not have '
--> 378                                     'integer dtype: {!r}'.format(k))
    379                 if k.ndim != 1:
    380                     raise TypeError('invalid indexer array for {}, must have '

TypeError: invalid indexer array, does not have integer dtype: array(['a', 'b', 'c', 'd'], dtype='<U1')

@max-sixty max-sixty reopened this Oct 7, 2018
@max-sixty max-sixty changed the title Join using a non-index coord Update docs to include how to Join using a non-index coord Oct 8, 2018
@max-sixty max-sixty self-assigned this Oct 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants