New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(feat): Support for pandas
ExtensionArray
#8723
Conversation
…all back to numpy
Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. |
This looks like a big feature addition! I'm very ignorant of |
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
What's the status here? |
Sorry, this fell off my radar. Can you open an issue regarding the |
for more information, see https://pre-commit.ci
* main: (feat): Support for `pandas` `ExtensionArray` (pydata#8723) Migrate datatree mapping.py (pydata#8948) Add mypy to dev dependencies (pydata#8947) Convert 360_day calendars by choosing random dates to drop or add (pydata#8603)
elif array_type_cupy := array_type("cupy") and any( # noqa: F841 | ||
isinstance(x, array_type_cupy) for x in scalars_or_arrays # noqa: F821 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this kind of syntax allowed?
I suspect the CI didn't run this code:
$ pytest xarray/tests/test_array_api.py -x
================================= test session starts =================================
platform linux -- Python 3.10.14, pytest-8.2.0, pluggy-1.5.0
rootdir: /home/mark/git/xarray
configfile: pyproject.toml
collected 13 items
xarray/tests/test_array_api.py .F
====================================== FAILURES =======================================
__________________________________ test_aggregation ___________________________________
arrays = (<xarray.DataArray (x: 2, y: 3)> Size: 48B
array([[ 1., 2., 3.],
[ 4., 5., nan]])
Coordinates:
* x ... [ 4., 5., nan]], dtype=float64)
Coordinates:
* x (x) int64 16B 10 20
Dimensions without coordinates: y)
def test_aggregation(arrays: tuple[xr.DataArray, xr.DataArray]) -> None:
np_arr, xp_arr = arrays
> expected = np_arr.sum()
/home/mark/git/xarray/xarray/tests/test_array_api.py:51:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/mark/git/xarray/xarray/core/_aggregations.py:1857: in sum
return self.reduce(
/home/mark/git/xarray/xarray/core/dataarray.py:3805: in reduce
var = self.variable.reduce(func, dim, axis, keep_attrs, keepdims, **kwargs)
/home/mark/git/xarray/xarray/core/variable.py:1663: in reduce
result = super().reduce(
/home/mark/git/xarray/xarray/namedarray/core.py:889: in reduce
data = func(self.data, **kwargs)
/home/mark/git/xarray/xarray/core/duck_array_ops.py:427: in f
return func(values, axis=axis, **kwargs)
/home/mark/git/xarray/xarray/core/nanops.py:99: in nansum
result = sum_where(a, axis=axis, dtype=dtype, where=mask)
/home/mark/git/xarray/xarray/core/duck_array_ops.py:333: in sum_where
a = where_method(xp.zeros_like(data), where, data)
/home/mark/git/xarray/xarray/core/duck_array_ops.py:349: in where_method
return where(cond, data, other)
/home/mark/git/xarray/xarray/core/duck_array_ops.py:343: in where
return xp.where(condition, *as_shared_dtype([x, y], xp=xp))
/home/mark/git/xarray/xarray/core/duck_array_ops.py:236: in as_shared_dtype
elif array_type_cupy := array_type("cupy") and any( # noqa: F841
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.0 = <list_iterator object at 0x7e404174b6d0>
elif array_type_cupy := array_type("cupy") and any( # noqa: F841
> isinstance(x, array_type_cupy) for x in scalars_or_arrays # noqa: F821
):
E NameError: free variable 'array_type_cupy' referenced before assignment in enclosing scope
/home/mark/git/xarray/xarray/core/duck_array_ops.py:237: NameError
=============================== short test summary info ===============================
FAILED xarray/tests/test_array_api.py::test_aggregation - NameError: free variable 'array_type_cupy' referenced before assignment in enclosi...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================= 1 failed, 1 passed in 0.33s =============================
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hmaarrfk Interesting. I see you fixed this. I must have done this when testing because I do specifically remember testing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues. It was a straightforward fix.
Some outstanding points/decisions brought up by this PR:
as_shared_dtypes
#8714)- [ ] Acceptance ofplum
as a dispatch method. Without it, the behavior should be fallen back on from before (cast tonumpy
types). I am a big fan of dispatching and think it could serve as a model going forward for making support of other data types/arrays more feasible. The other option, I think, would be to just use the underlyingarray
of theExtensionDuckArray
class to decide and then have some central registry that serves as the basis for a decorator (like the api for accessors via_CachedAccessor
). That being said, the current defaults are quite good so this is a marginal feature, in all likelihood.ExtensionArray
directly or can we also allowSeries
?Possible missing something else! Let me know!
Checklist:
whats-new.rst
api.rst