-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add method to format xarray.DataArray as strings for issue #5985 #7628
Conversation
…into round-stringify
for more information, see https://pre-commit.ci
@TomNicholas Can you review this for me? |
Wouldn't it be more useful to have something like Or maybe |
By the way, this works (but probably slow?): def tostr(x):
return f"{:.2f}"
tostr_v = np.vectorize(tostr)
da = xr.DataArray([1, 2, 3.456], dims="x")
tostr_v(da) |
@TomNicholas & @headtr1ck , Can you review this so I can know what to change? I think it's failing just one test |
Again, I think that this implementation is too specific and the issue wanted a more general string formatting option. Also, auch a method is better kept as part of the StrAccessor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also feel that the suggestion here is a bit too specific, so before discussing the implementation (and fixing tests) I think we should figure out a good API. Below I'm suggesting .str.format
because that's what the original issue proposed, but really the only requirement I have is that the API feels natural and does not do too much at once.
The original issue also suggested
xr.DataArray("%.3f").str % arr
which would allow having a array of formats, but I feel that's a bit too complicated for now (or in general, not sure).
doc/warnings.log
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has been added by accident. Could you remove that file?
xarray/core/dataarray.py
Outdated
@@ -6670,6 +6670,68 @@ def resample( | |||
**indexer_kwargs, | |||
) | |||
|
|||
def roundStringify(self, n: int) -> DataArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use snake-case for our functions, so it should be round_stringify
. However, before you change that I agree with @headtr1ck that the method combines two different things (rounding plus string interpolation).
As an API, I would probably prefer something like this:
(
arr.round(0.01) # round to two decimal digits
.str.format("%.03f") # display three decimals
)
where .str.format
can be implemented in a efficient manner using numpy.char.mod
.
The advantage of that is that it is a lot more versatile, for example:
ds = xr.Dataset({"a": ("x", ["a", "b", "c"]), "b": ("x", [0.2756, 6.7279, 3.7832])})
ds.a.str.format("%s: ").str + ds.b.str.format("%.03")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keewis str.format already exists and behaves like the python str.format method (also different from the builtin format()
method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keewis str.format already exists and behaves like the python str.format method (also different from the builtin
format()
method).
Good find.
This method is working the other way around as the proposed solution, but it should work perfectly.
Untested:
fmt = xr.DataArray("{:.2f}")
da = xr.DataArray([1, 2, 3.456], dims="x")
fmt.str.format(da)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the issue with that is that it will "stringify" the values before passing it to str.format
, so the only format you can pass it is :s
. So I guess the only thing to change is that?
xarray/tests/test_dataarray.py
Outdated
def test_roundStringify_output_type(self, my_obj): | ||
result = my_obj.roundStringify(3) | ||
assert isinstance(result, xr.DataArray), "output should be a DataArray" | ||
assert isinstance(result, xr.DataArray), "output should be a DataArray" | ||
|
||
def test_roundStringify_attrs(self, my_obj): | ||
attrs = {"units": "meters"} | ||
my_obj.attrs = attrs | ||
new_da = my_obj.roundStringify(2) | ||
assert "units" in new_da.attrs.keys(), "expected attribute not found" | ||
assert new_da.attrs["units"] == attrs["units"], "attribute values not matched" | ||
|
||
def test_roundStringify_edge_case(self): | ||
# test case when all values are zero | ||
zeros = xr.DataArray(np.zeros((3,), dtype=float), dims=["x"]) | ||
rounded_zeros = zeros.roundStringify(n=3) | ||
expected_result = xr.DataArray(["0", "0", "0"], dims=["x"]) | ||
xr.testing.assert_allclose(rounded_zeros, expected_result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually use a actual
-expected
pattern (if possible), combined with heavy use of pytest.mark.parametrize
.
Edit: oh, sorry, I just saw that the last test already uses that pattern.
For example, the attrs
check could become:
@pytest.mark.parametrize("keep_attrs", [True, False])
def test_str_format_attrs(self, keep_attrs):
attrs = {"units": "meters"}
arr = xr.DataArray(["a", "b", "c"], attrs=attrs)
expected = arr.copy()
if not keep_attrs:
expected.attrs.clear()
with xr.set_options(keep_attrs=keep_attrs):
actual = arr.str.format("%s")
xr.testing.assert_identical(actual, expected)
but of course if we test .str.format
it should go into the module that contains the tests for the string accessor (xarray.tests.test_accessor_str
, if I'm not mistaken)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much. I'll look into the changes and recommendations right away
@headtr1ck @keewis can you check it now? I’ve modified it a bit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since da.str.format
already exists, I am not sure how much benefit this function gives.
But apparently it is difficult to discover and use da.str.format
, so maybe a simpler interface is useful.
Anyway, I'm the current state the implementation is still too specific.
xarray/core/accessor_str.py
Outdated
@@ -51,6 +51,7 @@ | |||
|
|||
import numpy as np | |||
|
|||
import xarray as xr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to import this
xarray/core/accessor_str.py
Outdated
array(['1.46e+05', '-3.25e+06', '0.00125'], dtype='<U8') | ||
Dimensions without coordinates: x | ||
""" | ||
self._obj = self._obj.astype(float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not change the stored object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was to ensure all inputs were the same dtype before formatting in case of nan or inf values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's ok, but then store it in a temporary, local variable.
xarray/core/accessor_str.py
Outdated
Dimensions without coordinates: x | ||
""" | ||
self._obj = self._obj.astype(float) | ||
return xr.DataArray( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use type(self._obj)(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, do you even need to wrap it into a DataArray? What does _apply
return?
@keewis it seems |
@keewis @TomNicholas @headtr1ck any comments? |
whats-new.rst
api.rst