Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preserve chunked data when creating DataArray from DataArray #5984

Merged
merged 8 commits into from Jan 13, 2022
Merged

preserve chunked data when creating DataArray from DataArray #5984

merged 8 commits into from Jan 13, 2022

Conversation

FabianHofmann
Copy link
Contributor

…ata`

test: add test for preserving chunks DataArray from DataArray creation
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
Copy link
Collaborator

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I can merge in a few days if there are no objections.

You could also add a short note to whats-new - probably under bug fixes.

edit: the test failure is unrelated.

@mathause
Copy link
Collaborator

@dcherian you had an opinion on this in #5983 - would you rather raise a ValueError here and advise to pass da.data?

@dcherian
Copy link
Contributor

How about we raise DeprecationWarning for now asking the user to pass in .data instead.

@mathause
Copy link
Collaborator

How about we raise DeprecationWarning for now asking the user to pass in .data instead.

Makes sense - how about:

msg = (
    "Passing a `DataArray` object to the `DataArray` constructor is ambiguous.\n"
    "Pass `da.data`, or use `da.copy(deep=True)`, or one of the `xarray.*_like` (e.g. "
    "`xarray.zeros_like`) functions instead,\n"
    "depending on your use case."
)
warnings.warn(msg, FutureWarning)

@max-sixty
Copy link
Collaborator

How about we raise DeprecationWarning for now asking the user to pass in .data instead.

Does this mean xr.DataArray(da) would be deprecated? That seems like a very common case. Sorry for joining late but could I ask why this is necessary, if my assumption is right? I had been thinking that is broadly equivalent to a copy.

@keewis
Copy link
Collaborator

keewis commented Nov 25, 2021

for reference, the situation we already deprecated and removed was passing (["x", "y"], da) to as_variable (for example, when passed to the coords parameter of DataArray), which would create a new variable with new dimension names, but drop the old attrs and the name / coords and it would convert var to numpy, which could trigger a computation if the data was a dask array (or raise / emit a warning for other array types).

In contrast to that, I think we can keep xr.DataArray(da) working, but only if it's the only argument (i.e. I don't think we should allow xr.DataArray(da, dims=("x", "y")) or something similar – those might still be useful, but I'd make them separate methods instead of overloading the constructor).

However, this might make the constructor a bit more difficult to understand so if we don't think that's worth the (minor) inconvenience:

da if isinstance(da, xr.DataArray) else xr.DataArray(da)

we can go ahead with the deprecation.

@max-sixty
Copy link
Collaborator

In contrast to that, I think we can keep xr.DataArray(da) working, but only if it's the only argument (i.e. I don't think we should allow xr.DataArray(da, dims=("x", "y")) or something similar – those might still be useful, but I'd make them separate methods instead of overloading the constructor).

Perfect, very much agree, thank you for the clear description @keewis !

@mathause
Copy link
Collaborator

mathause commented Dec 1, 2021

Any objections to merging this as is? If we don't disallow xr.DataArray(da) this is a clear advantage over the current version.

However,

xr.DataArray(da).data is da.data # True
xr.DataArray(da.data).data is da.data # True
# while
da.copy().data is da.data # False (deep=True is the default)

might just be surprising in some situations.

(Just for completeness - xr.DataArray(da, **kwargs) does a good job at erroring if the kwargs are not compatible with da.)

@dcherian
Copy link
Contributor

dcherian commented Dec 1, 2021

(Just for completeness - xr.DataArray(da, **kwargs) does a good job at erroring if the kwargs are not compatible with da.)

Sounds good to me then!

@Illviljan Illviljan added the plan to merge Final call for comments label Dec 29, 2021
@Illviljan Illviljan merged commit 0259063 into pydata:main Jan 13, 2022
@Illviljan
Copy link
Contributor

Thank you, @FabianHofmann !

@FabianHofmann
Copy link
Contributor Author

It was a pleasure!

jjpr-mit added a commit to brain-score/brainio that referenced this pull request Oct 24, 2022
…0. 0.21.0 drops support for python 3.7. 0.20.2 has the bug. This commit tests if 0.20.1 does, too.

issue:  [preserve chunked data when creating DataArray from itself #5983](pydata/xarray#5983)
corresponding pull request:  [preserve chunked data when creating DataArray from DataArray #5984](pydata/xarray#5984)
released in 0.21.0:  [https://docs.xarray.dev/en/stable/whats-new.html#id81]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

preserve chunked data when creating DataArray from itself
6 participants