Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs on view / copies #8744

Merged
merged 8 commits into from
Mar 25, 2024
Merged

Conversation

ks905383
Copy link
Contributor

- Add reference to numpy docs on view / copies in the corresponding section of the xarray docs, to help clarify pydata#8728 .
- Add note that `da.values()` returns a view in the header for `da.values()`.
Copy link

welcome bot commented Feb 13, 2024

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

@@ -761,7 +761,7 @@ def data(self, value: Any) -> None:
@property
def values(self) -> np.ndarray:
"""
The array's data as a numpy.ndarray.
A view onto the DataArray's underlying data as a numpy.ndarray.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm no expert here.)

Is this correct? I would have thought that it's returning the ndarray. And then that can get used as a view in many situations, consistent with numpy's rules.

If my sense is correct, we could just comment that it's not copied, and so mutations to the array will also affect the dataarray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! How's this?

"The array's underlying numpy.ndarray. Note that this array is not copied; operations on it follow numpy's rules of what generates a view vs. a copy, and changes to this array will be reflected in the DataArray as well."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect I think!

(though to confirm, I'm not confident that the initial thing isn't so accurate, fine to wait for others to comment if we prefer...)

Copy link
Collaborator

@keewis keewis Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what we're doing is roughly

return np.asarray(self._data)

(with slightly different behavior for 0d arrays with datetime64 or timedelta64 dtype)

This means that "The array's data converted to numpy.ndarray." would be a more accurate summary, but we could definitely emphasize the forwarding towards numpy.asarray / numpy.array a bit more (the current "If the array's data is not a numpy.ndarray this will attempt to convert it naively using np.array()" is wrong, numpy.asarray is always called).

We already have a note about non-castable arrays, so I'd probably append a warning about this not copying the data for numpy (I don't think we can or should cover all the edge cases of numpy.asarray here).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks.

So let's merge after this is done...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's this?
"The array's data converted to numpy.ndarray. Note that this array is not copied; operations on it follow numpy's rules of what generates a view vs. a copy, and changes to this array may be reflected in the DataArray as well."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's two line breaks between the first and second sentences I don't have any objections (the first line is supposed to be a brief summary). Just make sure to coordinate the stuff that's already there:

If the array's data is not a numpy.ndarray this will attempt to convert it naively using np.array(), which will raise an error if the array type does not support coercion like this (e.g. cupy).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - from your comment above, should "convert it naively using np.array()" just be changed to "convert it naively using np.asarray()" then? Or should that clause just be struck

Copy link
Collaborator

@keewis keewis Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the difference between asarray and array (I think it's different defaults), but in any case the important part is that the function is always applied to the data, not just when it's not already a numpy array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh okay - makes sense. I think the tweaks in the latest push should address that

xarray/core/dataarray.py Outdated Show resolved Hide resolved
@ks905383
Copy link
Contributor Author

I think the failing check is something that came from the latest main branch update, so once that gets resolved, I'll merge this barring any other comments?

Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, this looks good to me now.

The failing tests are flaky (not sure why only on python=3.11, though), a rerun should fix that. I'll try once the MacOS run completed.

@ks905383
Copy link
Contributor Author

does anything else need to happen before it's cleared for merging btw?

@dcherian dcherian changed the title Clarify pydata#8728 in docs Update docs on view / copies Mar 25, 2024
@dcherian dcherian merged commit 2f34895 into pydata:main Mar 25, 2024
28 of 29 checks passed
Copy link

welcome bot commented Mar 25, 2024

Congratulations on completing your first pull request! Welcome to Xarray! We are proud of you, and hope to see you again! celebration gif

dcherian added a commit to dcherian/xarray that referenced this pull request Apr 2, 2024
* main: (26 commits)
  [pre-commit.ci] pre-commit autoupdate (pydata#8900)
  Bump the actions group with 1 update (pydata#8896)
  New empty whatsnew entry (pydata#8899)
  Update reference to 'Weighted quantile estimators' (pydata#8898)
  2024.03.0: Add whats-new (pydata#8891)
  Add typing to test_groupby.py (pydata#8890)
  Avoid in-place multiplication of a large value to an array with small integer dtype (pydata#8867)
  Check for aligned chunks when writing to existing variables (pydata#8459)
  Add dt.date to plottable types (pydata#8873)
  Optimize writes to existing Zarr stores. (pydata#8875)
  Allow multidimensional variable with same name as dim when constructing dataset via coords (pydata#8886)
  Don't allow overwriting indexes with region writes (pydata#8877)
  Migrate datatree.py module into xarray.core. (pydata#8789)
  warn and return bytes undecoded in case of UnicodeDecodeError in h5netcdf-backend (pydata#8874)
  groupby: Dispatch quantile to flox. (pydata#8720)
  Opt out of auto creating index variables (pydata#8711)
  Update docs on view / copies (pydata#8744)
  Handle .oindex and .vindex for the PandasMultiIndexingAdapter and PandasIndexingAdapter (pydata#8869)
  numpy 2.0 copy-keyword and trapz vs trapezoid (pydata#8865)
  upstream-dev CI: Fix interp and cumtrapz (pydata#8861)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants