Lingering memory connections when extracting underlying `np.arrays` from datasets #8728

ks905383 · 2024-02-09T18:39:34Z

What is your issue?

I know that generally, ds2 = ds connects the two objects in memory, and changes in one will also cause changes in the other.

However, I generally assume that certain operations should break this connection, for example:

extracting the underlying np.array from a dataset (changing its type and destroying a lot of the xarray-specific information: index, dimensions, etc.)
using the underlying np.array into a new dataset

In other words, I would expect that using ds['var'].values would be similar to copy.deepcopy(ds['var'].values).

Here's an example that illustrates how in these cases, the objects are still linked in memory:

(apologies for the somewhat hokey example)

import xarray as xr
import numpy as np

# Create a dataset
ds = xr.Dataset(coords = {'lon':(['lon'],np.array([178.2,179.2,-179.8, -178.8,-177.8,-176.8]))})
print('\nds: ')
print(ds)

# Create a new dataset that uses the values of the first dataset
ds2 = xr.Dataset({'lon1':(['lon'],ds.lon.values)},
                  coords = {'lon':(['lon'],ds.lon.values)})
print('\nds2: ')
print(ds2)

# Change ds2's 'lon1' variable 
ds2['lon1'][ds2['lon1']<0] = 360 + ds2['lon1'][ds2['lon1']<0]

# `ds2` is changed as expected
print('\nds2 (should be modified): ')
print(ds2)

# `ds` is changed, which is *not* expected
print('\nds (should not be modified): ')
print(ds)

The question is - am I right (from a UX perspective) to expect these kinds of operations to disconnect the objects in memory? If so, I might try to update the docs to be a bit clearer on this. (or, alternatively, if these kinds of operations should disconnect the objects in memory, maybe it's better to have .values also call .copy(deep=True).values)

Appreciate y'all's thoughts on this!

The text was updated successfully, but these errors were encountered:

dcherian · 2024-02-09T18:57:27Z

In general, you're expected to deep-copy explicitly to break these "links". This is the numpy paradigm

max-sixty · 2024-02-09T19:01:46Z

If you want to read up on this, look for "view vs copy"!

ks905383 · 2024-02-09T19:22:14Z

Yeah, I guess in this case from a legibility standpoint, the fact that .values 'changes' (from the user point of view) the form (and type) of the data from a DataArray to the underlying numpy array just feels different?

Like I wouldn't expect the following two operations:

a = np.ones(3)
b = a.astype(str)
a[0] = 5
print(b)

and

a = np.ones(3)
b = a
a[0] = 5
print(b)

to behave the same. But I do understand that from the backend perspective, .values seems to be more of the latter than the former, since it is just accessing something that's already there...

(relatedly, would it be worth it to link to the relevant numpy docs in this part of the xarray docs?)

ks905383 · 2024-02-09T19:52:38Z

A related issue is that this allows you to (possibly inadvertently) circumvent certain xarray safeguards, like the TypeError around not being able to modify IndexVariables:

# Create sample dataset
ds = xr.Dataset({'test':(['lon'],[5,6,7])},coords = {'lon':(('lon'),[0,1,2])})

# Raises TypeError, to avoid changing indices like this
ds['lon'][0] = 2

# Now, extract underly numpy array
a = ds.lon.values

# Change value
a[0] = 2

# This changes `ds` without raising error
print(ds)

max-sixty · 2024-02-09T19:54:02Z

(relatedly, would it be worth it to link to the relevant numpy docs in this part of the xarray docs?)

Yes! That would be a welcome contribution.

A related issue is that this allows you to (possibly inadvertently) circumvent certain xarray safeguards, like the TypeError around not being able to modify IndexVariables:

Yes. But I'm not sure there's much we can do about this. Our focus should be "if you use xarray operations, you won't get surprises"...

ks905383 · 2024-02-09T20:38:34Z

Yes! That would be a welcome contribution.

Sounds good, I'll prep a PR

- Add reference to numpy docs on view / copies in the corresponding section of the xarray docs, to help clarify pydata#8728 . - Add note that `da.values()` returns a view in the header for `da.values()`.

* Clarify #8728 in docs - Add reference to numpy docs on view / copies in the corresponding section of the xarray docs, to help clarify #8728 . - Add note that `da.values()` returns a view in the header for `da.values()`. * tweaks to the header * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * flip order of new .to_values() doc header paragraphs --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

kmuehlbauer · 2024-06-05T11:34:41Z

Resolved by #8744.

ks905383 added the needs triage Issue that has not been reviewed by xarray team member label Feb 9, 2024

ks905383 mentioned this issue Feb 13, 2024

Update docs on view / copies #8744

Merged

max-sixty added topic-documentation and removed needs triage Issue that has not been reviewed by xarray team member labels Feb 26, 2024

kmuehlbauer closed this as completed Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lingering memory connections when extracting underlying `np.arrays` from datasets #8728

Lingering memory connections when extracting underlying `np.arrays` from datasets #8728

ks905383 commented Feb 9, 2024

dcherian commented Feb 9, 2024

max-sixty commented Feb 9, 2024

ks905383 commented Feb 9, 2024 •

edited

Loading

ks905383 commented Feb 9, 2024

max-sixty commented Feb 9, 2024 •

edited

Loading

ks905383 commented Feb 9, 2024 •

edited

Loading

kmuehlbauer commented Jun 5, 2024

Lingering memory connections when extracting underlying np.arrays from datasets #8728

Lingering memory connections when extracting underlying np.arrays from datasets #8728

Comments

ks905383 commented Feb 9, 2024

What is your issue?

dcherian commented Feb 9, 2024

max-sixty commented Feb 9, 2024

ks905383 commented Feb 9, 2024 • edited Loading

ks905383 commented Feb 9, 2024

max-sixty commented Feb 9, 2024 • edited Loading

ks905383 commented Feb 9, 2024 • edited Loading

kmuehlbauer commented Jun 5, 2024

Lingering memory connections when extracting underlying `np.arrays` from datasets #8728

Lingering memory connections when extracting underlying `np.arrays` from datasets #8728

ks905383 commented Feb 9, 2024 •

edited

Loading

max-sixty commented Feb 9, 2024 •

edited

Loading

ks905383 commented Feb 9, 2024 •

edited

Loading