Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate/remove array_data_set #7093

Closed
njsmith opened this issue Jan 22, 2016 · 9 comments
Closed

Deprecate/remove array_data_set #7093

njsmith opened this issue Jan 22, 2016 · 9 comments

Comments

@njsmith
Copy link
Member

njsmith commented Jan 22, 2016

So it turns out that ndarray.data supports assignment at the Python level, and what it does is just assign to the ->data field of the ndarray object:
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/getset.c#L325

This kind of assignment been deprecated at the C level since 1.7, and is totally unsafe -- if there are any views pointing to the array when this happens, then they'll be left pointing off into unallocated memory.

E.g.:

a = np.arange(10)
b = np.linspace(0, 1, 10)
c = a.view()
a.data = b.data
# Now c points into free'd memory

Can we deprecate or just remove this?

@njsmith njsmith changed the title Deprecate/remove array_set_data Deprecate/remove array_data_set Jan 22, 2016
@njsmith
Copy link
Member Author

njsmith commented Jan 22, 2016

@seberg
Copy link
Member

seberg commented Jan 22, 2016

Sure, see also gh-5958, but Jaime never came around to doing the deprecation instead of just removing it. Of course if we want to see if we can just remove it....

@njsmith
Copy link
Member Author

njsmith commented Jan 22, 2016

Doh, thanks :-)

Given that we have bugs back to 2009 complaining about this and no-one speaking up in favor, it sounds like deprecating it at least is a no-brainer.

@jaimefrio
Copy link
Member

I sometimes don't read long threads with sufficient attention... Was it agreed that this should be deprecated? I thought there was some concerns that things may break somewhere,

@njsmith
Copy link
Member Author

njsmith commented Jan 22, 2016

From skimming the old bug reports about this, I think the only positions
that anyone has seriously considered are whether we should
deprecate-and-then-remove, or just skip deprecating and kill it with fire.
.
Given that it has survived for however many decades now without triggering
the apocalypse, I guess we can afford a deprecation cycle. Even if leaving
the code in there makes me feel dirty :-)

On Thu, Jan 21, 2016 at 4:45 PM, Jaime notifications@github.com wrote:

I sometimes don't read long threads with sufficient attention... Was it
agreed that this should be deprecated? I thought there was some concerns
that things may break somewhere,


Reply to this email directly or view it on GitHub
#7093 (comment).

Nathaniel J. Smith -- https://vorpus.org http://vorpus.org

@embray
Copy link
Contributor

embray commented Aug 16, 2018

By the way, the docs cite #7083 as the reason for the deprecation, not this issue (a small one-digit typo it seems).

I don't fully agree with this deprecation, though I understand the reasoning for it. There is at least one valid use case I have had for updating .data directly from Python, and it is ironically very closely related to the reasoning behind this deprecation in the first place.

This first came to my attention in astropy/astropy#5797. The reasoning behind it is that in Astropy, there are objects representing a FITS file--an astronomy data format. The data in FITS files can be accessed as mmap-backed Numpy arrays. There is functionality for performing "in-place" updates of FITS files, including updates which resize the original file. Resizing--especially if data in the middle of the file is being resized--often means an in-place rewrite of the file. This means writing to a temp file, and then renaming the temp file over the original file.

As Astropy is often used in interactive data analysis sessions, the following scenario is not unheard of:

1. User opens a FITS file with mmap backing
2. User accesses the data array in a FITS file, and creates a variable referencing it; i.e. `mydata = myfitsfile[0].data`  (the `0` in this case is because a single FITS file can contain multiple array)
3. User modifies some things in the FITS file in such a way that the file requires resizing, and saves the updates using in-place update mode.  They still have a reference to the original data (backed by the original file) in `mydata`.

On most OS's this is not a problem. E.g. Linux will keep the original file data around, and just move dirents around instead. On Windows, however, this is a problem. It is not possible to delete or rename a file while there are still open handles to that file, including handles used for memory mapping. Therefore, in order to perform such an in-place update, it is necessary on Windows to close the mmaps underlying each array read from the FITS file before the user can save their changes.

That's easy enough to do, but the user, who is generally an astronomer unwary of things like mmaps and file handles, may still try to access mydata. If they are lucky, they'll get a page that's already been mapped; if they're unlucky that'll try to load a page that wasn't already mapped, and will get a segfault, bringing down their entire Python interpreter.

The workaround, then, is after rewriting the file to open new mmaps to the data that was already loaded, but backed by the new file, and then to update each ndarray's buffer to use the new mmap instead. This generally works, but is maybe still not worth the trouble. After all, as this ticket points out, if the user had a reference to a view of the array, the view would be broken I guess. It would help here if each ndarray instance kept track of all its views, and had a way to access all views.

As an alternative, I'd be happy enough if the scenario I described above still broke the user's original reference to the array (mydata) but raised an exception when trying to use it in any way. My thinking was maybe to have a "closed" flag on ndarrays meaning that their underlying buffer is no longer usable, and that any operations that access it should fail.

@charris
Copy link
Member

charris commented Aug 16, 2018

This sounds like a discussion that should happen on the mailing list.

@embray
Copy link
Contributor

embray commented Aug 16, 2018

@charris You're right, of course , I just wanted to place my comments in context; but I can also bring it up there.

@charris
Copy link
Member

charris commented Aug 17, 2018

It would be good to bring it up there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants