Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in fromfile with non-seekable file causing reference count error #16473

Closed
adeak opened this issue Jun 1, 2020 · 7 comments
Closed

Comments

@adeak
Copy link
Contributor

adeak commented Jun 1, 2020

Apologies in advance for an inexact issue. I saw someone mention surprising behaviour in chat. They have fixed their original problem by updating python versions and going from 32-bit to 64-bit python, but I believe the original problem signals a bug in numpy.

The user saw this behaviour on 32-bit Python 3.7.1 (numpy version unknown) on Windows:

>>> np.fromfile(path, dtype=np.uint8)
OSError: could not seek in file
*** Reference count error detected:
an attempt was made to deallocate 2 (B) *** 

That refcount error raised a red flag. The binary file was 2-3 GB large. By upgrading to 64-bit Python 3.8 the original error went away. I couldn't persuade the user to investigate the original issue to see if it's present in numpy master, for instance.

I believe the problem is here:

/* This array creation function steals the reference to dtype. */
static PyArrayObject *
array_fromfile_binary(FILE *fp, PyArray_Descr *dtype, npy_intp num, size_t *nread)
{
    PyArrayObject *r;
    npy_off_t start, numbytes;
    int elsize;

    if (num < 0) {
        int fail = 0;
        start = npy_ftell(fp);
        if (start < 0) {
            fail = 1;
        }
        if (npy_fseek(fp, 0, SEEK_END) < 0) {
            fail = 1;
        }
        numbytes = npy_ftell(fp);
        if (numbytes < 0) {
            fail = 1;
        }
        numbytes -= start;
        if (npy_fseek(fp, start, SEEK_SET) < 0) {
            fail = 1;
        }
        if (fail) {
            PyErr_SetString(PyExc_IOError,
                            "could not seek in file");
            Py_DECREF(dtype);
            return NULL;
        }
        /* snip */

This seems to be the exact code path that was tripped by the user. I couldn't reproduce the problem locally since npy_fseek calls OS fseek directly, so I couldn't mock an object that fails to seek. However, it seems clear enough that when the IOError is raised, we Py_DECREF the dtype even though nothing has happened to dtype up until that point. This would explain why dtype gets a spurious decref, eventually leading to the reference count error.

My hunch is that we merely have to remove the Py_DECREF(dtype); line. If there's an error, we don't touch refcounts. When there's no error PyArray_NewFromDescr steals the reference to dtype as promised. When that fails there's no Py_INCREF either, which is probably fine: I suspect that when PyArray_NewFromDescr fails then the reference is not stolen (but I couldn't figure this out either way from around here).

Since I cannot reproduce the exact issue I'd rather wait for expert feedback before trying to fix the problem :)

@seberg
Copy link
Member

seberg commented Jun 1, 2020

We had a couple of fixes in this area a while ago. Its possible its fixed in master. I think the only way to easily reproduce the ref-counting error is by just changing the code to always error.

The decref there is correct right now, but I think the calling function had an issue thinking that it would not decref on error (but a function that steals a reference should always steal a reference, although stealing references tends to be confusing).

@eric-wieser
Copy link
Member

Based on the comment only, it seems the intent of this function is to either:

  • DECREF dtype
  • Store it somewhere without INCREFing it

It's possible there is a bug somewhere, but it looks to me like it would be in the caller.

@adeak
Copy link
Contributor Author

adeak commented Jun 1, 2020

Thank you both! @seberg good point about consistency of references, I didn't think of that. I'll close this then and try to look and figure out what may have happened on the calling side. And only return when I have something reproducible :) Odds are it is indeed already fixed in master, then!

@adeak adeak closed this as completed Jun 1, 2020
@seberg
Copy link
Member

seberg commented Jun 1, 2020

Just to make sure you don't do double work, make sure to look at the history to see recent changes in the caller here...

@adeak
Copy link
Contributor Author

adeak commented Jun 1, 2020

Thanks @seberg, I was going to do that. The reason I didn't look at recent changes was that the code where I suspected the bug hasn't been touched in 8-10 years. If the issue is in the caller it's a whole different kettle of fish.

@seberg
Copy link
Member

seberg commented Jun 1, 2020

I thought I remembered something: 454c5b5 likely includes the fix for this.

@adeak
Copy link
Contributor Author

adeak commented Jun 2, 2020

@seberg thank you for looking into this! I can confirm that 454c5b5 fixed this specific refcount issue. Sorry for the false alarm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants