Skip to content

mmap objects used in base should expose a filename attribute #470

@GaelVaroquaux

Description

@GaelVaroquaux

In numpy <= 1.6 it is possible to inspect arrays to find out if they come from memmapped files and which file:

>>> import numpy as np
>>> a = np.ones((10, 10))
>>> np.save('a.npy', a)
>>> b = np.load('a.npy', mmap_mode='r')
>>> c = np.asarray(b)
>>> c.base is b
True
>>> c.base.filename
'/home/varoquau/a.npy'

In 1.7b2, the 'base' of 'c' is now a reference to the Python mmap object, and thus it is now impossible to get back to original filename. This may be an important usecase in parallel computing (e.g. with multiprocessing, or IPython) to control if data is shared between processes or not.

>>> import numpy as np
>>> a = np.ones((10, 10))
>>> np.save('a.npy', a)
>>> b = np.load('a.npy', mmap_mode='r')
>>> c = np.asarray(b)
>>> c.base

I propose to use as a 'base' in the numpy memmap objects something that keeps tracks of the filename (and maybe the offset, although it can be retrieved with a bit of magic). The simplest option would probably to simply use a subclass of mmap.mmap.

I, @GaelVaroquaux, volunteer to implement such a patch, if the strategy above is deemed as acceptable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions