Skip to content

ENH: Add compresslevel input parameter in savez_compressed() function #20995

@geniuskey

Description

@geniuskey

Proposed new feature or change:

I'm using bunch of large 2D arrays (about 10000 x 10000 ) and I need to compress and save data quickly.
The problem is that the savez_compressed() function uses only default compress level of zipfile.ZipFile class.
(The default compress level of zipfile.ZIP_DEFLATED is '6')

If I could use compress level 1 for savez_compressed() function

  • w/ Compress level 1: Compression ratio: 66.8%, exec. time: about 3.2 seconds 
  • w/ Compress level 6: Compression ratio: 69.5%, exec. time: about 24.7 seconds
    --> The compress level 1 is the most suitable for my project.
    Therefore I want to configure the compression level in savez_compressed() function.

Here is my solution. I could solve it with adding just few lines.

my modification code snippet of _savez() function definition of the numpy/lib/npyio.py file.

def _savez(file, args, kwds, compress, allow_pickle=True, pickle_kwargs=None):
    # Import is postponed to here since zipfile depends on gzip, an optional
    # component of the so-called standard library.
    import zipfile

    if not hasattr(file, 'write'):
        file = os_fspath(file)
        if not file.endswith('.npz'):
            file = file + '.npz'

    namedict = kwds
    for i, val in enumerate(args):
        key = 'arr_%d' % i
        if key in namedict.keys():
            raise ValueError(
                "Cannot use un-named variables and keyword %s" % key)
        namedict[key] = val

    if compress:
        compression = zipfile.ZIP_DEFLATED
    else:
        compression = zipfile.ZIP_STORED

    if 'compresslevel' in namedict:
        compresslevel = namedict['compresslevel']
        if not isinstance(compresslevel, int) or compresslevel < 1 or compresslevel > 9:
            compresslevel = None
    else:
        compresslevel = None

    zipf = zipfile_factory(file, mode="w", compression=compression, compresslevel=compresslevel)

    for key, val in namedict.items():
        fname = key + '.npy'
        val = np.asanyarray(val)
        # always force zip64, gh-10776
        with zipf.open(fname, 'w', force_zip64=True) as fid:
            format.write_array(fid, val,
                               allow_pickle=allow_pickle,
                               pickle_kwargs=pickle_kwargs)

    zipf.close()

It works!!

I haven't understand the github process from my idea to the merge in this kind of big open source repository cause I have no experience of github upstream activity yet. So... just let me share my idea through an issue.

thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions