Skip to content

Add bitshuffle/LZ4 compression to reduced output #124

@aaronfinke

Description

@aaronfinke

The current size per reduced output file is about 1.8 GB. Could be smaller, for users, especially since that is per image taken. Fortunately, compression is trivial. Bitshuffle/LZ4 (https://github.com/kiyo-masui/bitshuffle) is particularly well-suited to this task. In addition, because Dectris uses it for their Nexus file compression, it is compatible with DIALS. Bitshuffle/LZ4 tends to give similar results to GZIP and other compression algorithms with better performance. A naive implementation of bitshuffle/LZ4 compression, shown below, led to 80% file reduction (365 MB vs 1.8 GB). (the chunk size has been optimized.)

The only datasets that need to be compressed are entry/instrument/nD_Mantid_{n}/data ; the rest can be left alone.

import h5py
import bitshuffle.h5

def copy_item2(name, obj):
    if isinstance(obj, h5py.Group):
        dst.create_group(name)
        for attrname,attroutput in  obj.attrs.items():
            dst[name].attrs[attrname] = attroutput
    elif isinstance(obj, h5py.Dataset):
        chunk_size=(1280,1280,1)
        if name in ['entry/instrument/nD_Mantid_0/data','entry/instrument/nD_Mantid_1/data','entry/instrument/nD_Mantid_2/data']:
            dst.create_dataset(
                name,
                (1280,1280,50),
                data=obj[:],
                compression=bitshuffle.h5.H5FILTER,
                compression_opts=(0, bitshuffle.h5.H5_COMPRESS_LZ4),
                dtype='uint64',
                chunks=chunk_size)
        else:
            dst.create_dataset(name, data=obj)
        for attrname,attroutput in  obj.attrs.items():
            dst[name].attrs[attrname] = attroutput

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions