-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Description
The current size per reduced output file is about 1.8 GB. Could be smaller, for users, especially since that is per image taken. Fortunately, compression is trivial. Bitshuffle/LZ4 (https://github.com/kiyo-masui/bitshuffle) is particularly well-suited to this task. In addition, because Dectris uses it for their Nexus file compression, it is compatible with DIALS. Bitshuffle/LZ4 tends to give similar results to GZIP and other compression algorithms with better performance. A naive implementation of bitshuffle/LZ4 compression, shown below, led to 80% file reduction (365 MB vs 1.8 GB). (the chunk size has been optimized.)
The only datasets that need to be compressed are entry/instrument/nD_Mantid_{n}/data ; the rest can be left alone.
import h5py
import bitshuffle.h5
def copy_item2(name, obj):
if isinstance(obj, h5py.Group):
dst.create_group(name)
for attrname,attroutput in obj.attrs.items():
dst[name].attrs[attrname] = attroutput
elif isinstance(obj, h5py.Dataset):
chunk_size=(1280,1280,1)
if name in ['entry/instrument/nD_Mantid_0/data','entry/instrument/nD_Mantid_1/data','entry/instrument/nD_Mantid_2/data']:
dst.create_dataset(
name,
(1280,1280,50),
data=obj[:],
compression=bitshuffle.h5.H5FILTER,
compression_opts=(0, bitshuffle.h5.H5_COMPRESS_LZ4),
dtype='uint64',
chunks=chunk_size)
else:
dst.create_dataset(name, data=obj)
for attrname,attroutput in obj.attrs.items():
dst[name].attrs[attrname] = attroutput
Metadata
Metadata
Assignees
Labels
No labels