Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected compression rate on LZ4 filter #121

Closed
Willian-Zhang opened this issue Jun 10, 2021 · 1 comment · Fixed by #123
Closed

Unexpected compression rate on LZ4 filter #121

Willian-Zhang opened this issue Jun 10, 2021 · 1 comment · Fixed by #123
Labels
Milestone

Comments

@Willian-Zhang
Copy link

Willian-Zhang commented Jun 10, 2021

Using 32004 (LZ4) filter on obviously-compressible (OC) data not properly compressing data

Expected Behavior

h5ls -vr shows very large utilization rate on OC data (20227.16% on Reproduction Section result)

Current Behavior

h5ls -vr shows 100.00% utilization on OC data

Possible Solution

#120

Steps to Reproduce

import h5py
import numpy as np
import hdf5plugin

nda = np.full(2**20, 0.233)
f = h5py.File("data/2.hdf")
f.create_dataset(name="nda", data=nda, **hdf5plugin.LZ4())
f.close()
!h5ls -vr data/2.hdf
Opened "data/1.hdf" with sec2 driver.
/                        Group
    Location:  1:96
    Links:     1
/nda                     Dataset {1048576/1048576}
    Location:  1:800
    Links:     1
    Chunks:    {4096} 32768 bytes
    Storage:   8388608 logical bytes, 8388608 allocated bytes, 100.00% utilization
    Filter-0:  HDF5 lz4 filter; see http://www.hdfgroup.org/services/contributions.html-32004 OPT {0}
    Type:      native double

Context (Environment)

pip list | grep -e h5py -e numpy -e hdf
h5py                2.10.0
hdf5plugin          3.0.0
numpy               1.19.2

Detailed Description

problem call path
LZ4_compress_default(rpos, roBuf+4, blockSize, nBlocks*4);
-> LZ4_compress_fast(rpos, roBuf+4, blockSize, nBlocks*4, 1);
-> ...
fixed path
LZ4_compress(rpos, roBuf+4, blockSize);
-> LZ4_compress_default((rpos, roBuf+4, blockSize, LZ4_compressBound(inputSize));
-> LZ4_compress_default((rpos, roBuf+4, blockSize, 
          ((unsigned)(blockSize) > (unsigned)0x7E000000   /* 2 113 929 216 bytes */ ? 
                  0 : 
                  (blockSize) + ((blockSize)/255) + 16)
-> ...
);

Possible Implementation

#120

@t20100
Copy link
Member

t20100 commented Jun 10, 2021

Thanks for the report!

I reported this issue upstream: nexusformat/HDF5-External-Filter-Plugins#16 since hdf5plugin is only packaging it, the reference implementation being there (see https://confluence.hdfgroup.org/display/support/Filters#Filters-32004).

@t20100 t20100 added the bug label Jun 11, 2021
@t20100 t20100 added this to the Next release milestone Jun 11, 2021
t20100 added a commit to t20100/hdf5plugin that referenced this issue Sep 4, 2023
092190c38 Fix spelling errors (silx-kit#128)
b241ca0d0 Fix the CMake multiconfig var and test and tools handling (silx-kit#127)
7b33e4f19 Fix some make output issues (silx-kit#126)
51d221848 Documentation fixes and fixes from testing 1.1.1 (silx-kit#125)
4ae30429c CMake: (fix) HDF5 1.14.0 (silx-kit#121)
e1072ec5b Update Github actions  (silx-kit#118)
b021532ea Fix leak and consistify generic API with ZFP C API (silx-kit#117)
06408bb22 Enable CMake Testing (silx-kit#116)
eb5440043 Fix version info in docs
ad61dfa41 Revert "rework feat-cmake_tests CMake and replace scripts (silx-kit#111)" (silx-kit#115)
03b86413f rework feat-cmake_tests CMake and replace scripts (silx-kit#111)
08116fa9e Update README.md (silx-kit#114)
a10bd7bdb Merge branch 'master' of github.com:LLNL/H5Z-ZFP
1660d94a4 Fix h5repack test (silx-kit#107)
fd01bf1d7 mention generic interface
8d5ec2e74 Update docs (silx-kit#106)
c76f847d9 Added new endianness test (silx-kit#102)
8acf8244b fix reading of byte-swapped input files (silx-kit#95) (silx-kit#101)
62ab4eb6d Minor code clean-up. (silx-kit#103)
60032f640 Bugfix/hdf5 1.14.0 (silx-kit#99)
8fad85d72 CMake: (fix) MPI parallel HDF5 1.14.0 (silx-kit#97)
e81aec114 Don't flag a skipped test as a failed test.  (silx-kit#94)

git-subtree-dir: src/H5Z-ZFP
git-subtree-split: 092190c38d5760b3212ce4d96cced767b90aa189
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants