Add ZSTD support (#104)

Add support for compressing the data with ZSTD a new fast and efficient compression algorithm. ZSTD is pulled in as a submodule. This bumps the version to 0.4.0. Co-authored-by: Richard Shaw <richard@phas.ubc.ca>
kiyo-masui · Feb 25, 2022 · 9ba2580 · 9ba2580
1 parent 0aee87e
commit 9ba2580
Show file tree

Hide file tree

Showing 22 changed files with 578 additions and 70 deletions.
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -45,8 +45,11 @@ jobs:
         pip install -r requirements.txt
         pip install pytest
 
+        # Pull in ZSTD repo
+        git submodule update --init
+
         # Installing the plugin to arbitrary directory to check the install script.
-        python setup.py install --h5plugin --h5plugin-dir ~/hdf5/lib
+        python setup.py install --h5plugin --h5plugin-dir ~/hdf5/lib --zstd
 
     - name: Run tests
       run: pytest .
diff --git a/.github/workflows/wheels.yml b/.github/workflows/wheels.yml
@@ -26,14 +26,15 @@ jobs:
         run: python -m cibuildwheel --output-dir wheelhouse-hdf5-${{ matrix.hdf5 }}
         env:
           CIBW_ARCHS_LINUX: "x86_64"
-          CIBW_BEFORE_BUILD_LINUX: chmod +x .github/workflows/install_hdf5.sh; .github/workflows/install_hdf5.sh ${{ matrix.hdf5 }}
-          CIBW_ENVIRONMENT: "LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib"
+          CIBW_BEFORE_BUILD_LINUX: chmod +x .github/workflows/install_hdf5.sh; .github/workflows/install_hdf5.sh ${{ matrix.hdf5 }};
+                                   git submodule update --init
+          CIBW_ENVIRONMENT: "LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib ENABLE_ZSTD=1"
           CIBW_TEST_REQUIRES: pytest
           # Install different version of HDF5 for unit tests to ensure the 
           # wheels are indepedent of HDF5 installation
           CIBW_BEFORE_TEST: chmod +x .github/workflows/install_hdf5.sh; .github/workflows/install_hdf5.sh 1.8.11; 
           # Run units tests but disable test_h5plugin.py
-          CIBW_TEST_COMMAND: CI_BUILD_WHEEL=1 pytest {package}/tests
+          CIBW_TEST_COMMAND: pytest {package}/tests
 
       # Package wheels and host on CI
       - uses: actions/upload-artifact@v2

diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "zstd"]
+	path = zstd
+	url = https://github.com/facebook/zstd
diff --git a/README.rst b/README.rst
@@ -21,12 +21,12 @@ is performed within blocks of data roughly 8kB long [1]_.
 
 This does not in itself compress data, only rearranges it for more efficient
 compression. To perform the actual compression you will need a compression
-library.  Bitshuffle has been designed to be well matched Marc Lehmann's
-LZF_ as well as LZ4_. Note that because Bitshuffle modifies the data at the bit
+library.  Bitshuffle has been designed to be well matched to Marc Lehmann's
+LZF_ as well as LZ4_ and ZSTD_. Note that because Bitshuffle modifies the data at the bit
 level, sophisticated entropy reducing compression libraries such as GZIP and
 BZIP are unlikely to achieve significantly better compression than simpler and
-faster duplicate-string-elimination algorithms such as LZF and LZ4. Bitshuffle
-thus includes routines (and HDF5 filter options) to apply LZ4 compression to
+faster duplicate-string-elimination algorithms such as LZF, LZ4 and ZSTD. Bitshuffle
+thus includes routines (and HDF5 filter options) to apply LZ4 and ZSTD compression to
 each block after shuffling [2]_.
 
 The Bitshuffle algorithm relies on neighbouring elements of a dataset being
@@ -50,7 +50,7 @@ used outside of python and in command line utilities such as ``h5dump``.
 .. [1] Chosen to fit comfortably within L1 cache as well as be well matched
        window of the LZF compression library.
 
-.. [2] Over applying bitshuffle to the full dataset then applying LZ4
+.. [2] Over applying bitshuffle to the full dataset then applying LZ4/ZSTD
        compression, this has the tremendous advantage that the block is
        already in the L1 cache.
 
@@ -62,6 +62,8 @@ used outside of python and in command line utilities such as ``h5dump``.
 
 .. _LZ4: https://code.google.com/p/lz4/
 
+.. _ZSTD: https://github.com/facebook/zstd
+
 
 Applications
 ------------
@@ -97,11 +99,14 @@ Installation for Python
 
 Installation requires python 2.7+ or 3.3+, HDF5 1.8.4 or later, HDF5 for python
 (h5py), Numpy and Cython. Bitshuffle is linked against HDF5. To use the dynamically 
-loaded HDF5 filter requires HDF5 1.8.11 or later.
+loaded HDF5 filter requires HDF5 1.8.11 or later. If ZSTD support is enabled the ZSTD 
+repo needs to pulled into bitshuffle before installation with::
+
+    git submodule update --init
 
-To install::
+To install bitshuffle::
 
-    python setup.py install [--h5plugin [--h5plugin-dir=spam]]
+    python setup.py install [--h5plugin [--h5plugin-dir=spam] --zstd]
 
 To get finer control of installation options, including whether to compile
 with OpenMP multi-threading, copy the ``setup.cfg.example`` to ``setup.cfg``
@@ -112,6 +117,8 @@ Bitshuffle and LZF filters outside of python), set the environment variable
 ``HDF5_PLUGIN_PATH`` to the value of ``--h5plugin-dir`` or use HDF5's default
 search location of ``/usr/local/hdf5/lib/plugin``.
 
+ZSTD support is enabled with ``--zstd``.
+
 If you get an error about missing source files when building the extensions,
 try upgrading setuptools.  There is a weird bug where setuptools prior to 0.7
 doesn't work properly with Cython in some cases.
@@ -133,9 +140,13 @@ the filter will be available only within python and only after importing
 The filter can be added to new datasets either through the `h5py` low level
 interface or through the convenience functions provided in
 `bitshuffle.h5`. See the docstrings and unit tests for examples. For `h5py`
-version 2.5.0 and later Bitshuffle can added to new datasets through the
+version 2.5.0 and later Bitshuffle can be added to new datasets through the
 high level interface, as in the example below.
 
+The compression algorithm can be configured using the `filter_opts` in 
+`bitshuffle.h5.create_dataset()`. LZ4 is chosen with: 
+`(BLOCK_SIZE, h5.H5_COMPRESS_LZ4)` and ZSTD with: 
+`(BLOCK_SIZE, h5.H5_COMPRESS_ZSTD, COMP_LVL)`. See `test_h5filter.py` for an example.
 
 Example h5py
 ------------

diff --git a/bitshuffle/__init__.py b/bitshuffle/__init__.py
@@ -1,3 +1,4 @@
+# flake8: noqa
 """
 Filter for improving compression of typed binary data.
 
@@ -11,6 +12,8 @@
     bitunshuffle
     compress_lz4
     decompress_lz4
+    compress_zstd
+    decompress_zstd
 
 """
 
@@ -19,6 +22,7 @@
 
 from bitshuffle.ext import (
     __version__,
+    __zstd__,
     bitshuffle,
     bitunshuffle,
     using_NEON,
@@ -28,6 +32,16 @@
     decompress_lz4,
 )
 
+# Import ZSTD API if enabled
+zstd_api = []
+if __zstd__:
+    from bitshuffle.ext import (
+        compress_zstd,
+        decompress_zstd,
+    )
+
+    zstd_api += ["compress_zstd", "decompress_zstd"]
+
 __all__ = [
     "__version__",
     "bitshuffle",
@@ -37,4 +51,4 @@
     "using_AVX2",
     "compress_lz4",
     "decompress_lz4",
-]
+] + zstd_api
diff --git a/bitshuffle/ext.pyx b/bitshuffle/ext.pyx
@@ -33,14 +33,23 @@ cdef extern from b"bitshuffle.h":
                            int block_size) nogil
     int bshuf_decompress_lz4(void *A, void *B, int size, int elem_size,
                              int block_size) nogil
+    IF ZSTD_SUPPORT:
+        int bshuf_compress_zstd_bound(int size, int elem_size, int block_size)
+        int bshuf_compress_zstd(void *A, void *B, int size, int elem_size,
+                                int block_size, const int comp_lvl) nogil
+        int bshuf_decompress_zstd(void *A, void *B, int size, int elem_size,
+                                  int block_size) nogil
     int BSHUF_VERSION_MAJOR
     int BSHUF_VERSION_MINOR
     int BSHUF_VERSION_POINT
 
+__version__ = "%d.%d.%d" % (BSHUF_VERSION_MAJOR, BSHUF_VERSION_MINOR,
+                            BSHUF_VERSION_POINT)
 
-__version__ = str("%d.%d.%d").format(BSHUF_VERSION_MAJOR, BSHUF_VERSION_MINOR,
-                                     BSHUF_VERSION_POINT)
-
+IF ZSTD_SUPPORT:
+    __zstd__ = True
+ELSE:
+    __zstd__ = False
 
 # Prototypes from bitshuffle.c
 cdef extern int bshuf_copy(void *A, void *B, int size, int elem_size)
@@ -451,3 +460,110 @@ def decompress_lz4(np.ndarray arr not None, shape, dtype, int block_size=0):
     return out
 
 
+IF ZSTD_SUPPORT:
+    @cython.boundscheck(False)
+    @cython.wraparound(False)
+    def compress_zstd(np.ndarray arr not None, int block_size=0, int comp_lvl=1):
+        """Bitshuffle then compress an array using ZSTD.
+    
+        Parameters
+        ----------
+        arr : numpy array
+            Data to be processed.
+        block_size : positive integer
+            Block size in number of elements. By default, block size is chosen
+            automatically.
+        comp_lvl : positive integer
+            Compression level applied by ZSTD
+    
+        Returns
+        -------
+        out : array with np.uint8 data type
+            Buffer holding compressed data.
+    
+        """
+
+        cdef int ii, size, itemsize, count=0
+        shape = (arr.shape[i] for i in range(arr.ndim))
+        if not arr.flags['C_CONTIGUOUS']:
+            msg = "Input array must be C-contiguous."
+            raise ValueError(msg)
+        size = arr.size
+        dtype = arr.dtype
+        itemsize = dtype.itemsize
+
+        max_out_size = bshuf_compress_zstd_bound(size, itemsize, block_size)
+
+        cdef np.ndarray out
+        out = np.empty(max_out_size, dtype=np.uint8)
+
+        cdef np.ndarray[dtype=np.uint8_t, ndim=1, mode="c"] arr_flat
+        arr_flat = arr.view(np.uint8).ravel()
+        cdef np.ndarray[dtype=np.uint8_t, ndim=1, mode="c"] out_flat
+        out_flat = out.view(np.uint8).ravel()
+        cdef void* arr_ptr = <void*> &arr_flat[0]
+        cdef void* out_ptr = <void*> &out_flat[0]
+        with nogil:
+            for ii in range(REPEATC):
+                count = bshuf_compress_zstd(arr_ptr, out_ptr, size, itemsize, block_size, comp_lvl)
+        if count < 0:
+            msg = "Failed. Error code %d."
+            excp = RuntimeError(msg % count, count)
+            raise excp
+        return out[:count]
+
+    @cython.boundscheck(False)
+    @cython.wraparound(False)
+    def decompress_zstd(np.ndarray arr not None, shape, dtype, int block_size=0):
+        """Decompress a buffer using ZSTD then bitunshuffle it yielding an array.
+    
+        Parameters
+        ----------
+        arr : numpy array
+            Input data to be decompressed.
+        shape : tuple of integers
+            Shape of the output (decompressed array). Must match the shape of the
+            original data array before compression.
+        dtype : numpy dtype
+            Datatype of the output array. Must match the data type of the original
+            data array before compression.
+        block_size : positive integer
+            Block size in number of elements. Must match value used for
+            compression.
+    
+        Returns
+        -------
+        out : numpy array with shape *shape* and data type *dtype*
+            Decompressed data.
+    
+        """
+
+        cdef int ii, size, itemsize, count=0
+        if not arr.flags['C_CONTIGUOUS']:
+            msg = "Input array must be C-contiguous."
+            raise ValueError(msg)
+        size = np.prod(shape)
+        itemsize = dtype.itemsize
+
+        cdef np.ndarray out
+        out = np.empty(tuple(shape), dtype=dtype)
+
+        cdef np.ndarray[dtype=np.uint8_t, ndim=1, mode="c"] arr_flat
+        arr_flat = arr.view(np.uint8).ravel()
+        cdef np.ndarray[dtype=np.uint8_t, ndim=1, mode="c"] out_flat
+        out_flat = out.view(np.uint8).ravel()
+        cdef void* arr_ptr = <void*> &arr_flat[0]
+        cdef void* out_ptr = <void*> &out_flat[0]
+        with nogil:
+            for ii in range(REPEATC):
+                count = bshuf_decompress_zstd(arr_ptr, out_ptr, size, itemsize,
+                                              block_size)
+        if count < 0:
+            msg = "Failed. Error code %d."
+            excp = RuntimeError(msg % count, count)
+            raise excp
+        if count != arr.size:
+            msg = "Decompressed different number of bytes than input buffer size."
+            msg += "Input buffer %d, decompressed %d." % (arr.size, count)
+            raise RuntimeError(msg, count)
+        return out
diff --git a/bitshuffle/h5.pyx b/bitshuffle/h5.pyx
@@ -14,6 +14,7 @@ Constants
 
     H5FILTER : The Bitshuffle HDF5 filter integer identifier.
     H5_COMPRESS_LZ4 : Filter option flag for LZ4 compression.
+    H5_COMPRESS_ZSTD : Filter option flag for ZSTD compression.
 
 Functions
 =========
@@ -54,13 +55,15 @@ cdef extern from b"bshuf_h5filter.h":
     int bshuf_register_h5filter()
     int BSHUF_H5FILTER
     int BSHUF_H5_COMPRESS_LZ4
+    int BSHUF_H5_COMPRESS_ZSTD
 
 cdef extern int init_filter(const char* libname)
 
 cdef int LZF_FILTER = 32000
 
 H5FILTER = BSHUF_H5FILTER
 H5_COMPRESS_LZ4 = BSHUF_H5_COMPRESS_LZ4
+H5_COMPRESS_ZSTD = BSHUF_H5_COMPRESS_ZSTD
 
 # Init HDF5 dynamic loading with HDF5 library used by h5py
 if not sys.platform.startswith('win'):