Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
hdf5 compression breaks on sparse series which have all values censored #2931
In pandas 0.10.0 I cannot use hdf5 compression when storing sparse series for which all values are "sparsified"/censored (i.e. the same as fill_value).
Compression works fine if there's at least one non-sparse value in each series of a sparse dataframe. And dataframes with series that are completely sparse can be stored in hdf5 without compression.
But combining the two, as in case 4 below, breaks:
import pandas as pd import numpy as np # make sparse dataframe df = pd.DataFrame(np.random.binomial(n=1, p=.01, size=(1e4, 1e2))).to_sparse(fill_value=0) # case 1: store uncompressed (works) store1 = pd.HDFStore('sparse_uncompressed.h5') store1['sparse_df'] = df store1.close() # case 2: store compressed (works) store2 = pd.HDFStore('sparse_compressed.h5', complib='zlib', complevel=9) store2['sparse_df'] = df store2.close() # set one series to be completely sparse df = np.zeros(1e4) # case 3: store df with completely sparse series uncompressed (works) store3 = pd.HDFStore('sparser_uncompressed.h5') store3['sparse_df'] = df store3.close() # case 4: try storing df with completely sparse series compressed (fails) store4 = pd.HDFStore('sparser_compressed.h5', complib='zlib', complevel=9) store4['sparse_df'] = df store4.close()
The resulting error comes from
ValueError: shape parameter cannot have zero-dimensions.