Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hdf5 compression breaks on sparse series which have all values censored #2931

Closed
kforeman opened this issue Feb 25, 2013 · 3 comments

Comments

@kforeman
Copy link

commented Feb 25, 2013

In pandas 0.10.0 I cannot use hdf5 compression when storing sparse series for which all values are "sparsified"/censored (i.e. the same as fill_value).

Compression works fine if there's at least one non-sparse value in each series of a sparse dataframe. And dataframes with series that are completely sparse can be stored in hdf5 without compression.

But combining the two, as in case 4 below, breaks:

import pandas as pd
import numpy as np

# make sparse dataframe
df = pd.DataFrame(np.random.binomial(n=1, p=.01, size=(1e4, 1e2))).to_sparse(fill_value=0)

# case 1: store uncompressed (works)
store1 = pd.HDFStore('sparse_uncompressed.h5')
store1['sparse_df'] = df
store1.close()

# case 2: store compressed (works)
store2 = pd.HDFStore('sparse_compressed.h5', complib='zlib', complevel=9)
store2['sparse_df'] = df
store2.close()

# set one series to be completely sparse
df[0] = np.zeros(1e4)

# case 3: store df with completely sparse series uncompressed (works)
store3 = pd.HDFStore('sparser_uncompressed.h5')
store3['sparse_df'] = df
store3.close()

# case 4: try storing df with completely sparse series compressed (fails)
store4 = pd.HDFStore('sparser_compressed.h5', complib='zlib', complevel=9)
store4['sparse_df'] = df
store4.close()

The resulting error comes from tables:

ValueError: shape parameter cannot have zero-dimensions.
@jreback

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2013

thanks for the heads up

pls give a try off PR #2933 and let me know if any more issues

@kforeman

This comment has been minimized.

Copy link
Author

commented Feb 26, 2013

Thanks so much, patch works perfectly!

@kforeman kforeman closed this Feb 26, 2013

@kforeman kforeman reopened this Feb 26, 2013

@jreback

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2013

FYI
I believe there exists a bug in 0.10.1 where the fill values are incorrectly recorded on sparse series
0.10 I think should be ok
fixed for 0.11

jreback added a commit that referenced this issue Feb 26, 2013
Merge pull request #2933 from jreback/pytables_2931
BUG: fixes issue in HDFStore w.r.t. compressed empty sparse series (GH #2931)

@jreback jreback closed this Feb 26, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.