Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hdf5 compression breaks on sparse series which have all values censored #2931

Closed
kforeman opened this issue Feb 25, 2013 · 3 comments
Closed

Comments

@kforeman
Copy link

In pandas 0.10.0 I cannot use hdf5 compression when storing sparse series for which all values are "sparsified"/censored (i.e. the same as fill_value).

Compression works fine if there's at least one non-sparse value in each series of a sparse dataframe. And dataframes with series that are completely sparse can be stored in hdf5 without compression.

But combining the two, as in case 4 below, breaks:

import pandas as pd
import numpy as np

# make sparse dataframe
df = pd.DataFrame(np.random.binomial(n=1, p=.01, size=(1e4, 1e2))).to_sparse(fill_value=0)

# case 1: store uncompressed (works)
store1 = pd.HDFStore('sparse_uncompressed.h5')
store1['sparse_df'] = df
store1.close()

# case 2: store compressed (works)
store2 = pd.HDFStore('sparse_compressed.h5', complib='zlib', complevel=9)
store2['sparse_df'] = df
store2.close()

# set one series to be completely sparse
df[0] = np.zeros(1e4)

# case 3: store df with completely sparse series uncompressed (works)
store3 = pd.HDFStore('sparser_uncompressed.h5')
store3['sparse_df'] = df
store3.close()

# case 4: try storing df with completely sparse series compressed (fails)
store4 = pd.HDFStore('sparser_compressed.h5', complib='zlib', complevel=9)
store4['sparse_df'] = df
store4.close()

The resulting error comes from tables:

ValueError: shape parameter cannot have zero-dimensions.
@jreback
Copy link
Contributor

jreback commented Feb 26, 2013

thanks for the heads up

pls give a try off PR #2933 and let me know if any more issues

@kforeman
Copy link
Author

Thanks so much, patch works perfectly!

@kforeman kforeman reopened this Feb 26, 2013
@jreback
Copy link
Contributor

jreback commented Feb 26, 2013

FYI
I believe there exists a bug in 0.10.1 where the fill values are incorrectly recorded on sparse series
0.10 I think should be ok
fixed for 0.11

jreback added a commit that referenced this issue Feb 26, 2013
BUG: fixes issue in HDFStore w.r.t. compressed empty sparse series (GH #2931)
@jreback jreback closed this as completed Feb 26, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants