BUG: DataFrame.to_hdf doesn't pass along min_itemsize for index #10381

Closed
TomAugspurger opened this Issue Jun 17, 2015 · 4 comments

Comments

Projects
None yet
4 participants
Contributor

TomAugspurger commented Jun 17, 2015

Unless I'm seeing something wrong

In [21]: df = DataFrame(dict(A = 'foo', B = 'bar'),index=range(5)).set_index("A")

In [22]: df.to_hdf('store.h5', 'test', format='table', min_itemsize={'index': 10})

In [23]: store = pd.HDFStore('store.h5')

In [24]: store.get_storer('test').table
Out[24]:
/test/table (Table(5,)) ''
  description := {
  "index": StringCol(itemsize=3, shape=(), dflt=b'', pos=0),
  "values_block_0": StringCol(itemsize=3, shape=(1,), dflt=b'', pos=1)}   # <---- I think this should be 10
  byteorder := 'irrelevant'
  chunkshape := (10922,)
  autoindex := True
  colindexes := {
    "index": Index(6, medium, shuffle, zlib(1)).is_csi=False}

and FYI this raises (not sure if it should work)

In [25]: df.index.name = 'theindex'

In [26]: df.to_hdf('store.h5', 'test2', format='table', min_itemsize={'theindex': 10})
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

Just a report right now... no time.

TomAugspurger added this to the Next Major Release milestone Jun 17, 2015

Contributor

TomAugspurger commented Jun 17, 2015

FYI the solution here works. Something like

store.append('test', df, min_itemsize={'index': 30})

so it should just be a matter of passing along arguments.

Contributor

jreback commented Jun 17, 2015

I think maybe add to the docs a bit more here. This is like saying that the default for column A should be the same as B, which is not very explicit.

That said it should work for min_itemsize=30 (e.g. defaults all object columns)

@DSLituiev DSLituiev added a commit to DSLituiev/pandas that referenced this issue Oct 19, 2015

@DSLituiev DSLituiev fixes #11364, #10381 6a7bf9f
Contributor

toobaz commented Oct 22, 2015

Notice that if I do

ddf = pd.DataFrame([['a', 'b', 1],
                    ['a', 'b', 2]],
                    columns=['A', 'B', 'C']).set_index(['A', 'B'])

and then

ddf['C'].to_hdf('/tmp/store.hdf', 'test',
          format="table",
          min_itemsize={'index' : 3})

(as far as I understand, the suggested workaround), I still get the error.

@toobaz toobaz added a commit to toobaz/pandas that referenced this issue Feb 7, 2016

@toobaz toobaz BUG: Ensure data_columns is always a list (i.e. min_itemsize can exte…
…nd it)


closes #10381
0b06d76

toobaz referenced this issue Feb 7, 2016

Closed

Hdffixes #12248

@toobaz toobaz added a commit to toobaz/pandas that referenced this issue Feb 7, 2016

@toobaz toobaz BUG: Ensure data_columns is always a list (i.e. min_itemsize can exte…
…nd it)


closes #10381
be81d09
Contributor

toobaz commented Dec 6, 2016

Just for the records: the bug doesn't have to do with to_hdf() specifically, but rather with storing in table format without (explicitly) appending:

store.put(df, 'key', format='table', min_itemsize={'index' : 10})

will fail the same.

I'm pushing a PR in few seconds.

jreback closed this in e833096 Dec 11, 2016

@jorisvandenbossche jorisvandenbossche added a commit that referenced this issue Dec 15, 2016

@toobaz @jorisvandenbossche toobaz + jorisvandenbossche BUG: Apply min_itemsize to index even when not appending
closes #10381

Author: Pietro Battiston <me@pietrobattiston.it>

Closes #14812 from toobaz/to_hdf_min_itemsize and squashes the following commits:

c07f1e4 [Pietro Battiston] Whatsnew
38b8fcc [Pietro Battiston] Tests for previous commit
c838afa [Pietro Battiston] BUG: set min_itemsize even when there is no need to validate (#10381)

(cherry picked from commit e833096)
26920d1

@ischurov ischurov added a commit to ischurov/pandas that referenced this issue Dec 19, 2016

@toobaz @ischurov toobaz + ischurov BUG: Apply min_itemsize to index even when not appending
closes #10381

Author: Pietro Battiston <me@pietrobattiston.it>

Closes #14812 from toobaz/to_hdf_min_itemsize and squashes the following commits:

c07f1e4 [Pietro Battiston] Whatsnew
38b8fcc [Pietro Battiston] Tests for previous commit
c838afa [Pietro Battiston] BUG: set min_itemsize even when there is no need to validate (#10381)
9529ec0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment