New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame.to_hdf doesn't pass along min_itemsize for index #10381

Closed
TomAugspurger opened this Issue Jun 17, 2015 · 4 comments

Comments

Projects
None yet
4 participants
@TomAugspurger
Contributor

TomAugspurger commented Jun 17, 2015

Unless I'm seeing something wrong

In [21]: df = DataFrame(dict(A = 'foo', B = 'bar'),index=range(5)).set_index("A")

In [22]: df.to_hdf('store.h5', 'test', format='table', min_itemsize={'index': 10})

In [23]: store = pd.HDFStore('store.h5')

In [24]: store.get_storer('test').table
Out[24]:
/test/table (Table(5,)) ''
  description := {
  "index": StringCol(itemsize=3, shape=(), dflt=b'', pos=0),
  "values_block_0": StringCol(itemsize=3, shape=(1,), dflt=b'', pos=1)}   # <---- I think this should be 10
  byteorder := 'irrelevant'
  chunkshape := (10922,)
  autoindex := True
  colindexes := {
    "index": Index(6, medium, shuffle, zlib(1)).is_csi=False}

and FYI this raises (not sure if it should work)

In [25]: df.index.name = 'theindex'

In [26]: df.to_hdf('store.h5', 'test2', format='table', min_itemsize={'theindex': 10})
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

Just a report right now... no time.

@TomAugspurger TomAugspurger added this to the Next Major Release milestone Jun 17, 2015

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Jun 17, 2015

Contributor

FYI the solution here works. Something like

store.append('test', df, min_itemsize={'index': 30})

so it should just be a matter of passing along arguments.

Contributor

TomAugspurger commented Jun 17, 2015

FYI the solution here works. Something like

store.append('test', df, min_itemsize={'index': 30})

so it should just be a matter of passing along arguments.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jun 17, 2015

Contributor

I think maybe add to the docs a bit more here. This is like saying that the default for column A should be the same as B, which is not very explicit.

That said it should work for min_itemsize=30 (e.g. defaults all object columns)

Contributor

jreback commented Jun 17, 2015

I think maybe add to the docs a bit more here. This is like saying that the default for column A should be the same as B, which is not very explicit.

That said it should work for min_itemsize=30 (e.g. defaults all object columns)

@toobaz

This comment has been minimized.

Show comment
Hide comment
@toobaz

toobaz Oct 22, 2015

Member

Notice that if I do

ddf = pd.DataFrame([['a', 'b', 1],
                    ['a', 'b', 2]],
                    columns=['A', 'B', 'C']).set_index(['A', 'B'])

and then

ddf['C'].to_hdf('/tmp/store.hdf', 'test',
          format="table",
          min_itemsize={'index' : 3})

(as far as I understand, the suggested workaround), I still get the error.

Member

toobaz commented Oct 22, 2015

Notice that if I do

ddf = pd.DataFrame([['a', 'b', 1],
                    ['a', 'b', 2]],
                    columns=['A', 'B', 'C']).set_index(['A', 'B'])

and then

ddf['C'].to_hdf('/tmp/store.hdf', 'test',
          format="table",
          min_itemsize={'index' : 3})

(as far as I understand, the suggested workaround), I still get the error.

@toobaz

This comment has been minimized.

Show comment
Hide comment
@toobaz

toobaz Dec 6, 2016

Member

Just for the records: the bug doesn't have to do with to_hdf() specifically, but rather with storing in table format without (explicitly) appending:

store.put(df, 'key', format='table', min_itemsize={'index' : 10})

will fail the same.

I'm pushing a PR in few seconds.

Member

toobaz commented Dec 6, 2016

Just for the records: the bug doesn't have to do with to_hdf() specifically, but rather with storing in table format without (explicitly) appending:

store.put(df, 'key', format='table', min_itemsize={'index' : 10})

will fail the same.

I'm pushing a PR in few seconds.

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.19.2, Next Major Release Dec 11, 2016

@jreback jreback closed this in e833096 Dec 11, 2016

jorisvandenbossche added a commit that referenced this issue Dec 15, 2016

BUG: Apply min_itemsize to index even when not appending
closes #10381

Author: Pietro Battiston <me@pietrobattiston.it>

Closes #14812 from toobaz/to_hdf_min_itemsize and squashes the following commits:

c07f1e4 [Pietro Battiston] Whatsnew
38b8fcc [Pietro Battiston] Tests for previous commit
c838afa [Pietro Battiston] BUG: set min_itemsize even when there is no need to validate (#10381)

(cherry picked from commit e833096)

ischurov added a commit to ischurov/pandas that referenced this issue Dec 19, 2016

BUG: Apply min_itemsize to index even when not appending
closes #10381

Author: Pietro Battiston <me@pietrobattiston.it>

Closes #14812 from toobaz/to_hdf_min_itemsize and squashes the following commits:

c07f1e4 [Pietro Battiston] Whatsnew
38b8fcc [Pietro Battiston] Tests for previous commit
c838afa [Pietro Battiston] BUG: set min_itemsize even when there is no need to validate (#10381)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment