min_itemsize not working on MultiIndex columns for Series, with format="table" #11412

Closed
toobaz opened this Issue Oct 22, 2015 · 3 comments

Comments

Projects
None yet
2 participants
Contributor

toobaz commented Oct 22, 2015

If I do

ddf = pd.DataFrame([['a', 'b', 1],
                    ['a', 'b', 2]],
                    columns=['A', 'B', 'C']).set_index(['A', 'B'])

and then

ddf['C'].to_hdf('/tmp/store.hdf', 'test',
          format="table",
          min_itemsize={'A' : 3})

I get the following:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-46-66f05c11146d> in <module>()
      1 ddf['C'].to_hdf('/tmp/store.hdf', 'test',
      2           format="table",
----> 3           min_itemsize={'A' : 3})

/usr/lib/python2.7/dist-packages/pandas/core/generic.pyc in to_hdf(self, path_or_buf, key, **kwargs)
    936 
    937         from pandas.io import pytables
--> 938         return pytables.to_hdf(path_or_buf, key, self, **kwargs)
    939 
    940     def to_msgpack(self, path_or_buf=None, **kwargs):

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, **kwargs)
    268         with HDFStore(path_or_buf, mode=mode, complevel=complevel,
    269                        complib=complib) as store:
--> 270             f(store)
    271     else:
    272         f(path_or_buf)

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in <lambda>(store)
    263         f = lambda store: store.append(key, value, **kwargs)
    264     else:
--> 265         f = lambda store: store.put(key, value, **kwargs)
    266 
    267     if isinstance(path_or_buf, string_types):

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in put(self, key, value, format, append, **kwargs)
    825             format = get_option("io.hdf.default_format") or 'fixed'
    826         kwargs = self._validate_format(format, kwargs)
--> 827         self._write_to_group(key, value, append=append, **kwargs)
    828 
    829     def remove(self, key, where=None, start=None, stop=None):

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
   1263 
   1264         # write the object
-> 1265         s.write(obj=value, append=append, complib=complib, **kwargs)
   1266 
   1267         if s.is_table and index:

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in write(self, obj, **kwargs)
   4104         cols.append(name)
   4105         obj.columns = cols
-> 4106         return super(AppendableMultiSeriesTable, self).write(obj=obj, **kwargs)
   4107 
   4108 

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in write(self, obj, data_columns, **kwargs)
   4071             obj.columns = [name]
   4072         return super(AppendableSeriesTable, self).write(
-> 4073             obj=obj, data_columns=obj.columns, **kwargs)
   4074 
   4075     def read(self, columns=None, **kwargs):

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
   3769         self.create_axes(axes=axes, obj=obj, validate=append,
   3770                          min_itemsize=min_itemsize,
-> 3771                          **kwargs)
   3772 
   3773         for a in self.axes:

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
   3371             axis, axis_labels = self.non_index_axes[0]
   3372             data_columns = self.validate_data_columns(
-> 3373                 data_columns, min_itemsize)
   3374             if len(data_columns):
   3375                 mgr = block_obj.reindex_axis(

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in validate_data_columns(self, data_columns, min_itemsize)
   3247 
   3248             existing_data_columns = set(data_columns)
-> 3249             data_columns.extend([
   3250                 k for k in min_itemsize.keys()
   3251                 if k != 'values' and k not in existing_data_columns

AttributeError: 'Index' object has no attribute 'extend'

All goes smoothly instead if I don't specify "format=table", or if I don't specify the min_itemsize, or if I save as DataFrame (ddf[['C']]) rather than a as Series.

Tested with up to date pandas from git and pytables 3.2.2-1.

Contributor

jreback commented Oct 22, 2015

dupe of #11364

its a bug, specify 'index' as the key to make it work

jreback closed this Oct 22, 2015

jreback added the IO HDF5 label Oct 22, 2015

Contributor

toobaz commented Oct 22, 2015

Sorry for the dupe (and for the ridiculous bug title).

But that said,

ddf['C'].to_hdf('/tmp/store.hdf', 'test',
          format="table",
          min_itemsize={'index' : 3})

still gives exactly the same error.

toobaz changed the title from min_itemsize to min_itemsize not working on MultiIndex columns for Series, with format="table" Oct 22, 2015

Contributor

jreback commented Oct 22, 2015

you can post that as an example in the other issue then. its the same/related.

@toobaz toobaz added a commit to toobaz/pandas that referenced this issue Nov 24, 2016

@toobaz toobaz BUG: Ensure min_itemsize is always a list
Closes #11412
7d5c9df

@toobaz toobaz added a commit to toobaz/pandas that referenced this issue Dec 5, 2016

@toobaz toobaz BUG: Ensure min_itemsize is always a list
Closes #11412
6406ee8

jreback added this to the 0.19.2 milestone Dec 5, 2016

@jreback jreback added a commit that referenced this issue Dec 5, 2016

@toobaz @jreback toobaz + jreback BUG: Ensure min_itemsize is always a list (#11412)
closes #11412

Author: Pietro Battiston <me@pietrobattiston.it>

Closes #14728 from toobaz/minitemsizefix and squashes the following commits:

e25cd1f [Pietro Battiston] Whatsnew
b9bb88f [Pietro Battiston] Tests for previous commit
6406ee8 [Pietro Battiston] BUG: Ensure min_itemsize is always a list
53bf1b2

@jorisvandenbossche jorisvandenbossche added a commit that referenced this issue Dec 15, 2016

@toobaz @jorisvandenbossche toobaz + jorisvandenbossche BUG: Ensure min_itemsize is always a list (#11412)
closes #11412

Author: Pietro Battiston <me@pietrobattiston.it>

Closes #14728 from toobaz/minitemsizefix and squashes the following commits:

e25cd1f [Pietro Battiston] Whatsnew
b9bb88f [Pietro Battiston] Tests for previous commit
6406ee8 [Pietro Battiston] BUG: Ensure min_itemsize is always a list

(cherry picked from commit 53bf1b2)
90e1922
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment