Wrong error message in HDFStore.append when appending dataframe with None item in previously string-only column #16300

toobaz · 2017-05-09T14:08:39Z

Code Sample, a copy-pastable example if possible

In [2]: store = pd.HDFStore('teststore.h5', 'w')

In [3]: chunk = pd.DataFrame({'V1':['a','b','c','d','e'], 'data': range(5)})

In [4]: store.append('df', chunk)

In [5]: chunk = pd.DataFrame({'V1':[None, None], 'data': [3, 5]})

In [6]: store.append('df', chunk)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-d2de62391194> in <module>()
----> 1 store.append('df', chunk)

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in append(self, key, value, format, append, columns, dropna, **kwargs)
    966         kwargs = self._validate_format(format, kwargs)
    967         self._write_to_group(key, value, append=append, dropna=dropna,
--> 968                              **kwargs)
    969 
    970     def append_to_multiple(self, d, value, selector, data_columns=None,

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
   1311 
   1312         # write the object
-> 1313         s.write(obj=value, append=append, complib=complib, **kwargs)
   1314 
   1315         if s.is_table and index:

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
   3867         self.create_axes(axes=axes, obj=obj, validate=append,
   3868                          min_itemsize=min_itemsize,
-> 3869                          **kwargs)
   3870 
   3871         for a in self.axes:

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
   3539                 self.values_axes.append(col)
   3540             except (NotImplementedError, ValueError, TypeError) as e:
-> 3541                 raise e
   3542             except Exception as detail:
   3543                 raise Exception(

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
   3534                              encoding=self.encoding,
   3535                              info=self.info,
-> 3536                              **kwargs)
   3537                 col.set_pos(j)
   3538 

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in set_atom(self, block, block_items, existing_col, min_itemsize, nan_rep, info, encoding, **kwargs)
   1894                 min_itemsize,
   1895                 nan_rep,
-> 1896                 encoding)
   1897 
   1898         # set as a data block

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in set_atom_string(self, block, block_items, existing_col, min_itemsize, nan_rep, encoding)
   1941         # check for column in the values conflicts
   1942         if existing_col is not None:
-> 1943             eci = existing_col.validate_col(itemsize)
   1944             if eci > itemsize:
   1945                 itemsize = eci

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in validate_col(self, itemsize)
   1633                         "column but\nthis column has a limit of [%s]!\n"
   1634                         "Consider using min_itemsize to preset the sizes on "
-> 1635                         "these columns" % (itemsize, self.cname, c.itemsize))
   1636                 return c.itemsize
   1637 

ValueError: Trying to store a string with len [3] in [values_block_1] column but
this column has a limit of [1]!
Consider using min_itemsize to preset the sizes on these columns

Problem description

The error message is confusing (in particular when you ignore the fact that your data contains None) - it doesn't even provide the right column name.

Expected Output

Something analogous to

In [5]: chunk = pd.DataFrame({'V1':['f', 4.5], 'data': [3, 5]})

In [6]: store.append('df', chunk)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-d2de62391194> in <module>()
----> 1 store.append('df', chunk)

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in append(self, key, value, format, append, columns, dropna, **kwargs)
    966         kwargs = self._validate_format(format, kwargs)
    967         self._write_to_group(key, value, append=append, dropna=dropna,
--> 968                              **kwargs)
    969 
    970     def append_to_multiple(self, d, value, selector, data_columns=None,

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
   1311 
   1312         # write the object
-> 1313         s.write(obj=value, append=append, complib=complib, **kwargs)
   1314 
   1315         if s.is_table and index:

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
   3867         self.create_axes(axes=axes, obj=obj, validate=append,
   3868                          min_itemsize=min_itemsize,
-> 3869                          **kwargs)
   3870 
   3871         for a in self.axes:

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
   3539                 self.values_axes.append(col)
   3540             except (NotImplementedError, ValueError, TypeError) as e:
-> 3541                 raise e
   3542             except Exception as detail:
   3543                 raise Exception(

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
   3534                              encoding=self.encoding,
   3535                              info=self.info,
-> 3536                              **kwargs)
   3537                 col.set_pos(j)
   3538 

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in set_atom(self, block, block_items, existing_col, min_itemsize, nan_rep, info, encoding, **kwargs)
   1894                 min_itemsize,
   1895                 nan_rep,
-> 1896                 encoding)
   1897 
   1898         # set as a data block

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in set_atom_string(self, block, block_items, existing_col, min_itemsize, nan_rep, encoding)
   1926                         "Cannot serialize the column [%s] because\n"
   1927                         "its data contents are [%s] object dtype"
-> 1928                         % (item, inferred_type)
   1929                     )
   1930 

TypeError: Cannot serialize the column [V1] because
its data contents are [mixed] object dtype

Notice that the error is even different if you try with chunk = pd.DataFrame({'V1':[7, 4.5], 'data': [3, 5]}), which results in ValueError: invalid combinate of [values_axes] on appending data [name->values_block_1,cname->values_block_1,dtype->float64,kind->float,shape->(1, 2)] vs current table [name->values_block_1,cname->values_block_1,dtype->bytes8,kind->string,shape->None].

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.7.0-1-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.utf8
LOCALE: it_IT.UTF-8

pandas: 0.21.0.dev+5.g82b9a172e
pytest: 3.0.6
pip: 9.0.1
setuptools: 33.1.1
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.18.1
xarray: 0.9.2
IPython: 5.1.0.dev
sphinx: 1.4.9
patsy: 0.3.0-dev
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.3.0.dev0
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: 3.7.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: 0.2.1

The text was updated successfully, but these errors were encountered:

toobaz · 2017-05-09T14:09:38Z

This is similar to #12242 , but the cause seems different.

jreback · 2017-05-09T14:35:44Z

this is a fixed store, there is no concept of column names. that said, you might be able to provide a better error msg.

toobaz · 2017-05-09T14:47:15Z

this is a fixed store, there is no concept of column names

What I mean is just that "the column [V1]" (see my last example) is informative, while "in [values_block_1] column" is not (regardless of what is in the store).

bryanfree66 · 2018-08-23T20:18:26Z

Working on it

bryanfree66 · 2018-08-24T18:16:46Z

From the comment above: "Notice that the error is even different if you try with chunk = pd.DataFrame({'V1':[7, 4.5], 'data': [3, 5]}), which results in ValueError: invalid combinate of [values_axes] on appending data [name->values_block_1,cname->values_block_1,dtype->float64,kind->float,shape->(1, 2)] vs current table [name->values_block_1,cname->values_block_1,dtype->bytes8,kind->string,shape->None]."

The two test cases:

chunk = pd.DataFrame({'V1':[None, None], 'data': [3, 5]})
chunk = pd.DataFrame({'V1':[7, 4.5], 'data': [3, 5]})

Are raising two different errors from two different locations. The first if failing a column validation in validate_col, trying to fit 'nan' into a column size 1. The second is failing table validation because of a data type mismatch.

I'm not sure how to proceed with this.

christopherkang · 2018-10-02T15:16:51Z

Were you able to make any progress with this?

jreback added IO HDF5 read_hdf, HDFStore Difficulty Novice Error Reporting Incorrect or improved errors from pandas labels May 9, 2017

jreback added this to the Next Major Release milestone May 9, 2017

TomAugspurger added the good first issue label Oct 11, 2017

jreback removed the Difficulty Novice label Dec 15, 2017

bryanfree66 mentioned this issue Aug 23, 2018

Wrong error message in HDFStore.append #22489

Closed

4 tasks

Alexandreae mentioned this issue Aug 13, 2019

Issues de sala de aula Insper/open-dev#45

Closed

jbrockmendel removed the Effort Low label Oct 21, 2019

mroeschke added the Enhancement label May 16, 2020

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong error message in HDFStore.append when appending dataframe with None item in previously string-only column #16300

Wrong error message in HDFStore.append when appending dataframe with None item in previously string-only column #16300

toobaz commented May 9, 2017

INSTALLED VERSIONS

toobaz commented May 9, 2017

jreback commented May 9, 2017 •

edited

toobaz commented May 9, 2017

bryanfree66 commented Aug 23, 2018

bryanfree66 commented Aug 24, 2018

christopherkang commented Oct 2, 2018

Wrong error message in HDFStore.append when appending dataframe with None item in previously string-only column #16300

Wrong error message in HDFStore.append when appending dataframe with None item in previously string-only column #16300

Comments

toobaz commented May 9, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

toobaz commented May 9, 2017

jreback commented May 9, 2017 • edited

toobaz commented May 9, 2017

bryanfree66 commented Aug 23, 2018

bryanfree66 commented Aug 24, 2018

christopherkang commented Oct 2, 2018

Output of `pd.show_versions()`

jreback commented May 9, 2017 •

edited